Friday, June 02, 2006

Things Google Should Invent: enable users to mark parallel texts

Note for the non-linguists: parallel texts are texts that say the same thing in two or more different languages. The multilingual instructions for setting up your Ikea furniture or your new electronic widget are parallel texts. Your Canadian cereal box is a parallel text. EU websites are generally parallel texts.

Google already has what is probably the largest multilingual indexed corpus in the world, and millions, if not billions, of users. If Google cannot learn to recognize parallel texts itself, it should include a function on the toolbar or the interface (I'd prefer the interface because I don't have the toolbar at work and am not permitted to install it) that lets users mark two webpages as parallel texts - either tightly parallel (for direct translations) or loosely parallel (for texts that discuss the same topic, but aren't translations). Then they could add a function that lets users retrieve pages that other users have marked as parallel to the one they are currently viewing.

Apart from the obvious applications in the language industry, this could be very helpful for people who want to be tourists in a country whose language they don't read fluently, or do research on a topic where information is more readily available in a non-preferred language.

No comments: