Words of the mouse can mislead: Google Translator exposed through more fundamental evidence

My previous post got advocates of artificial intelligence to accuse me of making arguments against Google Translator (GT) based on proper nouns or misspelled words. These they contended made GT confound translation with transliteration in that particular case. However, I have recently found more compelling evidence to show that they indeed use English as a mediating language even when they ‘offer’ to translate from any source language to any target language. In doing so, they ignore fundamental ways in which languages differ from English.

Spanish like Hindi has a distinct formal and informal second person form. So an “Aap kaise ho? (How are you, formal)  – becomes “?Como esta”? in spanish and a “Tum Kaise Ho?” (how are you, informal)translates to “?Como estas?”. Enter either of these in GT in Hindi and the Spanish output on  GT is the same result, “?Como estas?” ( the informal form). Puzzled, well English ( the language through which it is mediated) has one second person singular form, “You”. So either of the Hindi expressions are first translated to “How are You?” and then further translated into Spanish.

Here I used no proper nouns, or a word that was hard to spell or understand, but the first expression that one learns when starting with any new language ( even before the alphabet or any vocab). And there this probabilistic and/or intelligent algorithm fails to make this fundamental distinction in  Spanish and Hindi from English.

Anyway, till Big Brother achieves greater perfection so that humans can only learn the newest newspeak  (ref: Orwell’s 1984) – the language of the mouse,  I urge you to continue enrolling in real language classes and turn to real people for humanistic tasks!

PostScript:I was told that my previous post was being circulated within Google and they were using the evidence presented as a case study of sorts. I would have expected them to offer some kind of acknowledgement. But I see no signs that they even visited my website. They ( or some employee) perhaps has conveniently copied the text and maybe is passing it off as his own discovery.


9 thoughts on “Words of the mouse can mislead: Google Translator exposed through more fundamental evidence

  1. Harsh, I believe it is a good idea to investigate possible failures of GT translations and raise awareness of them. However, some of them are quite predictable if you know how the stuff works.

    The system is not based on rules of language, it is a learning system based on statistics. Compare it to a child. The child will start learning the language without knowing any rules just by memorizing and reproducing of whatever he has heard from parents or friends. Later, the child will figure out some rules and patterns of the language by himself, and apply them to construct new sentences. Similarly, GT is based on whatever translations they were able to find and feed to the machine as the basis for the knowledge about the language. The huge difference is that the child uses the language to deliver some meaningful information, and recognizes it based on current circumstances, common knowledge, emotions etc. Something that the machine cannot (will never be able to?) do. Hence, the machine uses just a statistical approach. This more or less implies many of your findings.

    For one thing, if you remember, initially Google offered only from- and to-English translations. This was mainly because they had a huge database of documents translated to English, and used that database as the initial knowledge for their translating AI. Now, I believe, they have a large database of documents translated between other languages, but still it is explicitly stated that English is used as an intermediary whenever needed. At the same time, there are some languages that have different intermediaries. For example, as far as I know, the Ukrainian language is translated to English using the Russian language as the intermediary (there is a huge database: check almost any Ukrainian news web site and you will see that it is either written in Russian, or it has a button to see the same text in Russian language).

    And just to finish this short comment… I think, GT should mainly be used for quick and rough translations to get some idea of what is being written in another language. But even for that you should not 100% trust it. Still, it is better than word-by-word translations. More importantly, the thing is that GT learns whatever you (yes, you Harsh!) put into it. When you see that your name is translated as Horse, just click on it, and type in Harsh instead. It will learn it. When you see an error or a missing page in Wikipedia, the best thing you can do for the rest of us is fill in the gap. The same here. And even if you are a Google-hater, and hates the idea that they will monetize on your effort, you may still want to do that, simply because Google will eventually die, but your contribution will live forever. 🙂

    1. Vadim,
      Your points are all valid and well taken. I am of course writing this post imagining usage by a lay person, evaluating the ability of uni-lingual and non English speaking people in many parts of the world to benefit from this service. These are even less likely to appreciate differences between deterministic and probabilistic outputs. And why should that matter to an ordinary user of a service.
      As Arvind points out very rightly below – using more popular languages as mediators ( English or Russian) is indeed a low hanging fruit that should be exploited by AI researchers. My reservation which I shall soon articulate in a follow up post is that they are anything but explicit about this. Therefore the page does create the impression that they translate between all those pairs of languages. That is my problem.
      I hope to offer a full response to all the response that this has generated.

  2. I think fundamentally the difference lies in the expectation. As a CS engineer, i think both AI and translation have come along a long way over the last decade. But is it completely solved yet? No. Will it be able to differentiate formal vs informal in future? Yes. Will it ever be perfect? No. Every CS guy recognizes that language translation is a very difficult problem with many nuances. So, the fact that with AI we can do a decent enough job, that in itself is a great step. Also, there is nothing wrong in routing translation via English to begin with. As technology develops, inter-language facets will be modelled into AI, but till then, routing via English is like a low hanging fruit to get started. If we(or GT) go via the ideal way and take care of all finer details of a language, it would take forever before the product is launched. Examples like the ones you have cited are good ones to be worked upon and improved. But in no way it should be termed as “abject failure” or pushing us towards “Newspeak”.

    1. Arvind, I understand these terms are ‘harsh’ ( in English) and loaded. There are humanistic and sociological reasons why I used those. Wait for a full follow up post. I am just waiting for any more responses there might be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s