Avoiding combat in a compelling disagreement is no easy task. A delicate balance, that I unfortunately, struggle to strike. Mallesh Pai is more than good on both fronts, in his response to my two recent posts ( 1, 2). His principal disagreement is my near dismissal of the product Google Translate (GT), in reporting some bugs and omissions in information . Vadim, Arvind and Siddhartha (elsewhere) also offered insightful comments. Google surely would benefit from this team of social and computer scientists testing and discussing GT, with little incentive.
I will begin with what I found problematic in Mallesh’s largely reasonable response. To test the efficacy of GT, he translated one piece from Hindi to English, an output he calls “less than good” and another from French to English where GT was “unsurprisingly [much better]”. I agree that the results are a darn good start. But in my original posts, my claims were explicitly about source AND target languages being other than English. Apply my deduction, that GT uses English as a mediating language, on the results shown by Mallesh, the ‘unsurprisingly better’ translation would ‘border on poor’ when it passes from ‘French to Hindi’ ( via English). The reverse, I am afraid, would be an even poorer re-translation of an already “less than good” translation. Although, as Arvind said, such experiments have to begin somewhere, and mediating via English (or Russian for Slavic Languages as per Vadim) are indeed low hanging fruits that should be plucked. These technical glitches ‘alone’, i agree, are not “abject failures”.
However , GT is already positioned as much more than just an experiment. It is an institution and should be evaluated as such. In this frame of evaluation, my invective is quite justified, as I would soon show by situating this institution in two social settings.
The first is GT ‘s use by its non paying customers, ordinary people, who couldn’t care less about whether its outcome is probabilistic or deterministic. Many of whom don’t know English, and perhaps speak only 1 language. A uni-lingual Hindi speaker, referred to as ‘haurs’ in Bengali script ( a typo for Harsh, meaning Joy) would be bewildered when he sees himself addressed as ‘Ghoda’ (the animal Horse). Remember he doesn’t know any English or Bengali to interpret whether the translation acted up or someone actually called him a horse. There begins my problem.
Possibilities of such mistakes warrant that GT explicitly state either, at the site of translation, and in its product description that they use an intermediate language for translation between certain pairs . Such information is absent even on the GT blog. Why would they hide such a blatant aspect? Perhaps it would make a service, they themselves term an output of a “very smart” program, appear less so.
A more plausible explanation is that hiding this information is in Google’s economic self interest. For this let’s look at GT’s second set of “more valued” customers. Google’s principal revenue is from paid search or keyword based advertising (also called ad words). They offer advertisers to buy keywords in languages they are not conversant in for advertising in global markets, by generating copy (keywords) with GT (the main GT page links to a toolkit for businesses). The attractiveness of this proposition would go down many notches if advertisers are explicitly told that the translation between many language pairs is actually mediated through a third language such as English or Russian. Aren’t they less likely to use a service that relies on further translation of an initially rough translation. And there Google risks losing millions of advertising dollars. (Finally these are seen by human advertising agents, but initially an advertiser has to generate them through GT).
A traditional analogy: how would a Russian author react if a translator translated an English translation into Hindi after being hired to translate from Russian to Hindi. Had he known that the translator would use an English version, he may have looked for an alternative provider in the first place.
The point of my earlier posts was not to paint a dystopian picture of the Internet or Artificial Intelligence. Instead, here’s an alternative interpretation of a “successful experiment” that despite being very smart has limitations which need to be stated upfront, when rolled out as an institution .