Reimagining the Internet as a mosaic of regional cultures

Angela Xiao Wu, Chinese University of Hong Kong and Harsh Taneja, University of Missouri-Columbia

Most online maps of the Internet are architectural plans, engineering blueprints, anatomical drawings or statistical graphics. For example, the Internet has been represented as millions of devices connected to each other by 300 “[c]ables lying on the seafloor” with its center in a huge hotel in Manhattan.

The Internet can also be viewed as a network of hyperlinks between world languages used to produce online content or represented through Wikipedia as a map of human knowledge.

Yet we learn from historians of cartography that maps reflect the preexisting interests, desires and preconceptions of the society from which they emerge. The same goes with how the vast virtual territory is mapped.

Consistent with the rhetoric emphasizing technical connectivity led by US-based transnational corporations, the prevalent maps of the Internet privilege technical features – such as hyperlinks, content of Web pages, Internet infrastructures and service providers. In these maps, the Web tends to center on the West with the rest of the world at its “peripheries.” These, together with other representations of the global digital divide, highlight the dominance of the West.

Such views limit the public’s ability to envision the Internet as a globally inhabited cyberspace.

We mapped usage of the Internet, as distinct from its technical features. Viewed this way, the Internet is much less West-centric, and rapidly diversifying as the world’s populations engage with it in their own ways.

Mapping global Internet usage

Actual traffic patterns on the Internet differ from its technical architecture. Reimagining the Internet according to global usage, our research reveals a fairly decentralized Web with significant participation from the global South. Our mapping makes visible, on an unprecedented scale, aspects of Internet use that remain “largely invisible” when “viewed from the perspective of network centers.”

We analyzed traffic to the world’s most popular 1,000 websites – which consistently account for 99 percent of global Web traffic – during the month of September in 2009, 2011 and 2013, respectively. These data come from comScore, a world leader in Web audience measurement.

For each of the possible pairs of those 1,000 websites – more than half a million pairs in total – we looked at the traffic shared by its two constituent websites. For example, for the pair comprising The New York Times and Google USA, we looked at how many people visited both sites.

We viewed website pairs as connected if they had traffic overlaps greater than would be expected by random chance, as with the Times-Google pair above, or the pair comprising the Times of India and Google India. Examined in this way, pairs of websites serving users from different cultural backgrounds – such as the Times of India and Yahoo Japan – tend not to be connected.

The Internet as Global Usage: 2009 (left), 2011, 2013 (right).
The dots are websites and the lines represent the existence of significant traffic overlap between them. These show that global Web usage clusters itself into many communities of websites based on shared traffic. What the member websites of these clusters have in common with each other allows us to identify them as expressions of online regional cultures (see legend).

Analyzing online regional cultures

Mapping sites based on how much traffic they share with each other revealed interconnected clusters or communities of shared Web use. These corresponded well with major geo-linguistic regions, and we called them “online regional cultures.“ In addition, there are a few online cultures that span geographic regions; they tend to include either user-generated or adult content.

To conduct our analysis, we borrow the anthropological concept of ethnology, a scholarly tradition that characterizes relationships between cultures based on common traits in beliefs, emotions or practices. To examine these regional cultures comparatively and historically, we calculated how distinctly a regional cultural community stands out on the Web, and the strength of its online activities.

In general, we find that geographical regions where people speak languages not widely spoken elsewhere (such as Japan and Korea) are the most distinct online cultures; regions with geographically dispersed languages (such as Spanish or Russian) or those of multilingual geographies (such as India) less so.

Our study suggests that the Web, when mapped based on its usage, does not have its core in the West, but is a mosaic of online regional cultures that associate with physical places.

In such maps, the Internet is becoming more decentralized, or to be more precise, de-Westernizing, as more users from disparate cultures are taking over its topography by bringing in their own cultural identities. Between 2009 and 2013 the Web witnessed a gradual process of “de-Americanization”; the cluster corresponding to the U.S. has separated from the “global” websites such as Twitter and Instagram – primarily user-generated websites, which are neither centered in North America nor on the English language.

In this process, the American sites have taken their own “corner” of the Web, just like other online regional cultures. Simultaneously, non-Western online cultures have strengthened, especially those linked to Brazil, Russia, China and India. Unsurprisingly, in these places, local Internet industries are thriving and domestic content is flourishing.

Compared to the prevalent technological Internet maps, our user-centric maps from 2009 to 2013 challenge, rather than reinforce, the existing concept of an Internet anchored by Western knowledge, norms and activities. They encourage the (Anglophone, especially) general public to confront the narrow online world with which it is familiar. Further, the trend captured by our maps may encourage Westerners to refresh their own preconceptions by exploring the vastly heterogeneous cyberspace.

These user-centric maps also inform policymakers about how better to empower the global South. Technical connectivity alone is not enough. For online regional cultures around the globe to strengthen, users must be able to build and shape the content they find appealing. For this to happen, local governments need to introduce civic, economic and social opportunities with new technologies. Left to a market dominated by West-based transnational corporations, the global South may not achieve healthy domestic Internet landscapes and online cultures.

Angela Xiao Wu, Assistant Professor of Journalism and Communication, Chinese University of Hong Kong and Harsh Taneja, Assistant Professor of Media and Communication, University of Missouri-Columbia

Here’s how not to capture India in a day

Ridley Scott is asking Indians to capture What they did on October 10 to create a reel of “India in a Day”. Scroll talks about it here . The article shows two Youtube videos. The first one is a concept explication. The second one (scroll down a bit) is “what exactly do they want!”.“>

Will he able to?

I say this project will yield a very skewed slice of the country . The “urban, left liberal, yuppie anglophone” India. I say this because: See the videos explaining the concept and the example video where the implementing director (some American Desi) is asking people in his very American accent on what he wants them to do.

Watch here:“>

I am all for people satisfying their creative pursuits. However, I would have appreciated some localization of this effort. Some attempt to make the request a better cultural fit with a wider cross-section of India.

The current video wont appeal to large masses of Indians with bilingual English proficiency. Because they train in English, not in “American”, and at least an American Desi should understand that, if not Ridley Scott.

Let’s wait to see what this turns out into.

Why mobile Internet in India won’t boom anytime soon?

Haven’t we all been hearing that the Internet boom in India is waiting to happen. We will soon get past the tipping point. We just need to overcome a few barriers. So what are these barriers?  About 10 years ago, it was the lack of a broadband policy, expensive computing. Five years ago lack of availability of inexpensive broadband access.  Finally, the advent of smartphones, it is believed would help India cross both hurdles of access and affordability. I remain skeptical.

I believe all these arguments have focused on the availability of technology itself, not how or why it is expected to be used by the people. It is overtly simplistic to assume that just because more and more mobile users have Internet enabled phones, they would take to the mobile Internet. This may remain a fond hope.

First, lets talk about the cellphones. No one in India or worldwide had expected cellphones to become ubiquitous. Far from it, they were thought to fail. Yet they found their most enthusiastic customers (surprisingly for some) in emerging markets of Asia and Africa, and not the US or UK. To me this was quite unsurprising.

Let’s examine the case of India. There were less than 20 million phones  in the country at the time cellphones first arrived ( a teledensity of about 1  in 1995-96). This is 100 years after the first telephone exchange was setup in the country.  The low teledensity was not because people did not want or could not afford telephones. Most people wanted one, and that too badly.  With one state owned provider, you had to wait years ( often decades) to get a connection ( we got ours in 1994 after applying in 1985 because we were high priority customers, my parents being doctors).

Why did everyone want a phone?  Because everyone had seen it being used. A worker at a grocery store and a restaurant had seen it being used by his owner. A clerk had seen it being used in his office. Even housewives had more than once flocked to STD /PCO booths to convey important messages. The phenomenal success of STD/PCOs in the late 80s and early 90s made every Indian aware of what the telephone can do.

The entry and falling prices of mobile phones was then a sweetly timed co-incidence with the need for a telephone. What more, no waiting lists, no favors to be offered to linesmen, yet you could within days ( or hours) get a working telephone. Soon, everybody knew somebody who had a telephone or a cellphone. Hence 100 years after its first introduction in the country, people were able to get a device they longed for. A device that enabled even the illiterate to communicate! They indeed embraced it, that too in no insignificant measure.

That this burgeoning cellphone population would start accessing Internet on mobile phones is a fairly rational thought. Accessible, affordable and reliable services further the case. Yet, there is one catch.

Unlike  phones, most haven’t longed for the Web. Do they even know what the Internet is or what can it do for them? The computer for them is a device that banks maintain ledgers on or people in big offices use for some official work, or some students use in colleges. Everyone they talk to or want to talk to has a cellphone. By contrast 9 out of 10 people do not use the Internet, fewer use it more frequently than once a month.

Then what about the claim that companies such as Facebook claim that a large chunk of their users (even in India) use the service from mobile devices.  These claims may be perfectly valid. However, how many of these users began using the Internet on their phones? My understanding suggests that most mobile web users are people who are already introduced to the Internet on a regular device, and use the mobile Web in addition to surfing the Web on the computer.

Agreed, for many the phone may have become a primary device for Internet access. However this does not mean that those who remain unexposed to computers and consequently the Internet, will also start using web services on their phone. They have to have the need for that service in the first place. How we develop that need is a matter of another post.Till then, good luck to all those waiting for the Internet in India to boom!

प्रादेशिक भाषाएँ हैं भारत में इन्टरनेट उपयोंग में वृद्धि की कुंजी।

नमस्कार मित्र ,

आजकल अपने phd थेसिस में काफी व्यस्त हूँ । आप में से कुछ लोगों को मालूम होगा कि मेरे अनुसन्धान का विषय  मीडिया, समुदाय और तकनीकी से जुड़ा हुआ है। ख़ास तौर से, अपने थेसिस में मैं इन्टरनेट के उपयोंग में भाषा की भूमिका की जाँच कर रहा हूँ ।

अभी तक मैंने नतीजों के तौर प़र  यह पाया है, कि इन्टरनेट के उपयोंग को बढाने में स्थानीय भाषाओं मे websites का उपलद्ध होना अनिवार्य है। यही नहीं, बल्कि तमाम देशों में कंप्यूटर keyboards और सोफ़्त्वैर भी स्थइनीय भाषाओँ मे उप्लब्ध हैं। यह रुझान केवल फ्रांस,  जेर्मनी इटली जैसे सम्रध देशों में ही नहीं, बल्कि विएतनाम, इंडोनेसिया, तुर्की जैसे विकासशील देशों में भी मैंने देखे हैं।

भारत  में स्थिथि काफी निराशाजनक है। अक्सर कंप्यूटर और अंग्रेजी की कुशलता में सीधा सम्बन्ध देखा जा सकता है । कुछ लोगों केअनुँसर यह समस्या का विषय नहीं है, क्योंकि  हर साक्षर भारतीय को अंग्रेजी आती है। परन्तु यह दावा सच्चाई से बहुत दूर है। पिछले 30 वर्षों में भारत में प्रादेशिक भाषाओँ में समाचार पत्रों,पत्रिकाओं, फिल्मों और टीवी कार्यक्रमों के श्रोताओं में बहुत बढ़त हुई है, जबकि अंग्रेजी मीडिया के दर्शकों में कोई खास वृद्धि नहीं हुई है।

तो इन्टरनेट के उपयोंग में  हम बाकी देशों  के मुकाबले इतना क्यों  पिछड़े हुए हैं ?  मेरे विचार में अगर हम कंप्यूटर शिक्षा और यन्त्र प्रदिशेक भाषा में प्राप्त कराएँ, तो हम इन्तेरेंट के उपयोंग में  काफी वृद्धि देखेंगे । परन्तु हमारे देश के करता धर्ता कभी चाहते ही नहीं हैं कि जो अंग्रेजी नहीं बोल सकता , वो आगे बढे और उनको टक्कर दे। ऐसा उन्होने अंग्रेजों से सीखा है, और हर अंग्रेजी में कुशल इंसान अपने आप को उच्च नागरिक मानता है।

हमें शीघ्र ही स्थिथि को सुधारने के लिए  कुछ करना होगा ।


The Economist concurs with some of my thoughts on Machine Translation

It is always gratifying to see media you trust, think about the same issues.

I had written some posts (1, 2) on problems with Google’s machine translation  and responses to some people who responded to me here.  The gist of these arguments was:

  1. Google uses English as an intermediate language to translate between many language pairs
  2. It does not declare that it does this
  3. It has vested interests in hiding this fact – getting ad-words users to advertise in multiple languages.

My conclusion was that machine translation is a successful experiment but not ready yet to be rolled out as an institution.

Interestingly the Economist highlights some of these issue here in a post appearing weeks after my own.

Responding to responses on my earlier posts on Google Translate

Avoiding combat in a compelling disagreement is no easy task.  A delicate balance, that I unfortunately, struggle to strike. Mallesh Pai is more than good on both fronts, in his response to my two recent posts ( 1, 2). His principal disagreement is my near dismissal of the product Google Translate (GT), in reporting some bugs and omissions in information . Vadim, Arvind and Siddhartha (elsewhere) also offered insightful comments. Google surely would benefit from this team of social and computer scientists testing and discussing GT, with little incentive.

I will begin with what I found problematic in Mallesh’s  largely reasonable response. To test the efficacy of GT, he translated one piece from Hindi to English, an output he calls “less than good” and another from French to English where GT was “unsurprisingly [much better]”. I agree that the results are a darn good start. But in my original posts, my claims were explicitly about  source AND target languages being other than English. Apply my deduction, that GT uses English as a mediating language, on the results shown by Mallesh, the ‘unsurprisingly better’ translation would ‘border on poor’ when it passes from ‘French to Hindi’ ( via English). The reverse, I am afraid, would be an even poorer re-translation of an already “less than good” translation. Although, as Arvind said, such experiments have to begin somewhere, and mediating via English (or Russian for Slavic Languages as per Vadim) are indeed low hanging fruits that should be plucked. These technical glitches ‘alone’, i agree, are not “abject failures”.

However , GT is already positioned as much more than just an experiment. It is an institution and should be evaluated as such. In this frame of evaluation, my invective is quite justified, as I would soon show by situating this institution in two social settings.

The first is GT ‘s use by its non paying customers, ordinary people, who couldn’t care less about whether its outcome is probabilistic or deterministic.  Many of whom don’t know English, and perhaps speak only 1 language. A uni-lingual Hindi speaker, referred to as ‘haurs’ in Bengali script ( a typo for Harsh, meaning Joy) would be bewildered when he sees himself addressed as ‘Ghoda’ (the animal Horse). Remember he doesn’t know any English or Bengali to interpret whether the translation acted up or someone actually called him a horse. There begins my problem.

Possibilities of such mistakes warrant that GT explicitly state either, at the site of translation, and in its product description that they use an intermediate language for translation between certain pairs . Such information is absent even on the GT blog. Why would they hide such a blatant aspect? Perhaps it would make a service, they themselves term an output of a “very smart” program, appear less so.

A more plausible explanation is that hiding this information is in Google’s economic self interest. For this let’s look at GT’s second set of “more valued” customers. Google’s principal revenue is from paid search or keyword based advertising (also called ad words). They offer advertisers to buy keywords in languages they are not conversant in for advertising in global markets, by generating  copy (keywords) with GT (the main GT page links to a toolkit for businesses). The attractiveness of this proposition would go down many notches if advertisers are explicitly told that the translation between many language pairs is actually mediated through a third language such as English or Russian. Aren’t they less likely to use a service that relies on further translation of an initially rough translation. And there Google risks losing millions of advertising dollars. (Finally these are seen by human advertising agents, but initially an advertiser has to generate them through GT).

A traditional analogy: how would a Russian author react if a translator translated an English translation into Hindi  after being hired to translate from Russian to Hindi. Had he known that the translator would use an English version, he may have looked for an alternative provider in the first place.

The point of my earlier posts was not to paint a dystopian picture of the Internet or Artificial Intelligence. Instead, here’s an alternative interpretation of a “successful experiment” that despite being very smart has limitations which need to be stated upfront, when rolled out as an institution .

Words of the mouse can mislead: Google Translator exposed through more fundamental evidence

My previous post got advocates of artificial intelligence to accuse me of making arguments against Google Translator (GT) based on proper nouns or misspelled words. These they contended made GT confound translation with transliteration in that particular case. However, I have recently found more compelling evidence to show that they indeed use English as a mediating language even when they ‘offer’ to translate from any source language to any target language. In doing so, they ignore fundamental ways in which languages differ from English.

Spanish like Hindi has a distinct formal and informal second person form. So an “Aap kaise ho? (How are you, formal)  – becomes “?Como esta”? in spanish and a “Tum Kaise Ho?” (how are you, informal)translates to “?Como estas?”. Enter either of these in GT in Hindi and the Spanish output on  GT is the same result, “?Como estas?” ( the informal form). Puzzled, well English ( the language through which it is mediated) has one second person singular form, “You”. So either of the Hindi expressions are first translated to “How are You?” and then further translated into Spanish.

Here I used no proper nouns, or a word that was hard to spell or understand, but the first expression that one learns when starting with any new language ( even before the alphabet or any vocab). And there this probabilistic and/or intelligent algorithm fails to make this fundamental distinction in  Spanish and Hindi from English.

Anyway, till Big Brother achieves greater perfection so that humans can only learn the newest newspeak  (ref: Orwell’s 1984) – the language of the mouse,  I urge you to continue enrolling in real language classes and turn to real people for humanistic tasks!

PostScript:I was told that my previous post was being circulated within Google and they were using the evidence presented as a case study of sorts. I would have expected them to offer some kind of acknowledgement. But I see no signs that they even visited my website. They ( or some employee) perhaps has conveniently copied the text and maybe is passing it off as his own discovery.