Monday, March 27, 2006

Counteracting Sameness on the Internet

The internet is considered a place where everyday people can creatively express themselves. That is certainly true, but you also see a lot of outright copying (and illicit republishing) of online material. Of course, the popular stuff gets copied the most, and is also the original source material for most of the derivative works, criticism, and analysis as well. In Science Studies, it's what we call the Matthew Effect (described by Robert Merton in 1968), which is essentially a restatement of the fact that the rich get richer and the poor get poorer. It's why, for example, certain blogs have thousands of inbound links, while the majority of blogs have none or just a handful. (Economists use the term "power law distribution" to describe this type of phenomenon. In describing the Matthew Effect, Merton gets into the psychosocial causes of this particular behavior and its implications.)

Because of the way Google works (ranking pages according to popularity), popular content on the Web is what gets noticed and becomes more popular over time while less popular content gets buried deeper and deeper over time, making it especially hard for new content and websites to get on an equal footing with old websites with pre-established audiences (and therefore good PageRank).

The content of popular sites is what tends to get replicated on other sites as well. With open sources of information like Wikipedia (published under the GNU Free Documentation License) becoming more and more popular with internet users, that content is also being heavily replicated (word-for-word in many cases) on other websites (including other popular online encyclopedias). As a result, alternative and less popular sources of knowledge become harder to find amidst the sea of copied information.

The ability to copy information without directly altering the original might avoid the 'tragedy of the commons', but it can result in a tragedy of another sort, a Tragedy of Sameness, a type of information entropy where all knowledge moves towards an equilibrium where everyone knows the same things about the same topics, and nothing really new is said or created.

Web content that can be copied and republished freely often becomes more popular than similar content that is more strictly controlled by their original authors, especially when the free content has been around longer. The tragedy occurs when this free (as in beer and as in speech) content is of lower quality than the non-free content, and the non-free high quality content can't compete in the long run and therefore disappears forever (to the detriment of everyone).

The precariousness of being in The Long Tail

It is said that The Long Tail is good for content that is not in high demand, that sites such as ebay, Amazon, and Netflix make it possible for niche content to survive and get exposure they wouldn't get otherwise (while making those companies a lot of money). Such sites and Long Tail content are mutually beneficial to each other.

Wikipedia also benefits from having a lot of niche content that could be considered in the Long Tail. There are many specialized articles that don't get read a lot, but the fact that they exist adds tremendous value to Wikipedia. (Print encyclopedias, on the other hand, are much more constrainted regarding the types of articles that get printed.) I'm not convinced, however, that Wikipedia is reciprocally beneficial to independent website creators, many of whom initially research and publish the valuable information that Wikipedia later incorporates. As Wikipedia and other megasites get popular, the independent websites are increasingly being pushed into the Long Tail of the Web and face the very real threats of obscurity and/or extinction.

The growing usage of megasites to get information on niche subjects can, in many cases, reduce the demand for independently created sites that cover the same content. With reduced demand, there becomes less incentive (monetary or otherwise) to produce such content independently, and the Web becomes populated by clones of Wikipedia articles and the other most popular sites on Google for any given subject. What used to be a collection of incredibly diverse (and sometimes quite expert) opinions and analysis on a multitude of subjects is increasingly becoming a boring monolith of sameness.

Even the new material posted on the internet these days that is considered "fresh and exciting content" is mostly superficial and ephemeral in its appeal, intended for audiences with brief attention spans. Scouring the Web for fascinating, well-researched, and wonderfully trivial details about arcane subjects has been replaced by surfing the Web for funny little photos and video clips, people's daily diaries, fanlistings, virtual slambooks, quickly accessible facts from reference pages, and easily digestible news articles (about current events, celebrity gossip, the latest tech gadgets, etc.) I browse and am entertained by those things too, I admit, but I'm also worried that "heavy" content on the Web is getting more and more scarce.

My Modest Attempts at Cultural Environmentalism

I try, even though it's hard, to avoid replicating the content (e.g. stuff on Slashdot or Digg) I read every day on the internet, a lot of which is actually fun stuff that lainspotting readers might be interested in. When I get the urge, however, I just remind myself that a ton of people are already linking to those things, and I don't need to add noise to the system.

I also try (with varying amount of success) to avoid linking to Wikipedia articles. Wikipedia is a great resource to quickly learn about new stuff, but I much prefer to support those independent website operators whose hard work results in truly original and in-depth content (that benefits from the accountability of having the author's name attached to it). Much of that content ends up being the (often uncredited) source material for Wikipedia and other megasites anyway, but in the process of being transferred, details get lost or are presented out of context.

Furthermore, Wikipedia contributors and editors tend to err on the side of caution (or stinginess, depending on your point of view) regarding external links (whether it was the source of the info, or alternative information) for fear of link-spamming. Yet, robust linking is what made the web great to begin with. It's becoming a lost tradition. Few people, even small website owners, want to link to potential competitors for fear of losing ad revenue in the long run. That search engine optimizers have to debate whether or not outbound links will negatively affect their PageRank is another tragedy, and the quality of the Web is suffering because of it.

Google and Originality

Ironically, I started this post with the intention of writing just one or two lines on how Google can be used to aid creativity. I just wanted to note that Google is a great tool for checking to see if a phrase you came up with has been coined or used before by anyone else. In something I wrote a few weeks back, I used the phrase "toothless ambition" and wondered if anyone had used it before me. Google says no, so I'll claim it as my own. =)

Related links

Browsers and the quality of web content [edit: added 3/28/06]

What happened to anime "shrines" on the web?


  1. the biggest problem I see with the net is its discouragement of activity. after all, 2 seconds on google can turn up just how many people are already doing just about anything you can conceive of doing. before, you might just start a sci-fi club or anime zine or something like that, because you didn't know any better. Now, you can bump into 20,000 others doing the same.

    it tends to discourage one from getting in the game, makes us all into passive watchers, which then feeds into more of that sameness of content, the lack of different/new/other voices.

  2. Speaking of a lack of "deepness" I noticed something else. When stories are created they use a lot of symbols, or allusions, but usually don't reappear as allusions or become a new source of symbols. Someone needs to create something devoid of symbols to restore creativity, in general.

  3. I continue the discussion by asking the following question: "Can anything be done at the browser level to encourage independent website owners to publish high quality original content on the internet, and to help out those who already do?". See my Opera blog for more details: Browsers and the quality of web content

  4. I should also mention that the idea of popular sites getting more popular and less popular sites getting less popular on the Web is a matter of debate and discussion in academic circles. Here are some varying viewpoints:

    The egalitarian effect of search engines

    Impact of Web Search Engines on Page Popularity (click on the first link)

    "Googlearchy": How a Few Heavily-Linked Sites Dominate Politics on the Web

    In general, I remain convinced that popular sites maintain a distinct advantage over unpopular sites on the Web in terms of getting more popular over time.

    While low-ranked niche pages get more exposure from search engines than they would without them (due to people who use very specific search terms), their overall visibility is still comparatively low, and the situation will only get worse as megasites continue to replicate the content found on the less popular niche sites.

    Furthermore, the mere perception and common intuition (right or wrong) that people have regarding the difficulty of getting new websites to be recognized by Google acts as a deterrent, preventing potential independent content producers from even trying.