12 May 2009

The Effective Anti-Commons

I occasionally stumble across trends on the internet that frustrate and annoy me enough to write about. One of the most recent is what I like to call the "effective anti-commons." The term is a play on the phrase the tragedy of the anti-commons coined by Michael Heller. This tragedy in a nutshell is when numerous rights-holders each control part of a resource to the detriment of everybody involved. Last summer I saw Professor Heller give a talk* in Redomond, WA about his book Gridlock Economy during which he described the phenomenon in some detail. One of the examples he provided is that most airports are basically unable to add runways due to the land nearby being owned/controlled by too many competing interests. If you want more examples, check out the links above.

* side note: Quite a few of the Microsoft Research talks can be found at researchchannel.org, but god forbid you try to watch the videos on a linux box.

I'm taking the "effective" anti-commons to refer to those situations where control of a resource is split between multiple parties, but through technological barriers rather than through legal rights and restrictions. This happens fairly frequently when dealing with information rather than with physical resources. Technological barriers are necessary because data and other factual information is not copyrightable in and of itself (although the display or compilation of the information may be... the copyrightability of databases is somewhat hazy). So in order to provide protection to a database, companies keep it behind close doors and throw up a scary license that says you cannot copy the facts they display on their website. There have also been attempts to apply the legal concept of trespass to chattels to prevent data extraction techniques such as web scraping.

These attempts to legally control factual content have been hit or miss at best, so organizations have resorted to using technology to protect the data instead, partially because it is so easy to do. In general these barriers exist by default and a certain amount of effort must be spent to remove them (through providing web services or periodical database dumps, etc). This leaves few alternatives beyond web scraping for a third party to access the data. Many third-party sites do take this scraping approach, the most popular are probably airfare aggregators.

In many domains this sectioning off of information is harmful both to the consumer and the provider of the data. A few examples of where this is a problem are listed below.
  • Recipes -- There hundreds of different recipe collection sites on the web, some of the most notable are Allrecipes, Epicurious, RecipeZaar, etc. I still haven't found one with an open API. There are also a few web scraping aggregate sites like Supercook and Food.com, but surprise surprise they don't have an API either.

  • Car Pooling -- There are many carpooling websites, many of which sprang up in the last few years when gas prices were on the rise. Here is a list of 25 of them.

  • Guitar Tabs -- Just searching for guitar tabs will bring you quite a few different websites, each with their own collection of tabs. Lyrics websites are the same way.

  • Events -- Let's say you have an event coming up in Omaha, NE that you want people to know about, where would you post that event to so people saw it? Yahoo? We Go Places? Eventful? Or maybe a city specific site like Hello Omaha? Yahoo and Eventful at least realize the importance of data-sharing in this domain and provide developer APIs for access to their data.
Examining these examples illuminates a few specific problems with this setup.

For consumers:
  1. Where do I find information? An obvious problem when the information for a domain is split across multiple locations is where to look for something you need. Using recipes as an example, where would one know look for a desired recipe or recipe type? There is little to no way to tell which website has the highest chance of providing the best results. You have little choice but to search all of them (or Google might provide decent results).

  2. Where do I contribute information? Similar to problem 1, a person has to make a choice about where to contribute information so others can use it. In the case of events, how do you choose a site where the relevant group of users is likely to see it? Different people probably check different websites so you have to post the same information (facts) across many of them if you hope to advertise to the most people (this actually happens fairly often with guitar tab websites).

  3. How do I most effectively connect with other users? Carpooling is one of those domains where the goal is to connect people to each other. This is incredibly problematic when somebody advertising a ride and somebody looking for that same ride are on different websites. The problem of connecting these people is only a problem because the relevant information is not shared.
For producers:
  1. How do I accumulate information? For sites that rely on user-generated content, it is necessary for the owner of the website to convince users to actually generate that content. With an effective anti-commons, websites are forced to compete for users not only as consumers, but also as producers. Through this competition some users choose one website while others choose a different one, and the total amount of usable content for any one website is a fraction of what it could be if the information was shared.

  2. How do I leverage that information to provide value and attract users? The goal of many web applications is to leverage a set of data to provide value to customers. In many cases the amount of value provided correlates directly to the size of the dataset. In many of the example domains listed above, the amount of value possible increases as the data size increases (carpooling, recipes, etc). As mentioned in problem 1, this data set can be increased quite a bit if information is shared among producers rather than fragmented. With the current model of information hoarding, it leaves the door wide open for web-scraping mashups to come through, aggregate data from multiple websites, and win the market. If the data were shared to begin with, this would be far less of a concern.

  3. How do I differentiate myself from my competitors? In a free market competition is inevitable and can be a good thing. However competing for data accumulation and hoarding that data is counter-productive for the reasons just mentioned. It is much more useful and attractive to spend your time competing on features, usability, integration, etc built on top of a shared data set rather than shooting yourself in the foot competing on data accumulation itself. Knowing where to actually compete is a basic business principle, and is also a reason many for profit software companies leverage open source software (so they can focus on competing in more relevant areas of the software stack).
Consumers and producers are both harmed by technological barriers that restrict data sharing. I don't have the space to list out technological solutions to this problem, but many of them exist and do not take that much extra work. In most cases it is in a business' best interest to explore these options, especially in the user generated content space.

One area I didn't mention above, because it deserves it's own post (or series of posts) is the identity metadata domain (i.e. social networking sites). However many of the same problems pervade this domain as well.