Macro-analytic view of collaborative filtering
Thanks in part to collaborative filtering, Web-based retailers like Amazon can make money selling "niche" items that are too obscure for traditional brick-and-mortar stores to carry. Chris Anderson coined the term long tail to describe how selling lots of different niche items can add up to gross sales on par with a few old-fashioned blockbusters.
Two outstanding introductions to the long tail are
- The Long Tail, by Chris Anderson. Wired, Oct 2004. "Forget squeezing millions from a few megahits at the top of the charts. The future of entertainment is in the millions of niche markets at the shallow end of the bitstream."
- Going Long, by John Cassidy. The New Yorker, July 2006. A cautionary review of the book-length 2006 version of Anderson's original Wired article. "The least convincing part of Anderson’s book is his treatment of what he calls 'the short head,' the part of the curve where popular products reside."
Cassidy sums up the technology that makes the long tail possible:
- Cheap computer hardware, which reduces the cost of making and storing information products;
- Ubiquitous broadband, which cuts the cost of distribution;
- Elaborate “filters,” such as search engines, blogs, and online reviews, which help to match supply and demand.
Anderson draws the canonical long tail, which emerges when the above forces are all in place:
Click on the above image to see the full slide show on Wired.
The long tail (above) looks exactly like a power law distribution (below):
We have used the power law distribution previously to describe the effects of cumulative advantage, or "rich get richer." However, the long tail describes something utterly different than "rich get richer." Consider that megahits are the big winners in the "rich get richer" world described by Watts, but they are the dying paradigm of the "long tail" world described by Anderson.
How can the power law curve describe both the preeminence of megahits and their imminent demise?
To answer this question, it helps to examine more carefully the long tail curve and how it differs from the traditional power law distribution. Below we draw a generic long tail curve with units:
We see two important distinctions between the long tail curve and the power law distribution:
- Megahits and niche items are featured in both the long tail curve and the power law distribution, but they swap places depending on which curve is drawn. The long tail curve is, in a sense, a flipped-around version of the power law distribution.
- The x- and y-units of the long tail curve are different than the x- and y-units of the power law distribution. In particular
- The y-units of the long tail curve are rather similar to the x-units of a degree distribution (power law or otherwise): sales per unit time and node degree both indicate popularity.
- The x-axis of the long tail curve depends on sorting items by popularity. The result is akin to sorting people in a line by height and observing the curve described by the tops of their heads. This is not exactly a probability distribution; it is based on a different construction.
Niches, megahits, and the neglected middle
Two strong and fundamental forces of Web dynamics -- birds of a feather (homophily) and rich get richer (cumulative advantage) -- enhance the likelihood of extreme events at opposite ends of the spectrum from niches to megahits:
- "Birds of a feather" sustains niche items in the long tail of collaborative filtering
- "Rich get richer" launches huge megahits thanks to centrality-based search
The slide show below displays these forces against the backdrop of the long-tail. The last slide in the series depicts the "neglected middle," as described by Cassidy.
Macro-analytic view of the long tail
The long tail provides a macro-analytic view of collaborative filtering. Cassidy's essay invites us to step back to a macro-analytic view of the long tail itself (i.e., a macro-macro-analytic view of collaborative filtering).
Six Degrees chapter 7 frames this perspective with the notion of collective decisions. In a collective decision, each individual incorporates his perceptions of others into his process of choosing among the options before him. Watts discusses four types of externalities that influence collective decisions:
- Information externalities: Knowing how others have acted under similar circumstances saves me the effort of evaluating all the options "objectively."
- Example: I am hungry. McDonalds has sold 30 billion Big Macs. They must be OK.
- Note: We use information externality synonymously with cumulative advantage and rich-get-richer.
- Coercive externalities: Anticipating the impact of my decision on others influences my choice.
- Example: Everyone is drinking at this party. What will they think of me if I don't drink?
- Note: If we define coercive externality as "aversion to difference" and homophily as "affinity for similarity" then we have roughly equated these two concepts (as double-negatives of each other). This is worth noting because homophily is otherwise not included in Watts' overview of forces that influence collective decisions.
- Market externalities: As a particular option is chosen by more and more people, that option becomes more and more valuable to all those who have chosen it.
- Example: In 1970 very few people had fax machines, and so a fax machine was of very limited use. By 1990 many people had fax machines, and that popularity made fax machines exponentially more useful to everyone owning one.
- Note: Market externalities are important to collaborative filtering, the long tail, and technology standards in general. See below.
- Coordination externalities: I will sacrifice my short-term selfish interests for long-term gains that depend on favors from others, to the extent that (1) I care about the future, and (2) I believe my actions affect the decisions of others.
- Example: When my friend lends me $10, I will pay him back the next time I see him. I lose $10 when I pay him back but gain more than that in the long run.
- Note: Coordination externalities are important to scenarios of group behavior such as the tragedy of the commons and the prisoners' dilemma. We will return to these topics later.
Market externalities and collaborative filtering: Market externalities heavily influence the competition among sites that perform collaborative filtering. Niche items in a long-tail marketplace must be stocked by a huge aggregator like Amazon in order for the benefits of collaborative filtering to work. This makes it very difficult for a new collaborative filtering site to challenge established sites like Amazon. Without a critical mass of user, inventory, and preference data, even the best new collaborative filter is useless -- like a fax machine in a world with no other fax machines. Cassidy puts it this way:
Successful long-tail aggregators can be counted on the fingers of one hand and have already established seemingly impregnable positions. Has the New Economy really moved past the familiar “winner take all” dynamic? That depends on whether you’re looking at the long tail—or at who’s wagging it.
Macro view of Web programming
A new Web builder usually focuses on the micro-synthetic task of building a site that achieves her goals; however, Web building is equally informed by the macro-analytic issues above. For example:
- Market externalities and technology standards: The Web would not be possible without standardized protocols for storing, transmitting, receiving, and displaying information. In this context, technologies such as HTML, CSS, XML, and RSS represent social contracts that prescribe how Web builders are to go about their programming. Furthermore, Web browsers ultimately dictate how Web users will see (or not see) the results of any Web programming; and so each version of each browser (e.g., Explorer 6.0) represents another possible technology standard to which a Web programmer may or may not adhere. The more popular any of these standards is, the more valuable that standard is to all those who already use it.
- Information externalities and site popularity: We have discussed this previously as cumulative advantage (rich get richer).
- Collaborative filtering and technical resources for Web programming: Our own "Share" page is one of thousands of online libraries dedicated to helping Web builders.
Think about how the above examples fit into the diagram below: