Our course introduces Web science. It has no prerequisites and has been used by non-technical undergraduates at Boston University since 2006. Our curriculum is guided by the following passage adapted from "Web Science: An Interdisciplinary Approach to understanding the Web," by James Hendler, Nigel Shadbolt, Wendy Hall, and Tim Berners-Lee:
Web science, an emerging interdisciplinary field, takes the Web as its primary object of study. This study incorporates both the social interactions enabled by the Web's design and the applications that support them.
The Web is often studied at the micro scale, as an infrastructure of protocols, programming languages, and applications. However, it is the interaction of human beings creating, linking, and consuming information that generates the Web's behavior as emergent properties at the macro scale. These properties often generate surprising properties that require new analytic methods to be understood.
For example, when Mosaic, the first popular Web browser, was released publicly in 1992, the number of users quickly grew by several orders of magnitude, with more than a million downloads in the first year. The wide deployment of Mosaic led to a need for a way to find relevant material on the growing Web, and thus search became an important application, and later an industry, in its own right. The enormous success of search engines has inevitably yielded techniques to game the algorithms (an unexpected result) to improve search rank, leading, in turn, to the development of better search technologies to defeat the gaming. More recent macro-scale examples include photo-sharing on Flickr, video-uploading on YouTube, and social-networking sites like mySpace and Facebook.
The essence of Web science is to understand how to design systems to produce the effects we want. The best we can do today is design and build in the micro, hoping for the best; but how do we know if we've built in the right functionality to ensure the desired macro-scale effects? How do we predict other side effects and the emergent properties of the macro? Further, as the success or failure of a particular Web technology may involve aspects of social interaction among users, understanding the Web requires more than a simple analysis of technological issues but also of the social dynamic of perhaps millions of users.
Given the breadth of the Web and its inherently multi-user (social) nature, its science is necessarily interdisciplinary, involving at least mathematics, computer science, sociology, psychology, and economics.
Micro vs. Macro: themes of Web science
Four important themes of Web Science are
- Micro: an individual acts
- Macro: the world responds (or not) to an individual's action
- Synthetic: something is created to produce a desired result
- Analytic: laws are stated to explain observed phenomena
We focus on these themes as they apply to Web builders -- people who contribute links and other content to the Web:
|Micro||An individual builds a Web |
site to produce a desired result.
|(We do not speak|
to this quadrant.)
|Macro||"The world" builds a Web site |
to produce a desired result.
|Laws are stated to explain|
large-scale Web phenomena.
Some Web builders consider themselves Web developers; others consider themselves bloggers; others merely post an occasional comment on someone else's blog or discussion forum. We say "Web builder" to encompass the full spectrum of people who contribute links and other content to the Web.
Our lab curriculum provides an informal hands-on approach to the task of building a Web site. Our Search and Share pages help Web builders leverage collectively engineered resources (such as WordPress). The formal chapters of the Study page (which you are now reading) explain large scale Web phenomena; they also explain the Amazon recommendation algorithm and the Google PageRank algorithm.
The sociology, psychology, and economics of this course follow Duncan Watts' Six Degrees, which we recommend as a narrative companion to our own material. Our complete suggested reading list is below.
Protecting yourself from evildoers
Privacy, trust, and ownership
Basic mathematical foundations of networks:
- Explicit Notation for Sets
- Venn Diagrams
- Union and Intersection
- Ordered Lists
- Implicit Notation for Sets
- Logical Expressions
- Compound expressions with "or"
- Compound expressions with "and"
- Union and intersection defined formally
- Similarity of Sets
- Undirected and Directed
- Neighborhood and Degree
- Density and Average Degree
- Paths in undirected graphs defined formally
- Paths in directed graphs
See also Facebook and Touchgraph
Hubs, clusters, and other basic structural features of the Web:
- Connected: a word of many meanings
- Induced Subgraphs
- "Connected" defined formally
- Connected graphs and connected components
- Defining clusters, part one: connected components
- Defining clusters, part two: cliques
- Defining clusters, part three
How randomness, homophily, and cumulative advantage shape the Web:
- Limitations of traditional graph theory
- Introduction to network dynamics
- Three models of dynamic graphs
- Random graphs
- Demonstration of random graph dynamics
- Random graph algorithm
- Clusters and homophily
- Triadic closure
- Triadic closure algorithm
- Hubs and cumulative advantage
- Preferential attachment algorithm
All the above are summarized in the following table:
| ||Random graphs ||Clustering ||Centrality|
|Real-world phenomenon explained by model ||Giant component forms quickly when |E| ≅ |V|. ||Clusters emerge, providing "table of contents" overview. ||Hubs emerge, indicating popularity and/or influence. |
|Web sites ||N/A ||Clusty, iBoogie, Grokker ||Google et al |
|Sociological force ||Chance ||Homophily ||Cumulative advantage |
|Mathematical model ||Random graph algorithm ||Triadic closure algorithm ||Preferential attachment algorithm |
Variables, Probability, and Scale-Free Networks
Understanding that the Web is a scale-free network requires some probability theory:
Variables and Probability
- Variables in mathematics
- Variables in algorithms
- Random variables
- Discrete vs. continuous variables
- Probability distributions
- Degree distributions
General discussion of scale-free networks:
- Six Degrees Chapter 4, pp 101-114
- From previous chapter on Network Dynamics
- Hubs and cumulative advantage
- Preferential attachment algorithm
Information and Computation
Applying fundamental concepts of computer science to the Web
Information and computation
- Information, computation, and algorithms
- Summation: an example of what computation is
- HTML: an example of what computation is not
- Computing distance, part one: Information diffusion
- Computing distance, part two: Example
- Computing distance, part three: Algorithm
Examples of information diffusion on the Web:
How to compute personalized recommendations:
- "Expert opinions" without the experts
- Delicious: example of CF
- Bookmarks: content of Delicious
- Tuples: content of CF
- Bipartite graphs: structure of CF
- Structural equivalence: computation of CF
- Delicious: algorithmic summary
- The four steps of collaborative filtering
Niches and blockbusters in the world of Web commerce:
The Long Tail
- Macro-analytic view of collaborative filtering
- Power law revisited
- Niches, megahits, and the neglected middle
- Macro-analytic view of the long tail
- Macro view of Web programming
- The Long Tail, by Chris Anderson. Wired, October 2004.
- Going Long, by John Cassidy. The New Yorker, July 2006.
- Six Degrees Chapter 7, pp 207-215: Information Externalities & Market Externalities
How to compute the influence of a Web page:
Influence in Networks
- Popularity, influence, and centrality
- Introduction to PageRank
- NetRank: a simplified version of PageRank
- Normalization and convergence
- The NetRank algorithm
- Dividing by outdegree: the NR* formula
- The PageRank formula
- The damping factor: PageRank as probability
See also PageRank Explained by Phil Craven
Competition and Cooperation
What happens when Web builders seek to increase their influence?
Games: Competition and Cooperation
- Dynamics of popularity and influence
- PageRank competition
- Doing the right thing
- Mutually assured construction
- Authority, reciprocity, reputation
- Game theory
- Winners' dilemma