Popularity, influence, and centrality 
Our understanding of collective decisionmaking relies on the notions of popularity and influence. Popularity is the quality of being commonly liked; it is often used to measure the outcome of a collective decision (e.g., counting votes). Influence is the power to affect others; it is often used to shape the process of a collective decision (e.g., lobbying). These two distinct qualities are often combined  such as when popularity confers influence. In the context of the Web, we mathematically model popularity and influence with indegree and PageRank™, respectively. Indegree and PageRank™ are two ways of measuring network centrality. Despite its intuitive appeal, network centrality is hard to define. For example, which node is most central in the following graph? There is no one right answer, because saying a node is "central" is ambiguous. A few ways to compute which node is most central include:  Indegree Centrality: Just another way of saying indegree. Node 4 above has the highest indegree centrality: it is the only node with 3 incoming links.
 Outdegree Centrality: Just another way of saying outdegree. For a Web page, this indicates author activity. In the above graph, node 5 has the highest outdegree centrality: it is the only node with 3 outgoing links.
 PageRank™: An enhanced version of indegree centrality. In the above graph, node 2 has the highest PageRank™.
In this chapter we will explain PageRank™, including the reasons why node 2 is most "influential" in the above graph. Our explanation combines basic concepts of set theory, graph theory, and algorithms  all drawn from previous chapters. Those who explore PageRank™ in more depth will find the literature full of references to matrix algebra and recursion, which are useful topics in more advanced courses of mathematics and computer science. For a headfirst dive into these topics, see Eigenvector centrality, which is the mathematical foundation on which PageRank™ is built. 
PageRank™ is Google's patented algorithm "to examine the entire link structure of the web and determine which pages are most important." The details of the algorithm are secret, but the main ideas are wellknown and muchcopied. The principles of PageRank™ are similar to those of a political election. In particular, a hyperlink from page x to page y counts as a "vote" cast by x for y. The rules of voting in this political election combine both democracy and oligarchy. The privilege of voting is open to all (democracy), but the weight of a vote depends on who cast it (oligarchy). Famous features of PageRank™ include two ways votes are weighted:  A vote from a highranked page is worth more than a vote from a lowranked page.
 Each page can vote for any number of other pages; however, the weight of each of its individual votes decreases according to the total number of votes it has cast.
Most other parts of PageRank™ are closely guarded secrets. For example:  Who has voted for whom? The contents of this ballot box are public, but who wants to count the votes? We rely on Google's database as a convenient substitute for verifiable votecounting.
 Whose voting privileges have been revoked? As Web builders compete for PageRank™, some use techniques such as linkfarming that Google considers to be cheating. Google works diligently and secretly to identify the cheaters and remove all their votes from the ballot box.

NetRank: a simplified version of PageRank 
Here we present our own NetRank algorithm as a simplified version of PageRank™. NetRank and PageRank™ accept exactly the same input: a directed graph. They also provide the same kind of output  influence scores for all the nodes of the graph  although they compute slightly different values for these scores. NetRank has two outstanding properties relative to PageRank™:  NetRank captures the most important feature of PageRank™ that improves on indegree: Votes from highranked pages are worth more than votes from lowranked pages.
 NetRank is much easier to compute than PageRank™  mostly because NetRank ignores all the other ways that PageRank™ improves on indegree as a measure of influence.
Example: We demonstrate the NetRank calculation using the following directed graph G=(V,E): We write NR to denote NetRank. The calculation of NR proceeds iteratively. During iteration i, the calculation of NR_{i} improves upon the prior calculation of NR_{(i1)}. We write NR_{i}(x) to denote the NetRank value of node x during the i^{th} iteration of the calculation. Iteration 0: To get the NetRank calculation started, we begin with NR_{0}(x) = 1 for every node x. Below we redraw G and write each NR_{0} value inside the corresponding node. Iteration 1: For each node x, NR_{1}(x) is the sum of the NR_{0} scores of all the nodes that vote for x:  NR_{1}(1) = NR_{0}(2) + NR_{0}(3) = 1 + 1
 NR_{1}(2) = NR_{0}(1) = 1
 NR_{1}(3) = NR_{0}(2) = 1
Below is G with each NR_{1} value drawn inside the corresponding node. Note that for each node x, NR_{1}(x) equals the indegree of x. NR_{1} counts votes and gives every vote equal weight. Iteration 2: For each node x, NR_{2}(x) is the sum of the NR_{1} scores of all the nodes that vote for x:  NR_{2}(1) = NR_{1}(2) + NR_{1}(3) = 1 + 1
 NR_{2}(2) = NR_{1}(1) = 2
 NR_{2}(3) = NR_{1}(2) = 1
NR_{2} counts votes, which are then weighted according to the rank of who cast each vote. NR_{1} is used as the basis for weighting. Here is G with each NR_{2} value drawn inside the corresponding node. Iteration 3: For each node x, NR_{3}(x) is the sum of the NR_{2} scores of all the nodes that vote for x:  NR_{3}(1) = NR_{2}(2) + NR_{2}(3) = 2 + 1
 NR_{3}(2) = NR_{2}(1) = 2
 NR_{3}(3) = NR_{2}(2) = 2
NR_{3} also counts weighted votes; however, the weighting of the votes is improved because NR_{2} is used instead of NR_{1}. Here is G with each NR_{3} value drawn inside the corresponding node. Summary of Iterations 03: Each of the iterations above is a column in the following table: Node x  NR_{0}(x)  NR_{1}(x)  NR_{2}(x)  NR_{3}(x)  1  1
 2  2  3  2  1
 1  2  2  3  1
 1  1  2  Subsequent iterations: As we continue iterating, the pattern of computing NR_{i} by adding NR_{(i1)} values remains the same. This pattern is determined by the original input to the algorithm: the nodes and edges of directed graph G. In our current example, the pattern can be drawn over the table of values. We also extend the table out to iteration 7: The arrows show how values from one column move (or are added) to determine the values in the next column. 
Normalization and convergence 
Here again is our example input to the NetRank algorithm: The table below shows iterations 07 of the NetRank calculation for this graph: Node x  NR_{0}(x)  NR_{1}(x)  NR_{2}(x)  NR_{3}(x)  NR_{4}(x)  NR_{5}(x)
 NR_{6}(x)  NR_{7}(x)
 1  1
 2  2  3  4  5
 7  9  2  1
 1  2  2  3  4  5  7  3  1
 1  1  2  2  3  4  5  The scores are different and bigger with each iteration. Evidently these numbers will change and grow for as long as the NetRank calculation continues to iterate. These observations lead us to two questions:  In what sense are the influence scores improving with each iteration?
 How many iterations must be done before the algorithm can stop?
We answer these questions by introducing normalization and convergence. Normalization is a calculation that makes it easier to compare one column of numbers to another column. Normalizing column NR_{i} of the above table involves two steps: (1) add the numbers in column NR_{i}, and (2) divide each of the numbers in column NR_{i} by that sum. Totals
of each column above  3  4
 5
 7  9  12
 16  21
  Dividing by above totals yields normalized data below
 Node x  normalized
NR_{0}(x)  normalized
NR_{1}(x)  normalized
NR_{2}(x)  normalized
NR_{3}(x)  normalized
NR_{4}(x)  normalized
NR_{5}(x)
 normalized
NR_{6}(x)  normalized
NR_{7}(x)
 1  1/3
 2/4  2/5  3/7  4/9  5/12
 7/16  9/21  2  1/3
 1/4  2/5  2/7  3/9  4/12  5/16  7/21  3  1/3
 1/4  1/5  2/7  2/9  3/12  4/16  5/21  The table above shows fractions to highlight the arithmetic of normalization. We write the exact same table a second time using decimals, in order to make it easier to compare one column of numbers to another (which is the point of normalization). Totals
of each column
 3  4
 5
 7  9  12
 16  21
  The normalized fractions above are rewritten below as decimals
 Node x  normalized
NR_{0}(x)  normalized
NR_{1}(x)  normalized
NR_{2}(x)  normalized
NR_{3}(x)  normalized
NR_{4}(x)  normalized
NR_{5}(x)
 normalized
NR_{6}(x)  normalized
NR_{7}(x)
 1  0.33
 0.50  0.40  0.43  0.44  0.42
 0.44  0.43  2  0.33
 0.25  0.40  0.29  0.33  0.33  0.31  0.33  3  0.33
 0.25  0.20  0.29  0.29  0.25  0.25  0.24  Convergence is a property illustrated by the above table of decimals. With each iteration, the changes from one column to the next become smaller and smaller. If we were to expand the above table to show column NR_{16}, we would see that, after normalization, NR_{16}(1) = 0.43, NR_{16}(2) = 0.32, and NR_{16}(3) = 0.25. Furthermore, the normalized NR_{i} values would never change (to the nearest 0.01) after iteration 16. This point  when further iterations of a calculation do not change the answer according to our desired level of precision (e.g., 0.01)  is the point of convergence. By normalizing the results of the NetRank calculation, we find that after enough iterations the values will converge. How many iterations is enough depends on (1) our desired level of precision, and (2) the input graph. More precision and larger graphs will require more iterations for convergence to occur. Conclusion: Having started with the following input graph: We have used NetRank to calculate the relative influence of each node of the graph.  The normalized NetRank score of node 1 is 0.43
 The normalized NetRank score of node 2 is 0.32
 The normalized NetRank score of node 3 is 0.25

Recall the NetRank calculation before we introduced normalization and convergence. For our 3node example graph, we represented this calculation with the following table: To write the NetRank algorithm (which computes the values in the above table), we need a mathematical way to say, "everyone who has voted for node x." We define InNeighhorhood for this purpose. Given a directed graph G=(V,E) and a node x, the InNeighborhood of x is: InN(x) = {y: (y,x) ∈ E} = the set of all nodes with directed edges into x We can now state the NetRank algorithm: NetRank Algorithm: Input:  Directed graph G=(V,E)
Output:  Influence scores for nodes in V
Steps:  For each node x∈V: NR_{0}(x)=1
 i = 1
For each node x∈V:
 NR_{i}(x) =  y∈InN(x)  (NR_{(i1)}(y))   If normalized NR_{i} values have converged, then we're done
 i = i + 1
 Go to step 3
Example: We will calculate iterations 05 of NetRank using the graph from the very beginning of this chapter: The table below includes a column for the InNeighborhood of each node. As we see in step 3 of the algorithm above, this is exactly the information that determines the pattern of computing NR_{i} from NR_{(i1)}. When constructing a NetRank table, writing this information explicitly helps organize the subsequent arithmetic. Node x
 InN(x)  NR_{0}(x)  NR_{1}(x)  NR_{2}(x)  NR_{3}(x)  NR_{4}(x)  NR_{5}(x)  1  {3,5}  1  2  3  5  10  18  2  {4,1}  1  2  5  8  15  26  3  {2,5}  1  2  3  7  11  22  4  {2,1,5}  1  3  5  10  16  32  5  {3}  1  1  2  3  7  11  The NetRank scores above have not been normalized. Skipping the process of normalization makes it impossible to detect convergence but easy to do a prespecified number of iterations manually (without a calculator). Node 4 is the most influential node according to NR_{5}. Using a spreadsheet, we can program the above calculation and include normalization. With a desired precision of 0.001, we observe convergence after 34 iterations. Node 4 is still the most influential node. The normalized NetRank values are  normalized NR_{34}(1) = 0.166
 normalized NR_{34}(2) = 0.248
 normalized NR_{34}(3) = 0.195
 normalized NR_{34}(4) = 0.285
 normalized NR_{34}(5) = 0.107

Dividing by outdegree: the NR* formula 
One way to represent the NetRank calculation is with the NR formula: For each node x∈V:
 NR_{i}(x) =  y∈InN(x)  (NR_{(i1)}(y)) 
The NR formula is taken from step 3 of of the NetRank algorithm. In order to transform the NetRank calculation into PageRank™, we can keep the basic iterative structure of the NetRank algorithm, but we must make two changes to the NR formula:  Accounting for the following feature of PageRank™: For each page, the weight of each individual vote cast by that page decreases according to the total number of votes cast by that page.
 Accounting for the "damping factor" that is built into PageRank™
The first change gives us the NR* formula: For each node x∈V:
 NR*_{i}(x) =  y∈InN(x)  (NR*_{(i1)}(y) / Outdegree(y)) 
Besides renaming NR to NR*, the only difference between the NR and NR* formulas is that NR* divides the weight of each vote by the outdegree of the node casting that vote. This models the desired effect: the weight of each individual vote cast by a page decreases according to the total number of votes cast by that page. Example: We will calculate iterations 05 of NR* with the same graph we just used to demonstrate NR: The table below includes a column for the outdegree of each node. Because each NR*_{i} calculation adds fractions with different denominators (based on outdegree values), the arithmetic is too messy to do manually. We have used a spreadsheet to calculate the NR*_{1}  NR*_{5} values below. Node x
 Outdegree(x)
 InN(x)  NR*_{0}(x)  NR*_{1}(x)  NR*_{2}(x)  NR*_{3}(x)  NR*_{4}(x)  NR*_{5}(x)  1  2  {3,5}  1  0.833  0.597
 0.597
 0.660  0.652  2  2  {4,1}  1  1.500  1.750
 1.625
 1.604  1.594  3  2  {2,5}  1  0.833  0.917
 1.014
 0.965  0.971  4  1  {2,1,5}  1  1.333  1.333
 1.306  1.264  1.301  5  3  {3}  1  0.500  0.417
 0.458
 0.507  0.483  The NR* scores above have not been normalized; however it appears that NR* scores will converge even without normalization. This convergence occurs because dividing by outdegree has an effect similar to that of normalization. In particular, dividing by outdegree limits how much the NR* values can grow  unlike the endless growth of NR values that we have observed. Note: Using NR* instead of NR gives us a different definition of the most influential node in our graph. Node 2 is the most influential node according to NR*_{5}. Using a spreadsheet, we can program the above calculation with a desired precision of 0.001 and observe convergence after 28 iterations. Node 2 is still the most influential node. The NR* values are  NR*_{28}(1) = 0.645
 NR*_{28}(2) = 1.613
 NR*_{28}(3) = 0.968
 NR*_{28}(4) = 1.290
 NR*_{28}(5) = 0.484
Previously we have seen that node 4 is most influential according to NR; it is secondmost influential according to NR*. Consider: Why does node 4 drop from most influential to secondmost influential after we account for outdegree in the weighting of votes? 
We create PageRank™ (PR for short) by adding a "damping factor" to the NR* formula. The damping factor d is a fixed decimal value, 0≤d≤1 (e.g., d=0.85). The PR formula:For each node x∈V:
 PR_{i}(x) = (1d) + d *
 y∈InN(x)  (PR_{(i1)}(y) / Outdegree(y)) 

Besides renaming NR* to PR, the only difference between the NR* and PR formulas is the use of the damping factor d. The meaning of the damping factor is subtle, so we first note what happens if we choose the most extreme possible values of d:  If d=1 then PR and NR* are exactly the same formula.
 If d=0 then PR_{i}(x) = 1 and the rest of the formula is irrelevant.
Example: We will calculate iterations 05 of PR, using d=0.85 and the same graph as before: Node x
 Outdegree(x)
 InN(x)  PR_{0}(x)  PR_{1}(x)  PR_{2}(x)  PR_{3}(x)  PR_{4}(x)  PR_{5}(x)  1  2  {3,5}  1  0.858  0.678  0.686  0.719  0.715  2  2  {4,1}  1  1.425  1.606  1.529  1.518  1.513  3  2  {2,5}  1  0.858  0.919  0.978  0.953  0.955  4  1  {2,1,5}  1  1.283  1.283  1.266  1.245  1.261  5  3  {3}  1  0.575  0.515  0.540  0.566  0.555  The PR scores are very similar to the NR* scores. Using a spreadsheet, we can program the above calculation with a desired precision of 0.001 and observe convergence after 13 iterations. Node 2 is still the most influential node. The PR values are  PR_{13}(1) = 0.713
 PR_{13}(2) = 1.521
 PR_{13}(3) = 0.954
 PR_{13}(4) = 1.257
 PR_{13}(5) = 0.555

The damping factor: PageRank as probability 
In order to understand what the damping factor means, it helps to reframe our PageRank™ metaphor from a political election to a Websurfing monkey. PageRank™ provides a kind of probability distribution for the following random variable: If a monkey were to sit down at a browser and start clicking and typing, what Web page would be left on the screen when the zookeeper finally takes the monkey away from the computer? PageRank™ tells us the likelihood that any specific page will be the monkey's final destination. Now suppose we also know how often the monkey likes to click on a random link vs. how often the monkey likes to type a random URL into the browser. In this scenario, the damping factor d expresses the monkey's preference for clicking instead of typing. So...  If d=1 then the resulting PageRank™ function assumes that the monkey starts at a random page and then always clicks and never types in a new URL.
 If d=0 then the resulting PageRank™ function assumes that the monkey never clicks; he always types in another random URL.
Admittedly the d=0 model corresponds to an unrealistic and/or pathological version of Web surfing  equivalent to the famous team of monkeys still typing up King Lear. Hence our best intelligence says that Google uses d=0.85, which is closer to 1 than 0. 
