Algorithms and Computation
Information, computation, and algorithms

Computer science is "the study of the theoretical foundations of information and computation" [Wikipedia, 2009].

Examples of information includeAsabi Romera

  • The Adventures of Huckleberry Finn, novel by Mark Twain
  • Asabi Romera, photo by Annie Leibowitz (at right)
  • Results of the United States Census 2000
  • Letters of the alphabet

We will not attempt to define information. Instead, we simply note that sets and ordered lists, which we have already discussed, provide a basic foundation for the theory of information.

Computation is a form of action. Examples include

  • Add 1 + 2 + 3 + ... + 100
  • Digitize the photograph Asabi Romera as a JPG image
  • Sort a list of United States households according to their gross annual incomes
  • Find a shortest path (i.e., with fewest links) via the Internet from Webwhompers.com to your browser

An algorithm is a step-by-step procedure for performing a computation. We introduced algorithms to illustrate network dynamics. Our repertoire of algorithms so far includes the random graph algorithm, the triadic closure algorithm, and the preferential attachment algorithm.

 
Summation: an example of what computation is

Adding a series of numbers, called summation, is a simple example of computation. Below is a basic algorithm for summation:

Summation Algorithm:

Input:

Integer n, with n > 0

Output:

Variable t =1+2+3+...+n

Steps:

  1. i = 0
  2. t = 0
  3. i = i + 1
  4. t = t + i
  5. if i ≥ n then we're done
  6. go to step 3

The symbol Σ denotes summation and is used to represent calculations similar to the above.

Example: The simplest form of summation notation is

n
 
Σ
i
i=1
 

Given an integer n>0, the above notation represents the sum 1+2+3+...+n.

Example: We use sets to define a more general notion of summation. Let A = {1,2,3,...,n} with n>0. Then the previous example can also be written 

Σ
i
i∈A
 

Notice how the more general notation for summation is similar to implicit notation for sets. In the above example, the dummy variable is i and the true/false statement is i∈A. The symbol Σ denotes the summation of all values of the dummy variable for which the true/false statement is true.

Example: Let A = {1,5,8,20}. Consider

Σ
i
i∈A and
i is even
 

This is 8+20 = 28.

Example: Let A = {4,8,12}. Consider

Σ
i2
i∈A or
i = 7
 

This is 42+82+122+72 = 16+64+144+49 = 273.

 
HTML: an example of what computation is not

HTML is arguably the fundamental computer language of the Web and the basis of Web programming; however, many computer scientists do not consider HTML to be a programming language.

The distinction between the popular notion of "Web programming" and the computer science notion of "programming language" hinges on computation. A computer scientist uses a programming language to express algorithms that do computation. In contrast, Web programming does not require the author to write algorithms to do computation.

For example, if you want to add the numbers 1+2+3+...+100, then HTML is not helpful. Even the simplest arithmetic cannot be performed by HTML. The language is not designed to express algorithms of any kind.

HTML does not express algorithms, but it certainly does represent information. For example, if you wanted to write a story about adding the numbers 1+2+3+...+100, then HTML provides a language with which you can tell that story. You could also use HTML to tell the story of The Adventures of Huckleberry Finn. Both of these stories are examples of information, not computation.

If you used HTML to write a story about adding the numbers 1+2+3+...+100, then you would probably want to use an algorithm (expressed by some programming language other than HTML) in order to compute the actual sum. Then with HTML you could present that sum as a tidy conclusion to the story.

For example, you could use the Web scripting language PHP to add 1+2+3+...+100 and to generate the HTML that would present the above story. PHP can express algorithms that do computation; it is a programming language in the classic sense.

 
Computing distance, part one

Given an undirected graph G=(V,E), we previously defined the distance between two nodes x and y as the length of a shortest path in G from x to y. Now we describe how to compute this value.

Computing distance in a graph relates to several important topics in Web science:

  • Information diffusion: how does information spread over time through a population of Web users?
  • Packet routing: how is a discrete piece of information transmitted between two computers via the Internet?
  • Network navigation: how does one explore the connections in a network? (See "broadcast search" and "directed search" in Six Degrees for more on this.)

Computing distance in a graph is also very closely related to computing shortest paths in a graph. Finding shortest paths is a fundamental problem in computer science that is traditionally solved with breadth-first search. Our distance algorithm is a slightly simplified version of breadth-first search.

We motivate our distance algorithm with the following scenario:

  1. Suppose m is a meme that is 100% contagious: any person with m will immediately share it with everyone adjacent to him.
  2. Furthermore, m never changes: any person who has previously received m will forever more recognize it and be immune. (I.e., the sharing only happens the first time a person receives m.)
  3. The set of people and the connections joining them do not change.
  4. One person--the source--has m, and every other person does not have m.

Given the above scenario, who will receive meme m from whom, and how many steps will it take for m to reach each person in the network? Answering these questions drives the distance algorithm.

Example:

 0Node 9 is the source--the only node that starts with the meme
 1All nodes adjacent to 9 -- nodes 1, 3, and 8 -- receive the meme next. These nodes are distance 1 from source node 9.
 2All nodes that are adjacent to 1, 3, or 8 and that have not already received the meme are next. These nodes -- 2 and 10 -- are distance 2 from source node 9.
 3All nodes that are adjacent to 2 or 10 and that have not already received the meme are next. These nodes -- 5, 6, and 7 -- are distance 3 from source node 9.
 4

All nodes that are adjacent to 5, 6, or 7 and that have not already received the meme are next. There is one such node -- 4 -- which is distance 4 from source node 9.

 

 

 

 
Computing distance, part two: Example

Below is another example that illustrates how to compute distance in a graph. It explicitly counts iterations (of the algorithm we will soon write) and it introduces some more formal bookkeeping: Set Ai = the set of nodes that are distance i from the source node.

For example:

  • A0 is a set with one element: the source node.
  • A1 is the set of all nodes adjacent to the source node. 

Below we use Ai and a more complicated example to illustrate computing the distances from node 1 to every other node in the graph.

Beginning
0

Source node = 1


A0 = Set of nodes that are distance 0 from source
={1}

Iteration 1

Nodes not yet visited
= all nodes except source
= {2,3,4,5,...,20}
= all white nodes

A1 = Set of nodes that are distance 1 from source
= nodes adjacent to source
= {7,14,16,17,20}
= nodes with red arrows pointing at them
Iteration 2

Updated set of nodes not yet visited
= {2,3,4,5,6,8,9,10,11,12,13,15,18,19}
= white nodes

A2 = Set of nodes distance 2 from source
= unvisited nodes adjacent to any node in A1
= {4,5,8,9,11,13,18,19}
= nodes with red arrows pointing at them
Iteration 3

Updated set of nodes not yet visited
= {2,3,6,10,12,15}
= white nodes

A3 = Set of nodes distance 3 from source
= unvisited nodes adjacent to any node in A2
= {2,6,10,12,15}
= nodes with red arrows pointing at them
Iteration 4

Nodes not yet visited = {3}

A4 = Set of nodes distance 4 from source
= unvisited nodes adjacent to any node in A3
= {3}
= nodes with red arrows pointing at them
Iteration 5

Nodes not yet visited = { }

A5 = Set of nodes distance 5 from source
= unvisited nodes adjacent to any node in A4
= { }

At the end of iteration 5, we see that there are no nodes distance 6 from the source node. Therefore, there are no nodes distance 7 from the source node, no nodes distance 8, etc.

Note that if a graph is not connected, then when the algorithm ends, only nodes in the same connected component as the source node will have been visited, and nodes in other components will not have been visited. Nodes that are still unvisited at the end of the algorithm are distance infinity from the source.

 
Computing distance, part three: Algorithm

The algorithm below states more formally the steps of the computation illustrated above.

Distance Algorithm:

Input:

  • Undirected graph G=(V,E) with |V|>1
  • Source node s
  • Destination node t

Output:

  • Distance from s to t in G

Steps:

  1. if s = t then distance = 0 and we're done; otherwise continue
  2. A0 = {s}
  3. B = {x: x∈V and x≠s}
  4. i = 1
  5. Ai = {x: x is adjacent to at least one node in Ai-1} ∩ B
  6. If t∈Ai then distance = i and we're done; otherwise continue
  7. If Ai = { } then distance = ∞ and we're done; otherwise continue
  8. Remove elements of Ai from set B
  9. i = i + 1
  10. Go to step5

 

 

 

 

 


Joomla Templates by Joomlashack