We look at a part of the Internet that consists of 10 web pages, numbered from 1 to 10 here. This network is represented by a directed graph (see above), whose arrows symbolize the links between the web pages. We're modeling a random surfer, who moves in this part of the internet. The surfer obeys the following rules:
Which web page is the most visited by the random surfer, when he or she is on the network for a long time?
This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try
refreshing the page, (b) enabling javascript if it is disabled on your browser and,
finally, (c)
loading the
non-javascript version of this page
. We're sorry about the hassle.
We consider the probability p i ( k ) that the surfer is in the k - th step on the web page i . Since the first website is chosen completely random, we have an equal distribution at the beginning: p i ( 0 ) = n 1 = 1 0 % with the number n = 1 0 of the web pages. If the probabilities p i ( k ) for the k - th step are known, then the probabilities for the next step can be calculated by p i ( k + 1 ) = j → i ∑ l j 1 p j ( k ) The sum goes over all web pages j , which have a link to the web page i . l j is the number of links, that start from the page j . In order to jump from j to i , the surfer must first have been on the page j (with probability p j ( k ) ) and selected the link to the page i (with probability 1 / l j ). (This formula is Google's PageRank algorithm with the damping factor d = 1 .)
The Python program below computes the probability distribution of the network for the first 25 steps. After just 15 steps, the values begin to converge. In the last step, we get the result prob ≈ [ 0 . 0 7 3 8 , 0 . 0 3 6 9 , 0 . 0 1 8 5 , 0 . 1 2 9 1 , 0 . 1 9 3 5 , 0 . 0 6 4 5 , 0 . 2 5 8 , 0 . 0 6 4 5 , 0 . 0 3 2 2 , 0 . 1 2 9 ] Therefore, we get the highest probability p ≈ 2 6 % for the website 7.