This is a long due post which i wanted to write a year ago after my 1st sem in IIT . The page rank algorithm is a very simple probabilistic model for ranking the importance of page . The page rank (PR) of a website gives the importance of the website in the WEB .
The PR of a page (A) is calculated by the number of inbound links ie number of pages pointing to A . Suppose B , C , D are three web pages having a link to A . Then B,C,D is said to cast a vote of A . But google took another measure into account . It assigned a rank not only based on number of inbound links but also based on the importance of the page which points to A ie. the importance of B , C ,D is taken into account .
The PR of a page is calculated using the below formala
PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))
d – Damping ratio. Usually a value of 0.85 is set for this .
t1 – tn – These are the pages which are pointing to A
PR(t1) – The importance of t1 which is pointing to A
C(t1) – The total number of outbound links in page t1 . So if the Page rank of t1 is 0.50 and has 100 outgoing links from t1 then each outgoing link gets 0.50/100 .(uniformly distributed ) .
So the page rank of A is the sum of values contributed by each of it’s incoming links .
Pretty Simple right!!! Google does not take only page rank into account . It uses a number of complex measures . For eg: Suppose you are searching for Bruce Lee . The text processing engine finds web pages with the keyword (Say E is one of the page containing the keyword) . It also looks at pages pointing the page( pages pointing E) and does a context based search of these pages also .
So to in order to increase the page rank of the page one can make manually make a lot of links point to this page . This is called link farming . Google is intelligent enough to identify such link farming website and ignores such links . It also identifies such link farming websites as harmful and if you website points to such penalized pages you page rank will automatically be decreased . One way to increase the rank of ur site is to increase the number of pages in ur website .
A major problem when implementing algorithm is “Problem of Circularity ” . Suppose page A points to B and B points to A . To calculate the PR(A) we need PR(B) and vice versa . The value of P.R(A) and PR(B) calculated are inaccurate . So this procedure of calculating PR is iterated a number of times (ideally 5 times ) so that the value thus calculated is close the real value . This is the reason by PR updation of a webpage takes a lot of time . Google recalculates these values almost every month( not know for sure .. ) .
There was one more reason i am writing this article . I wanted to do some sort of analysis using the emails from my inbox (hobby work) . I thought that page rank algorithm will do the thing . So this is what i wanted to do . I wanted to read all the messages in my google inbox and make each and every address as a node in a graph . I would add an directed edge A–>B if A has received an mail from B . Then applying the page rank algo . I would have information on important nodes in my social network graph . If you have used page rank algorithm for any of the problems kindly post in the comments section .