- Thursday, June 30, 2011 at 10:18 PM
A White Paper on changes to the PageRank algorithms.
Reg Charie NBS-SEO & DotCom-Productions
If you are a developer, or SEOer, hobbyist or professional, you cannot but help notice the changes being brought about in search.
Panda is the latest to stir the pot, but current, basic, major changes, have roots that reach back to the beginnings, with major changes implemented a couple/few years.
ALL Links open in new window. Close to return.
|Let me quickly define the state of PageRank
back in October '09.
Before this point the effect of PageRank
on SERPs was still under debate. Google had posted in 2007 how PR was being
devalued, but the community did not listen.
|Google's Susan Moskwa ( has come out again and stated that the PageRank metric is not a good choice.|
Webmaster level: Beginner
|As Susan pointed out, Udi Manber, VP of
engineering at Google wrote in his blog in 2008:
“The most famous part of our ranking algorithm is PageRank, an algorithm developed by Larry Page and Sergey Brin, who founded Google. PageRank is still in use today, but it is now a part of a much larger system.”
Let's look at the problems with PageRank.
The basic problem is that those doing SEO ignore the "organic"
requirements by building links in an effort to influence the search
The rank value indicates an importance of a particular page. A hyperlink to a page counts as a vote of support. The PageRank of a page is defined recursively and depends on the number and PageRank metric of all pages that link to it "incoming links". A page that is linked to by many pages with high PageRank receives a high rank itself. (My bold).
Amount of PR algo changes made from '04 to '09.
|This is the crux of the problem.
Links were assumed to have relevance as they would in the academic
Without the influence of relevance the authority becomes blurred.
Because a linking page has a high value does not mean that the value bears relevance to the topic on the linked page.
In the old PR system, all things being equal, a link on a high PR page means more PR for the linked page.
They go on to confirm this by stating:
In practice, the PageRank concept has proven to be vulnerable to manipulation, and extensive research has been devoted to identifying falsely inflated PageRank and ways to ignore links from documents with falsely inflated PageRank.
But how about links from documents with genuinely acquired PR, but which are not authorities on the subject?
How would you value a link to a page about knitting a sweater on a PR8 site that is concerned with IT Security?
The PR value assigned would not be accurate.
The old PR had 2 fractions that caused problems.
|If the primary calculation is switched from one based on the PR of the linking page, to one calculated on the relevance between linking and linked pages, all sorts of previous problems disappear.|
Previous Linking Problems
|Because of the calculation method, (using
the linking page's PR as a base), all of the above give PR, if not found
to be spam by Google.
By calculating the PageRank using relevance as a metric numbers 1 through 5 do not matter anymore.
If it is a link based on relevance it does not matter if the link is reciprocal, if it is on a paid link page, uses 3 way linking, or if it is on a high or low PR page.
It is the degree of relevance that counts.
|The follow/NoFollow tags are rendered
Google just has to follow the links that have relevance, it can ignore all others.
This alone will save Google a ton of time/CPU cycles and improve the information silos, at the same time.
|Original anatomy of Google as presented pre-production by Page and
Note the URL resolver flows to "Links" which in turn goes to PageRank then to the searcher.
In fact, as of November 1997, only one of the top four
commercial search engines finds itself (returns its own search page in
response to its name in the top ten results).
People are still only willing to look at the first few tens of
results. Because of this, as the collection size grows, we need tools
that have very high precision (number of relevant documents returned,
say in the top tens of results). Indeed, we want our notion of
"relevant" to only include the very best documents since there may
be tens of thousands of slightly relevant documents. This very high
precision is important even at the expense of recall (the total number
of relevant documents the system is able to return). There is quite a
bit of recent optimism that the use of more hypertextual information can
help improve search and other applications [Marchiori
2. System FeaturesThe Google search engine has two important features that help it produce high precision results. First, it makes use of the link structure of the Web to calculate a quality ranking for each web page. This ranking is called PageRank and is described in detail in [Page 98]. Second, Google utilizes link to improve search results.
2.2 Anchor Text
The text of links is treated in a special way in our search engine. Most search engines associate the text of a link with the page that the link is on. In addition, we associate it with the page the link points to. This has several advantages. First, anchors often provide more accurate descriptions of web pages than the pages themselves.
We use anchor propagation mostly because anchor text can help provide better quality results
This idea of propagating anchor text to the page it refers to was implemented in the World Wide Web Worm [McBryan 94] especially because it helps search non-text information
Aside from PageRank and the use of anchor text, Google has several other features. First, it has location information for all hits and so it makes extensive use of proximity in search. Second, Google keeps track of some visual presentation details such as font size of words. Words in a larger or bolder font are weighted higher than other words.
|This refers to position of words in the content and in the code.|
3.1 Information Retrieval
Goes beyond exact
match and amount of repetitions: