SlideShare a Scribd company logo
“PAGE RANKING”
ALGORITHM
INTRODUCTION
• Finding useful information on the World Wide Web is something many of us take for
granted. According to the Internet research firm Netcraft, there are nearly 150,000,000
active Web sites on the Internet today.
• Google's algorithm does the work for you by searching out Web pages that contain
the keywords you used to search, then assigning a rank to each page based several
factors, including how many times the keywords appear on the page. Higher ranked
pages appear further up in Google's search engine results page (SERP), meaning that
the best links relating to your search query are theoretically the first ones Google lists.
• Automated programs called spiders or crawlers travel the Web, moving from link to link
and building up an index page that includes certain keywords. Google references this
index when a user enters a search query. The search engine lists the pages that contain
the same keywords that were in the user's search terms.
• Also like other search engines, Google has a large index of keywords and where those words can be found.
What sets Google apart is how it ranks search results, which in turn determines the order Google displays results
on its search engine results page (SERP). Google uses a trademarked algorithm called PageRank, which assigns
each Web page a relevancy score.
• Keyword placement plays a part in how Google finds sites. Google looks for keywords throughout each Web
page, but some sections are more important than others. Including the keyword in the Web page's title is a
good idea, for example. Google also searches for keywords in headings.
How to decide which page is to be selected and which has to be left out,
google does this by asking questions 200 of them, few important ones are:
i. How many time the keyword is contained in the page ? i.e.
frequency of the word in the page
ii. Do words appear in title ,URL, directly adjacent, meta tag?
iii. Does page include Synonyms..
iv. Page from quality website, low quality,…
v. Page rank?
PAGERANKING ALGORITHM
• Google’s PageRank algorithm has become one of the most famous in
computer science. It was originally designed to rank websites according
to their importance by assuming that a site is important if it is linked to by
other important sites it follows the real life philosophy that
“How does a product or an individual get popular when people other
than the individual know about that individual or product “
which is similar to page ranking of a page when other webpages has a
link to the specific web page.
• The algorithm works by counting the links to a website and the
importance of the sites these come from. It then uses this to work out the
importance of the original site. Through a process of iteration, the
algorithm comes up with a ranking.
• PageRank assigns a rank or score to every search result. The higher the page's
score, the further up the search results list it will appear.
• Scores are partially determined by the number of other Web pages that link to
the target page. Each link is counted as a vote for the target. The logic behind
this is that pages with high quality content will be linked to, more often than
mediocre pages.
• Not all votes are equal. Votes from a high-ranking Web page count more than
votes from low-ranking sites. You can't really boost one Web page's rank by
making a bunch of empty Web sites linking back to the target page.
• The more links a Web page sends out, the more diluted its voting power
becomes. In other words, if a high-ranking page links to hundreds of other pages,
each individual vote won't count as much as it would if the page only linked to a
few sites.
• Other factors that might affect scoring include the how long the site has been
around, the strength of the domain name, how and where the keywords appear
on the site and the age of the links going to and from the site. Google tends to
place more value on sites that have been around for a while.
A Web page's PageRank depends on a few factors:
• The frequency and location of keywords within the Web page: If the
keyword only appears once within the body of a page, it will receive
a low score for that keyword.
• How long the Web page has existed: People create new Web pages
every day, and not all of them stick around for long. Google places
more value on pages with an established history.
• The number of other Web pages that link to the page in question:
Google looks at how many Web pages link to a particular site to
determine its relevance.
• Out of these three factors, the third
is the most important. It's easier to
understand it with an example.
• Let's look at a search for the terms
"Planet Earth.“
• As more Web pages link to
Discovery's Planet Earth page, the
Discovery page's rank increases.
When Discovery's page ranks higher
than other pages, it shows up at the
top of the Google search results
page.
PageRank description
We assume page A has pages T1...Tn which point to it .
The parameter d is a damping factor which can be set between 0 and 1. We usually set d
to 0.85.
The PageRank theory holds that an imaginary surfer who is randomly clicking on links will
eventually stop clicking.
The probability, at any step, that the person will continue is a damping factor d.
Various studies have tested different damping factors, but it is generally assumed that the
damping factor will be set around 0.85.
Also C(A) is defined as the number of links going out of page A.
The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))
the PageRank's form a probability distribution over web pages,
“so the sum of all web pages' PageRank's will be one”.
How is PageRank Calculated?
• The PR of each page depends on the PR of the pages pointing to it. But
we won’t know what PR those pages have until the pages pointing
to them have their PR calculated and so on… And when you consider that
page links can form circles it seems impossible to do this calculation!
• the Google paper says:
PageRank or PR(A) can be calculated using a simple iterative algorithm,
and corresponds to the principal eigenvector of the normalized link matrix of
the web.
What that means to us is that we can just go ahead and calculate a page’s
PR without knowing the final value of the PR of the other pages. That seems
strange but, basically, each time we run the calculation we’re getting a
closer estimate of the final value. So all we need to do is remember the
each value we calculate and repeat the calculations lots of times until the
numbers stop changing much.
Lets take the simplest example network: two pages, each pointing to the
other:
Each page has one outgoing link (the outgoing count is 1, i.e. C(A) = 1 and
C(B) = 1).
1. GUESS 1 d= 0.85
PR(A)= (1 – d) + d(PR(B)/1)
PR(B)= (1 – d) + d(PR(A)/1)
PR(A)= 0.15 + 0.85 * 1
= 1
PR(B)= 0.15 + 0.85 * 1
= 1
We don’t know what their PR should be to begin with, so let’s take a guess at 1.0 and do some calculations:
i.e.
2. GUESS 2
PR(A)= 0.15 + 0.85 * 0
= 0.15
PR(B)= 0.15 + 0.85 * 0.15
= 0.2775
PR(A)= 0.15 + 0.85 * 0.2775
= 0.385875
PR(B)= 0.15 + 0.85 * 0.385875
= 0.47799375
PR(A)= 0.15 + 0.85 * 0.47799375
= 0.5562946875
PR(B)= 0.15 + 0.85 * 0.5562946875
= 0.622850484375
Ok, let’s start the guess at 0 instead and re-calculate:
And again:
And again:
and so on. The numbers just keep going up. But will the numbers stop increasing when they get to 1.0? What if a calculation
over-shoots and goes above 1.0?
3. GUESS 3
Let’s start the guess at 40 each and do a few cycles:
PR(A) = 40
• Principle: it doesn’t matter where you start your guess, once the PageRank calculations
have settled down, the “normalized probability distribution” (the average PageRank for
all pages) will be 1.0
PR(A)= 0.15 + 0.85 * 40
= 34.25
PR(B)= 0.15 + 0.85 * 0.385875
= 29.1775
PR(A)= 0.15 + 0.85 * 29.1775
= 24.950875
PR(B)= 0.15 + 0.85 * 24.950875
= 21.35824375
First calculation
And again
PR(D)= (1-d) + d * (0)
= 0.15
no backlinks means the equation looks like this:
no matter what else is going on or how many times you do it.
Observation: every page has at least a PR of 0.15 to share out.
• Our home page has 2 and a
half times as much PR as the
child pages! Excellent!
• This is what we’d expect. All
the pages have the same
number of incoming links, all
pages are of equal
importance to each other, all
pages get the same PR of 1.0
(i.e. the “average”
probability).
EXAMPLES
• Because Google looks at links to a Web page as a vote, it's not easy to cheat the system. The best way to make sure
your Web page is high up on Google's search results is to provide great content so that people will link back to your
page. The more links your page gets, the higher its PageRank score will be. If you attract the attention of sites with a
high PageRank score, your score will grow faster.
• Mega-sites, like http://news.bbc.co.uk have tens or hundreds of editors writing new content – i.e. new pages - all day
long! Each one of those pages has rich, worthwhile content of its own and a link back to its parent or the home page!
That’s why the Home page Toolbar PR of these sites is 9/10 and the rest of us just get pushed lower and lower by
comparison…
• Principle: Content Is King! There really is no substitute for lots of good content…
Steps to a enhance your PAGERANK
1.Give visitors the information they're looking for
• Provide high-quality content on your pages, especially your homepage. This is the single most
important thing to do. If your pages contain useful information,their content will attract many
visitors and entice webmasters to link to your site. Think about the words users would type to
find your pages and include those words on your site.
2. Make sure that other sites link to yours
• Links help our crawlers find your site and can give your site greater visibility in our search results.
When returning results for a search, Google uses sophisticated text-matching techniques to
display pages that are both important and relevant to each search. Google interprets a link
from page A to page B as a vote by page A for page B.
3. Make your site easily accessible
• Build your site with a logical link structure. Every page should be reachable from at least one
static text link.
BIBLIOGRAPHY
• http://www.google.com/googlebot
• www.wikipedia.org
• http://infolab.stanford.edu/~backrub/google.html
THANK YOU

More Related Content

PPTX
Designing data intensive applications
PPTX
Page rank algortihm
PPT
Pagerank Algorithm Explained
PPTX
PageRank Algorithm In data mining
PDF
Linear algebra behind Google search
PDF
The Next Generation of AI-powered Search
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
PPTX
SQL vs MongoDB
Designing data intensive applications
Page rank algortihm
Pagerank Algorithm Explained
PageRank Algorithm In data mining
Linear algebra behind Google search
The Next Generation of AI-powered Search
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
SQL vs MongoDB

What's hot (20)

PPT
Seo and page rank algorithm
PPTX
Page rank algorithm
PDF
Google PageRank
PPTX
PageRank
PPT
Page Rank
PPT
Web crawler
PPS
Google Search Presentation
PPT
Advance SEO Training - Professional SEO Techniques
PPTX
Search engine
PPT
Page rank
PDF
PageRank_algorithm_Nfaoui_El_Habib
PPTX
Search Engine
PPTX
Page rank and hyperlink
PPT
Web search engines ( Mr.Mirza )
PDF
Pagerank and hits
PPT
google search engine
PPT
Web mining
PPTX
Link analysis : Comparative study of HITS and Page Rank Algorithm
PPT
Google Search Engine
Seo and page rank algorithm
Page rank algorithm
Google PageRank
PageRank
Page Rank
Web crawler
Google Search Presentation
Advance SEO Training - Professional SEO Techniques
Search engine
Page rank
PageRank_algorithm_Nfaoui_El_Habib
Search Engine
Page rank and hyperlink
Web search engines ( Mr.Mirza )
Pagerank and hits
google search engine
Web mining
Link analysis : Comparative study of HITS and Page Rank Algorithm
Google Search Engine
Ad

Viewers also liked (12)

PPTX
Ranking algorithms
PDF
Google Panda
PDF
Fourier Transforms
PDF
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
PDF
PageRank and Related Methods
PDF
Link Analysis (RBY)
PPT
Lec5 Pagerank
PPT
Introduction to question answering for linked data & big data
PPT
Comparative study of different ranking algorithms adopted by search engine
PDF
The Google Pagerank algorithm - How does it work?
PPTX
Search engine optimization
Ranking algorithms
Google Panda
Fourier Transforms
Adding Semantics to Social Software Engineering (by Steffen Lohmann & Thomas ...
PageRank and Related Methods
Link Analysis (RBY)
Lec5 Pagerank
Introduction to question answering for linked data & big data
Comparative study of different ranking algorithms adopted by search engine
The Google Pagerank algorithm - How does it work?
Search engine optimization
Ad

Similar to page ranking algorithm (20)

PPTX
google pagerank algorithms cosc 4335 stnaford
PPTX
Dm page rank
PPT
Search engine page rank demystification
PPTX
Google Page Ranking
DOC
PageRank & Searching
PPTX
How Google Works
PPTX
Page ranking factors
PPTX
Optimizing search engines
PDF
PPT
Page rank by university of michagain.ppt
PPT
Ranking Web Pages
PDF
PageRank Algorithm
PPTX
Search engine
PDF
Page Rank
PDF
Google page rank
PPTX
Are you interested in increasing your Google PageRank?
PPTX
Are you interested in increasing your Google PageRank?
PPTX
Are you interested in increasing your Google PageRank?
PPT
Googling of GooGle
google pagerank algorithms cosc 4335 stnaford
Dm page rank
Search engine page rank demystification
Google Page Ranking
PageRank & Searching
How Google Works
Page ranking factors
Optimizing search engines
Page rank by university of michagain.ppt
Ranking Web Pages
PageRank Algorithm
Search engine
Page Rank
Google page rank
Are you interested in increasing your Google PageRank?
Are you interested in increasing your Google PageRank?
Are you interested in increasing your Google PageRank?
Googling of GooGle

Recently uploaded (20)

PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Presentation on HIE in infants and its manifestations
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Cell Types and Its function , kingdom of life
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
RMMM.pdf make it easy to upload and study
STATICS OF THE RIGID BODIES Hibbelers.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Presentation on HIE in infants and its manifestations
102 student loan defaulters named and shamed – Is someone you know on the list?
Abdominal Access Techniques with Prof. Dr. R K Mishra
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
2.FourierTransform-ShortQuestionswithAnswers.pdf
Cell Structure & Organelles in detailed.
O5-L3 Freight Transport Ops (International) V1.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Cell Types and Its function , kingdom of life
GDM (1) (1).pptx small presentation for students
Anesthesia in Laparoscopic Surgery in India
RMMM.pdf make it easy to upload and study

page ranking algorithm

  • 2. INTRODUCTION • Finding useful information on the World Wide Web is something many of us take for granted. According to the Internet research firm Netcraft, there are nearly 150,000,000 active Web sites on the Internet today. • Google's algorithm does the work for you by searching out Web pages that contain the keywords you used to search, then assigning a rank to each page based several factors, including how many times the keywords appear on the page. Higher ranked pages appear further up in Google's search engine results page (SERP), meaning that the best links relating to your search query are theoretically the first ones Google lists. • Automated programs called spiders or crawlers travel the Web, moving from link to link and building up an index page that includes certain keywords. Google references this index when a user enters a search query. The search engine lists the pages that contain the same keywords that were in the user's search terms.
  • 3. • Also like other search engines, Google has a large index of keywords and where those words can be found. What sets Google apart is how it ranks search results, which in turn determines the order Google displays results on its search engine results page (SERP). Google uses a trademarked algorithm called PageRank, which assigns each Web page a relevancy score. • Keyword placement plays a part in how Google finds sites. Google looks for keywords throughout each Web page, but some sections are more important than others. Including the keyword in the Web page's title is a good idea, for example. Google also searches for keywords in headings. How to decide which page is to be selected and which has to be left out, google does this by asking questions 200 of them, few important ones are: i. How many time the keyword is contained in the page ? i.e. frequency of the word in the page ii. Do words appear in title ,URL, directly adjacent, meta tag? iii. Does page include Synonyms.. iv. Page from quality website, low quality,… v. Page rank?
  • 4. PAGERANKING ALGORITHM • Google’s PageRank algorithm has become one of the most famous in computer science. It was originally designed to rank websites according to their importance by assuming that a site is important if it is linked to by other important sites it follows the real life philosophy that “How does a product or an individual get popular when people other than the individual know about that individual or product “ which is similar to page ranking of a page when other webpages has a link to the specific web page. • The algorithm works by counting the links to a website and the importance of the sites these come from. It then uses this to work out the importance of the original site. Through a process of iteration, the algorithm comes up with a ranking.
  • 5. • PageRank assigns a rank or score to every search result. The higher the page's score, the further up the search results list it will appear. • Scores are partially determined by the number of other Web pages that link to the target page. Each link is counted as a vote for the target. The logic behind this is that pages with high quality content will be linked to, more often than mediocre pages. • Not all votes are equal. Votes from a high-ranking Web page count more than votes from low-ranking sites. You can't really boost one Web page's rank by making a bunch of empty Web sites linking back to the target page. • The more links a Web page sends out, the more diluted its voting power becomes. In other words, if a high-ranking page links to hundreds of other pages, each individual vote won't count as much as it would if the page only linked to a few sites. • Other factors that might affect scoring include the how long the site has been around, the strength of the domain name, how and where the keywords appear on the site and the age of the links going to and from the site. Google tends to place more value on sites that have been around for a while.
  • 6. A Web page's PageRank depends on a few factors: • The frequency and location of keywords within the Web page: If the keyword only appears once within the body of a page, it will receive a low score for that keyword. • How long the Web page has existed: People create new Web pages every day, and not all of them stick around for long. Google places more value on pages with an established history. • The number of other Web pages that link to the page in question: Google looks at how many Web pages link to a particular site to determine its relevance.
  • 7. • Out of these three factors, the third is the most important. It's easier to understand it with an example. • Let's look at a search for the terms "Planet Earth.“ • As more Web pages link to Discovery's Planet Earth page, the Discovery page's rank increases. When Discovery's page ranks higher than other pages, it shows up at the top of the Google search results page.
  • 8. PageRank description We assume page A has pages T1...Tn which point to it . The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. The PageRank theory holds that an imaginary surfer who is randomly clicking on links will eventually stop clicking. The probability, at any step, that the person will continue is a damping factor d. Various studies have tested different damping factors, but it is generally assumed that the damping factor will be set around 0.85. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows: PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) the PageRank's form a probability distribution over web pages, “so the sum of all web pages' PageRank's will be one”.
  • 9. How is PageRank Calculated? • The PR of each page depends on the PR of the pages pointing to it. But we won’t know what PR those pages have until the pages pointing to them have their PR calculated and so on… And when you consider that page links can form circles it seems impossible to do this calculation! • the Google paper says: PageRank or PR(A) can be calculated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. What that means to us is that we can just go ahead and calculate a page’s PR without knowing the final value of the PR of the other pages. That seems strange but, basically, each time we run the calculation we’re getting a closer estimate of the final value. So all we need to do is remember the each value we calculate and repeat the calculations lots of times until the numbers stop changing much.
  • 10. Lets take the simplest example network: two pages, each pointing to the other: Each page has one outgoing link (the outgoing count is 1, i.e. C(A) = 1 and C(B) = 1). 1. GUESS 1 d= 0.85 PR(A)= (1 – d) + d(PR(B)/1) PR(B)= (1 – d) + d(PR(A)/1) PR(A)= 0.15 + 0.85 * 1 = 1 PR(B)= 0.15 + 0.85 * 1 = 1 We don’t know what their PR should be to begin with, so let’s take a guess at 1.0 and do some calculations: i.e.
  • 11. 2. GUESS 2 PR(A)= 0.15 + 0.85 * 0 = 0.15 PR(B)= 0.15 + 0.85 * 0.15 = 0.2775 PR(A)= 0.15 + 0.85 * 0.2775 = 0.385875 PR(B)= 0.15 + 0.85 * 0.385875 = 0.47799375 PR(A)= 0.15 + 0.85 * 0.47799375 = 0.5562946875 PR(B)= 0.15 + 0.85 * 0.5562946875 = 0.622850484375 Ok, let’s start the guess at 0 instead and re-calculate: And again: And again: and so on. The numbers just keep going up. But will the numbers stop increasing when they get to 1.0? What if a calculation over-shoots and goes above 1.0?
  • 12. 3. GUESS 3 Let’s start the guess at 40 each and do a few cycles: PR(A) = 40 • Principle: it doesn’t matter where you start your guess, once the PageRank calculations have settled down, the “normalized probability distribution” (the average PageRank for all pages) will be 1.0 PR(A)= 0.15 + 0.85 * 40 = 34.25 PR(B)= 0.15 + 0.85 * 0.385875 = 29.1775 PR(A)= 0.15 + 0.85 * 29.1775 = 24.950875 PR(B)= 0.15 + 0.85 * 24.950875 = 21.35824375 First calculation And again
  • 13. PR(D)= (1-d) + d * (0) = 0.15 no backlinks means the equation looks like this: no matter what else is going on or how many times you do it. Observation: every page has at least a PR of 0.15 to share out.
  • 14. • Our home page has 2 and a half times as much PR as the child pages! Excellent! • This is what we’d expect. All the pages have the same number of incoming links, all pages are of equal importance to each other, all pages get the same PR of 1.0 (i.e. the “average” probability).
  • 15. EXAMPLES • Because Google looks at links to a Web page as a vote, it's not easy to cheat the system. The best way to make sure your Web page is high up on Google's search results is to provide great content so that people will link back to your page. The more links your page gets, the higher its PageRank score will be. If you attract the attention of sites with a high PageRank score, your score will grow faster. • Mega-sites, like http://news.bbc.co.uk have tens or hundreds of editors writing new content – i.e. new pages - all day long! Each one of those pages has rich, worthwhile content of its own and a link back to its parent or the home page! That’s why the Home page Toolbar PR of these sites is 9/10 and the rest of us just get pushed lower and lower by comparison… • Principle: Content Is King! There really is no substitute for lots of good content…
  • 16. Steps to a enhance your PAGERANK 1.Give visitors the information they're looking for • Provide high-quality content on your pages, especially your homepage. This is the single most important thing to do. If your pages contain useful information,their content will attract many visitors and entice webmasters to link to your site. Think about the words users would type to find your pages and include those words on your site. 2. Make sure that other sites link to yours • Links help our crawlers find your site and can give your site greater visibility in our search results. When returning results for a search, Google uses sophisticated text-matching techniques to display pages that are both important and relevant to each search. Google interprets a link from page A to page B as a vote by page A for page B. 3. Make your site easily accessible • Build your site with a logical link structure. Every page should be reachable from at least one static text link.
  • 17. BIBLIOGRAPHY • http://www.google.com/googlebot • www.wikipedia.org • http://infolab.stanford.edu/~backrub/google.html