PageRank is a link analysis algorithm developed by Larry Page and Sergey Brin, the founders of Google.
It evaluates the importance of web pages based on the number and quality of links pointing to them.
The core idea is:
- A page is more important if many important pages link to it.
PageRank assigns a numerical weight to each webpage. The higher the value, the more important the page is considered.
- A page’s importance is determined by the importance of pages linking to it.
- The rank is distributed among outgoing links.
- A damping factor (typically 0.85) accounts for random jumps to any webpage.
PageRank can be computed using two main approaches:
- Iterative/Numerical Method (Basic Definition)
- Matrix Multiplication (Power Iteration Method)
[ PR_{t+1}(i) = sum_{j in B_i} (PR_t(j) / C(j)) ]
Where:
- PR_t+1(i) → PageRank of page i at iteration t+1.
- B_i → Set of pages linking to page i.
- C(j) → Number of outgoing links from page j.
- Start with an equal initial PageRank for all pages.
- Update each page’s rank using the formula.
- Repeat until convergence (when changes are minimal).
A directed web graph can be represented as a stochastic transition matrix M, where:
- Each entry M[i, j] represents the probability of moving from page j to page i.
- Columns sum to 1, ensuring a probability distribution.
The PageRank vector PR is computed using repeated matrix multiplications:
- Construct the transition matrix M.
- Initialize PageRank vector with equal values.
- Multiply iteratively until values stabilize.
To prevent dead ends (pages with no outgoing links) and spider traps (loops), a damping factor (d) (typically 0.85) is introduced.
[ PageRank = d * M' * PageRank + (1 - d) * v ]
Where:
- d → Damping factor.
- M' → Modified transition matrix with teleportation.
- v → Teleportation vector (typically uniform).
To run the Python implementation:
python pagerank.pyTo run the code, ensure you have the following Python libraries installed:
numpynetworkxmatplotlib
You can install them using:
pip install numpy networkx matplotlibPageRank remains one of the most influential algorithms in search ranking. Understanding both its numerical and matrix-based formulations provides insights into its efficiency and real-world applications.
