Skip to content

Visualizing the phenomenon "Getting to Philosophy" that clicking the first link in the main text of a Wikipedia article, and then repeating the process, will lead to the 'Philosophy' article.

License

Notifications You must be signed in to change notification settings

Tom-stack3/wikiGraph

Repository files navigation

Wiki Graph - Getting to Philosophy

A script created to check the "Getting to Philosophy" phenomenon:
Clicking on the first link in the main text of an English Wikipedia article, and then repeating the process for subsequent articles, leads to the Philosophy article.

You are welcome to read more about this interesting phenomenon here: https://en.wikipedia.org/wiki/Wikipedia:Getting_to_Philosophy

I created a python script which "plays" this "game" and generates a nice Graph showing the paths created by clicking the first link in a Wikipedia article.
For example the following graph, which ran on these first pages:
"Russia", "Space", "Coronavirus", "Art", "LeBron James","Real Madrid" and "Formula One".
(to view raw: 7_0.svg , 7_0.pdf)

Installations

  1. pip install -r requirements.txt
  2. Install Graphviz from here: https://www.graphviz.org/download/ and make sure that the directory containing the dot executable is on your systems’ path !

Libraries used:

  • bs4 to scrape information from web pages easily.
  • wikipedia to access and parse data from Wikipedia.
  • Graphviz to create and render graphs.

Running the script

There are three options to run:

draw_pages.py:

Gets a list of Wikipedia article names to draw. The script finds the closest article to each name entered and draws the graph for the pages chosen.

e.g:
python draw_pages.py formula 1, Nervous system, Road Bicycle Racing, minerals, baseball, cafe

Results in the following graph: (to view raw: 6_1.svg , 6_1.pdf)

6 Formula One+Nervous system+Road bicycle racing.svg

draw_random.py:

Gets a list of integers. For each number in the arguments, the script generates and draws a graph, with randomly chosen articles. Each integer corresponds for the number of random articles in a drawing.

e.g:
python draw_random.py 10 18

Results in two graphs. One with 10 randomly chosen Wikipedia articles to start with, and one with 18 randomly chosen Wikipedia articles to start with.

draw_handpicked_pages.py:

Gets a number of pages to draw. Then the script lets the user choose each Wikipedia article manually, (using console I/O). After all the pages are chosen, the script generates the graph from the Wikipedia articles chosen.

e.g:
python draw_handpicked_pages.py 8

Gives the user 8 articles to choose, and then draws the graph for the 8 Wikipedia articles chosen.

How we decide what to click on?

Following the chain consists of:

  • Clicking on the first non-parenthesized, non-italicized link.
  • Ignoring external links, links to the current page, or red links (links to non-existent pages).
  • Stopping when reaching "Philosophy", a page with no links or a page that does not exist, or when a loop occurs.

The function that decides what we should click-on is: is_href_valid(), located in the WikiPage.py, in class WikiPage. It gets a href html tag, parsed with bs4(BeautifulSoup) and decides if it is valid to click on or not. If the page is valid - it returns True, otherwise - False.
You can go take a look on the checks it does, but in general we check the following stuff:

  1. It is indeed a link to a Wikipedia article. Meaning it is not an external link to somewhere outside Wikipedia.

  2. It is not a link enclosed in brackets.
    For example in Epistemology the first link that is clicked shouldn't be (🔊listen), Greek or ἐπιστήμη. The right link to click on is branch of philosophy instead.

  3. It is not a side-comment, meaning the link is not in the following tags:

    1. italicized (<i>)
    2. smaller text (<small>)
    3. supper text (<sup>)
  4. It is not a link to a disambiguation page ( disambiguation ).

How the graph is generated?

To generate the graph, I used a very convenient open-source library I found called Graphviz.

Output formats

The Graphviz library supports tons of output formats ( their documentation). In this project I preferred to use .SVG and .PDF files. Both preserve "quality" when zooming in.

One advantage of .SVG over .PDF files is that it allows adding URL links onto nodes, a feature which I found very useful. Consequently, the nodes of the graphs in the .SVG files are clickable and lead to the Wikipedia page they represent.

Loops found 😯

Of course the "Getting to Philosophy" phenomenon doesn't happen in 100% percent of the cases, and there are some loopholes in it. Some interesting loops of Wikipedia articles I found:

So.. what does Philosophy lead to?

As surprising as it sounds, Philosophy also leads to Philosophy 🥳🥳
You can see its path here: philosophy path
(to view raw: philosophy.svg , philosophy.pdf)

Examples:

Check out the folder output_examples for some examples of generated graphs.

Created by Tommy Zaft

About

Visualizing the phenomenon "Getting to Philosophy" that clicking the first link in the main text of a Wikipedia article, and then repeating the process, will lead to the 'Philosophy' article.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages