Mapping the distraction that is Wikipedia
Posted by Ben Fri, 25 Apr 2008 22:02:00 GMT
It happens to us all. We go to Wikipedia with the intention of finding some specific information on some specific topic. Several hours later we realize that we are reading about Sexual Abuse on Pitcairn Island or something equally unrelated to High-Energy Particle Physics or whatever the initial topic was. Oftentimes the articles in between are forgotten and only revealed when one transforms into the smarmy smart ass at the next game of Trivial Pursuit. Much like waking up in a bathtub full of ice, missing a kidney, this loss of time and memory raises unsettling questions about recent events.
A rather old XKCD confirmed that I am not the only person experiencing this most curious malady.
![]()
I’m tired of the contents of the “3 hours of fascinated clicking” time block being unknown. I think I am reasonably sound of mind and the connections that get me from point A to point Z on the Wiki would make some sense in context. I might be wrong, but I want to find out with evidence.
I’ve hacked together a simple hacktempt at graphing this solution. Basically I have an extremely simple greasemonkey script that runs on en.wikipedia.org and captures the current page and the referer. It then runs some AJAX that tells a local mongrel hackjob
to update the database of connections. The local mongrel server also has an HTTPHandler (localhost:9999/show) that uses Graphviz to render a fresh hot png to be delivered to the web-browser. This handler also takes a query string with start and end to set the date range of interest.
The code is uglier than Fergie on a rainy day, but it works and I find the results to be pretty fascinating.
My actual usage for yesterday evening is also available
The code is available on Github
If it amuses you or you have any suggestions let me know.

Times would be fun – i.e. to show how long you spent reading about, say “Batman” vs. “Wet T-Shirt Contest”. You could weight the size/color the nodes of the graph by the length of time you spent on that page.
—Simon
There’s a nifty Mac app called Pathway that does this: http://osx.iusethis.com/app/pathway
I… need… this… program… really… really… badly
From all the topics in Wikipedia! I actually did end up reading about sexual abuse in the Pitcairn Islands!
Glad to know I was not the only one.
@simon g: Times would be pretty awesome, but it gets kind of tricky to determine how much time its being viewed versus just sitting in a tab. It’s really hard to measure view times in such a parallel, nonlinear system. If I get any insight into a good approach I might try to implement it, but I’m drawing blanks right now
@jan: As seemingly random as “Sexual abuse on the pitcairn islands” is, it kind of makes sense. All one has to do is get to the article on the Pitcairn Islands (HMS Bounty, Mutiny, British Dependencies, Pacific Islands) and then curiosity will quickly lead to the article on the abuses. I think the most fascinating thing about working with wikipedia is how even the most random of topics make sense when the link chain is examined. I am also glad to know that I’m not the only one who stumbled across the article on the islander’s perversions.
Producing an image map with graphviz that links to the actual articles would be a nice addition—if you don’t mind adding to the problem ;)
Is it just wikipedia, or it is the WWW? i mean c’mon the web is all about hyper (hyperlinks,hypermedia).
I see you looked up the God of Special Teams and Defense.
Wow, cool :) I’d like to try that out.
A great game is the wikipedia game. You start on a random page then try to get to another one (works best against another human).
Could you make something like this for StumbleUpon? I don’t get into wikipedia like you guys, but I can spend hours upon hours on StumbleUpon. Something like this would be neat for it.
IM thinking if you have time to write programs doing silly stuff for wikipedia, you can probably afford to waste those hours there anyway? So does it really matter? :P
I tried it out and had to add ; to the end of the lines in the Greasemonkey script in order to get it to generate the referrer correctly, otherwise it always submitted it as a “single” only.