This project is actually my master’s summer paper/thesis. While the backstory isn’t as intriguing as the grades paper, there’s still a story. Back in 2022 I was exploring conspiracy theory rabbit holes because, to be quite frank, they’re interesting—it’s fascinating to see how people construct their arguments and why they find them convincing. One video claimed that Wikipedia is a far left and biased website and that nobody should trust it. After watching, I admitted some of the instances mentioned were not a good look. However, after further research, I realized these were isolated cases, and Wikipedia’s continuous editing process is designed to fix such biases and inaccuracies.
Fast forward to the summer of 2024, I had to choose a thesis topic. My preferred topic was more aligned with typical economics papers and with my advisor, Thomas Lemieux a very well known labour economist. I originally wanted to conduct a regression discontinuity to determine the causal effect of enrolling in quantitative versus qualitative majors—to see if there’s actual value in the education or if it’s just self-selection. Unfortunately, the data sources either refused to provide data or it would have taken too long to complete my thesis on time. So, I had to pivot.
After brainstorming, I discussed with Thomas, and he gravitated toward this Wikipedia idea I’d noted years ago. After figuring out how to conduct the study, I began implementing my ideas. The main roadblock was web scraping Wikipedia pages due to some technical issues. I got help from my friend and cohort member, Sein Jone, who used a web scraper he’d built for the WayBack Machine. Due to the large amount of data, this had to be done on a cloud server, and I ended up spending around $50 USD—the only one in my cohort with a positive budget for a project 😓.
Eventually, I had a clean dataset. The results were surprising and a bit more politically charged than I’d have liked. They showed there wasn’t a bias for left-wing politicians but rather a bias against right-wing politicians, aligning with other academic work on similar topics.
In the end, the paper tied for the highest grade in my class (93.3%), marking my second research paper during my master’s rated at the top of my cohort. I was quite proud of the work, even though the results are a bit polarizing. I received very positive comments from my advisor and believe it’s some of the highest quality work I’ve ever done.
Righting the Writers
Abstract: This paper investigates the presence of political bias in Wikipedia through a causal inference framework. Utilizing a dataset of 1,399 politicians from the US, UK, and Canada and 271,400 historical snapshots of their Wikipedia pages, I employ an event study/staggered Difference-in-Differences (DiD) research design combined with a Large Language Model (LLM) for sentiment analysis. The analysis estimates the impact of being affiliated with right-wing versus left-wing parties on the sentiment of politicians’ Wikipedia pages. The findings reveal a statistically significant decrease in the sentiment of these pages following a switch to a more right-wing party, an effect that is not observed with switches to more left-wing parties. These results highlight Wikipedia’s potential ideological biases and continue the discussion on how media platforms influence public perception and discourse.