Regression, Twitter, and #Ferguson

Emma Pierson's analysis of hashtags that have appeared in tweets about Ferguson, Missouri. Click for the image on her website. Image: Emma Pierson.

Emma Pierson’s analysis of hashtags that have appeared in tweets about Ferguson, Missouri. Click for more information on her website. Image: Emma Pierson.

Like many people, I have been following news about the events in Ferguson, Missouri with shock and sorrow for almost two weeks. I have been following these events as a human, not as a mathematician. But there’s a mathematical side to this story, too. I’m not just talking about the statistics on how many people are killed by the police each year (which we don’t even know for sure) and the racial composition of the Ferguson police force versus the people they stop and arrest, although those are both important. I’m talking about Twitter. It’s been a crucial part of how the Ferguson story has become international news, but it’s also a useful source of data about how people are responding to the tragedy.

Emma Pierson is a computational biologist currently working for 23andMe, and her blog, Obsession with Regression, focuses on data analysis, often with Twitter’s data. She writes,

“I am very excited about Twitter because it combines two qualities.

“1. People actually use it. Famous people — it’s become standard for celebrities to say “Follow me on Twitter!” — and more importantly, lots of people.

“2. It makes massive amounts of data available in a way you can process with a computer. 500,000,000 tweets are sent every day and Twitter will give you up to 1% of those. And if I know what 1% I want — for example, only Tweets containing the word “Spock” — it will give me all of them, which means I can actually hear everything that’s being said on a topic by millions of people worldwide. And not just what’s being said, but who’s saying it — how they describe themselves, where they live, who their friends are, and the last few thousand things they said.”

She has been blogging about Twitter data since December 2013, when 23andMe was ordered to stop providing disease risk information to their customers. She wrote a post about who was reacting to the news on Twitter and how they felt about it. Of course, being an employee of the company represents an obvious potential source of bias, so she also included a link to the tweets she analyzed so others could study them. She’s done several other interesting data analyses as well. Earlier this summer, she wrote an interesting analysis of tweets about LeBron James’ most recent career move, and of personal interest to me is her post about gender in the symphony. (Her analysis seems to match my experience. In my four years in the orchestra in college, I think we only had two men in the viola section.)

On Tuesday, Pierson wrote a post about using Twitter to study people’s reactions to current events, focusing on Ferguson. She mined a few hundred thousand tweets about Ferguson and analyzed the diferent hashtags that appeared in tweets with #Ferguson. (Part of the visualization she made is at the top of this post.) She also put her mineTweets program up on Github so others can use it to collect tweets about any topic in real time. She has some ideas for further analysis, particularly about whether the day/night-peace/violence pattern is apparent in tweets, and she’s invited others to contribute either ideas or analyses of their own.

The events in Ferguson have also highlighted the difference between the way Twitter and Facebook work. I’m not the only one whose Twitter feed has been saturated with #Ferguson, while Facebook has been nearly silent on the topic. In a Medium article, Zeynep Tufekci explains how Facebook’s algorithm for deciding what to show us caused this discrepancy and wonders what would have happened to Ferguson without Twitter. “It’s a clear example why net neutrality is a human rights issue; a free speech issue; and an issue of the voiceless being heard, on their own terms,” she writes. “Algorithms have consequences.”

This entry was posted in Events, Mathematics and Computing, Statistics and tagged , , , , , , , , , . Bookmark the permalink.