Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Don’t turn social media into another Literary Digest poll
Gayo-Avello D. Communications of the ACM54 (10):121-128,2011.Type:Article
Date Reviewed: Dec 2 2011

Twitter, due to its growth and ever-expanding user base, has become a gold mine of information for analysts who mine tweet content as a data source for gauging public opinion. Even The New York Times is discussing this phenomenon [1]. But what does a Literary Digest poll, conducted in 1936, have to do with the current practice of mining data on Twitter to make projections? Gayo-Avello discusses the dangers that may result when negative results from data extracted from Twitter are ignored, stating that “current research risks turning social media analytics into the next Literary Digest poll.”

For those unfamiliar with this reference, it is the classic demonstration of how data bias seriously skewed poll results. The poll was conducted by the Literary Digest for the 1936 US presidential election. The magazine’s own readers were its query source, and readers were asked which candidate they preferred: New Deal candidate Franklin Roosevelt, or Republican Alf Landon. The readers were reached by telephone numbers listed in nationwide directories and by a list of registered car owners. The poll concluded that Landon would win in a landslide. However, Roosevelt ultimately won with 61 percent of the popular vote. This poll became the quintessential example of the need to mine unbiased data as sources for reliable projections.

In the 2008 US presidential election, projections based on tweet data were heavily in favor of Barack Obama--even in states where he eventually lost heavily to John McCain. Gayo-Avello’s study looks for reasons for the faulty projections. He states:

My aim was not to compare Twitter data with pre-election polls or with the popular vote, as had been done previously, but to obtain predictions on a state-by-state basis. Additionally, unlike the other studies, my predictions were not to be derived from aggregating Twitter data but by detecting voting intention for every single user from their individual tweets.

Gayo-Avello “applied four different sentiment-analysis methods described in the most recent literature and carefully evaluated their performance.” He then demonstrated that the results for the 2008 US presidential election “could not have been predicted from Twitter data alone through commonly applied methods.” A substantial bibliography lists all of the references Gayo-Avello used in his research.

Relying heavily on statistical research and evaluation, Gayo-Avello, in the “Election Twitter Data Set” section, first describes an analysis in which simple Twitter data was used to overestimate President Obama’s victory in the 2008 US presidential election. He then hypothesizes that Twitter users are probably a sample group and, most likely, a biased one. The article continues, focusing on whether data extracted from Twitter can be used to reliably predict outcomes, both current and future. Gayo-Avello provides a series of statistical analyses, described through detailed text and tables, and highlights both errors and corrections in his research.

The data used in his study was collected from users’ unprotected tweets viewed in Twitter’s public timeline. These tweets are easily accessed and collected through Twitter’s own application programming interface (API). For Gayo-Avello’s study, tweets were collected shortly after the 2008 election using Twitter’s API search function. He used the following query parameters, picking up 100 tweets per candidate, per county, per day: only one query was used per candidate--Obama or Biden for the Democratic candidates, and McCain or Palin for the Republican candidates, and the query was limited to only those tweets published by US residents within a specified time interval. This view counted the number of appearances of a candidate in a user’s tweets, assuming that the candidate mentioned more often would be the one the user would later vote for. This view would ultimately prove to be wrong.

In the next section, “Inferring Voter Intention,” Gayo-Avello presents a second method using terms labeled either positive or negative. If a tweet contained more positive terms than negative, it was labeled positive; the opposite was true for negative terms. Since each tweet in the collection was applied to just one candidate, it was possible to count, for each user, the number of positive and negative tweets for each set of candidates. Three more elaborate procedures were tested: vote and flip; semantic orientation; and polarity lexicon, which ended up being the one used to infer votes for all users in the dataset since it was better at estimating McCain support and global accuracy. However, polling results achieved by analyzing Twitter data were still far less accurate than the predictive results achieved through traditional polling methods. Selection bias had tainted the sample.

Gayo-Avello then tested for Twitter bias. The first test applied to the data checked for the number of users per county. This was based on the premise that city dwellers and young adults are more likely to use Twitter and lean toward more liberal political opinions. This test looked for correlations between percentage of users per county and population density using the actual elections results for each county. Results showed that, within Twitter, all of the states showed a positive correlation between population density and the democratic vote in the 2008 US presidential election. Moreover, every state except for Missouri and Texas expressed positive correlation between population density and Twitter use. (In my own 2008 study, younger people were clearly overrepresented in Twitter, which explains part of the faulty prediction.) Results also showed that Republican voters used Twitter less than Democratic voters, or were reluctant to express their political opinions publicly. Republicans, or at least McCain supporters, tweeted much less than Democratic voters during the 2008 election.

In conclusion, the outcome of the 2008 US presidential election could not have been predicted from user content published through Twitter by applying the most common sentiment-analysis methods. According to the article, “Due to the prevalence of younger users and their tilt toward Democrats and Obama, ‘Democrats and Obama backers are more in evidence on the Internet than backers of other candidates or parties’ [2].” The possible biases in the data are consistent with the conclusions drawn by Lenhart and Fox [3], and Rainie and Smith [2].

The article ends with “Lessons Learned.” The problem with trying to predict the outcome of the 2008 US presidential election was not data collection per se, but two other things: the need to learn how to minimize the importance of bias in social media data, and the tendency to ignore how such data differs from the actual population.

Four lessons can be learned from this study. First, because researchers can assemble very large sets of data for mining does not make it statistically representative of overall populations. Second, bias can be introduced through the relative youth of social networking users. Researchers need to correct for bias by knowing user ages within their samples. Third, a topic that appears frequently within a given sample, or is repeated often, can skew results within Twitter. And finally, a no response within Twitter may play a more important role than is realized, especially if lack of information mainly affects only one group in particular.

Gayo-Avello concludes:

Until social media is used regularly by a broad segment of the voting population, its users cannot be considered a representative sample, and forecasts from the data will be of questionable value at best and incorrect in many cases. Until then, researchers using such data should identify the various strata of users--based on, say, age, income, gender, and race--to properly weigh their opinions according to the proportion of each of them in the population.

Statisticians, pollsters, media, and others interested in social behavior will find this paper enlightening. Follow-ups on the topic might investigate whether instinctive responses when tweeting, herd behavior, or the lack of critical evaluation by tweeters skew the results of a Twitter survey.

Reviewer:  Bernice Glenn Review #: CR139634 (1204-0395)
1) Zimmer, B. “Twitterology: a new science?” New York Times, Oct. 29, 2011, http://www.nytimes.com/2011/10/30/opinion/sunday/twitterology-a-new-science.html (accessed Dec. 1, 2011).
2) Rainie, L., and A. Smith. The Internet and the 2008 election. Pew Internet and American Life Project, Washington, D.C., 2008, http://www.pewinternet.org/Reports/2008/The-Internet-and-the-2008-Election.aspx (accessed Dec. 1, 2011).
3) Lenhart, A., and Fox, S. Twitter and status updating. Pew Internet and American Life Project, Washington, D.C., 2009, http://www.pewinternet.org/Reports/2009/Twitter-and-status-updating.aspx (accessed Dec. 1, 2011).
Bookmark and Share
  Reviewer Selected
Editor Recommended
Featured Reviewer
 
 
General (H.3.0 )
 
 
Communications Applications (H.4.3 )
 
 
Electronic Commerce (K.4.4 )
 
 
Group And Organization Interfaces (H.5.3 )
 
 
Information Search And Retrieval (H.3.3 )
 
Would you recommend this review?
yes
no
Other reviews under "General": Date
Dictionary of information science and technology
Watters C., Academic Press Prof., Inc., San Diego, CA, 1992. Type: Book (9780127385105)
Jul 1 1993
Information retrieval
Frakes W., Baeza-Yates R., Prentice-Hall, Inc., Upper Saddle River, NJ, 1992. Type: Book (9780134638379)
Jul 1 1993
Organizing information: principles of data base and retrieval systems
Soergel D., Academic Press Prof., Inc., San Diego, CA, 1985. Type: Book (9789780126542608)
Aug 1 1986
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy