Dartmouth logo Dartmouth College Computer Science
Technical Report series
CS home
TR home
TR search TR listserv
By author: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
By number: 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986

Twitter Bot Detection in the Context of the 2018 US Senate Elections
Wes Kendrick
Dartmouth TR2019-865

Abstract: A growing percentage of public political communication takes place on social media sites such as Twitter, and not all of it is posted by humans. If citizens are to have the final say online, we must be able to detect and weed out bot accounts. The objective of this thesis is threefold: 1) expand the pool of Twitter election data available for analysis, 2) evaluate the bot detection performance of humans on a ground-truth dataset, and 3) learn what features humans associate with accounts that they believe to be bots. In this thesis, we build a large database of over 120 million tweets from over 900,000 Twitter accounts that tweeted about political candidates running for US Senate during the 2018 American Midterm Elections. Tweet-level data were collected in real-time during the two-month period surrounding the elections; account-level data were collected retrospectively in the months following the elections. Using this original dataset, we design and launch a bot detection study using a novel combination of Amazon SageMaker and Qualtrics. For ground truth, we include 39 known bot accounts from a separate 2015 Bot Challenge Dataset (BCD 2015) in the study sample. Of the 39 known bots from BCD 2015, only 11 accounts (28.2%) were accurately identified as bots with a two-thirds or unanimous annotator vote; just 5 accounts (12.8%) were unanimously accurately identified as bots, highlighting the difficulty of building accurate training sets for bot detection. Looking at the study results for the Senate dataset accounts, we observe that accounts which 1) post frequently and 2) retweet frequently were more likely to be labeled as bots. The Senate dataset and the associated study results offer significant opportunities for further analysis and research.

Note: Senior Thesis. Advisors: V.S. Subrahmanian and Benjamin Valentino.

PDF PDF (13563KB)

Bibliographic citation for this report: [plain text] [BIB] [BibTeX] [Refer]

Or copy and paste:
   Wes Kendrick, "Twitter Bot Detection in the Context of the 2018 US Senate Elections." Dartmouth Computer Science Technical Report TR2019-865, May 2019.

Notify me about new tech reports.

Search the technical reports.

To receive paper copy of a report, by mail, send your address and the TR number to reports AT cs.dartmouth.edu

Copyright notice: The documents contained in this server are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Technical reports collection maintained by David Kotz.