The 2016 U.S. Presidential Elections: A Timeline Analysis
It is currently under investigation whether during the 2016 U.S. elections a Russian ‘troll factory’, the Internet Research agency (IRA) based in St. Petersburg, released a number of tweets from fraudulent Twitter accounts. These tweets potentially influenced the population of voters by spreading divisive statements, some of which were fake news. The aim of this analysis is to better understand the strategy of the trolls by retrieving the main subjects of these tweets over time and categorizing them according to their targeted social, geographical, or political group. Quantifing the potential impact of those tweets and providing a link to specific events that were occuring in the United States at the time are the objectives. All analyzes are provided with interactive plots to stimulate future research.
On October 7th 2016, the Department of Homeland Security and the Director of National Intelligence (ODNI) stated that the American Intelligence Community was certain that the Russian Government had interfered with the U.S. election process through a number of strategies with the intent of damaging Hillary Clinton’s presidential campaign. Examples of such strategies include, but are not limited to, directed hacking of Hillary Clinton’s personal google email account and the broadcasting of fake news via social media accounts. In early 2017, the ODNI stated that the Russian president Vladimir Putin personally ordered this ‘influence campaign’ to harm Clinton’s chances and thereby increase the chance of the election of a president more favorable to Russia.
The Russian internet trolls targetted a number of social media services such as Facebook and Twitter. This analysis will consider data only from Twitter. The dataset is provided by FiveThirtyEight and comprises a number of features, including but not limited to, author name, content (the tweet itself), language, date, followers and account type. A second dataset is used as supplementary material and includes other features that will be incorporated into the analysis such as the number of likes for a given tweet. We expect the types of accounts to differ in activity and subject matter depending on the timeframe. The analysis will attempt to dig deeper into the strategy of these Russian trolls.
An analysis of the types of languages that were tweeted reveals that Russian and English are the main languages. Interestingly, Spanish appears to be negligible even though it is the second most prominent language in the United States. The reason for this remains unclear. Considering the fact that English is spoken by 72% of individuals in the U.S. and that Russian tweets would not be able to get through to the American population in general, our project focuses only on the tweets written in English. The following plot shows the most common languages. The legend is ordered from most to least frequent language.
Following the work of Linvill and Warren, the Russian Trolls can be clustered into a few different account categories:
The plot below shows the distribution of followers for each account category. Notably we can see that many authors (especially left and right trolls) did not manage to reach a high number of followers with respect to the NewsFeed category. Additionally, we must consider the fact that the trolls could follow each other to give credibility to their accounts. Hence, knowing that the total number of unique trolls appearing in the entire FiveThirtyEight dataset is 2848, accounts with more that 2848 followers strongly suggest that they may be followed by also true American citizens. To further support this hypothesis, higher densities of accounts both at 1000 and 100 followers are visualized on the boxplot, suggesting that this log-bimodal distribution might not be due to true American Twitter accounts, but from an organized entity.
In order to gain a better understanding of what the Trolls were posting about, a neural network was trained to recognize 10 topics that we deemed to be the most dominant based on the most common hashtags. This is hence a supervised learning approach. The neural net was implemented in PyTorch and trained using Google CoLab’s Tesla K80 GPU. The net has three hidden layers - the first, second and third layers have 6000, 1000 and 100 neurons, respectively. The model was shown to have an accuracy of 85% through an evaluation on a test set. The list of topics is given below. Try hovering over the topic description for a graphical representation of the most tweeted words in a given topic.
Trump. Include any reference to Trump
Trump Adversaries. Include references to any political figure against Trump. Tweets in this category are commonly against Hillary Clinton or Obama.
Black. Include any reference the African-American population. It is mainly related to the #BlackLivesMatter movement
Patriot. Includes tweets related to support of the NRA,the army and conservative movements.
Crime. Involve a type of criminal offense
Sports. Includes any sports-related activity including events related to games, players and coaches.
Entertainment. Include events related to celebrities, music and sometimes controversial topics.
Health. Includes tweets related health in general such as health insurance, going to the gym and food.
Islam. Involves anything against Islam, notably ISIS and bombings that occured internationally.
Foreign Countries. Includes tweets interested in international affairs and world news.
From the following plot, it is clear that the general increase in followers cannot be attributed to an increase in tweets, nor increase in active authors. What could the increase in followers be attributed to? Are they real people? Or perhaps they are the Trolls themselves, including bots that may have been set up? To help answer this question, the statistics on the number of ‘likes’ were retrieved. ‘Likes’ have the potential to be considered as an indicator of Troll success - i.e. if they were able to get through to the general public. Here we see that the number of ‘likes’ skyrockets from 50,000 to 150,000 at the start of September, 2 months before the election. These numbers make sense, as there is an increase of approximately 200,000-400,000 followers in total during this period.
To better understand what topics were being ‘liked’, the model for topic prediction was applied. The graph below shows the proportions for the most dominant topics over time. However, we are most interested in the period between September 2016 to the date of the election, since this is when ‘like’ counts reach their highest numbers. Here it is shown that the majority of ‘likes’ were attributed to topics related to the election such as Trump-related, Black-related and finally to smashing Trump adversaries (including Clinton and Obama), interestingly the news related tweets did not receive any significant increase in likes. Had these ‘likes’ been caused by bots, it would have been likely to see an increase in ‘likes’ that are not catered to a specific topic but rather distributed among all Troll tweets. Since practically no ‘likes’ were given to News-Feed topics, it may be less likely that the ‘likes’ are fake, and are rather related to real Americans who turn their interest towards politically-related topics.
The Trolls may have been able to gain the attention of the masses via various strategies.
You may not be at war with Islam, but Islam is at war with you #ISIS -JENN_ABRAMS
#OscarsSoWhite REALLY?? #Oscars -JENN_ABRAMS
No equality. No freedom. Just violence, civil war and terrorism… Do we need this to happen in America? -PIGEONTODAY
This is not a misspell on Hofstra University’s debate tickets It’s the name of Hillary’s body double #debatenight -JENN_ABRAMS
A pearson correlation coefficient of 0.52 is obtained when quantifying the relationship between the percentage of African-Americans versus the percent of Black-support topics in that state. This implies that there is a statistically relevant relationship between these two variables, and that the Trolls may very well have considered the African-American population of states when releasing tweets. Furthermore, it validates the ability of our topic categorization model to accurately classify the Black-related topic.
One approach that had racked up a large quantity of followers was the creation of twitter accounts that impersonated a seemingly real individual rather than accounts named in a news-related fashion. A notable example is the author JENN_ABRAMS, a Troll account which was depicted to be a woman in her 30’s that supported Trump. This account managed to attain a maximum follower count of 61759 during the period under inspection.
Another potential strategy that we investigated was the act of replying to popular Twitter authors such as @midnight that are followed by a significant proportion of the American people. However, given the fact that only individuals who follow both the Troll who replied and the author to which the reply was made can see the reply, this tactic may have not been the best in terms of gaining an audience. For this reason the number of replies remains relatively low. The analysis was conducted by extracting unique users that were mentioned in tweets and quantifying their occurences. It also appears that trolls do not tend to mention eachother - only 13 trolls were shown to name another fellow troll. The accounts belonging to these 13 trolls are categorized as HashtagGamer. Additionally, other than the leading 3 trolls, they are mentioned less than ten times.
For the top 3 users, the 10 tweets with the most replies, likes and retweets were analyzed. These tweets did not have a particular “trolling” message. Additionally, some of the tweets were identical for all users, with tweets such as: “#ThingsIWontBelieve this church sign (link to picture)” and “#IHatePokemonGoBecause There will be more distracted drivers”.
Only in the 10 tweets with the most replies did we see some tweets with a political side such as:
Obama is elected the 3rd time #MakeMeMadIn5Words
and
Why? And when will my people learn? Whites can’t be trusted #IStartCryingWhen
and
#GrowingUpWithObama watching his ugly daughter in all networks
It should be noted that when the tweet ‘Obama being elected for the 3rd time’ was investigated, two of the users tweeted it at the same exact time. This lead to further inquiries which are discussed in the following section. By plotting the histogram of all the users mentioned by the trolls, it becomes evident that the most frequently mentioned user is @midnight which is a late-night internet themed panel game show. Users marked in red are those which one would expect to see mentioned (Donald Trump, Hillary Clinton) and the 3 trolls mentioned by other trolls. What we noticed is that in contrast to our expectation they did not mention Hillary Clinton nor Donald Trump all that much (around 200 tweets out of more than a million).
We were interested to see if tweet contents are repeated several times. In order to do so, the data was filtered to only contain tweets that were not labeled as a retweet. The findings were that there are 16,707 tweets that appear more than once in the entire dataset, while there are 27 tweets that appear more than 15 times. The following analysis focuses on these 27 tweets.
We were interested in quantifying the difference between politically-related topics and non-politically related topics with respect to duplicate tweets from distinct authors. We show that politically-related tweets tend to have larger numbers of distinct authors that post them, while non-politically related tweets tend to stick with a single author. The distinct dates that the tweets were released were also kept track of.
As the tweets are too long to show on any figure we created the following mapping:
A scatter plot visualizing the number of distinct authors as a function of number of distinct dates for each tweet was generated. By extracting tweets which appear at least 5 times (869 tweets) and by applying the topic categorization model to classify political (Trump-related, Trump adversaries, patriot, Black-related, Islam and foreign countries) versus non-political (sports, entertainment, health, crime) tweets we were able to increase the number of samples in the scatter plot.
The scatter plot indicates that there is a difference between political tweets and non-political tweets with respect to the amount of distinct authors. A statistical test for significance using the Wilcoxon signed-rank test on the difference between the two classes based on number of distinct authors resulted in a p-value of 1.50e-09.
A potential hypothesis for the fact that politically-related tweets tend to have multiple authors is that the trolls may have been instructed to release a given tweet. As all political tweets have less than 20 distinct dates we believe they are coordinated to some extent.
My heart goes out to the victims who were not so lucky #Prayers4California”
as well as:
#SanBernardinoShooting displayes how inept and clueless anti-gun lawmakers are. #Prayers4California
and:
Guns are our friends because in a country without guns, I’m what’s known as “prey.” All females are. #Prayers4California
Under Obama administration mass shootings happen every month! He wants to cover his ass with gun control! #Prayers4California
and:
#ObamaLogic: we can’t defeat ISIS? We definitely must ban guns #Prayers4California
#GunControl but not Muslim control? Jihad a part of Islam when not raping 6 yr old girls #Prayers4California
Chicago Police Admit To Killing Innocent 55-Year-Old Woman By Accident #BlackMatters
and:
LIVE FEED FROM CHICAGO ANTI-MAYOR PRTOTEST #Chicago #Rahm #BlackLivesMatter #LaquanMcDonald #PoliceBrutality
NewsOne Now Audio Podcast: Bishop E.W. Jackson Calls #BlackLivesMatter Is Movement “Disgraceful”
I proud to be black!!! #SelmaToMontgomery1965
and:
I’m with Martin Luther king. Support everything he proposed #SelmaToMontgomery1965
This analysis of the Twitter dataset provides insights into many potential strategies that may have been employed by the Russian trolls during the 2016 U.S. presidential election. We show that strategies such as gaining credibility via increasing the number of followers artificially, exploiting particular events to attain viewers, targeting the African-American population, releasing larger quantities of tweets to swing-states and re-tweeting identical tweets were all practiced tactics. Furthermore, our research demonstrates that the trolls may have been dedicated to some tactics more than others. Mentioning popular authors for example may not have been an efficient approach to gain followers, especially since it was not pursued to significant amounts. This may imply that the Russian trolls may not have had a strict plan to follow throughout the course of the election campaigns, but rather tried out a variety of strategies that appeared to work well.