Want to make data analysis fast for everyone?Join Us!
There were many unique aspects of the 2016 U.S. presidential election, with one of the most interesting being Donald Trump’s use of Twitter. As luck would have it, tweets happen to be very easy to extract and put into relational database format. Being the data junkies that we are, we couldn’t resist downloading President Trump’s entire tweet history and loading it into Periscope Data. It’s an excellent opportunity to slice, analyze, and quantify presidential communications while also exploring methods for analyzing text in SQL at the same time.
So let’s get started.
The Data Set
First, we formatted Trump’s tweets into 5 columns of data:
- Tweet Text
- Tweet Date
- Tweet ID
With these columns it was easy to do some quick exploration. @realDonaldTrump tweeted for the first time on May 4, 2009 with a tweet he almost certainly didn’t write.
Around July of 2011 Trump’s tweet volume quickly escalated. It’s not clear what triggered this increase in activity, but based on his first tweet a reasonable guess is that Trump began tweeting himself, either in addition to or lieu of staff. In total, Donald Trump has tweeted over 34,000 times. At his peak he was tweeting over 30 times per day. A remarkable number by any measure.
A look at Trump’s tweet frequency over time looks like this:
Keyword Frequency: A History of MAGA
Trump first used the phrase ‘Make America Great Again’ long before his announcement to run for President in the 2016 election as a Republican candidate. He filed papers in Iowa in 2012 for the “Make America Great Again” Party to begin a run as an independent candidate. His first use of the phrase on Twitter:
His use of the phrase has obviously increased since. If we sum the number of tweets containing the words ‘Make America Great Again’ or ‘MAGA’, and divide that number by the count of all tweets, we can easily define Trump’s MAGA rate. With a simple query we can look at the MAGA rate over time. Trump’s MAGA rate has a slightly bimodal distribution centered first around the announcement of his candidacy in 2015, then more strongly after his victory in in the presidential election. Trump held an astounding peak monthly MAGA rate of 14% in September 2016.
Sentiment: The Anger and Joy of Donald J. Trump
Given Trump’s prolific use of Twitter leading up to and throughout the campaign, we can apply a quantitative approach to measuring the tone and sentiment of his campaign with text analysis.
SQL is not the first tool people think of when doing text analytics, but it has some interesting advantages. It’s fast, well integrated with existing ETL processes, and easy to build into regular reports. We will use a simple but effective approach, by relating individual words to sentiment scores to get a quick read on the emotional state of DJT.
This technique we’re using was explored in the paper “A new ANEW: Evaluation of a word list for sentiment analysis in microblogs.” Using Twitter as a foundation, Finn Årup Nielsen from the Technical University of Denmark (DTU) scored 2,477 words from -5 to 5 based on their perceived sentiment, with -5 being a word linked to a strongly negative emotional content. By joining the words in Trump’s tweets to this list, we can quickly get a per-tweet sentiment score. The data set for this topic is published by the department of Informatics and Mathematical Modelling at DTU.
The data is in the form:
In order to begin this analysis, the tweets must be broken down into individuals words to join to our sentiment table. The method to do this is outlined in our blog post about NPS and survey analytics.
Once we have a resulting table of individual words, we can begin joining our tweets to our sentiment scores.
With a few simple queries we can find Trump’s most celebratory and antagonistic tweets. First we sum the sentiment scores by tweet.
His most positive:
And his most negative:
These are Trump’s most emotionally charged tweets, but since we have summed the sentiment scores we are biased towards longer phrases. Simple declarations such as “Great!” or “Sad!” convey a lot of emotional content without summing to large numbers. As we dig a little deeper we will divide the summed scores by the number of words in the Tweet to gauge the net sentiment.
Let’s look at Trump’s sentiment over time. This has been normalized to his average sentiment over the data history so that it is centered on 0. This is done because the sentiment data has little absolute meaning without a reference point.
As can be seen, Trump had some serious highs and lows before 2013. The highs are often associated with the Miss USA and Miss Universe pageants, or were tweets written by people (we suspect) other than Trump. The very low scores in 2011 - 2012 are mostly political in nature. Let’s take a closer look at some of these:
Most worrying though, Trump has only become more negative over the past 4 years, especially from 2015 and onwards. Zooming in on the period of the election we can see that Trump generally trended downwards, exhibiting increasing negativity over the course of his Presidential campaign. There were some high points, such as the week of his victory in November, but the overall sentiment of Trump’s tweets continues to decline. We can easily summarize Trump’s tweets with a single word: Sad!
Sentiment by Time of Day: Angry, Early Morning Tweets
Another interesting perspective is Trump’s sentiment as a function of the hour of day. Much has been made of his proclivity for early morning tweeting and what the ramifications of this might be. With a bit of SQL, we can look at this quickly and easily. We use the same query as above, continuing to limit our timeframe to the presidential campaign, but we use a 3 hour rolling average and use an extract, like so:
He’s at his most negative in the morning. On average, between the hours of 4 am and 7 am, Trump unleashes a high volume of negative tweets. Our data does reveal a counterbalance to this finding. Trump’s tweets become more positive throughout the day and this positive sentiment peaks at night, along with his overall tweet volume.
Many companies have an abundance of text data, but don’t have a strategy for fully making use of it. After a few simple joins in SQL, keyword frequency and sentiment analysis are two excellent methods for gaining context and insight into text data. Both are broadly applicable beyond tweets. Survey data, comments, competitive information, or web copy can all be explored with the methods discussed above.
We hope our advice conducting text analysis in SQL was helpful. If you’d like a CSV file containing the tweet or sentiment dataset we used for this post, please email us at email@example.com and we’ll send it your way.