Do Republicans really have a smaller vocabulary than Democrats?
And what about Trump and Biden?
Lingo Ninja Research Team
October 3, 2020 · 4 min read
We analyzed the top 100 political Twitter accounts in the US. Here are the (surprising) results:
- Republicans' average vocabulary is 14.0% larger than Democrats'.
- Donald Trump's vocabulary is 17.6% larger than Joe Biden's.
- Both Donald Trump and Joe Biden have below average vocabulary sizes.
Why and how did we do this analysis?
We recently released an article series based on a vocabulary analysis of Twitter accounts. Our goal was to identify the most important words for language learners and the most effective language learning strategies. This new series focused on analyzing Donald Trump's tweets, since it's more fun to read about someone famous. In our latest article we compared the vocabularies of Donald Trump, Joe Biden and Kim Kardashian. A surprising outcome (for us) was that Trump's vocabulary on Twitter is larger than Biden's. But wait—doesn't Donald Trump have the reputation of a small vocabulary?
To test our own analysis, we doubled down and expanded our sample from Trump, Biden, and Kim Kardashian to now include the top 100 US political accounts on Twitter. To identify the accounts for analysis, we used the ranking from Socialbakers, a marketing agency that tracks the largest social media data set in the industry. We then filtered out the following account types:
- accounts that could not be associated with a US political party (e.g., Boris Johnson, NASA);
- US Government accounts tied to positions, not people (e.g., @POTUS, @FLOTUS, @VP)—they are handed from one person to the next, and therefore cannot be clearly assigned to a party; and
- accounts that tweeted less than 30,000 words.
At the end, 76 accounts remained. We split them into Republican (32) and Democrat (44), and also into male (53), female (20), and neutral (3) (@GOP, @TheDemocrats, and @HouseDemocrats). We then analyzed the vocabulary of the last 30,000 words tweeted per account (in English, without retweets or replies). Each subcategory was averaged and the differences tested for statistical significance. See Appendix B for the list of analyzed Twitter accounts and Appendix C for dropped accounts.
Before presenting our findings, here is a short remark: our analysis was never meant to be political. We happened upon this topic by focusing on Donald Trump in our previous articles due to his immense Twitter following, and because of the global media attention garnered by his politics and eccentricities. In the following section, we will reveal our results but keep our interpretations to a minimum.
In total we analyzed 90,946 tweets and 2.28 million words. The largest vocabulary has 6,213 words (@SarahPalinUSA) and the smallest has 2,809 words (@HouseDemocrats). The average vocabulary size is 4,301 words. See Appendix D for detailed results per account.
Republican vs. Democrat
There is a clear distinction between the average vocabulary sizes. Republicans have an average of 4,630 words, which is 14.0% larger than the Democrats' average (4,060 words). This difference is statistically significant with t(74) = 3.97, p = .000164.
This difference is also visible in a simple ranking of the vocabulary sizes (see following chart): Republicans dominate the higher end of the spectrum, while Democrats are more dominant on the lower end.
Male vs. Female
As mentioned previously, we also split the accounts into male and female categories. When analyzing the data within these categories, the averages are now nearly identical, with 4,317 (male) vs. 4,377 (female). There is no statistically significant difference in vocabulary sizes between male and female accounts.
Donald Trump vs. Joe Biden
This analysis confirmed the results of our previous study: Donald Trump's vocabulary on Twitter (3,965 words) is 17.6% larger than Joe Biden's (3,371 words). (The numbers differ slightly from those listed in our previous articles because we’ve now analyzed the tweets as of September 22, 2020.)
It is often said that Donald Trump has a very small vocabulary—e.g., here and here. In our analysis, Trump's vocabulary is indeed smaller than the average, but by far not the smallest. In contrast, Joe Biden's vocabulary is the fifth smallest out of the 76 analyzed.
Our analysis only examines the quantity of the vocabulary size, but does not distinguish the word quality and sophistication. Variations of the same word (e.g., typos, slang, misspellings) were counted as separate words. There are multiple possible reasons for differences in vocabulary size, including age, education, range of topics, intentional use of simpler or more complicated language. Those reasons are out of the scope of our current analysis and would require detailed examination in a future study.
If you have questions or comments on this work or you would like us to further analyze this topic, please let us know on Twitter or Facebook.
Appendix A: Method
Counting words or lemmas
Some researchers count word roots, or lemmas, as opposed to single words—e.g., is, be, are, was, been would all be grouped under the root to be. In our articles, however, all words are considered separately.
The analyzed tweets were the last 30,000 words tweeted on or before Sept 22, 2020. The number of analyzed tweets can be found in Appendix D: Detailed Results.