Comparing the vocabularies of Donald Trump, Kim Kardashian and Joe Biden
How many words you need to know to understand their tweets?
5 min read · first published September 11, 2020 · last update November 17, 2020
Here are the highlights:
- Donald Trump's vocabulary in the analyzed tweets is about 18% bigger than Joe Biden's vocabulary (4030 vs 3396 words). Kim Kardashian's vocabulary is 3713 words.
- Biden and Trump's vocabularies are more similar to each other than they are to Kardashian's vocabulary.
Ok, let's start.
Why compare the vocabulary of the presidential candidates and Kim Kardashian?
In our previous article we analyzed Donald Trump's vocabulary to learn how many words we need to know to roughly understand his tweets. Now we wanted to expand this question to more people.
And since it's always more fun to read about celebrities, we planned to compare the 3 presidential candidates Donald Trump, Joe Biden and Kayne West. But Kayne West did not post enough on Twitter, so it is impossible to extract 30,000 words from his tweets. As a replacement we chose West's wife, Kim Kardashian. This makes an interesting trio: Biden and Trump are politicians and more similar to each other than to Kardashian. Kardashian is an American celebrity and media personality and distinctly different. How will their backgrounds be reflected in their vocabulary?
This analysis is meant to illustrate language learning strategies. We are not getting into political debates.
The most used words
The word frequency was analyzed out of the last 30,000 English words each person tweeted (themselves, without retweets). These 90,000 words were collected from 3,710 tweets in total: 1098 tweets of Trump, 1669 of Kardashian and 943 of Biden. When we refer to someone's top or most used words in the rest of this article, we always mean the top words according to our analysis.
Here are each person's 20 most used words:
We highlighted the 12 words which appear in all three lists. Another 3 words are shared by Trump & Biden (we, or, that) and 1 word (this) is shared by Kardashian and Biden.
From the same number of analyzed words we can determine the vocabulary size. Trump's vocabulary is the biggest (4030 words) and Biden's the smallest (3396 words).
What are possible reasons for the smaller vocabulary sizes in Kardashian and Biden's tweets?
- age and education (although the impact is quite small. See http://archive.sciendo.com/LIFIJSAL/lifijsal.2016.2.issue-2/lifijsal-2016-0008/lifijsal-2016-0008.pdf)
- use of simpler language. This can be intentional, if you want to be easier understood.
- less diverse range of topics. E.g. Elon Musk, who talks about electric cars, photovoltaics, rockets, brain implants, etc. has a bigger vocabulary (5366 words).
To find out the reason for the differences in vocabulary size in our case is outside the scope of this article. But it would be interesting to follow up at one point in the future.
Comparing the words in the vocabularies, we can analyze how much they overlap. In total, all three used 7609 unique words. We see that 1032 words were shared by all three persons.
Comparing Trump and Biden, there are less words used only by Biden (1311 words vs Trump 1878 words). This makes up most of the difference in their vocabulary size.
We can also check our assumption from before, if different backgrounds affects vocabularies:
We see that Trump and Biden's vocabularies overlap more than they do with Kardashian's vocabulary. So the similarity in their backgrounds is reflected in their vocabularies.
Overlapping of word usage
The diagram above shows how often the unique words appeared in the 90,000 analyzed words. What we see here is quite surprising: The 1,032 words shared by all three persons were used 66 thousand times, each word about 66 times on average! In contrast, the other 6,577 words were only used 3.6 times each on average.
So we see that shared words are used much more frequently.
How many words it takes to understand them
In the previous article we built curves showing us how many % of the words in Trump's tweets we understand when we know Trump's most used words. We can add Joe Biden and Kim Kardashian into this chart:
To understand any of them completely, we need to know between 3,400 and 4,030 words, but to understand them to 75%, it is enough to know their 360 - 530 most used words.
Does Kardashian's vocabulary work with Trump's tweets?
Let's try an experiment. Assume you learned English using Kim Kardashian's or Joe Biden's most used words. How well would you understand Donald Trump? The answer is in the following chart. (Similar charts for Kim Kardashian and Joe Biden's tweets can be found in the appendix.)
Why are Biden's and Kardashian's curves so jittery?
- Donald Trump doesn't use some words in the other's vocabularies. E.g. Kardashian uses words like shop, sizes, body, shipping, xxs, cotton, … Each of these words makes the curve jittery
- Donald Trump uses some words more or less frequent than the others. You can observe this for the top 20 words in the table above. This explains why the curve sometimes gets steeper or flattens out.
The reason not all curves reach 100% is that some words Donald Trump uses are not used by the others. This can also be seen in the Venn diagram above. These words are "missing" in the others vocabularies and leave a gap to the 100%.
Donald Trump's curve is using Donald Trump's vocabulary to understand Donald Trump's tweets, it is therefore smooth and reaches 100%.
The two curves of Biden and Kardashian make it clear that the most used words are not easily transferable between people. Joe Biden's vocabulary is also clearly more similar to Donald Trump's. His curve is higher than Kardashian's.
- Each person has a different vocabulary size. In our example, Joe Biden's vocabulary is 3396 words, Kim Kardashian's is 3713 words and Trump's is 4040 words.
- To roughly understand a single person (that means to understand 75% of the words in their tweets) we should learn that person's most used words. That's 364 words for Biden, 451 words for Kardashian and 528 words for Trump. But that vocabulary is not easily transferable between people.
- When you learn English, it helps to practice first with someone who uses simpler words and a smaller range of vocabulary. You will understand that person faster than someone using a large vocabulary.
Counting words or lemmas
Some researchers count word roots, or lemmas, and not single words. E.g. is, be, are, was, been would all be grouped under the root to be. In our articles, all words are considered separately.
The analyzed tweets were the last 1098 tweets of Trump, 1669 tweets of Kardashian and 943 of Biden posted on or before Aug 31, 2020. Donald Trump's vocabulary is (slightly) different than in the previous article because the previous article analyzed the tweets before July 20, 2020.