We made a list of the most important words in English
How we analyzed 160,000 tweets to build a list of the 10,000 most frequently used words in modern English
Content
Lingo Ninja Research Team
3 min read · published August 24, 2020 · last update November 17, 2020
If you just want the list, you can download it here. You can use it freely. If you build something based on this list, please provide a link to https://www.Lingo-Ninja.com.

Here are the highlights:
  • Extracted 2.3 million words of the top 100 Twitter accounts
  • Boiled down to the most frequent 10,000 unique words 
  • Contains modern slang and abbreviations from various English speaking nations
  • Covers 94.7% of all analyzed words

Why did we make this list?

We made this list to have some base vocabulary to compare Twitter users against. 
But as we show in other articles, learning the most frequently used words first is the fastest way to learn a language. If you want to use this list to improve your English, please, go ahead. This list covers way more words than most language students will have in their vocabulary. 

How does this list differ from other lists?

There are many lists of top English words out there, but all the lists we found were either too short, did not cover single words, or did not contain common slang words (as y'all, peeps, xoxoxo, …). So we decided to build a list of the 10,000 most frequently used words that anyone learning English will encounter in real life. 

The data source - the top 100 Twitter accounts

To build the word list we extracted the tweets of the top 100 Twitter accounts. As the accounts were ranked by the number of followers, this selection is representative of people's interests and currently popular topics. They fall into the following main groups:
  • entrepreneurs, politicians and newspapers dealing with current issues 
  • actors, celebrities and musicians dealing with cultural topics

Some accounts use formal language, some use informal language. They include users from various countries, but are dominated by the US, UK and India. 

For every account we extracted the most recently posted 30,000 English words, excluding all retweets and non-English tweets. However, for some accounts we hit the Twitter download limit of 3,200 tweets per user. E.g. @POTUS does not contain any tweets as its last 3,200 tweets were only retweets. Some other accounts did not include that many words, e.g. @danieltosh deleted most of his tweets. This led to some accounts having less than 30,000 words to analyze. In total we analyzed approximately 2.3 million words, which results in a good overview of contemporary English.

A detailed table of the top accounts and the tweets and words analyzed per account can be found at the end of this article.

Boiling 2.3 million words down into the top 10,000

Out of our 2.3 million words, we extracted the unique words and sorted them by descending frequency. The most frequently used word is "the", which appears 92,958 times in our analyzed words. 

When we draw the cumulative frequency, we get the chart below, where we can see that the top 100 words alone made up 46.2% of all analyzed words. This means that almost 50% of our analyzed words can be reduced to 100 unique words!

When we add up the frequency of the top 10,000 words, we see they represent 94,7% of our 2.3 million words.


What are the slang and contemporary words in our list? 

Different from conventional lists, we discover words like peeps (for people), betta (for better), b4 (abbreviation of before), cali (for california), xoxoxo (for hugs and kisses), yum (for delicious), but also paji (from the hindi word for elder brother). 

Can you use our list?

If you want to incorporate this list in your language learning or education, please go ahead.  We make this list available under the CC 4.0 license. This means you are free to use, modify and redistribute it, as long as you provide a link to us (https://www.Lingo-Ninja.com). 

Appendix: Which accounts did we analyze?

We analyzed the top 100 accounts ranked by followers. For the ranking, we used the data provided by  https://friendorfollow.com/twitter/most-followers/ (extracted on Aug 20, 2020)








Share article:
content_copy
more_horiz
written by
Lingo Ninja Research Team
Visit The best way to learn Thai