And The Most Commonly-Used English Word Is...

Have you ever wondered what the most common word in the English language is? Well, good news — not only has that research already been conducted, but even better, the YouTube channel Abacaba has taken it and assembled it into a snazzy visualization exploring some of its more interesting trends. The video, titled “The Frequency of English Words: Bubbled (Part 1)” has a sequel forthcoming — but it's still worth checking out on its own.

The data set used in the video came from an article by Peter Norvig, who was inspired by previous work done by Mark Mayzner in the 1960s studying the frequency of certain letter combinations in English words. Mayzner wrote to Norvig asking if he'd be interested in using the technology now available to further the research — and the results are definitely worth looking at. Because of all that new technology (hi there, Google!), Norvig's research was much larger in scope: His data set included over 743 billion words, in comparison to Mayzner's 20,000; furthermore, Norvig found 97,565 distinct words within that sample set.

That's where the Abacaba video comes in. It presents all that data in an easy-to-parse visual format, with each word appearing in a colorful bubble. The size of each bubble tells us about the frequency with which the word inside it is used, while the color indicates the word's starting letter. Here's the full visualization:

Are you surprised that the most commonly used word in the English language is “the?” I'm not; the word appears in this 633-word post 64 times, so… well, you do the math. According to Abacaba's visualization, “the” shows up over 53 billion times in Norvig's data set — and there's a huge margin between the number of times “the” appears and the number of times the second runner-up, “of,” appears.

It's worth noting that the top 10 consists of four prepositions, two articles, two pronouns, one conjunction, and one verb. The majority are also three letters or shorter (with the exception being “that”). Why does the top 10 consist almost entirely of tiny little words? Because most shorter words are more general, which enables them to be used in a variety of different ways and contexts. The longer the word, the more specific it tends to be.

So next, the video takes a look at the most commonly used word when we narrow the data set by the number of letters in each word. The most commonly-used one-letter word probably won't surprise you at all:

Although the most popular two-letter word might:

The playing field starts to even out when we get to the four-letter words:

I find it interesting that the most commonly-used six-letter word is “should”:

What we don't know here (or for any of these words, really) is the context. Are we using it to tell people what they should or shouldn't do? And if so, why do we feel so compelled to do that?

“Children” and “American” take the top two slots for eight-letter words:

But you'll notice the bubbles are getting smaller… until the words themselves start getting longer. Here are the most commonly-used 14-letter words:

And here's the visualization for words of 20 letters:

But what about gender? Does it matter? In a word, yes. “He” appears a substantially larger number of times than “she”:

The video points out that this may be due to the fact that “he” is often used a placeholder when the gender of what it's referring to isn't known; I would argue, though, that even that is yet another indication of our society's lack of gender equality expressed through language. The default is male; the “other” is female. We're assumed male until proven otherwise. (Unless you're in Sweden.)

Our use of language tells us a lot about who we are — so maybe we'd all do well to pay closer attention to what we're saying.

Watch the full video below:

Images: Trevor Pritchard/Flickr; Abacaba/YouTube (9)