If you tried to build your vocabulary in a language just by reading books, how many books would you have to read to achieve the vocabulary of a native speaker?
Here is the key finding, which will be explained below in more detail
Some of the (personal) assumptions made in the analysis:
- To have ‘learned’ a word, you must see it at least 3 times.
- A ‘book’ is estimated to be roughly 50000 words == 200 pages of 25 lines each 10 words.
- The data used is from a series of books written by Hungarian author, Rejtő Jenő, of which at least 20 can be found online.
According to one linguist who has devoted his life to language learning, different levels of langague knowledge can be discerned by their vocab size. The table is summarized below:
|Level||Description||Number of words|
|Core||Minimum necessary to construct sentences||250|
|Everyday||Words used every day by native speakers||750|
|Conversational||Amount to express almost anything (sometimes in awkward circumlocutions)||2500|
|Native - low||Active vocab of native speaker without higher education||5000|
|Native - high||Active vocab of native speaker with higher education||10000|
|Literature||Passive vocab necessary to read/understand novel by notable author||20000|
Using these (very rough) estimates of vocabulary sizes, I have broken up the 20 books written by Hungarian author Rejtő Jenő into words, and start counting up the number of times each word is seen as I progress through the books in an arbitrary order. Once a word has been seen 3 times, I count it as a ‘learned’ word. The result of the number of books read vs number of words learned is the first graphic above. Surprisingly, even by the 20th book, you are still added quite a lot of new words to your vocabulary.
- What does this graphic look like for other authors? languages?
- What if you used news articles instead?
- Is 3 times per word a reasonable estimate for when a word enters our passive vocabulary?
- More than enough for even the most avid Rejtő Jenő fans.
- Often times when I’m reading I’ll write down the translation of a word in the margin of the book. 3 is rough estimate of the number of times I have probably seen and processed a word once o really get it into at least your passive vocabulary.
- According to Amazon) 64,000 words is a reasonable estimate for median book length.
- This is not encouraged to be an adequate way of learning the language. It is a thought experiment I came up with since I live 1000’s of miles away from the country whose language I’m learning