Wiktionary:Frequency lists/Esperanto/Tekstaro 2023

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Based on the words found in the Tekstaro dump of 2023-03-15. All words are reduced to their base form (plural -j and accusative -n are stripped, the verb endings -as/-is/-os/-us/-u are changed to the infinitive -i). Each word is listed in the most typical case form (lower-case, capitalized, or all-caps). Non-Esperanto-ified proper names are mostly omitted (unless listed in common dictionaries). The total size of the corpus is more than 10 million words.

A version easier to copy and to parse is available on GitHub. There is also a version where words that are not present in ESPDIC (Esperanto-English Dictionary) are filtered out, and one version where each Esperanto word is directly followed by English translations.


First hundred by frequency

[edit]

Together these 100 words cover 52.70% percent of the whole corpus.

Second hundred

[edit]

Together these 200 words cover 59.60% percent of the whole corpus.

Third hundred

[edit]

Together these 300 words cover 63.66% percent of the whole corpus.

Fourth hundred

[edit]

Together these 400 words cover 66.56% percent of the whole corpus.

Fifth hundred

[edit]

Together these 500 words cover 68.83% percent of the whole corpus.

Frequency rank of 501–1000

[edit]

Together these 1000 words cover 76.15% percent of the whole corpus.

Frequency rank of 1001–2000

[edit]

Together these 2000 words cover 83.46% percent of the whole corpus.

Frequency rank of 2001–3000

[edit]

Together these 3000 words cover 87.37% percent of the whole corpus.

Frequency rank of 3001–4000

[edit]

Together these 4000 words cover 89.89% percent of the whole corpus.

Frequency rank of 4001–5000

[edit]

Together these 5000 words cover 91.66% percent of the whole corpus.

Frequency rank of 5001–6000

[edit]

Together these 6000 words cover 92.98% percent of the whole corpus.

Frequency rank of 6001–7000

[edit]

Together these 7000 words cover 94.01% percent of the whole corpus.

Frequency rank of 7001–8000

[edit]

Together these 8000 words cover 94.84% percent of the whole corpus.

Frequency rank of 8001–9000

[edit]

Together these 9000 words cover 95.51% percent of the whole corpus.

Frequency rank of 9001–10000

[edit]

Together these 10000 words cover 96.08% percent of the whole corpus.

Frequency rank of 10001–11000

[edit]

Together these 11000 words cover 96.55% percent of the whole corpus.

Frequency rank of 11001–12000

[edit]

Together these 12000 words cover 96.96% percent of the whole corpus.

Frequency rank of 12001–13000

[edit]

Together these 13000 words cover 97.31% percent of the whole corpus.

Frequency rank of 13001–14000

[edit]

Together these 14000 words cover 97.61% percent of the whole corpus.

Frequency rank of 14001–15000

[edit]

Together these 15000 words cover 97.88% percent of the whole corpus.