Wiktionary:Frequency lists
Counting words and lemmas: The following frequency lists count distinct orthographic words, including inflected and some capitalised forms. For example, the verb "to be" is represented by "is", "are", "were", and so on.
English[edit]
TV and movie scripts[edit]
Most common words in TV and movie scripts: Here are frequency lists comparable to the Gutenberg ones, but based on 29,213,800 words from TV and movie scripts and transcripts.
Here's a fuller explanation of how the list was generated and its limitations: Wiktionary:Frequency lists/TV/2006/explanation.
Here are the top hundred words (from TV scripts) in alphabetical order:
- a · about · all · and · are · as · at · back · be · because · been · but · can · can't · come · could · did · didn't · do · don't · for · from · get · go · going · good · got · had · have · he · her · here · he's · hey · him · his · how · I · if · I'll · I'm · in · is · it · it's · just · know · like · look · me · mean · my · no · not · now · of · oh · OK · okay · on · one · or · out · really · right · say · see · she · so · some · something · tell · that · that's · the · then · there · they · think · this · time · to · up · want · was · we · well · were · what · when · who · why · will · with · would · yeah · yes · you · your · you're
Here they are in frequency order:
- 1-1000 · 1001-2000 · 2001-3000 · 3001-4000 · 4001-5000 · 5001-6000 · 6001-7000 · 7001-8000 · 8001-9000 · 9001-10000
- Top 1,000 words cover 85.5% of all words (24,981,922/29,213,800).
- Top 10,000 words cover 97.2% of all words (28,398,152/29,213,800).
From the 10,000th to the 40,000th :
- 10001-12000 · 12001-14000 · 14001-16000 · 16001-18000 · 18001-20000 · 20001-22000 · 22001-24000 · 24001-26000 · 26001-28000 · 28001-30000 · 30001-32000 · 32001-34000 · 34001-36000 · 36001-38000 · 38001-40000
- 40001-41284 (the dregs that were tied for the final place)
This is a third of all the unique words. The rest were used 5 or fewer times each.
Specific TV series dictionaries[edit]
Project Gutenberg[edit]
Most common words in Project Gutenberg:
These lists are the most frequent words, when performing a simple, straight (obvious) frequency count of all the books found on Project Gutenberg. The list of books was downloaded in July 2005, and "rsynced" monthly thereafter. These are mostly English words, with some other languages finding representation to a lesser extent. Many Project Gutenberg books are scanned once their copyright expires, typically book editions published before 1923, so the language does not necessarily always represent current usage. For example, "thy" is listed as the 280th most common word. Also, with 24,000+ books, the text of the boilerplate warning for Project Gutenberg appears on each of them.
Here are the top 100 words from Project Gutenberg texts in alphabetical order:
- a · about · after · all · an · and · any · are · as · at · be · been · before · but · by · can · could · did · do · down · first · for · from · good · great · had · has · have · he · her · him · his · I · if · in · into · is · it · its · know · like · little · made · man · may · me · men · more · Mr · much · must · my · no · not · now · of · on · one · only · or · other · our · out · over · said · see · she · should · so · some · such · than · that · the · their · them · then · there · these · they · this · time · to · two · up · upon · us · very · was · we · were · what · when · which · who · will · with · would · you · your
These wikified terms can be copied to other language wiktionaries; this is what they are intended for. If you do, please add an interwiki link onto the page here.
Frequency lists as of 2006-04-16:
- Wiktionary:Frequency lists/PG/2006/04/1-10000
- Wiktionary:Frequency lists/PG/2006/04/10001-20000
- Wiktionary:Frequency lists/PG/2006/04/20001-30000
- Wiktionary:Frequency lists/PG/2006/04/30001-40000
Frequency lists as of 2005-10-10:
- Wiktionary:Frequency lists/PG/2005/10/1-10000
- The list divided by thousand words: 1-1000 ·
1001-2000 · 2001-3000 · 3001-4000 · 4001-5000 · 5001-6000 · 6001-7000 · 7001-8000 · 8001-9000 · 9001-10000
Frequency lists as of 2005-08-16:
- Wiktionary:Frequency lists/PG/2005/08/1-10000
- Wiktionary:Frequency lists/PG/2005/08/10001-20000
- Wiktionary:Frequency lists/PG/2005/08/20001-30000
- Wiktionary:Frequency lists/PG/2005/08/30001-40000
- Wiktionary:Frequency lists/PG/2005/08/40001-50000
- Wiktionary:Frequency lists/PG/2005/08/50001-60000
- Wiktionary:Frequency lists/PG/2005/08/60001-70000
- Wiktionary:Frequency lists/PG/2005/08/70001-80000
- Wiktionary:Frequency lists/PG/2005/08/80001-90000
- Wiktionary:Frequency lists/PG/2005/08/90001-100000
- Approximately 24,197 files, 1,712,082,956 words, 70,756.0 average words per file, from which were gleaned about 9,053,310 unique "words".
Contemporary fiction[edit]
The 2,000 most common words in contemporary fiction can be found here:
The 2,000 most common words in contemporary fiction can be found here divided into 60 subject categories.
This lumps regular lemmas of the same word together, unlike most of these lists.
Contemporary poetry[edit]
The 2,000 most common words in contemporary poetry can be found here:
Another lemma-based list.
Music[edit]
Top English words lists[edit]
- A Frequency Dictionary of Contemporary American English (Routledge, 2010)
- Complete Shakespeare wordlist (with modernised spellings) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
- Frequency list of lemmas derived from an internet corpus assembled by Leeds University's Centre for Translation Studies
- Appendix:Basic English word list
- Simple:Wikipedia:Basic English combined wordlist
Word families[edit]
- British National Corpus - most frequent word families: see the simple:Wiktionary:BNC spoken freq on Simple English Wiktionary.
- Kilgarriff's lists.
- Academic Word List by word family: see the simple:Wiktionary:Academic word list on Simple English Wiktionary.
- 50K and larger word lists based on www.opensubtitles.org
Wikipedia[edit]
Adnyamathanha[edit]
Albanian[edit]
Arabic[edit]
- A Frequency Dictionary of Arabic (Routledge)
- Appendix:Arabic Frequency List from Quran
- Appendix:Arabic Frequency List from Quran/Arabic Frequency List from Quran 1-1000
- Appendix:Arabic Frequency List from Quran/Arabic Frequency List from Quran 1001-2000
- Appendix:Arabic Frequency List from Quran/Arabic Frequency List from Quran 2001-3000
- Appendix:Arabic Frequency List from Quran/Arabic Frequency List from Quran 3001-3680
- Appendix:Arabic Quranic Verbs
- 50K and larger word lists based on www.opensubtitles.org
Bulgarian[edit]
- Top 5000 Bulgarian words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Catalan[edit]
Croatian[edit]
- 50K and larger word lists based on www.opensubtitles.org
- Top 50000 Croatian words based on www.opensubtitles.org
Czech[edit]
- A Frequency Dictionary of Czech
- Frequency lists of Czech National Corpus ("Srovnávací frekvenční seznamy", SYN2000, SYN2005, SYN2010), without a license suitable for republishing in Wiktionary
- new link for Frequency lists of Czech National Corpus (SYN2000 .. SYN2015)
- Top 5000 Czech words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Danish[edit]
- Top 5000 Danish words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
- Password protected wordlists from DSL are also available
Dutch[edit]
- A Frequency Dictionary of Dutch (Routledge)
- Top 5000 Dutch words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
- 5000 dutch sentences sorted by their words' frequency--that is, from the easiest(most simple) to the hardest.
Frequent words with example sentences:
The thirteen most popular Dutch words:
From Max Havelaar (numbers between parentheses denote occurrences):
- de (4770)
- en (2709)
- het, 't (2469)
- van (2259)
- ik (1999)
- te (1935)
- dat (1875)
- die (1807)
- in (1639)
- een (1637)
- hij (1328)
- niet (1162)
- zijn (1049)
University of Leipzig Frequency Lists:
- Main Page
- 100 most frequent Dutch words
- 1000 most frequent Dutch words
- 10000 most frequent Dutch words
Frequency of diacritic characters in Dutch:
From diacritical marks in the Dutch language. A list of almost 250,000 Dutch words contained a total of 3538 diacritics:
| Character | Frequency |
|---|---|
| ë | 1762 |
| ï | 599 |
| é | 468 |
| è | 248 |
| ö | 171 |
| ê | 71 |
| ü | 61 |
| ó | 35 |
| ç | 30 |
| á | 24 |
| à | 17 |
| ä | 16 |
| û | 8 |
| î | 7 |
| í | 5 |
| ô | 4 |
| ú | 4 |
| ñ | 4 |
| â | 3 |
| Å | 1 |
Esperanto[edit]
- 10,000 most common words from Esperanto Wikipedia
- Esperanto corpus built from Project Gutenberg works
Frequent words with example sentences:
Estonian[edit]
- Top 5000 Estonian words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Finnish[edit]
- CSC IT Center for Science - 9996 most common Finnish words (text download) Creative Commons Attribution-NoDerivs-NonCommercial 1.0 Finland (CC BY-ND-NC 1.0)
- Word frequency based on the press
- Top 5000 words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Frequent words with example sentences:
Comprehensive list of available Finnish online corpora and downloads of corpora:
French[edit]
- A Frequency Dictionary of French: Core Vocabulary for Learners (Routledge frequency dictionaries)
Frequency lists with English translation:
- The 500 most frequently used French words
- 5000 word frequency dictionary of French based on a 23-million-word corpus of French which includes written and spoken material both from France and overseas
Sentences sorted by their words frequency:
Frequency lists from http://wortschatz.uni-leipzig.de/html/wliste.html with the authorization from the laboratory. The list was based on Belgium written sources, with a clear financial biais.
- top 2000 words
- Wiktionary:French frequency lists (Belgium, finance)/2001-4000
- Wiktionary:French frequency lists (Belgium, finance)/4001-6000
- Wiktionary:French frequency lists (Belgium, finance)/6001-8000
- Wiktionary:French frequency lists (Belgium, finance)/8001-10000
- Top French words from subtitles based on www.opensubtitles.org
- 1-5000
- 5001-10000
- 10001-15000
- 15001-20000
- 50K and larger word lists based on www.opensubtitles.org
- 100 most frequently used French words with example sentences based on www.opensubtitles.org
- Top 2000 Most Frequent French Words Found in Subtitles
Note: these indicative lists still require some cleanup, because:
- they don't unify common words that are normally not capitalized in the dictionary, but can be capitalized at the begining of sentences or in titles;
- they do not break correctly words preceded by a separate word contracted with an apostrophe for very common articles (l') or preposition (d') or negation adverb (n') or pronoun (c', j', l', m', s', t'), or verbal liaison particles (-t-, -z-, which are not really words as they don't have any meaning but are written for phonetic reason), or pronoun subjects just after the verb (after a mandatory linking hyphen, that still does not make a compound word but denotes the inversion of the subject rather than the normal occurrence of an object): all these words should be counted separately;
- the source is certainly from Belgian French written papers only, with typical occurrences for that country and no equivalence for France, or other French speaking countries where these words are much rarely used (such as currency abbreviations, Belgian toponyms for regions and cities, and many missing terms for very common specialties in France);
- the list contains isolated letters that are not words, per se (except a few effective words: a, à, y);
- as well, there are acronyms and symbols occurring only in written documents but not as part of the spoken language;
- frequent proper names are included but are not very specific to any of the 4 studied languages.
This list does not unify inflected words (with plural or feminine mark on nouns or adjectives, or conjugated verbs), and does not recognize auxiliaries of verbs at compound tenses as part of the conjugated verb, but treat auxiliaries separately for each inflected form. Alphabetising this list can be very helpful for spotting redundancies.
Frequent nouns:
Frequent words with example sentences:
Galician[edit]
Georgian[edit]
German[edit]
- A Frequency Dictionary of German: Core Vocabulary for Learners (Routledge frequency dictionaries), Second ed., 2020
Frequency lists with English translation:
- The Most Frequent German Words - German Department, University of Michigan
- Top 500 German words - The German Professor, according to University of Michigan's German Department, a rigorously researched list of the 500 most frequent German words
- The 500 most frequently used German words (with anglicised spellings ignoring German capitalisation)
Sentences sorted by their words' frequency:
German words in Wikipedia: the 100, 1000, or 10 000 most frequent words 2001.
Top 300,000 German words from the web.
Top 2000 German words from subtitles:
User:Matthias Buchmeier's Unformatted German frequency list. This list has been generated in 2009 from TV and movie subtitles with a total of 25399099 words. This list can be used under the terms of the cc-by-sa, GFDL or LGPL licenses.
- -5000 -10000 -15000 -20000 -25000 -30000 -35000 -40000 -45000 -50000 -55000 -60000 -65000 -70000 -75000 -80000 -85000 -90000 -95000 -100000 -105000 -110000 -115000 -120000 -125000 -130000 -135000 -140000 -145000 -150000 -155000 -160000 -165000 -170000 -175000 -180000 -185000 -190000 -195000 -200000 -205000 -210000 -215000 -220000 -225000 -230000 -235000 -240000 -245000 -250000 -255000 -260000 -265000 -270000 -275000 -280000 -285000 -290000 -295000 -300000 -305000 -310000 -315000 -320000 -325000 -330000 -335000 -340000 -345000 -350000 -355000 -360000 -365000 -370000 -375000 -380000
Top 10000 German words:
- 50K and larger word lists based on www.opensubtitles.org
- Github source code for extraction from Hermitdave
- User:Bigbossfarin/10000 German words
Frequent nouns:
Frequent words with example sentences:
Leeds University German frequency list of 5000 lemmas from internet corpus
Greek[edit]
- Top 5000 Greek words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Hebrew[edit]
- Wiktionary:Frequency lists/Hebrew (From http://invokeit.wordpress.com/frequency-word-lists/: 50K and larger word lists based on www.opensubtitles.org)
- -10000 -20000 -30000 -40000 -50000 -60000 -70000 -80000 -90000 -100000 -110000 -120000 -130000 -140000 -150000 -160000 -170000 -180000 -190000 -200000 -210000 -220000 -230000 -240000 -250000 -260000 -270000 -280000 -290000 -300000 -310000 -320000 -330000 -340000 -350000 -360000 -370000 -380000 -390000 -400000 -400682
Hindi[edit]
Hungarian[edit]
- Top 100.000 words in Hungarian text: http://mokk.bme.hu/resources/webcorpus
- Wiktionary:Frequency lists/Hungarian webcorpus frequency list
- Hungarian frequency list 1-10000
- Top 5000 Hungarian words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
- Searchable frequency list based on the Hungarian National Corpus with 185 million words
Icelandic[edit]
- Top 5000 Icelandic Words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
- Icelandic verbs
- The 100 most frequent Icelandic verbs according to the verb webpage.
- Icelandic verb frequency list 1-100
- Most frequent lemmas in spoken Icelandic
- Most frequent lemmas in written Icelandic
Indonesian[edit]
Irish[edit]
Italian[edit]
Top 1000 Italian words from subtitles:
- 1-1000
- 50K and larger word lists based on www.opensubtitles.org ==> Wiktionary:Frequency lists/Italian50k
Frequent words with example sentences:
Sentences sorted by their words frequency:
Japanese[edit]
- A Frequency Dictionary of Japanese (Routledge)
- Frequency lists
- Japanese Wikipedia (2013) via the JUMAN analyzer
- Japanese Wikipedia (2015) via the mecab analyzer
Frequent nouns:
- 1000 Japanese basic words:
- Appendix:JLPT (JLPT word lists - Japanese-Language Proficiency Test)
National Institute for Japanese Language and Linguistics word frequency lists
Khmer[edit]
The online dictionary http://kheng.info has by far the best frequency list. It also has a box where you can paste items from the list and it hyperlinks the words to immediate recorded pronunciation. Just copy and paste from the "Frequencies" page to the "Read" page. It actually contains pronunciations for almost all of the first 1000, and most of the first 2000. The entire frequency list can be downloaded at the bottom of the "Frequencies page". The anonymous creators of the site have done an enormous job to advance Khmer learning.
Korean[edit]
- 현대 국어 사용 빈도 조사 2 (2005) Modern Korean Use Frequency Rate Survey Result (National Institute of Korean Language)
- Korean 5800
- 5897 entries from the Basic Korean Vocabulary List (한국어 학습용 어휘목록)
- 15K word list based on www.opensubtitles.org
- A Frequency Dictionary of Korean: Core Vocabulary for Learners (Routledge frequency dictionaries)
Frequent nouns:
Latvian[edit]
- Top 5000 Latvian words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Lithuanian[edit]
- Dictionary of the Written Lithuanian Language based on Frequency (Dažninis rašytinės lietuvių kalbos žodynas)
- 50K and larger word lists based on www.opensubtitles.org
Lü[edit]
Macedonian[edit]
- 50K and larger word lists based on www.opensubtitles.org
- Top 5,000 words based on 26,000 newspaper articles over a thirty-year period from Sakam da Kazam
Malay[edit]
Mandarin[edit]
- Appendix:HSK list of Mandarin words:
- Appendix:Mandarin Frequency lists:
- Appendix:Mandarin Frequency lists/1-1000
- Appendix:Mandarin Frequency lists/1001-2000
- Appendix:Mandarin Frequency lists/2001-3000
- Appendix:Mandarin Frequency lists/3001-4000
- Appendix:Mandarin Frequency lists/4001-5000
- Appendix:Mandarin Frequency lists/5001-6000
- Appendix:Mandarin Frequency lists/6001-7000
- Appendix:Mandarin Frequency lists/7001-8000
- Appendix:Mandarin Frequency lists/8001-9000
- Appendix:Mandarin Frequency lists/9001-10000
Frequent nouns:
Manx[edit]
Māori[edit]
- 1,000 most frequent Māori words (in alphabetical order, document)
- 1,000 most frequent Māori words (in alphabetical order, PDF)
- 1,000 most frequent Māori words (in frequency order, document)
- 1,000 most frequent Māori words (in frequency order, PDF)
Marshallese[edit]
Nepali[edit]
Norwegian[edit]
Bokmål and Nynorsk[edit]
Bokmål[edit]
- Top 5000 Norwegian Bokmål words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Nynorsk[edit]
Odia[edit]
Palauan[edit]
3,000 most common Palauan words
Persian[edit]
- Persian frequency list 5000
- Appendix:Common Persian verbs
- 50K and larger word lists based on Tehran Monolingual Corpus (first 10k words)
Polish[edit]
Top 200 Polish words:
- List of top 200 Polish words
- Top 5000 Polish words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
- 8459 most frequent lemmas according to the Kelly Project [1]archived page
Frequent words with example sentences:
Sentences sorted by their words frequency:
Portuguese[edit]
- A Frequency Dictionary of Portuguese: Core Vocabulary for Learners (Routledge frequency dictionaries), 2008
European Portuguese[edit]
Unidades e palavras em português europeu: frequência e ordem
- Top 5000 European Portuguese words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Brazilian Portuguese[edit]
- Top 5000 Brazilian Portuguese words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Frequent words with example sentences:
Romanian[edit]
Russian[edit]
- A Frequency Dictionary of Russian: Core Vocabulary for Learners (Routledge frequency dictionaries)
- Appendix:Frequency dictionary of the modern Russian language (the Russian National Corpus) - 20,000 words
- List of top 1000 Russian words
- Appendix:Russian Frequency lists
- Appendix:Common Russian verbs
- On the Russian wiktionary there is a more complete list.
- 50K and larger word lists based on www.opensubtitles.org
Frequent nouns:
Frequent words with example sentences:
Sentences sorted by their words frequency:
- 5000 Russian sentences sorted by their words' frequency--that is, from the easiest(most simple) to the hardest.[2]
Sanskrit[edit]
Serbian[edit]
- 50K and larger word lists based on www.opensubtitles.org
- Top 50000 Serbian words based on www.opensubtitles.org
Slovak[edit]
- Word, 2,3,4-gram frequency lists from the Slovak National Corpus
- 50K and larger word lists based on www.opensubtitles.org
Slovene[edit]
50 most frequent Slovene words, Primož Jakopin research:
je , in , se , v , da , na , so , ne , pa , ki , bi , za , z , ni , sem , ga , še , po , s , tako , ko , tudi , to , bil , ali , si , mu , od , bilo , kot , že , iz , kaj , bo , če , vse , bila , kakor , mi , pri , jo , kar , jih , sta , o , do , ti , kako , samo , me
Spanish[edit]
- A Frequency Dictionary of Spanish (Routledge)
Royal Spanish Academy (Real Academia Española):
- Most common forms in the corpus of contemporary Spanish:[3]
- Top 1000
- Top 5000
- Top 10000
- Complete list (zip format).
Frequency lists with English translation:
Sentences sorted by their words' frequency:
Top 10000 Spanish words from subtitles:
Frequent nouns:
Frequent words with example sentences:
Swahili[edit]
Swedish[edit]
- Top 100,000 words in the Parole corpus
- Wiktionary:Frequency lists/top 2000 Swedish Wikipedia words
- /Swedish (similar, but not identical)
- 50K and larger word lists based on www.opensubtitles.org
- Swedish Kelly-list containing 8425 lemmas
Frequent words with example sentences:
Sentences sorted by their words frequency:
Tagalog[edit]
Here are the letter frequencies for Tagalog. [1] (Excluding the 2 letters like Ññ and NGng):
| Character | Frequency |
|---|---|
| A | 24.25% (≈1/4) |
| N | 11.77% (≈105/900) |
| G | 8.51% (≈115/1000) |
| I | 7.89% (≈126/1000) |
| S | 5.6% (≈178/1000) |
| T | 4.87% (≈205/1000) |
| M | 4.27% |
| O | 4.19% |
| L | 3.77% |
| K | 3.61% |
| Y | 3.08% |
| U | 2.98% |
| P | 2.84% |
| R | 2.23% |
| E | 2.22% |
| H | 2.08% |
| D | 2% |
| B | 1.9% |
| W | 0.93% |
You may have noticed a better way to remember these in order by thinking of ANGIST-MOLKY-UPREH-D-B-W. Also N and G got quite high due to the fact that the letter NGng adds it up at very high percentages.
Telugu[edit]
A list generated from the most common words in the Telugu Wikipedia in July 2017.
Thai[edit]
- Appendix:100 basic Thai words
- /Thai Chula: Chula university top 5000 Thai words
Turkish[edit]
- A Frequency Dictionary of Turkish (Routledge)
- List of top 1000 Turkish words
- /Turkish WordList 10K
- /Turkish WordList 20K
- /Turkish WordList 30K
- /Turkish WordList 40K
- Top 5000 Turkish words based on www.opensubtitles.org
- 50K and larger word lists based on www.opensubtitles.org
Frequent words with example sentences:
Sentences sorted by their words frequency:
Ukrainian[edit]
- 50K and larger word lists based on www.opensubtitles.org
- General (92K) and thematic lists in CSV format from site mova.info
- 98K word list based on corpus of over 1500 books of ukrainian authors containing over 40.000.000 words
Frequent words with example sentences:
Uyghur[edit]
Vietnamese[edit]
Welsh[edit]
Yiddish[edit]
See also[edit]
- Appendix:Swadesh lists
- Appendix:Vocabulary lists
- Wiktionary:List of languages
- Wiktionary:Multilingual statistics
External links[edit]
English[edit]
- Top 5,000 lemma and the top 60,000 lemma sampled every 7th word from the COCA corpus (the largest and most up-to-date corpus on American English based on written and spoken English): http://www.wordfrequency.info/
- A Common English Lexical Framework, aligned to the Common European Framework of Reference for Languages (A1, A2, B1, B2, C1, C2) in a CLIL context at http://www.scribd.com/doc/20386024/Common-English-Lexical-Framework See http://www.icrj.eu/13-75 for research base.
- Vocabulary profiler using the 2,709 most commonly used word families, covering 90% of most English texts (excluding proper nouns) at http://lextutor.ca/vp/bnl See http://dx.doi.org/10.1016/j.esp.2008.08.001 for research base.
Russian[edit]
- 1000 most common Russian words - with English translations
- 1000 most common Russian words - with German translations
Spanish[edit]
- Word Frequency List of Chilean Spanish - (Lifcach), Scott Sadowsky & Ricardo Martínez Gamboa
- The Word Frequency List of Chilean Spanish (Lifcach) is a set of 102 frequency lists derived from the sub-corpora of the Corpus Dinámico del Castellano de Chile (Dynamic Corpus of Chilean Spanish, Codicach), a corpus of contemporary written Chilean Spanish developed by Sadowsky between 1997 and 2002; this corpus contained approximately 450 million words when the Lifcach was created (it currently contains some 800 million words). The Lifcach also contains a non-weighted list of total frequencies (the Total Occurrences column), which is simply the sum of the frequencies of the 102 individual lists (in other words, the list of frequencies of the entire Codicach corpus.)
- ^ This is from the website "http://www.sttmedia.com/characterfrequency-filipino"