Vocabulary list appendices:
- Appendix:Vocabulary lists of African languages
- Appendix:Vocabulary lists of Amerindian languages
- Appendix:Vocabulary lists of Southeast Asian languages
- Appendix:Vocabulary lists of North Eurasian languages
- Appendix:Vocabulary lists of Indo-Pacific languages
- Appendix:Swadesh lists
Welcome to Wiktionary's vocabulary lists series. This series aims to have representive word lists for all language families of the world.
- Purpose: As linguistic lexicographical works, the vocabulary lists are designed with historical-comparative linguistics research goals in mind, such as classifying languages, reconstructing proto-languages, and identifying loanwords. Frequency lists and pedagogical resources are not included.
- Glosses: Each list maintains original glosses (definitions, meanings) as found in the original sources. Translated glosses are sometimes added as additional columns if the original glosses are not in English. Translations that are not in the original source are noted in the lists, and do not replace the original glosses. Unlike Swadesh lists and other standardized lexicostatistical word lists, the vocabulary lists here do not consist of lists with predetermined glosses. Instead, the vocabulary lists here can serve as "raw building blocks" for compiling Swadesh lists.
- Content: The lists are typically in the 50-1,000 item range for lexical entries. Definitions are typically concise and focus on basic vocabulary concepts such as numerals, body parts, and natural phenomena.
- Scope: Emphasis is placed on divergent language isolates, families, and branches that would likely be crucial for etymological reconstruction and classification. Proto-languages are included whenever possible. Many of these language groups are sparsely documented and/or extinct. As a result, some of these lists may actually be the only extant documentation of a language or even language group.
- Sources: The word lists are adapted from academic sources published by linguists. Thus, all lists must be properly referenced with adequate notes and metadata. Many of these sources are out of print, with highly limited distribution and accessibility.
- Digitization: As with Wikisource texts, the lists are individually and painstakingly digitized using a variety of methods, such as optical character recognition (OCR), manual typing, and document conversion.
- Encoding: Unicode.
|Vocabulary lists of North Eurasian languages|
European • Balkan • Hurro-Urartian • Hattic • Sumerian (Swadesh) • Elamite • Etruscan • Burushaski • Uralo-Altaic • Paleosiberian • p-Japanese • p-Ainu • p-Nivkh • p-Chukotko-Kamchatkan • p-Yukaghir • p-Yeniseian
|"Altaic" linguistic area|
|Vocabulary lists of Southeast Asian languages|
p-Tibeto-Burman • Old Chinese (basic) • p-Southern Min • Greater Bai • p-Tujia • p-Naish • p-Ersuic • Guiqiong • p-Lalo • Akha • Kathu • Gong • p-Karenic • p-Luish • p-Bodo-Garo • Kuki-Chin • Mru • p-W. Tibetan • Zakhring • Tshangla • Kho-Bwa • Mey • p-Puroik • p-Hrusish • Koro • Greater Siangic • Raji-Raute • Dhimalish • Baram-Thangmi • Bhujel • p-Kham • Dura • Bunan • (Nepal)
|Vocabulary lists of Indo-Pacific languages|
p-Trans-New Guinea • Bayono-Awbono • Paniai Lakes • Kolopom • Bulaka River • Pauwasi • p-South Bougainville • p-Lower Sepik • p-Watam-Awar-Gamay • p-Lakes Plain • p-North Halmahera • p-Timor-Alor-Pantar • p-Alor-Pantar • Tayap • Massep
A selection of various comparative lexical databases currently available online:
- NorthEuraLex (North Eurasia)
- IELex (Indo-European)
- Uralic Etymological Database (Uralic)
- RefLex (Africa)
- TransNewGuinea.org (Melanesia)
- Chirila (Australia)
- Southeast Asia