Appendix:Vocabulary lists

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Vocabulary list appendices:

Introduction[edit]

Welcome to Wiktionary's vocabulary lists series. This series aims to have representative word lists for all language families of the world.

  • Purpose: As linguistic lexicographical works, the vocabulary lists are designed with historical-comparative linguistics research goals in mind, such as classifying languages, reconstructing proto-languages, and identifying loanwords. Frequency lists and pedagogical resources are not included.
  • Glosses: Each list maintains original glosses (definitions, meanings) as found in the original sources. Translated glosses are sometimes added as additional columns if the original glosses are not in English. Translations that are not in the original source are noted in the lists, and do not replace the original glosses. Unlike Swadesh lists and other standardized lexicostatistical word lists, the vocabulary lists here do not consist of lists with predetermined glosses. Instead, the vocabulary lists here can serve as "raw building blocks" for compiling Swadesh lists.
  • Content: The lists are typically in the 50-1,000 item range for lexical entries. Definitions are typically concise and focus on basic vocabulary concepts such as numerals, body parts, and natural phenomena.
  • Scope: Emphasis is placed on divergent language isolates, families, and branches that would likely be crucial for etymological reconstruction and classification. Proto-languages are included whenever possible. Many of these language groups are sparsely documented and/or extinct. As a result, some of these lists may actually be the only extant documentation of a language or even language group.
  • Sources: The word lists are adapted from academic sources published by linguists. Thus, all lists must be properly referenced with adequate notes and metadata. Many of these sources are out of print, with highly limited distribution and accessibility.
  • Digitization: As with Wikisource texts, the lists are individually and painstakingly digitized using a variety of methods, such as optical character recognition (OCR), manual typing, and document conversion.
  • Encoding: Unicode.

Open-access online lexical databases that are similar in design, content, and research goals include STEDT, MKED, RefLex, Chirila, and Starling.

Glosses[edit]

Not all of the vocabulary lists are glossed in English. Glosses are always preserved in the original language for reference and to make it easier to spot unintentional translation errors (that is, if English translations are given).

  • Africa - often in French; sometimes in German
  • Americas - often in Spanish and Portuguese; sometimes in French
  • Southeast Asia - often in Chinese; sometimes in Vietnamese, Russian, and French
  • Indo-Pacific region - sometimes in Dutch and Indonesian

Navigational templates[edit]

Vocabulary lists of North Eurasian languages

European • Balkan • Hurro-Urartian • Hattic • Sumerian (Swadesh) • Elamite • Etruscan • Burushaski • Ural-Altaic • Paleosiberian • p-Japanese • p-Ainu • p-Nivkh • p-Chukotko-Kamchatkan • p-Yukaghir • p-Yeniseian

Caucasian

Caucasian • p-Northwest Caucasian • p-Nakh-Daghestanian • p-Kartvelian

Uralic

p-Uralic (stable roots) • Finnic • Saami • Mordvinic • Mari • Permic • Ugric • Samoyedic

"Altaic" linguistic area

Altaic • p-Altaic • Turkic • Mongolic • Common Mongolic • Khitan • Taghbach

Indo-European

Germanic • Celtic • Romance • Baltic • East Slavic • West Slavic • South Slavic


Vocabulary lists of African languages
Nilo-Saharan

Nilo-Saharan • p-Nilo-Saharan • p-Nilotic (p-E. Nilotic • p-S. Nilotic) • p-Surmic • NE Sudanic • p-Nubian • Nara • p-Daju • p-Jebel • Temein • Central Sudanic (p-Central Sudanic • p-Sara-Bongo-Bagirmi • Sinyar • Birri • p-Mangbetu) • p-Kuliak (Ik) • Kadu • Berta • Kunama • Gumuz • p-Koman • Gule • Amdang • Mimi-D • p-Maban • Mimi-N • Kanuri • p-Songhay • Tadaksahak

Niger-Congo

p-Niger-Congo • p-Atlantic-Congo • p-Benue-Congo • p-Grassfields • p-Ring • Momo • Tivoid • Ekoid • Beboid • Bendi • p-Bantu (Swadesh list) • p-Kongo • p-Jukunoid • p-Plateau (p-Tarokoid) • p-N. Jos • p-Fali • p-Yoruboid • Olukumi • p-Edoid • p-Akokoid • p-Igboid • Akpes • Ayere-Ahan • p-Upper Cross River • p-Lower Cross River • Anaang • p-Ogoni • p-Ukaan • p-Nupoid • Oko • p-Idomoid • p-Ijaw • Defaka • p-Gbe (Fongbe) • p-Potou-Akanic • p-Mumuye • p-Jen • Yendang • Tula-Waja • p-Lakka • p-Bua • Kim • p-Central Togo • p-Guang • p-Gurunsi • p-Oti-Volta (p-E. Oti-Volta • p-C. Oti-Volta) •  Tiefo • Natioro • Bariba • p-Gbaya • Dogon • p-Mande (p-W. Mande • p-Mandekan • p-Niger-Volta • p-S. Mande) • Atlantic (Guinea) • p-Fula-Sereer • p-Cangin • p-Manjaku • Bijogo • p-Talodi • p-Heiban • p-Katloid • Rashad • Lafofa

Afroasiatic

p-Afroasiatic • p-Chadic • p-Ron • p-North Bauchi • p-South Bauchi • South Bauchi • p-Central Chadic • p-Masa • Kujarge • p-Cushitic • p-Agaw • p-Omotic • p-Aroid • p-Maji • Mao • p-Semitic

Khoisan

Khoisan • p-Khoe • p-Central Khoisan • p-Tuu • !Kung

Language isolates

Bangime • Jalaa • Laal • Ongota • Shabo • Sandawe • Hadza

Others

p-Niger-Saharan


Vocabulary lists of Southeast Asian languages
Sino-Tibetan

p-Tibeto-Burman • Old Chinese (basic) • p-Southern Min • Macro-Bai • p-Tujia • p-Naish • p-Ersuic • Guiqiong • Horpa • p-Lolo-Burmese • p-Lalo • Lalo • Akha • Woni • Axi • Nesu • Yi (Mihei) • Kathu • Gong • p-Karenic • p-Luish • p-Bodo-Garo • Kuki-Chin • Suansu • Mru • p-W. Tibetan • Tibetan (Lajiao) • Amdo Tibetan • Zakhring • Tshangla • Kho-Bwa • Mey • p-Puroik • p-Hrusish • Koro • Greater Siangic • Raji-Raute • Dhimalish • Baram-Thangmi • Bhujel • p-Kham • Dura • Bunan • (Nepal)

Austroasiatic

p-Austroasiatic • p-Munda • p-Khasian • p-Palaungic • Quang Lam • p-Khmuic • p-Pakanic • p-Vietic • p-Katuic • p-Bahnaric • p-Pearic • p-Khmeric • p-Monic • p-Aslian • p-Nicobarese

Hmong-Mien

p-Hmong-Mien • Hmong-Mien • p-Hmongic • Pa-Hng • Xong • Pana • She • p-Mienic • Mienic • Mien (Gongcheng) • Biao Min (Shikou)

Kra-Dai

p-Kra-Dai • p-Kra • Laha • Qabiao • Gelao • p-Kam-Sui • Kam-Sui (Hunan) • p-Lakkia • Biao • p-Tai • Zhuang (Tiandeng) • Bouyei • p-Be • Jizhao • p-Hlai • Jiamao

Austronesian

p-Austronesian


Vocabulary lists of Indo-Pacific languages
Papuan

p-Trans-New Guinea • p-Northern Adelbert • p-Sogeram • Bayono-Awbono • Paniai Lakes • Kolopom • Bulaka River • Pauwasi • p-South Bougainville • p-Lower Sepik • p-Watam-Awar-Gamay • p-Lakes Plain • p-North Halmahera • p-Timor-Alor-Pantar • p-Alor-Pantar • Tayap • Massep

Australian

North Australian (basic) • p-Nyulnyulan • p-Mirndi • p-Gunwinyguan • Limilngan • Umbugarla • Minkin • Tiwi • Malak-Malak • Pama-Nyungan • p-Pama-Nyungan • p-Arandic • p-Thura-Yura • p-Ngayarda

Others

p-Dravidian • Dravidian • Kusunda • Nihali • Kenaboi • p-Ongan


Vocabulary lists of Amerindian languages
North America

Amerindian • p-Amerind • p-Eskimo • p-Na-Dene • p-Athabaskan • p-Algonquian • Beothuk • p-Iroquoian • p-Siouan • Caddoan • Yuchi • Kutenai • Chinook • p-Sahaptian • p-Takelman • p-Kalapuyan • Alsea • p-Wintun • Klamath • Molala • Cayuse • Coos • Lower Umpqua • p-Utian • p-Yokuts • p-Maidun • p-Salishan • p-Wakashan • p-Chimakuan • p-Hokan • p-Palaihnihan • Chimariko • Shasta • Yana • p-Pomo • Esselen • Salinan • p-Chumash • Waikuri • p-Yuman • p-Yukian • Washo • p-Kiowa-Tanoan • p-Keresan • Coahuilteco • Comecrudo • Cotoname • Karankawa • Tonkawa • Maratino • Quinigua • Naolan • p-Muskogean • Natchez (Swadesh) • Atakapa • Adai • Timucua

Central America

p-Oto-Manguean • p-Oto-Pamean • p-Central Otomian • p-Otomi • p-Popolocan (p-Mazatec) • p-Chinantec • p-Mixtec • p-Zapotec • p-Uto-Aztecan • p-Aztecan • Purépecha (Swadesh) • Cuitlatec • p-Totozoquean • p-Totonacan • p-Mixe-Zoquean • Highland Chontal • Huamelultec • Tequistlatec • p-Huave • p-Mayan (Swadesh) • Xinca • p-Jicaque • p-Lencan • Lenca • p-Misumalpan

South America

p-Cariban • p-Taranoan • p-Chibchan • p-Barbacoan • Páez • p-Pano-Takanan • p-Panoan • p-Makú • Hupda • p-Tukanoan • p-Arawan • Harákmbut–Katukinan • p-Cahuapanan • p-Choco • p-Guahiban • p-Shuar • Candoshi • p-Shuar-Candoshi • Achuar • p-Nambikwaran • Tinigua • Timote • p-Lule-Vilela • Vilela • Chamacoco • Allentiac • Chaná • Arutani-Sape • p-Bora-Muinane • Bora • p-Witotoan • Witoto • p-Macro-Daha • Sáliba • Piaroa • Ticuna • Yuri • Caraballo • Andoque • p-Mataguayo • p-Guaicurú • Guachi • Payagua • Mura • Pirahã • Matanawi • Quechumaran • Quechuan • p-Zaparoan • p-Peba-Yagua • Iquito • p-Chapacuran • Andaqui • Guamo • Betoi • Kamsá • Otomacoan • Jirajaran • Hibito-Cholon • Cholón • Sechura-Catacao • Sechura • Culli • Mochica • Esmeralda • Taushiro • Urarina • Aiwa • Canichana • Guató • Irantxe • Aikanã • Kanoé (Swadesh) • Kwaza • Mato Grosso Arára • Munichi • Omurano • Puinave • Leco • Puquina • Ramanos • Warao • Yaruro • Yuracaré • Yurumangui

South America (NE Brazil)

Katembri • Taruma • Yatê • Xukurú • Natú • Pankararú • Tuxá • Atikum • Kambiwá • Xokó • Baenan • Kaimbé • Tarairiú • Gamela

South America (Arawakan)

p-Arawakan • p-Japurá-Colombia • p-Lokono-Guajiro • Wayuu • p-Mamoré-Guaporé • p-Bolivia • p-Mojeño • p-Purus

South America (Macro-Jê)

p-Macro-Jê • Rikbaktsa • p-Jê • Jeikó • p-Jabuti • p-Kamakã • Kamakã • Maxakali • Chiquitano • Dzubukua • Oti • p-Puri • p-Bororo

South America (Tupian)

p-Tupian • Puruborá • Karo • p-Tupari • p-Maweti-Guarani • p-Tupi-Guarani • Guaraní

External links[edit]

A selection of various comparative lexical databases currently available online:

General
Regional
Southeast Asia
  • STEDT (Sino-Tibetan)
  • MKED (Austroasiatic)
  • ABVD (Austronesian)
  • ACD (Austronesian)