Wiktionary:Language treatment

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:LANGTREAT)
Jump to: navigation, search
Accessories-text-editor.svg This is a Wiktionary policy, guideline or common practices page. Specifically it is a policy think tank, working to develop a formal policy.
Policies: CFI - ELE - BLOCK - REDIR - BOTS - QUOTE - DELETE - NPOV - AXX

The distinction between languages and dialects is not clear-cut. A lect that some regard as a dialect of a certain language may be regarded as a full, separate language by others. This page contains a list of languages and their (ISO-code-having) dialects, with notes on whether or not the dialects are treated as separate languages on Wiktionary. If there is no note about the status of a particular language+dialect group, the situation is not yet regulated. If multiple dialects are treated as a single language on Wiktionary, but there is no ISO code that represents all of them, the code of one of the dialects is used as the code for the whole language, or an exceptional code is created (for more, see Wiktionary:Languages).

For the most part, this page documents cases where Wiktionary's treatment of lects deviates from that of the ISO/SIL, e.g. cases where we have merged lects that they have not. Cases where an ISO code has been excluded from Wiktionary altogether (typically because it was too vague to be meaningful) are also documented. Cases where the ISO/SIL itself has merged lects which they formerly granted separate codes, and we have followed suit, are not necessarily documented here.

Discussions about splitting, merging, deleting, adding or renaming lects may be archived to Wiktionary:Language treatment/Discussions. (Do not start or continue discussions on that page; do that in an appropriate community forum, such as WT:BP or WT:RFM.)

List of languages and dialects[edit]

Macrolanguage Subdivisions Treatment
Akan (ak) fat (Fanti), tw (Twi) Both the macrolanguage and its subdivisions are treated as languages.
Albanian (sq) aae (Arbëreshë Albanian), aat (Arvanitika Albanian), aln (Gheg Albanian), als (Tosk Albanian) In practice, the subdivision aln is treated as a language, but the macrolanguage code is used in place of the code (als) which the ISO gave the standard variety of the language. The status of aae and aat is unclear. (inconclusive discussion)
Antillean Creole (gcf) gcf, acf Antillean Creole is treated as a single language with the code gcf. The ISO had coded two dialects separately, using acf for "Saint Lucian Creole French" and gcf for "Guadeloupean Creole French". (discussion, permalink)
Arabic (ar) aao (Algerian Saharan Arabic), abh (Tajiki Arabic), abv (Baharna Arabic), acm (Iraqi Arabic), acq (Ta'izzi-Adeni Arabic), acw (Hijazi Arabic), acx (Omani Arabic), acy (Cypriot Arabic), adf (Dhofari Arabic), aeb (Tunisian Arabic), aec (Saidi Arabic), afb (Gulf Arabic), ajp (South Levantine Arabic), apc (North Levantine Arabic), apd (Sudanese Arabic), arb (Standard Arabic), arq (Algerian Arabic), ars (Najdi Arabic), ary (Moroccan Arabic), arz (Egyptian Arabic), auz (Uzbeki Arabic), avl (Eastern Egyptian Bedawi Arabic), ayh (Hadrami Arabic), ayl (Libyan Arabic), ayn (Sanaani Arabic), ayp (North Mesopotamian Arabic), bbz (Babalia Creole Arabic), pga (Juba Arabic), shu (Chadian Arabic), ssh (Shihhi Arabic) Both the macrolanguage and its subdivisions are treated as languages, but the macrolanguage code is used in place of the code (arb) which the ISO gave the standard variety of the language.
Aramaic arc (Imperial Aramaic), oar (Old Aramaic), aii (Assyrian Neo-Aramaic), aij (Lishanid Noshan), amw (Western Neo-Aramaic), bhn (Bohtan Neo-Aramaic), bjf (Barzani Jewish Neo-Aramaic), cld (Chaldean Neo-Aramaic), hrt (Hértevin), huy (Hulaulá), jpa (Jewish-Palestinian Aramaic), kqd (Koy Sanjaq Surat), lhs (Mlahsô), lsd (Lishana Deni), mid (Modern Mandaic), myz (Classical Mandaic), sam (Samaritan Aramaic), syc (Syriac; Classical Syriac), syn (Senaya), tmr (Jewish Babylonian Aramaic), trg (Lishán Didán), tru (Turoyo), xrm (Armazic) Some varieties are treated as languages, others are not:
The code oar for "Old Aramaic" (up to 700 BCE) is not used; it has been superseded by arc and syc.
"Jewish Babylonian Aramaic" (circa 200-1200 CE) is also not allowed L2s, as it has been superseded by arc, but its code tmr is allowed in etymologies.
Assyrian Neo-Aramaic (aii) and Chaldean Neo-Aramaic (cld) are currently treated as languages, as are aij, amw, bhn, bjf, hrt, huy, kqd, lhs, lsd, mid, myz, sam, syn, trg, tru and xrm, and, of course, arc and syc. jpa is not currently treated as a language.
Aymara (ay) ayc (Southern Aymara), ayr (Central Aymara) Only the macrolanguage is treated as a language. (discussion)
Azeri (az) azb (South Azerbaijani), azj (North Azerbaijani) Only the macrolanguage is treated as a language. (discussion)
Baluchi (bal) bcc (Southern Baluchi/Balochi), bgp (Eastern Baluchi/Balochi), bgn (Western Baluchi/Balochi) Only the macrolanguage is treated as a language.
Berber (ber) auj (Awjilah), swn (Sawknah), siz (Siwi), cnu (Chenoua), jbe (Judeo-Berber), shi (Tashelhit), tzm (Central Atlas Tamazight), zgh (Standard Moroccan Tamazight), kab (Kabyle), gha (Ghadamès), jbn (Nafusi), sds (Sened), gho (Ghomara), oua (Tagargrent), tjo (Temacine Tamazight), grr (Taznatit), mzb (Tumzabt), sjs (Senhaja Berber), rif (Tarifit), shy (Tachawit), tia (Tidikelt Tamazight), thv (Tahaggart Tamahaq), ttq (Tawallammat Tamajaq), thz (Tayart Tamajeq), taq (Tamasheq), zen (Zenaga) Only the subdivisions are treated as languages. (discussion 1 and discussion 2, discussion 3, discussion 4)
Bikol (bik) agk, agz, atl, bcl (Central Bikol), bln, bto, cts, fbl, lbl, rbl, ubl Only the subdivisions are treated as languages, the macrolanguage is not. (discussion)
Bontoc (bnc) rbk, vbk, lbk, ebk, obk
Buryat (bua) bxm (Mongolian Buriat), bxr (Russian Buriat), bxu (Chinese Buriat) Only the macrolanguage is treated as a language. (discussion 1, discussion 2, discussion 3)
Chinese (zh) cdo, cjy, cmn, cpx, czh, czo, gan, hak, hsn, lzh, mnp, nan, wuu, yue Only the macrolanguage is treated as a language. (superseded discussion; vote)
Cree (cr) atj (Atikamekw), crj (Southern East Cree), crk (Plains Cree), crl (Northern East Cree), crm (Moose Cree), csw (Swampy Cree), cwd (Woods Cree), moe (Montagnais), nsk (Mansaka)
Dieri (dif) dif, dit (Dirari) Only the macrolanguage is treated as a language.
Dinka (din) dib, dik, dip, diw, dks
Dogri (doi) dgo, xnr
Dongolawi (also called "Kenuzi-Dongola") (kzh) dgl (Andaandi / Dongolawi), xnz (Kenzi / Mattoki) Only the macrolanguage is treated as a language.
Dhudhuroa (ddr) ddr (Dhudhuroa), xjt (Yaitmathang) Only ddr is treated as a language. (discussion)
English (en) hwc (Hawai'ian Creole English), pld (Polari) Only the macrolanguage is treated as a language.
Estonian (et) vro (Võro) Both the macrolanguage and its subdivision vro are treated as languages, but the macrolanguage code is used in place of the code (ekk) which the ISO gave the standard variety of the language.
Fula (ff) ffm, fub, fuc, fue, fuf, fuh, fui, fuq, fuv Only the macrolanguage is treated as a language.
French (fr) roa-gal (Gallo), roa-grn (Guernésiais), roa-jer (Jèrriais), roa-nor (Norman), frc (Cajun French / Louisiana French) Both the macrolanguage fr and its subdivisions roa-gal, roa-grn and roa-jer are treated as languages. frc is an etymology-only language (and is not to be confused with Louisiana Creole French, which is treated as a full language).
Gam (kmc) doc (Northern Dong), kmc (Southern Dong) Gam is treated as a single language with the code kmc. The ISO had coded two dialects separately, using doc for "Northern Dong" and kmc for "Southern Dong". "Cao Miao" (cov) is tentatively still treated as a separate language, pending further discussion. (discussion)
Gaulish (cel-gau) xtg (Transalpine Gaulish), xcg (Cisalpine Gaulish) The two varieties, Trans- and Cis-apline Gaulish, have been merged under the code cel-gau, though they may still be separated in etymologies. (discussion)
Gbaya (gba) bdt, gbp, gbq, gmm, gso, gya
Gondi (gon) ggo, gno Only the macrolanguage is treated as a language.
Grebo (grb) gbo, gec, grj, grv, gry, ktj, oub, pye, ted
Greek (el) grc (Ancient Greek), grk-cal, gkm, cpg, gmy, pnt Both the macrolanguage and its subdivisions are treated as languages.
Guaraní (gn) gnw, gug, gui, gun, nhd
Haida (hai) hax (Southern Haida), hdn (Northern Haida)
Hebrew (he) hbo (Biblical Hebrew) Biblical Hebrew does not have an L2 separate from Hebrew; the code he is used for both. In etymologies, however, the two may be distinguished. (See WT:AHE.)
Hmong (hmn) cqd (Chuanqiandian-cluster Miao), hea (Northern Qiandong Miao), hma (Southern Mashan Hmong), hmc (Central Huishui Hmong), hmd (A-Hmao / Large Flowery Miao), hme (Eastern Huishui Hmong), hmf (Hmong Don), hmg (Southwestern Guiyang Hmong), hmh (Southwestern Huishui Hmong), hmi (Northern Huishui Hmong), hmj (Ge), hml (Luopohe Hmong), hmm (Central Mashan Hmong), hmp (Northern Mashan Hmong), hmq (Eastern Qiandong Miao), hms (Southern Qiandong Miao), hmv (Hmong Do), hmw (Western Mashan Hmong), hmy (Southern Guiyang Hmong), hmz (Hmong Shua), hnj (Hmong Njua), hrm (Horned Miao), huj (Northern Guiyang Hmong), mmr (Western Xiangxi Miao), muq (Eastern Xiangxi Miao), mww (White Hmong), sfm (Small Flowery Miao) Tentatively, all subvarieties are accepted.
The old macrolanguage code blu and the newer macrolanguage code hmn are not used.
cqd (Chuanqiandian-cluster Miao) is an umbrella term for various varieties of Hmong in China. (discussion 1, discussion 2)
Inuktitut (iu) ike (Eastern Canadian Inuktitut), ikt (Western Canadian Inuktitut) Only the macrolanguage is treated as a language.
Inupiak (ik) esi (North Alaskan Inupiatun), esk (Northwest Alaska Inupiatun) Only the macrolanguage is treated as a language.
Jeru (akj) akj (Jeru), gac (Mixed Great Andamanese) Only Jeru (akj) is treated as a language. Mixed Great Andamanese (gac) is excluded because it is merely a nonstandard variety of akj used by seven people. (discussion)
Judeo-Arabic (jrb) ajt, aju, jye, yhd, yud
Kado (kdv) zkd (Kadu proper), zkn (Kanan) In 2012, the ISO split kdv into zkd and zkn. Wiktionary has not made this split at this time. (discussion)
Kalenjin (kln) enb, eyo, niq, oki, pko, sgc, spy, tec, tuy Only the macrolanguage is treated as a language.
Kanuri (kr) bms (Bilma Kanuri), kau, kbl, kby, knc, krt Only the macrolanguage is treated as a language. (discussion)
Khmer (km) khm, kxm
Kituba (ktu) ktu, mkw Kituba is treated as a single language with the code ktu. The ISO had coded two dialects separately, using ktu for the variety spoken in the Democratic Republic of the Congo and mkw for the variety spoken in the Republic of the Congo. {discussion)
Komi-Zyrian (kv) koi, kpv Only the subdivisions are treated as languages.
Kongo (kg) kng (Koongo), kwy (San Salvador Kongo), ldi (Laari), yom (Yombe) Only the macrolanguage is treated as a language. (discussion)
Konkani (kok) gom, knn Only the macrolanguage is treated as a language.
Kpelle (kpe) gkp, xpe
ǃKung (khi-kun) mwj (Sekele / Maligo), knw (Ekoka ǃKung), oun (ǃOǃKung), gfx (Mangetti Dune ǃXung) Only the macrolanguage is treated as a language. (discussion 1, discussion 2, discussion 3)
Kunjen (kjn) kjn, olk Kunjen is treated as a single language with the code kjn. The ISO had coded two dialects separately, using kjn for "Uw Oykangand" and olk for "Uw Olkola".
Kurdish (ku) ckb, kmr, sdh
Lahnda (lah) hnd, hno, jat, phr, pmu, pnb, skr, xhe
Latvian (lv) ltg (Latgalian) Both the macrolanguage and its subdivision ltg are treated as languages, but the macrolanguage code is used in place of the code (lvs) which the ISO gave the standard variety of the language.
Lenape (also called "Delaware") (del) umu (Munsee), unm (Unami) Only the subdivisions are treated as languages. (discussion)
Lithuanian (lt) sgs (Samogitian) Both the macrolanguage and its subdivision sgs are treated as languages.
Low German The code nds is deprecated. nds-de is used for German Low German varieties (including Westphalian, which the ISO gives the code wep). nds-nl is used for Dutch Low Saxon varieties (including Achterhoeks = act, Drents = drt, Gronings = gos, Sallands = sdz, Stellingwerfs = stl, Twents = twd, Veluws = vel). Plautdietsch (pdt) is a separate lect. (discussion of language name, discussion of drt, gos, twd, general discussion 1, general discussion 2, discussion of nds-de, general discussion 3 (permalink))
Luhya (luy) bxk, ida, lkb, lko, lks, lri, lrm, lsm, lto, lts, lwg, nle, nyd, rag
Luwian (xlu) xlu, hlu Luwian is treated as a single language with the code xlu. Luwian is written in two scripts, and the ISO had coded each separately, using xlu for "Cuneiform Luwian" and hlu for "Hieroglyphic Luwian"). (discussion, permalink)
Malagasy (mg) bhr (Bara Malagasy), bjq, bmm (Northern Betsimisaraka Malagasy), bzc (Southern Betsimisaraka Malagasy), mlg, msh, plt, skg, tdx, tkg, txy, xmv, xmw Only the macrolanguage is treated as an individual language. (discussion)
Malay (ms) bjn, btj, bve, bvu, coa, dup, hji, id, jak, jax, kvb, kvr, kxd, lce, lcf, liw, max, meo, mfa, mfb, min, mqg, msi, mui, orn, ors, pel, pse, tmw, urk, vkk, vkt, xmm, zlm, zmi The code zsm is not used; ms is used instead. The status of the remaining lects is unclear. (discussion, vote)
Mandingo (man) emk, mku, mlq, mnk, msc, mwk, myq
Mantharta djl (Jiwarli (macrolanguage code)), dze (Jiwarli (proper)), inn (Thiin), dhr (Tharrgari), wri (Warriyangga) Jiwarli is treated as a single language with the code djl; the ISO's split of that code into dze for Jiwarli proper and iin for Thiin has not been followed. However, dhr (Tharrgari) and wri (Warriyangga) have tentatively been retained as languages, rather than being merged, with Jiwarli, into the single language Mantharta. (discussion)
Mari (chm) mrj (Western Mari) Both the macrolanguage and its subdivision mrj are treated as languages, but the macrolanguage code is used in place of the code (chm) which the ISO gave the standard variety of the language. (discussion 1, discussion 2)
Marwari (mwr) dhd, mtr, mve, rwr, swv, wry Only the macrolanguage is treated as a language. (discussion)
Maykulan (mnt) wnn (Wunumara), xyj (Mayi-Yapi), xyk (Mayi-Kulan), xyt (Mayi-Thakurti) Only the macrolanguage is treated as a language. (discussion)
Mongolian (mn) khk (Khalkha Mongolian), mvf (Peripheral Mongolian) Only the code mn is used for Mongolian; khk is redundant to it and mvf is not usable. Note that Kalmyk (code: xal) and Buryat (see its entry in this table), which some scholars consider dialects of Mongolian, are treated as independent languages on Wiktionary. (discussion)
Na (nbf) nru (Narua), nxq (Naxi) Only the subdivisions are treated as languages, the macrolanguage is not.
Nahuatl (nah) azd, azn, azz, naz, nch, nci, ncj, ncl, ncx, ngu, nhc, nhe, nhg, nhi, nhk, nhm, nhn, nhp, nhq, nht, nhv, nhw, nhx, nhy, nhz, nln, nlv, nuz, ppl, xpo Both the macrolanguage and its subdivisions are treated as languages.
Naki (mff) mff, jms (Mashi), buz (Bukwen) Only the macrolanguage is treated as a language. (discussion)
Nepali (ne) ne, npi Only the code ne is used.
Norman (roa-nor) roa-grn, roa-jer Both the macrolanguage and its subdivisions are treated as languages. Compare fr.
Norwegian (no) nb, nn In practice, both the macrolanguage and its subdivisions are treated as languages. There has been discussion of either treating only the macrolanguage as a language, or of only treating the subdivisions as languages, but there is no consensus about which of these to do. (discussion 1, discussion 2, discussion 3, discussion 4, discussion 5, discussion 6, stalemated vote)
Nyika (nkt) nkt, nkv Nyika is treated as a single language with the code nkt. The ISO had coded two regional varieties separately, using nkt for the Nyika of Tanzania and nkv for the Nyika of Malawi and Zambia.
Occitan (oc) prv (Provençal) Only the macrolanguage is treated as a language. prv is an etymology-only language.
Ojibwe (oj) ciw, ojb, ojc, ojg, ojs, ojw, otw Both the macrolanguage and its subdivisions are treated as languages, except that the code ciw is not used, having been merged into oj.
Old French (fro) xno (Anglo-Norman) Only the macrolanguage is treated as a language. xno is an etymology-only language. (discussion)
Oromo (om) gax, gaz, hae, orc Only the macrolanguage is treated as a language. (discussion)
Paku Karen (kpp) jkp (Paku Karen), jkm (Mobwa Karen) In 2012, the ISO split kpp into jkp and jkm. Wiktionary has not made this split at this time. (discussion)
Pashto (ps) pbt (Southern Pashto), pbu (Northern Pashto), pst (Central Pashto), wne (Waneci) Only the macrolanguage ps and the variety wne (Waneci) are treated as languages. (discussion of pbt, pbu, pst)
Persian (also called "Farsi") (fa) aiq, bhh, deh, haz, jdt, jpr, pes, phv, prs, tg, ttt Persian (fa), Tajik (tg), Judeo-Persian (jpr), Bukhari (bhh), Judeo-Tat (jdt) and Tat (ttt) are treated as separate languages.

Western Persian (pes) and Eastern Persian / Dari (prs) are subsumed into fa. The status of aiq, deh, haz and phv is unresolved. (discussion of Tajik, of Judeo-Persian and Bukhari, and of Tat)

Pitcairn-Norfolk (pih) cpe-pit (Pitcairn), cpe-nor (Norfolk) For a time, Wiktionary split Pitcairn-Norfolk into two varieties, granting each an exceptional code: Pitcairn was cpe-pit, Norfolk was cpe-nor. That split has been undone; only pih is now treated as a language. (discussion 1, discussion 2)
Polish (pl) csb (Kashubian), zlw-pom (Pomeranian), zlw-slv (Slovincian) In practice, both the macrolanguage and its subdivisions are treated as languages.
Pomeranian (zlw-pom) See the entry for "Polish".
Purepecha (pua) pua, tsz Purepecha is treated as a single language with the code pua. The ISO had coded two dialects separately, using pua for "Western Purepecha" and tsz for "Eastern Purepecha". (discussion)
Quechua (qu) cqu (Chilean Quechua), qub, qud, quf, qug, quh, quk, qul, qup, qur, qus, quw, qux, quy, quz, qva, qvc, qve, qvh, qvi, qvj, qvl, qvm, qvn, qvo, qvp, qvs, qvw, qvz, qwa, qwc, qwh, qws, qxa, qxc, qxh, qxl, qxn, qxo, qxp, qxr, qxt, qxu, qxw Only the macrolanguage is treated as a language.
Rajasthani (raj) bgq, gda, gju, hoj, mup, wbr
Romani (rom) rmc, rmf, rml, rmn, rmo, rmw, rmy Only the macrolanguage is allowed an L2 header, but the subdivisions are allowed nested lines in translations tables. (discussion)
Romanian (ro) mo (Moldavian) Only the macrolanguage (ro) is treated as a language. (vote)
Sahaptin (qot) uma, waa, yak, tqn Both the macrolanguage and its subdivisions are treated as languages.
Sardinian (sc) sdc, sdn, src, sro
Serbo-Croatian (sh) bs, hr, sr, zls-mon Only the macrolanguage is treated as a language. (See discussion 1, discussion 2, discussion 3, discussion 4, discussion 5, more discussion, this old vote and many other discussions.)
Slavey (den) scs, xsl
Swahili (sw) swc, swh Only the macrolanguage is treated as a language.
Syriac (syr) See the entry for "Aramaic".
Tagalog (tl) fil (Filipino) Only the macrolanguage (tl) is treated as a language. (vote)
Tamashek (tmh) taq, thv, thz, ttq
Uzbek (uz) uzn, uzs Only the macrolanguage is treated as a language.
Wangkumara (xwk) xwk, xpt (Punthamara), eaa (Karenggapa) Only the macrolanguage (Wangkumara, xwk) is treated as a language, and it is treated as a language, rather than as a dialect of Ngura (which the ISO used to consider a single language with the code nbx).
Wemba-Wemba (xww) rbp (Baraba-Baraba), rnr (Nari-Nari), weg (Wergaia), xwt (Wotjobaluk) Only the macrolanguage is treated as a language. (discussion)
Wintu (wnw) nol (Nomlaki), pwi (Patwin), wnw (Wintu) Both the macrolanguage and its subdivisions are treated as languages. (discussion)
Yarli (yxl) wdk, yga, yxl Only yxl is treated as a language.
Yendang (yen) ynq (Yendang proper), yot (Yotti) Only the macrolanguage is treated as a language. (discussion)
Yiddish (yi) ydd, yih Only the macrolanguage is treated as a language. (discussion)
Yir-Yoront (yiy) yyr (Yir-Yoront), yrm (Yirrk-Mel / Yirrk-Thangalkl) Only the macrolanguage is treated as a language. (discussion)
Zapotec (zap) zaa, zab, zac, zad, zae, zaf, zai, zam, zao, zaq, zar, zas, zat, zav, zaw, zax, zca, zoo, zpa, zpb, zpc, zpd, zpe, zpf, zpg, zph, zpi, zpj, zpk, zpl, zpm, zpn, zpo, zpp, zpq, zpr, zps, zpt, zpu, zpv, zpw, zpx, zpy, zpz, zsr, zte, ztg, ztl, ztm, ztn, ztp, ztq, zts, ztt, ztu, ztx, zty
Zazaki (zza) diq, kiu Only the macrolanguage is treated as a language.
Zhuang (za) zch, zeh, zgb, zgm, zgn, zhd, zhn, zlj, zln, zlq, zqe, zyb, zyg, zyj, zyn, zzj Only the macrolanguage is treated as a language.

Excluded codes[edit]

The following codes have been excluded without being subsumed into other codes:

  • vmf, called "Mainfränkisch" by the ISO, is excluded from Wiktionary because it is too vague to be usable. (discussion)
  • bpw ("Bo", "Po", "Sorimi") is excluded for now because its existence as a distinct language is unconfirmed and undocumented, and if it were included, a naming conflict would exist with bgl.