Wiktionary:Languages
| This is a Wiktionary policy, guideline or common practices page. Specifically it is a policy think tank, working to develop a formal policy. | |
| Policies: CFI - ELE - BLOCK - REDIR - BOTS - QUOTE - DELETE - NPOV - AXX |
Wiktionary includes many words in many languages.
To distinguish languages, Wiktionary gives each a unique name and a unique code, which identify it.
- See Wiktionary:Dialects and Wiktionary:Families for discussions of dialects and of language families, respectively.
Contents |
Language names [edit]
Wiktionary calls each language by a different name; these language names are used in headers, translations tables, lexical categories, appendices, and some other places. Language names are chosen by consensus. Whenever possible, common English names of languages are used, and diacritics are avoided. Attested names (names which meet CFI) are strongly preferred.
When a single language is known by multiple names, only one is used. For a list of languages which are known by multiple names, see the Section "List of languages with multiple names".
When two languages are commonly known by the same name, Wiktionary distinguishes them by using synonyms for one or both, or (rarely) by using appended identifiers. For example, the Ghanan language commonly called "Buli" is referred to as "Buli (Ghana)" on Wiktionary and represented by the code "bwu"; the Indonesian language commonly called "Buli" is referred to as "Buli (Indonesia)" on Wiktionary and represented by the code "bzq". The Indonesian language commonly called "Maba" is referred to as "Maba" on Wiktionary and represented by the code "mqa"; the Chadian language commonly called "Maba" is referred to on Wiktionary as "Bura Mabang" and represented by the code "mde".
Language codes [edit]
- Wiktionary's bots maintain Wiktionary:Index to templates/languages, a list of all used language codes; all can also be found in Category:Language code templates.
Wiktionary has an intricate system for determining which string of letters (code) represents each language and language family, and for determining where (at which URL) the information that a particular string of letters represents a particular language or family will be stored and be called from (that is, where templates will look for information when they are given a string of letters and must display or otherwise use the name signified by the string). Language codes are used in naming some categories, and are called by many templates. When a template is called directly, the result is the language name: calling {{vot}}, for example, displays Votic; in this way, you can determine a language's name if you know its code. If you know its name, you can determine its code by using {{langrev}} with the language's name as a parameter: the template will return the language's code if it can find it. (Type {{langrev|English}}, for example, in the Sandbox or Special:ExpandTemplates, and it will return "en".)
Wiktionary also has a simple system for recording which family individual languages belong to, and which scripts they are written in.
Wiktionary represents individual languages as follows:
- Languages which were assigned two-letter codes in the international standard ISO 639-1 are generally represented on Wiktionary by those codes. The individual codes are stored in the Template: namespace without any prefix. English, for example, is represented by en, as recorded in the Template:en. German is represented by de (Template:de). Esperanto is represented by eo (Template:eo). Wiktionary has a list of ISO 639-1 codes here.
- A few languages are represented on Wiktionary by 639-1 codes the ISO has deprecated. (This is generally the case when the ISO has come to consider a lect a group of languages, but Wiktionary still considers it a single language.) Serbo-Croatian, for example, is represented by sh (Template:sh).
- Languages which were not assigned codes by ISO 639-1, but which were assigned three-letter codes (based on Ethnologue codes) in the international standard ISO 639-3 are generally represented on Wiktionary by those codes. Abenaki, for example, is represented by abe (Template:abe). Wiktionary has a list of ISO 639-3 codes here.
- A few languages are represented by other, "exceptional" codes. (A complete list of these is in the section "List of languages with exceptional codes".) Exceptional codes are chosen as follows:
- A few are ISO 639-2 codes. (This is the case, for example, for languages which were not assigned specific, single codes by either ISO 639-1 or ISO 639-3.) Nahuatl, for example, is represented by nah (Template:nah).
- A few are codes devised by the Wikimedia Foundation Language Committee. (This is the case when a Wikimedia project is begun in a language which was not assigned a code by any ISO standard.) Zamboanga Chavacano, for example, is represented by cbk-zam (Template:cbk-zam). Wiktionary has a list of such codes in its Appendix:Wikimedia language codes.
- Any language which does not have an ISO or specially-devised Wikimedia code, but which is to be included in Wiktionary, is given a two-part exceptional code. The first part of this code is a relevant ISO 639-5 family code (see Wiktionary's appendix); after a hyphen, the second part of the code is a series of three lowercase letters which generally approximate the language name. (No digits, upper case letters, etc are used: IANA tags allow these, case independent, but Mediawiki software is more restrictive.) For example, Samoan Plantation Pidgin is cpe-spp (Template:cpe-spp): "cpe" is the ISO 639-5 code for English-based creoles and pidgins, "spp" abbreviates "Samoan Plantation Pidgin". Gallo is roa-gal (Template:roa-gal): "roa" is the ISO 639-5 code for Romance languages, "gal" abbreviates "Gallo".
Constructed languages which are not widely used but which have been assigned ISO 639-3 codes are sometimes accepted by Wiktionary for inclusion in dedicated Appendices. These languages are represented by their ISO 639-3 codes. Láadan, for example, is represented by the ISO 639-3 code ldn. This information is stored in the Template: namespace after a conl: prefix (constructed language): Template:conl:ldn. Some other constructed languages are also included in dedicated Appendices though they do not have ISO 639-3 codes: these languages are given codes which consist of "art-" followed by three letters, and which are stored in the Template: namespace after a conl: prefix.
Reconstructed languages are assigned special codes. Proto-Germanic, for example, is represented by the code gem-pro. This information is stored in the Template: namespace after a proto: prefix: Template:proto:gem-pro.
Not all lects which have been assigned codes by the ISO are assigned codes or included by Wiktionary.
- The ISO has assigned codes to some constructed languages which Wiktionary excludes.
- The ISO has assigned codes to some lects which Wiktionary treats as dialects of other languages and thus of other codes. (This is the case, for example, with Moldovan/Moldavian: the ISO assigned the lect the 639-1 code mo, but Wiktionary regards it as a form of Romanian and represents it and Romanian by the same code ro.)
List of languages with exceptional codes [edit]
| Name | Wikipedia article | Wiktionary code | Comments |
|---|---|---|---|
| ǃKung | w:!Kung language | khi-kun {{khi-kun}} |
|
| Ammonite | w:Ammonite language | sem-amm {{sem-amm}} |
|
| Banyumasan | w:Banyumasan language | map-bms {{map-bms}} |
|
| Bunurong | w:Bunurong language | aus-bun {{aus-bun}} |
|
| Crimean Gothic | w:Crimean Gothic language | gme-cgo {{gme-cgo}} |
|
| Dutch Low Saxon | w:Dutch Low Saxon | nds-nl {{nds-nl}} |
|
| Gabi | w:Pama-Nyungan languages#Classification and Languages | aus-gab {{aus-gab}} |
|
| Gallo | w:Gallo language | roa-gal {{roa-gal}} |
|
| Gaulish | w:Gaulish language | cel-gau {{cel-gau}} |
|
| German Low German | w:Low German | nds-de {{nds-de}} |
Wiktionary uses the exceptional code nds-de because nds is ambiguous and could include Dutch Low Saxon. |
| Greenlandic Eskimo Pidgin | w:Indigenous languages of the Americas#Pidgins, mixed languages and trade languages | crp-gep {{crp-gep}} |
|
| Guernésiais | w:Guernésiais | roa-grn {{roa-grn}} |
|
| Gunai | w:Gunai language | aus-gun {{aus-gun}} |
|
| Gutnish | w:Modern Gutnish | gmq-gut {{gmq-gut}} |
|
| Jèrriais | w:Jèrriais | roa-jer {{roa-jer}} |
|
| Leonese | w:Leonese language | roa-leo {{roa-leo}} |
|
| Maroon Spirit Language | w:Jamaican Maroon Spirit Possession Language | cpe-mar {{cpe-mar}} |
|
| Middle Chinese | w:Middle Chinese | zhx-mid {{zhx-mid}} |
|
| Middle Norwegian | w:Norwegian language#From Old Norse to distinct Scandinavian languages | gmq-mno {{gmq-mno}} |
|
| Mingo | w:Mingo | iro-min {{iro-min}} |
|
| Nahuatl | w:Nahuatl | nah {{nah}} |
|
| Norman | w:Norman language | roa-nor {{roa-nor}} |
|
| Old Danish | w:Old Danish | gmq-oda {{gmq-oda}} |
|
| Old Polish | w:Old Polish language | zlw-opl {{zlw-opl}} |
|
| Old Portuguese | w:Galician Portuguese | roa-ptg {{roa-ptg}} |
|
| Old Swedish | w:Swedish language#Old Swedish | gmq-osw {{gmq-osw}} |
|
| Phuthi | w:Phuthi language | bnt-phu {{bnt-phu}} |
|
| Picuris | w:Picuris language | nai-pic {{nai-pic}} |
|
| Pomeranian | w:Pomeranian language | zlw-pom {{zlw-pom}} |
|
| Russenorsk | w:Russenorsk | crp-rsn {{crp-rsn}} |
|
| Samoan Plantation Pidgin | w:Samoan Plantation Pidgin | cpe-spp {{cpe-spp}} |
|
| Serbo-Croatian | w:Serbo-Croatian language | sh {{sh}} |
|
| Slovincian | w:Slovincian | zlw-slv {{zlw-slv}} |
|
| Syrian Arabic | w:Syrian Arabic | sem-syr {{sem-syr}} |
|
| Taimyr Pidgin Russian | crp-tpr {{crp-tpr}} |
||
| Tarantino | w:Tarantino language | roa-tar {{roa-tar}} |
|
| Zamboanga Chavacano | w:Chavacano language#Zamboangueño | cbk-zam {{cbk-zam}} |
|
List of appendix-only constructed languages [edit]
| Name | Wikipedia article | Wiktionary code | Comments |
|---|---|---|---|
| Bolak | w:Bolak language | art-blk {{conl:art-blk}} |
|
| Communicationssprache | w:Communicationssprache | art-com {{conl:art-com}} |
|
| Eloi | w:Eloi language | art-elo {{conl:art-elo}} |
|
| Go'uld | w:Go'uld | art-gld {{conl:art-gld}} |
|
| Klingon | w:Klingon language | tlh {{conl:tlh}} |
|
| Láadan | w:Láadan | ldn {{conl:ldn}} |
|
| Lapine | w:Lapine language | art-lap {{conl:art-lap}} |
|
| Mandalorian | w:Mandalorian#Language | art-man {{conl:art-man}} |
|
| Mundolinco | w:Mundolinco | art-mun {{conl:art-mun}} |
|
| Na'vi | w:Na'vi language | art-nav {{conl:art-nav}} |
|
| Neo | w:Neo (constructed language) | neu {{conl:neu}} |
|
| Noxilo | w:Noxilo | art-nox {{conl:art-nox}} |
|
| Quenya | w:Quenya | qya {{conl:qya}} |
|
| Sindarin | w:Sindarin | sjn {{conl:sjn}} |
|
| Toki Pona | w:Toki Pona | art-top {{conl:art-top}} |
Languages' family and script information [edit]
Wiktionary sorts languages into families. Most families are related through descent from a common ancestor, but a few are merely categories, such as "creoles and pidgins". Wiktionary records which family a language belongs to on a subpage of the language's template, /family. Each family is represented by a code; the family codes are explained in Wiktionary:Families.
- English belongs to the family of West Germanic languages; this information is recorded in Template:en/family. German is also a West Germanic language, as recorded in Template:de/family. Serbo-Croatian is a South Slavic language, as recorded in Template:sh/family. Abenaki is an Algonquian language, as recorded in Template:abe/family. Nahuatl is a Nahuan language, as recorded in Template:nah/family.
- The widely-used constructed language Esperanto has its membership in the category "Artificial languages" recorded in Template:eo/family.
- Zamboanga Chavacano has its membership in the category "Creole or pidgin languages" recorded in Template:cbk-zam/family.
- Wiktionary even records information about appendix-only constructed languages in this way: Láadan has its membership in the category "Artificial languages" recorded in Template:conl:ldn/family.
Wiktionary records which script(s) a language uses on another subpage of the language's template, /script. Each script is represented by a code, which is stored in the Template: namespace without any prefix. The script codes are explained in Wiktionary:Scripts.
- English is written in the Latin script; this is recorded in Template:en/script. Esperanto is written in the Latin script; this is recorded in Template:eo/script.
- Serbo-Croatian is written in both the Latin and the Cyrillic scripts; this is recorded in Template:sh/script.
- Wiktionary even records information about appendix-only constructed languages in this way: the information that Láadan is written in the Latin script is recorded in Template:conl:ldn/script.