Wiktionary talk:Language treatment

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

moving dialect codes to the etyl namespace[edit]

For cases where we consider the macrolanguage to be the individual language and the subdivisions to be dialects, I think we should move the subdivision language code templates to the etyl: namespace. Similar to language families & other dialects, this is where we house codes that should only be used in Etymologies and not as valid L2 languages. Sound fine? --Bequw¢τ 21:59, 20 January 2010 (UTC)

Treatment by SIL[edit]

I thought it may be interesting to post what SIL's (the Registration Authority for ISO 639-3) criteria are for determining if language varieties are dialects or distinct languages. It can be found on their Change Request Form (page 3).

For this part of ISO 639, judgments regarding when two varieties are considered to be the same or different languages are based on a number of factors, including linguistic similarity, intelligibility, a common literature (traditional or written), a common writing system, the views of users concerning the relationship between language and identity, and other factors. The following basic criteria are followed:
  • Two related varieties are normally considered varieties of the same language if users of each variety have inherent understanding of the other variety (that is, can understand based on knowledge of their own variety without needing to learn the other variety) at a functional level.
  • Where intelligibility between varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central variety that both understand can be strong indicators that they should nevertheless be considered varieties of the same language.
  • Where there is enough intelligibility between varieties to enable communication, the existence of well-established distinct ethnolinguistic identities can be a strong indicator that they should nevertheless be considered to be different languages

We are of course independent of these, but they may be useful nonetheless. --Bequw¢τ 21:58, 21 January 2010 (UTC)

Allowing macro and non-standard dialects[edit]

Sometimes (as with Latvian and Estonian) we treat the subdivisions of a macrolanguage as individual languages, but we use the macrolanguage name/code in place of the "standard" dialect name/code. I just added this option to the table. Are there other macrolanguages where this is the case (possibly Arabic and Malay)? --Bequw¢τ 17:09, 23 January 2010 (UTC)


Apparently some have been treating "Jewish Babylonian Aramaic" (aka "Talmudic Aramaic", code=tmr) as a variety of Aramaic. Does anyone know if this is standard, or if this is true of other ISO 639-3 coded Aramaic varieties? --Bequwτ 18:34, 8 February 2010 (UTC)

"Apparently" should now link to an archive.​—msh210 18:52, 15 February 2010 (UTC)


I'd like to update the Chinese entry. Is there any way to just write in plain English, without passing through a template? Mglovesfun (talk) 16:41, 25 March 2010 (UTC)

Use the templates, please; because they standardize the possible texts, and standardization is good. Another way to contribute to the page is typing here what you need, so I may update the table. --Daniel. 14:15, 22 April 2010 (UTC)
I've changed the table to a regular wikitable so that anyone can edit it and so that it can handle more complex situations and the presence of deleted codes. Cheers, - -sche (discuss) 21:05, 23 May 2013 (UTC)

Aramaic redux[edit]

Because at least one RFM is ongoing(?), I'll list this here rather than on the main page: oar (Old Aramaic, up to 700 BCE) is not used, as it has been superseded by arc and syc. tmr (Jewish Babylonian Aramaic, circa 200-1200 CE) is not used, as it has been superseded by arc and etyl:tmr. - -sche (discuss) 00:11, 16 July 2013 (UTC)


Currently, some main-namespace pages use Montagnais/Innu's language code (probably mostly in translations tables) while a few use other Cree dialects' language codes. Innu is different enough from Cree that Innu is regularly considered side-by-side with (rather than subordinated under) Cree; e.g. the Linguistic Atlas of Canada speaks of "different Cree and Innu dialects". OTOH, they're not that different, and splitting them at the L2 level would raise questions of what to do with e.g. Naskapi. I'm curious whether we should (a) allow Innu its own L2, (b) merge it completely into Cree, or (c) leave it subordinated under / merge into cr at the L2 level, but let it keep its code (it currently still has one, as no-one ever deleted it) so that it can be used in translations tables (like the Romani lects' codes). The translations could be nested under Cree/cr, or could be separate, sorted under M or I depending on which name we end up using for the lect. - -sche (discuss) 22:26, 20 July 2013 (UTC)

East Frisian: frs, stq[edit]

RFM 1[edit]


The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, though feel free to discuss its conclusions.

Template:frs - Template:stq

This is an old, old mistake in ISO. Both codes refer to the very same language, namely the Frisian dialect spoken in Saterland, which is an Eastern Frisian dialect. I have no idea how that was overlooked, but it means the two codes should be merged somehow. I'd prefer {{frs}}, since that one is in 639-2. -- Liliana 14:24, 17 October 2011 (UTC)

Should the language name be "East Frisian" or "Saterland Frisian"? I'd prefer to use the code "frs", but the name "Saterland Frisian". - -sche (discuss) 19:08, 17 October 2011 (UTC)
To me it seems like Saterland Frisian is the most common name, so we should probably use that. -- Liliana 19:14, 17 October 2011 (UTC)
Alright, frs = Saterland Frisian it is, but {{stq}} is in fact widely used — someone will need to replace it by bot. - -sche (discuss) 23:50, 23 October 2011 (UTC)
Or a really bored person like me needs to spend an hour or two. -- Liliana 00:12, 24 October 2011 (UTC)
But what about etymologies involving Eastern Frisian, at the time it still existed? With no code, how should they be entered? Or even, how should the etymologies that already exist be fixed? —CodeCat 11:23, 24 October 2011 (UTC)
If it warrants a distinction, it should get one of these constructed codes. It isn't covered by the code frs anyway, which ISO classifies as a "living" language, not an extinct one. -- Liliana 12:13, 24 October 2011 (UTC)
East Frisian isn't really extinct strictly, but the only surviving instance of it is now called Saterland Frisian. —CodeCat 16:11, 24 October 2011 (UTC)
This doesn't explain why ISO assigned two codes to one language. We do not have that for any other language of the world. Using frs for a different language than what ISO intended would make a precedent case, and almost certainly require a vote.
Another problem is that the current name "East Frisian" is really confusing, since there's an (unrelated) Low German dialect which is also called East Frisian. So in any case, you would have to sort out the erroneous uses. -- Liliana 16:15, 24 October 2011 (UTC)
I agree with Liliana, we need a separate code of our own for non-Saterland varieties of East Frisian (or we need to clearly indicate that we are using "frs" to refer to a language other than the one the ISO refers to as "frs"). If a word is derived from a variety of East Frisian other than the one the ISO calls "stq", it cannot be derived from what the ISO calls "frs", because "frs" is living, and the only living East Frisian lect is "stq". - -sche (discuss) 00:34, 26 October 2011 (UTC)

Proposed additions / clarifications[edit]

These are all from translation tables, which I will edit to reflect consensus for any of these cases:

  • Macro languages:
  • Chinese: dng, ltc, och
  • Sorbian: dsb, hsb
  • Apache: apw, apm, apj, apl, apk
  • Sami: smn, smj, sms, sma, se
  • Frisian: fy, ofs, frr
  • Berber: shi
  • Marquesan: mrq, mrm
  • Dialects / script group:
  • sq: als does not exist any more, change to just Tosk
  • cop: Bohairic, Sahidic, Fayyumic
  • lt: Aukštaitian
  • ms: Rumi, Jawa
  • sc: Nugorese
  • tly: Asalemi, Anbarani, Masali
  • sh: Cyrillic, Roman, Arebica, Latin
  • arc: Hebrew, Syriac
  • ks: Arabic, Devanagari
  • cu: Cyrillic, Glagolitic
  • ro: mo no longer exists; Latin, Cyrillic
  • os: Digor, Iron
  • kea: Badiu, São Vicente, ALUPEC, Sotavento, Barlavento, Santo Antão
  • az: Cyrillic, Roman, Perso-Arabic, Arabic, Persic
  • avd: Vidari
  • egy: Archaic Egyptian, Old Egyptian, Middle Egyptian, Late Egyptian
  • tt: Cyrillic, Roman
  • lad: Roman, Hebrew, Latin
  • pa: Gurmukhi, Shahmukhi (has its own code?)
  • nso: Sepedi
  • vot: Roman, Cyrillic
  • rom: table says that rmc, rmf, rml, rmn, rmo, rmw, rmy are deprecated but they still exist in the languages module
  • kw: Kernewek Kemmyn
  • be: Cyrillic, Roman, Narkamaŭka, Taraškievica, Tarashkevitsa
  • tg: Cyrillic, Persic, Roman
  • ug: Persic, Roman, Cyrillic, Perso-Arabic
  • uz: Cyrillic, Roman, table says that uzn and uzs are deprecated but they still exist in the languages module
  • zza: Persic, Roman
  • ko: South, North
  • fia: Fadicca, Kenzi
  • cr: some codes are deprecated but still in languages module
  • lmo: Eastern, Western, Milanese
  • ms: Rumi, Jawi, Latin, Arabic
  • la: New Latin
  • pi: Burmese, Devanagari, Latin
  • Other:
  • ar: xaa, mey
  • fr: frm, fro
  • de: ksh, gsw
  • nds: deprecated but still in languages module, add nds-de, nds-nl
  • mn: cmg
  • es: osp
  • hy: xcl
  • pnb: pa
  • id: ace, ban, bjn, bug, jv, mad, mak, min, nia, sas, su
  • ga: sga, pgl
  • fy: stq
  • arc: syc
  • tt: crh
  • ko: oko, okm
  • rom: rmq
  • pl: zlw-opl

I apologize if this is in an inconvenient format- rearrange it as you like. DTLHS (talk) 00:44, 20 August 2013 (UTC)

Nice. Some additional things that I noticed after a quick read: okm should be under ko, pgl should be under ga, zlw-opl should be under pl, there are tons of missing Arabic sublects that should be under ar, and grc (and possibly some other lects) should be under el. —Μετάknowledgediscuss/deeds 02:40, 20 August 2013 (UTC)
grc is already under el on the page. What Arabic sublects aren't in my list or the existing table? DTLHS (talk) 03:08, 20 August 2013 (UTC)
Never mind. Only mt, which shouldn't be under ar anyway (well, linguistically it should, but not sociopolitically). —Μετάknowledgediscuss/deeds 04:04, 20 August 2013 (UTC)

Use title text for the language names?[edit]

A lot of the language codes in the table don't have a name next to them, but if we added the name it would become very hard to see. Would it be useful to turn it into title text, so that the name is shown when you over the mouse over the code? —CodeCat 19:36, 25 August 2013 (UTC)

Hmm. One downside to that is that it would no longer be possible (would it?) to hit Ctrl+F and search the page for a particular dialect's name. Given that one of the reasons this page exists is so that people can see if the reason we don't have a code is because we've merged it into something else (vs we just haven't added it yet), that's a significant downside. - -sche (discuss) 05:15, 26 February 2014 (UTC)

RFC discussion: May 2013[edit]

TK archive icon.svg

The following discussion has been moved from Wiktionary:Requests for cleanup.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, though feel free to discuss its conclusions.

Wiktionary:Language treatment

This is causing some script errors because some of the codes have since been deleted. I'm not sure what to do about that. —CodeCat 13:34, 23 May 2013 (UTC)

It needs to be redesigned so that the table can contain/mention codes that have been deleted, for the reason you mention and several other reasons. - -sche (discuss) 19:03, 23 May 2013 (UTC)
I've started redoing the table. - -sche (discuss) 19:30, 23 May 2013 (UTC)

List of codes the ISO has retired[edit]

This was previously at User:-sche/retired codes, but I think it is useful to have it in the Wiktionary: namespace. - -sche (discuss) 23:53, 21 December 2014 (UTC)

Retired codes which were not used on Wiktionary in February 2014[edit]

This is a list of codes which were retired from the ISO and which were not used on Wiktionary as of February of 2014. Since that, several other codes which were retired from the ISO by that date have also been retired on Wiktionary; see the following sections.

Retired codes which have been discussed since February 2014[edit]

Please see Wiktionary:Language treatment/Discussions#Various_code_retirements.2C_part_one (Wiktionary:Beer parlour/2014/February#Codes_the_ISO_has_split_or_merged_.28first_batch.29) and Wiktionary:Language treatment/Discussions#Various_code_retirements.2C_part_two (Wiktionary:Beer parlour/2014/March#Codes_the_ISO_has_split_or_merged_.28second_batch.29).

Retired codes which are still used on Wiktionary[edit]

This is a list of codes which were retired from the ISO but which are still used on Wiktionary; it is not necessarily comprehensive. Some codes in the list have been discussed; others have not. Specifically, these have been intentionally retained: sh "Serbo-Croatian", gio "Gelao", kzh "Dongolawi" / "Kenuzi-Dongola", mnt "Maykulan".

Meanwhile, these have not yet been discussed.