Wiktionary talk:Language treatment

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

moving dialect codes to the etyl namespace[edit]

For cases where we consider the macrolanguage to be the individual language and the subdivisions to be dialects, I think we should move the subdivision language code templates to the etyl: namespace. Similar to language families & other dialects, this is where we house codes that should only be used in Etymologies and not as valid L2 languages. Sound fine? --Bequw¢τ 21:59, 20 January 2010 (UTC)

Treatment by SIL[edit]

I thought it may be interesting to post what SIL's (the Registration Authority for ISO 639-3) criteria are for determining if language varieties are dialects or distinct languages. It can be found on their Change Request Form (page 3).

For this part of ISO 639, judgments regarding when two varieties are considered to be the same or different languages are based on a number of factors, including linguistic similarity, intelligibility, a common literature (traditional or written), a common writing system, the views of users concerning the relationship between language and identity, and other factors. The following basic criteria are followed:
  • Two related varieties are normally considered varieties of the same language if users of each variety have inherent understanding of the other variety (that is, can understand based on knowledge of their own variety without needing to learn the other variety) at a functional level.
  • Where intelligibility between varieties is marginal, the existence of a common literature or of a common ethnolinguistic identity with a central variety that both understand can be strong indicators that they should nevertheless be considered varieties of the same language.
  • Where there is enough intelligibility between varieties to enable communication, the existence of well-established distinct ethnolinguistic identities can be a strong indicator that they should nevertheless be considered to be different languages

We are of course independent of these, but they may be useful nonetheless. --Bequw¢τ 21:58, 21 January 2010 (UTC)

Allowing macro and non-standard dialects[edit]

Sometimes (as with Latvian and Estonian) we treat the subdivisions of a macrolanguage as individual languages, but we use the macrolanguage name/code in place of the "standard" dialect name/code. I just added this option to the table. Are there other macrolanguages where this is the case (possibly Arabic and Malay)? --Bequw¢τ 17:09, 23 January 2010 (UTC)

Aramaic[edit]

Apparently some have been treating "Jewish Babylonian Aramaic" (aka "Talmudic Aramaic", code=tmr) as a variety of Aramaic. Does anyone know if this is standard, or if this is true of other ISO 639-3 coded Aramaic varieties? --Bequwτ 18:34, 8 February 2010 (UTC)

"Apparently" should now link to an archive.​—msh210 18:52, 15 February 2010 (UTC)

Chinese[edit]

I'd like to update the Chinese entry. Is there any way to just write in plain English, without passing through a template? Mglovesfun (talk) 16:41, 25 March 2010 (UTC)

Use the templates, please; because they standardize the possible texts, and standardization is good. Another way to contribute to the page is typing here what you need, so I may update the table. --Daniel. 14:15, 22 April 2010 (UTC)
I've changed the table to a regular wikitable so that anyone can edit it and so that it can handle more complex situations and the presence of deleted codes. Cheers, - -sche (discuss) 21:05, 23 May 2013 (UTC)

Aramaic redux[edit]

Because at least one RFM is ongoing(?), I'll list this here rather than on the main page: oar (Old Aramaic, up to 700 BCE) is not used, as it has been superseded by arc and syc. tmr (Jewish Babylonian Aramaic, circa 200-1200 CE) is not used, as it has been superseded by arc and etyl:tmr. - -sche (discuss) 00:11, 16 July 2013 (UTC)

Montagnais/Innu[edit]

Currently, some main-namespace pages use Montagnais/Innu's language code (probably mostly in translations tables) while a few use other Cree dialects' language codes. Innu is different enough from Cree that Innu is regularly considered side-by-side with (rather than subordinated under) Cree; e.g. the Linguistic Atlas of Canada speaks of "different Cree and Innu dialects". OTOH, they're not that different, and splitting them at the L2 level would raise questions of what to do with e.g. Naskapi. I'm curious whether we should (a) allow Innu its own L2, (b) merge it completely into Cree, or (c) leave it subordinated under / merge into cr at the L2 level, but let it keep its code (it currently still has one, as no-one ever deleted it) so that it can be used in translations tables (like the Romani lects' codes). The translations could be nested under Cree/cr, or could be separate, sorted under M or I depending on which name we end up using for the lect. - -sche (discuss) 22:26, 20 July 2013 (UTC)

East Frisian: frs, stq[edit]

RFM 1[edit]

AS-rondo-icon.svg

The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Template:frs - Template:stq

This is an old, old mistake in ISO. Both codes refer to the very same language, namely the Frisian dialect spoken in Saterland, which is an Eastern Frisian dialect. I have no idea how that was overlooked, but it means the two codes should be merged somehow. I'd prefer {{frs}}, since that one is in 639-2. -- Liliana 14:24, 17 October 2011 (UTC)

Should the language name be "East Frisian" or "Saterland Frisian"? I'd prefer to use the code "frs", but the name "Saterland Frisian". - -sche (discuss) 19:08, 17 October 2011 (UTC)
To me it seems like Saterland Frisian is the most common name, so we should probably use that. -- Liliana 19:14, 17 October 2011 (UTC)
Alright, frs = Saterland Frisian it is, but {{stq}} is in fact widely used — someone will need to replace it by bot. - -sche (discuss) 23:50, 23 October 2011 (UTC)
Or a really bored person like me needs to spend an hour or two. -- Liliana 00:12, 24 October 2011 (UTC)
But what about etymologies involving Eastern Frisian, at the time it still existed? With no code, how should they be entered? Or even, how should the etymologies that already exist be fixed? —CodeCat 11:23, 24 October 2011 (UTC)
If it warrants a distinction, it should get one of these constructed codes. It isn't covered by the code frs anyway, which ISO classifies as a "living" language, not an extinct one. -- Liliana 12:13, 24 October 2011 (UTC)
East Frisian isn't really extinct strictly, but the only surviving instance of it is now called Saterland Frisian. —CodeCat 16:11, 24 October 2011 (UTC)
This doesn't explain why ISO assigned two codes to one language. We do not have that for any other language of the world. Using frs for a different language than what ISO intended would make a precedent case, and almost certainly require a vote.
Another problem is that the current name "East Frisian" is really confusing, since there's an (unrelated) Low German dialect which is also called East Frisian. So in any case, you would have to sort out the erroneous uses. -- Liliana 16:15, 24 October 2011 (UTC)
I agree with Liliana, we need a separate code of our own for non-Saterland varieties of East Frisian (or we need to clearly indicate that we are using "frs" to refer to a language other than the one the ISO refers to as "frs"). If a word is derived from a variety of East Frisian other than the one the ISO calls "stq", it cannot be derived from what the ISO calls "frs", because "frs" is living, and the only living East Frisian lect is "stq". - -sche (discuss) 00:34, 26 October 2011 (UTC)


Proposed additions / clarifications[edit]

These are all from translation tables, which I will edit to reflect consensus for any of these cases:

  • Macro languages:
  • Chinese: dng, ltc, och
  • Sorbian: dsb, hsb
  • Apache: apw, apm, apj, apl, apk
  • Sami: smn, smj, sms, sma, se
  • Frisian: fy, ofs, frr
  • Berber: shi
  • Marquesan: mrq, mrm
  • Dialects / script group:
  • sq: als does not exist any more, change to just Tosk
  • cop: Bohairic, Sahidic, Fayyumic
  • lt: Aukštaitian
  • ms: Rumi, Jawa
  • sc: Nugorese
  • tly: Asalemi, Anbarani, Masali
  • sh: Cyrillic, Roman, Arebica, Latin
  • arc: Hebrew, Syriac
  • ks: Arabic, Devanagari
  • cu: Cyrillic, Glagolitic
  • ro: mo no longer exists; Latin, Cyrillic
  • os: Digor, Iron
  • kea: Badiu, São Vicente, ALUPEC, Sotavento, Barlavento, Santo Antão
  • az: Cyrillic, Roman, Perso-Arabic, Arabic, Persic
  • avd: Vidari
  • egy: Archaic Egyptian, Old Egyptian, Middle Egyptian, Late Egyptian
  • tt: Cyrillic, Roman
  • lad: Roman, Hebrew, Latin
  • pa: Gurmukhi, Shahmukhi (has its own code?)
  • nso: Sepedi
  • vot: Roman, Cyrillic
  • rom: table says that rmc, rmf, rml, rmn, rmo, rmw, rmy are deprecated but they still exist in the languages module
  • kw: Kernewek Kemmyn
  • be: Cyrillic, Roman, Narkamaŭka, Taraškievica, Tarashkevitsa
  • tg: Cyrillic, Persic, Roman
  • ug: Persic, Roman, Cyrillic, Perso-Arabic
  • uz: Cyrillic, Roman, table says that uzn and uzs are deprecated but they still exist in the languages module
  • zza: Persic, Roman
  • ko: South, North
  • fia: Fadicca, Kenzi
  • cr: some codes are deprecated but still in languages module
  • lmo: Eastern, Western, Milanese
  • ms: Rumi, Jawi, Latin, Arabic
  • la: New Latin
  • pi: Burmese, Devanagari, Latin
  • Other:
  • ar: xaa, mey
  • fr: frm, fro
  • de: ksh, gsw
  • nds: deprecated but still in languages module, add nds-de, nds-nl
  • mn: cmg
  • es: osp
  • hy: xcl
  • pnb: pa
  • id: ace, ban, bjn, bug, jv, mad, mak, min, nia, sas, su
  • ga: sga, pgl
  • fy: stq
  • arc: syc
  • tt: crh
  • ko: oko, okm
  • rom: rmq
  • pl: zlw-opl

I apologize if this is in an inconvenient format- rearrange it as you like. DTLHS (talk) 00:44, 20 August 2013 (UTC)

Nice. Some additional things that I noticed after a quick read: okm should be under ko, pgl should be under ga, zlw-opl should be under pl, there are tons of missing Arabic sublects that should be under ar, and grc (and possibly some other lects) should be under el. —Μετάknowledgediscuss/deeds 02:40, 20 August 2013 (UTC)
grc is already under el on the page. What Arabic sublects aren't in my list or the existing table? DTLHS (talk) 03:08, 20 August 2013 (UTC)
Never mind. Only mt, which shouldn't be under ar anyway (well, linguistically it should, but not sociopolitically). —Μετάknowledgediscuss/deeds 04:04, 20 August 2013 (UTC)

Use title text for the language names?[edit]

A lot of the language codes in the table don't have a name next to them, but if we added the name it would become very hard to see. Would it be useful to turn it into title text, so that the name is shown when you over the mouse over the code? —CodeCat 19:36, 25 August 2013 (UTC)

Hmm. One downside to that is that it would no longer be possible (would it?) to hit Ctrl+F and search the page for a particular dialect's name. Given that one of the reasons this page exists is so that people can see if the reason we don't have a code is because we've merged it into something else (vs we just haven't added it yet), that's a significant downside. - -sche (discuss) 05:15, 26 February 2014 (UTC)

RFC discussion: May 2013[edit]

TK archive icon.svg

The following discussion has been moved from Wiktionary:Requests for cleanup.

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.


Wiktionary:Language treatment

This is causing some script errors because some of the codes have since been deleted. I'm not sure what to do about that. —CodeCat 13:34, 23 May 2013 (UTC)

It needs to be redesigned so that the table can contain/mention codes that have been deleted, for the reason you mention and several other reasons. - -sche (discuss) 19:03, 23 May 2013 (UTC)
I've started redoing the table. - -sche (discuss) 19:30, 23 May 2013 (UTC)


List of codes the ISO has retired[edit]

This was previously at User:-sche/retired codes, but I think it is useful to have it in the Wiktionary: namespace. - -sche (discuss) 23:53, 21 December 2014 (UTC)

Retired codes which were not used on Wiktionary in February 2014[edit]

Codes which were retired from the ISO and which were not used on Wiktionary as of February of 2014. (Since then, several other codes which were retired from the ISO by that date have also been retired on Wiktionary; see the following sections.)

Retired codes which have been discussed since February 2014[edit]

Please see Wiktionary:Language treatment/Discussions#Various_code_retirements.2C_part_one (Wiktionary:Beer parlour/2014/February#Codes_the_ISO_has_split_or_merged_.28first_batch.29) and Wiktionary:Language treatment/Discussions#Various_code_retirements.2C_part_two (Wiktionary:Beer parlour/2014/March#Codes_the_ISO_has_split_or_merged_.28second_batch.29).

Retired codes which are still used on Wiktionary[edit]

Some codes which were retired from the ISO but which are still used on Wiktionary. (This list is not necessarily comprehensive.) Some codes in the list have been discussed, and these have been intentionally retained: sh "Serbo-Croatian", gio "Gelao", kzh "Dongolawi" / "Kenuzi-Dongola", mnt "Maykulan". Meanwhile, these have not yet been discussed.

List of ISO 639 codes absent from Wiktionary[edit]

Most of the 7865 codes present in ISO 639 are present on Wiktionary; most of those which are not are recorded on WT:LT. The only ones which have slipped between those two cracks are these, which should be investigated and discussed in the coming weeks. In many cases, the exclusion is likely nothing more than an oversight; in some cases, it's clearly because a naming conflict prevented importation of the codes back when Wiktionary bot-imported ISO 639 en masse (something we can now solve with disambiguators):

  1. cek — Eastern Khumi Chin - a dialect of cnk (Khumi Chin)
  2. dda — Dadi Dadi
  3. dgw — Daungwurrung
  4. dja — Djadjawurrung
  5. deq — Dendi (Central African Republic) - presumably failed to be included because of the naming conflict with ddn — Dendi (Benin)
  6. dmd — Madhi Madhi (Muthimuthi)
  7. dth — Adithinngithigh - compare rrt, which is said to be a different language
  8. dty — Dotyali
  9. gku — ǂUngkue
  10. gll — Garlali
  11. gpe — Ghanaian Pidgin English - probably to be combined with other African Pidgin English (see RFM)
  12. gwm — Awngthim
  13. gmz - Mgbo
  14. hna — Mina (Cameroon) - presumably failed to be included because of the naming conflict with myi Mina (India), which, however, is spurious
  15. ihw — Bidhawal - a dialect of/with unn
  16. jan — Jandai
  17. jbi — Badjiri - possibly not even Karnic; cf my notes about ekc above and on User:-sche/retired codes
  18. jbk (Barikewa) and jmw (Mouwase) — varieties of {{mgx}} Omati/Mini, said to be quite divergent from each other: but we should either have mgx or have jbk+jmw, not all three
  19. jbw — Yawijibaya
  20. jgk — Gwak
  21. jjr — Bankal
  22. jms — Mashi (Nigeria)
  23. jog — Jogi
  24. jui — Ngadjuri
  25. kbn — Kare (Central African Republic)
  26. kmf — Kare (Papua New Guinea)
  27. kol — Kol (Papua New Guinea)
  28. myi — Mina (India) (see hna)
  29. nmx — Nama (Papua New Guinea)
  30. npg — Ponyo-Gongwang Naga
  31. nqy — Akyaung Ari Naga
  32. nsf — Northwestern Nisu
  33. ntx — Tangkhul Naga (Myanmar)
  34. nwg — Ngayawung
  35. nxk — Koki Naga
  36. oke — Okpe (Southwestern Edo)
  37. okx — Okpe (Northwestern Edo)
  38. olk — Olkol
  39. orc — Orma
  40. pnl — Paleni
  41. ptq — Pattapu
  42. sfe — Eastern Subanen
  43. sgj — Surgujia - Suraji, Surguja, Surgujia-Chhattisgarhi, Surjugia
  44. sim — Mende (Papua New Guinea)
  45. sng — Sanga (Democratic Republic of Congo)
  46. sox — Swo
  47. spb — Sepa (Indonesia)
  48. tcl — Taman (Myanmar) - (extinct)
  49. tgj — Tagin
  50. tgz — Tagalaka - (extinct)
  51. tjl — Tai Laing
  52. tmn — Taman (Indonesia)
  53. tnz — Tonga (Thailand)
  54. tst — Tondi Songway Kiini
  55. xsn — Sanga (Nigeria)
  56. xud — Umiida - (extinct)
  57. xun — Unggaranggu - (extinct)
  58. xyy — Yorta Yorta
  59. yhs — Yan-nhaŋu Sign Language - signed by 10 people, not that distinct from ygs (exclude?)
  60. ykn — Kua-nsi
  61. yku — Kuamasi
  62. ysg — Sonaga
  63. yxy — Yabula Yabula - (extinct)

(This list is complete as of August 2015, before the 2015 change requests were finalized. Notes and misc.) - -sche (discuss) 15:47, 11 August 2015 (UTC)

Codes in the above list which have been added to Module:languages or WT:LT or otherwise dealt with have been stuck. - -sche (discuss) 03:11, 21 August 2016 (UTC)

Bidhawal[edit]

The ISO added a code for Bidhawal, which we never got around to adding. That seems to be OK; Robert M. W. Dixon says in Australian Languages: Their Nature and Development (2002, ISBN 0521473780) that "Bidhawal appears not to constitute a separate language, but rather to be the most eastern dialect of Q, Muk-thang (or Kurnai). The grammatical forms given by Mathews for Bidhawal are almost identical to those for Muk-thang, as are most of the verbs and a good proportion of nouns." - -sche (discuss) 03:02, 21 August 2016 (UTC)

Treatment of reconstructed languages?[edit]

We merged Proto-Finno-Ugric and Proto-Finno-Permic into Proto-Uralic, and Proto-Baltic into Proto-Balto-Slavic. The original languages remain as etymology codes. Should this be mentioned here? —CodeCat 18:48, 21 August 2015 (UTC)

Sure. Maybe in a separate table, though? Since those aren't cases where we deprecated, split, or broadened an ISO code, but rather cases where we assigned a code of our own devising and then went "wait, on second thought, nah". - -sche (discuss) 19:10, 21 August 2015 (UTC)

Akan and its subdivisions[edit]

As for Akan we can currently find that both the macrolanguage and its subdivisons are treated as languages though Category:Fanti language and Category:Twi language were merged previously. It seems that we have to modify the description. How's that? --Eryk Kij (talk) 22:53, 26 May 2016 (UTC)

Like so; thanks for pointing out that this page still needed to be updated. - -sche (discuss) 23:21, 26 May 2016 (UTC)