Wiktionary talk:Languages without ISO codes

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Purpose of this page[edit]

I've copied this from lower down as many people seem to have overlooked it.

The point of this page is to serve four temporally separated purposes:

  1. collect a list of languages, dialects, etc. that don't have codes but which we do or might want to generate codes for so they work with templates that require them.
    Note that this is not just about etymologies, but also for every template that accepts a language code parameter.
  2. Use the list generated in phase 1 to determine which languages we do actually need to give codes to, and workout how we want to deal with those we don't give codes to
  3. Define codes to those we decide to and do whatever we decide to do with any others.
  4. Serve as the reference point for these codes.

I suspect that a change of name, possibly even a move to the Appendix namespace, will happen between stages 3 and 4.

As of 21:56, 13 August 2008 (UTC) we are currently moving from stage 1 to stage 2 although I'm not certain we are completely there yet.

Language families and proto-languages[edit]

Language families ("macrolanguages") and proto-languages appear only in etymologies, and are already gracefully handled with {proto}, familiy-specific templates or ISO 639-2/3 macrolanguage (or "other language") codes. They're not used as individual languages, hence don't appear as entries in mainspace, and there won't be any need for them in templates that support lang=.

Moreover, some of the "languages" on this list are already handled as one individual language, specifically the distinction between Old, Classical, Late, Mediaeval and New Latin. Also, disputable is the Biblical Aramaic; should it be handled under Aramaic (like Bibilical Hebrew is for Hebrew), or as a separate (so far arc=Aramaic, but is officially "Imperial/Official Aramaic", with it's own script soon coming to Unicode 5.2) --Ivan Štambuk 19:51, 7 May 2008 (UTC)

Actually they do. Arabic and Malay turn out to actually be macrolanguages so there are undoubtedly other cases too. — hippietrail 10:33, 18 June 2009 (UTC)

Some questions.[edit]

Some questions:

  1. What's the difference between "Biblical Aramaic" and "Judeo-Aramaic"?
  2. Why is American English listed here? Is there a problem with using en-US?
  3. Do we really want a language code for "African languages"? If we can't narrow it down past that, I think the appropriate template is {{rfe}}, not {{etyl}}.
  4. Might it help clarify the issues if, instead of listing these in alphabetical order, we grouped them by what they are? There seem to be three main kinds of items listed here — macrolanguages, proto-languages, and sub-languages — but with the current ordering it's hard to tell for sure.

RuakhTALK 00:08, 14 May 2008 (UTC)

Well, w:Judeo-Aramaic language and w:Biblical Aramaic, quoting "The early witness to this period of change is the Biblical Aramaic of the books of Daniel and Ezra. This language shows a number of Hebrew features have been taken into Jewish Aramaic: the letter He is often used instead of Aleph to mark a word-final long a vowel and the prefix of the causative verbal stem, and the masculine plural ending -īm often replaces -īn."
Apparently all the "dialects" of collective term "Judeo-Aramaic" have their own ISO codes, so this would be redundant.
We should discuss addition of new entries to this table before their are added. Granting a pseudo-ISO code is not such a trivial thing. If somebody some day decides to add Biblical Aramaic words that that I can imagine are not attested anywhere else, I can imagine that he could present his arguments clearly and concisely enough why it shouldn't be treated as a dialect/subproject of XYZ Aramaic. We shouldn't even be discussing addition of languages that don't even have their own category and no one's interested in adding words in them ^_^.
The question of Aramaic is really tricky though - similar to Arabic (really a macrolanguage, but used of MSA around here). I asked some time ago User:334a what dialect he is adding, and he explained that he's kind of adding dialect-neutral versions/pronunciations. Moreover, -3 code 'arc' that is used here just for "Aramaic" refers to something called Imperial/Official Aramaic, which is apparently dated to 700-300 BCE ^_^. Also, Unicode 5.2 will support Imperial Aramaic script! (final-stage proposal here). Semitic etymological dictionary by Militarev differs Aramaic/Biblical Aramaic/Judaic Aramaic/Syrian Aramaic/Modern Aramaic/Mandaic Aramaic, which seems to me much more appropriate than to use generic term "Aramaic" to refer to a "one language" with ~3100 years of written history. Granted, Aramaic today means "Modern Aramaic", just like Persian=New Persian (Tehrani dialect), but the other stages should be separated.
As I stated in the above section, this does not concern etymologies only. Macrolanguages, proto-languages and "sub-languages" (en-US, en-UK) don't get listed in translations tables and don't have NS:0 entries (the latter two are just under ==English==); referring to them in etymology should be handled with appropriately specialized etymology templates. This table should primarily be used to overcome deficiencies in SIL's ISO code assignment scheme, which is sometimes not adequate (e.g. they give separate codes for a plethora of languages with 99% mutual intelligibility, but treat e.g. "Crimean Gothic" as a "Gothic dialect") --Ivan Štambuk 01:16, 14 May 2008 (UTC)
Most of the entries in this the were copied from WT:ETY/TEMP on the basis that these do not work with the {{etyl}} template, which AIUI was intended to deprecate the individual etymology templates (e.g. {{F.}}, {{OE.}}, etc).
My idea for the this page was for it to serve four temporally separated purposes:
  1. collect a list of languages, dialects, etc. that don't have codes but which we do or might want to generate codes for so they work with templates that require them.
  2. Use the list generated in phase 1 to determine which languages we do actually need to give codes to, and workout how we want to deal with those we don't give codes to
  3. Define codes to those we decide to and do whatever we decide to do with any others.
  4. Serve as the reference point for these codes.
I suspect that a change of name, possibly even a move to the Appendix namespace, will happen between stages 3 and 4.
We are currently, obviously, still in stage 1. If you think that the table can be better organised to help with this then go for it - it's a wiki, I do not own this page. Thryduulf 01:36, 14 May 2008 (UTC)
I agree that it doesn't concern etymologies only, but just because a given macrolanguage/dialect/whatever is only mentioned in etymologies, that doesn't mean we don't want a code for it. True, we could just give it a specialized etymology template; but also, we could just use {{etyl}} with a Wiktionary-specific ISO-style language code. —RuakhTALK 01:40, 14 May 2008 (UTC)
Took a stab at splitting up the table. Hopefully now we can converse better about each section. And I added some suggestions from others and my own. --Bequw¢τ 09:21, 26 July 2008 (UTC)

Merger[edit]

Should this page be merged with Wiktionary:Language code extensions? They have codes assigned by the IETF or WMF. Some of those now have ISO codes but most don't. --Bequw¢τ 02:51, 26 July 2008 (UTC)

It seems reasonable to me. I move that Language code extensions be moved here, and left as a redirect. -Atelaes λάλει ἐμοί 03:43, 26 July 2008 (UTC)
It could easily be transcluded, but I think a separate page is better in this particular case. --EncycloPetey 03:44, 26 July 2008 (UTC)
Doesn't the other page, though, have exactly "Languages without ISO codes" (at least at one point in time). Just because we didn't give the codes doesn't mean it's not relevant here. Having the pages defined so narrowly that they couldn't be merged would seem a bit over-kill. --Bequw¢τ 09:17, 26 July 2008 (UTC)
I wasn't aware of that page until now, but I think that it could very easily be merged into this one. The "code" and "template" columns there would be merged into the "Wiktionary code" column here. If nobody objects in the next few days I'll do just that. Thryduulf 10:56, 27 July 2008 (UTC)
I object, very strongly. Wiktionary:Language code extensions is current policy. If you want to replace this page with that, and move all the stuff presently on this page here to Talk, okay. But we need a list of settled, adopted codes on a policy page, not combined with lots of things that will never be language template codes (e.g. Proto languages, prohibited by CFI, etc.). —This unsigned comment was added by Robert Ullmann (talkcontribs) at 20:24, 27 July 2008 (UTC).
That's a very good point. So this would be the "Think Thank" and standardized items would get pushed out the other page as policy. Works for me. --Bequw¢τ 10:37, 18 August 2008 (UTC)

ISO 639-5[edit]

So, ISO 639-5 was released May, 2008 and it deals with language families (it uses the same "pool" of 3-letter codes as 639-2/3). It is disjoint from 639-3 but a superset of 639-2 "collective" codes. I think we should consider using these codes for our etymologies. With it I think we can clear up the whole "Collective/Macro-language or family" table. Specifically, we'd be able to:

  • Remove {{African.}}. It's too broad (I've heard there's more variation in Africa than outside it). The only entry that uses it cola appears to be descended from the w:Niger-Congo branch, which has the 639-5 code nic.
  • alg, aus, ber were 639-2 Collective codes and so are now 639-5 codes. I think we can use those for the 3 entries in the table as they match up perfectly.
  • Remove {{AmInd.}}. Not used, and we have at our disposal North/Central/South American Indian groups (nai/cai/sai).
  • Remove {{BFinn.}}. Not used. We also have it's ancestor Finno-Ugric languages coded as fiu.
  • Brythonic. Used on ~10 entries. Could be converted to Brythonic {{proto|Celtic|xxx}}. The ancestor family Celtic is cel (639-2 & 5).
  • Common Turkic was already cleared out to use {{proto}}
  • Judeo-Aramaic. Used on ~4 entries. Not sure, but we could use arc for Official Aramaic (700-300 BCE) which is a 639-3 code. Though from above Ivan seems to have a better handle on this.
  • Turkic is coded as trk in 639-5
  • Indo-Iranian is coded as iir in 639-5
  • Uralic is coded as urj in 639-5

I'd imagine if we created these code templates, we'd leave them in the same pool as the other 3-letter codes, though we could give them a prefix:. Thoughts?

I agree that these could be useful. However, I feel very strongly that they must be prefixed. I was thinking perhaps {{macro:nic}}, etc. This prevents a lot of templates which shouldn't use them (e.g. {{term}}, {{infl}}, {{etc.}}) from using them. Obviously, someone could enter {{term|lang=macro:nic}}, but at least it couldn't happen accidentally. This would also prevent confusion by bots which collect data from the language templates, such as Robert's bots. We could then decide on a case by case basis which templates should be allowed to use them (e.g. {{etyl}}?, {{proto}}?). But yeah, I think these would be useful. I do think that we really ought to have a BP convo on this before implementing them, however. -Atelaes λάλει ἐμοί 05:51, 27 January 2009 (UTC)

Ohlone[edit]

There's some concern about how to handle the Ohlone languages. ISO 639-3 breaks it up into just North and South. Goldenrowley, who has entered most of the entries, says we should have a separate code/category/etc for each one, which is how Ohlone deals with it. Going that route, we'd have to come up with our own codes. Any thoughts? --Bequw¢τ 22:58, 9 February 2009 (UTC)

Although the Wikipedia article divides the group up by regional tribes, it also refers to their speech as "dialects" in most cases rather than as separate languages. The "languages" are not given separate pages, either. The Ohlone article notes: "Neighboring divisions however could understand and speak to each other, only having colloquial differences." It may be that the regional "dialects" deserve separate recognition, but neither Wikipedia nor Ethnologue has done so. All the Costanoan (Ohlone) language group is considered extinct at this point, although there are attempts to revive it from records. --EncycloPetey 19:54, 21 February 2009 (UTC)

Vulgar/Late Latin[edit]

Going by the info from w:Vulgar Latin, when people use {{VL.}} do they mean "Vulgar Latin" as synonymous to "Late Latin" or do they mean vernacular Latin speech from any period? If it's the former, should we merge the categories? If its the latter, why would we categorize by that instead of by the time period (assuming that information is available)? Am I the only one confused by this? --Bequw¢τ 04:11, 16 February 2009 (UTC)

Sorry, I can't answer any of your questions except the last one: No, you're not the only one confused by this. —RuakhTALK 01:34, 18 February 2009 (UTC)
Apparently both terms can mean different things depending on the context. Perhaps the distinction could be among the vernacular ("vulgar") Latin words in the Classical period, post-Classical Proto-Romance dialects and those rare glosses of early Romance words attested when one can no longer speak of Proto-Romance bet neither of individual Romance languages, but I think it would be best to keep them on in one place when the etymologies and categorisations are concerned, using the term Vulgar Latin in the broadest possible sense (deviation from Classical Latin in word structure, inflection and pronunciation). --Ivan Štambuk 02:14, 18 February 2009 (UTC)
I can't answer for all uses, but I only use {{VL.}} when I don't know the period, but do know it isn't mainstream Classical Latin. Or, when I can only find it documented as from "Vulgar Latin" without finding information about the period. I try to avoid using it in most situations, however, because it is ambiguous. --EncycloPetey 19:47, 21 February 2009 (UTC)

Norman dialects[edit]

The Norman dialects (Jèrriais, Guérnesiais, Norman [used to refer to all continental dialects together]) are absolutely and emphatically not French dialects. If a currently existing ISO code MUST be used, "roa" would be the most applicable. Failing this, xno (the code for Anglo-Norman) makes a poor substitute, but much better than French. Jade Knight 10:35, 16 May 2009 (UTC)

brabantian?[edit]

--史凡/Sven - Pl also use MSN/skype as I suffer RSI and so cannot type very well! 07:45, 25 July 2009 (UTC)

Sydney / Dharuk[edit]

I just I'd let you all know that I filed a request for an ISO language code for this language. Oddly two of its less influential but equally extinct neighbours do have ISO language codes. When I hear back from them I'll post any URL's here so others can contribute to or observer the process.

One point I'd like to make though is that "Sydney" is not the best term to use here. Firstly it's confusing as people know the city Sydney but have never heard of the language. Secondly the dictionaries that I have seen give a specific language in their etymologies of words such as "boomerang" go with "Dharuk". I have also seen "Daruk" in at least one news article from the BBC. — hippietrail 08:05, 25 July 2009 (UTC)