Wiktionary:Votes/2012-03/CFI for Endangered Languages

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

CFI for Endangered Languages[edit]

  • Voting on: Two changes on the the criteria for inclusion page to address the difficulties in finding adequate source material to attest endangered languages:
At Wiktionary:CFI#Attestation, replace
"For terms in extinct languages: usage in at least one contemporaneous source."
"For terms in languages with sparse documentation, usage in at least one source."
Under "Languages to include," add the following section:
Languages with sparse documentation
The criterion for inclusion for languages that are extinct, endangered or without a strong written tradition is generally a contextually appropriate citation in at least one source. An endangered language is one listed by an institution such as UNESCO Interactive Atlas of the World’s Languages in Danger or the Living Tongues Institute for Endangered Languages, or a dialect of those languages.

  • Vote starts: 00:01, 30 March 2012 (UTC)
  • Vote ends: 23.59, 28 April 2012 (UTC)


  1. Symbol support vote.svg Support BenjaminBarrett12 (talk) 02:34, 30 March 2012 (UTC)
  2. Symbol support vote.svg Support --Μετάknowledgediscuss/deeds 04:20, 30 March 2012 (UTC)
  3. Symbol support vote.svg Support. This will be helpful for languages without a written tradition. This is the case of Hunsrik, which I created some entries for last year. Last time I checked, the only published material in Hunsrik were two short tales, two chapters of the Bible and a comic book (despite having, literally, millions of speakers). Ungoliant MMDCCLXIV 04:49, 30 March 2012 (UTC)
  4. Symbol support vote.svg Support -Atelaes λάλει ἐμοί 11:37, 30 March 2012 (UTC) This hasn't really come up yet, as we're really only working on "big" languages, but it will definitely come in handy further down the road. Some of the problems which Dan Polansky notes are real problems, and will have to be addressed at some point, but I still think this a change for the better.
  5. Symbol support vote.svg Support, although it will be hard to monitor what goes in. Who will check and how? I'd like to request similar treatment for languages seldom appearing on the web with scarce resources, even if they are not considered "endangered". --Anatoli (обсудить) 03:18, 4 April 2012 (UTC)


  1. Symbol oppose vote.svg Oppose EncycloPetey (talk) 02:41, 30 March 2012 (UTC) This willbe problematic if we remove the criterion for extinct languages like Bactrian, Avestan, and other extinct languages where scholarship has changed significantly over time. --EncycloPetey (talk) 02:41, 30 March 2012 (UTC)
    I don't know if this will change your mind, but this has only a small impact on extinct languages. The current criterion for extinct languages is "usage in at least one contemporaneous source" and the new language would be "usage in at least one source" and "generally a contextually appropriate citation in at least one source." BenjaminBarrett12 (talk) 03:32, 30 March 2012 (UTC)
    No, it won't. Ancient Greek is an extinct language, so the criterion currently applies to it. However, it does not have "sparse documentation", so the effect of this proposal is to eliminate an important criterion for Ancient Greek; the new version would not be applicable. --EncycloPetey (talk) 01:06, 5 April 2012 (UTC)
    That makes sense. I was wondering if Ancient Greek/Latin or something else would be adversely affected by the "sparse documentation" criterion. Is there a reason to keep the "contemporaneous" and the "usage" criteria as opposed to non-contemporaneous usage and/or mention (such as in a scholar's dissertation on some word)? BenjaminBarrett12 (talk) 01:11, 5 April 2012 (UTC)
    I suppose that depends on how you define 'sparse'. The corpus size of Ancient Greek, while enormous compared to most extinct languages, is miniscule compared to any major living language. -Atelaes λάλει ἐμοί 01:23, 5 April 2012 (UTC)
  2. Symbol oppose vote.svg Oppose Dan Polansky (talk) 07:05, 30 March 2012 (UTC) Absolutely. This does not even require a durably archived source or the like; a single post anywhere on the web is going to suffice. No problem has been demonstrated that this proposal solves. No list of examples of terms of "poorly documented" languages that are excluded by current CFI while they should not has been provided. Furthermore, it is unclear that citing in use as opposed to mention is really going to help these "poorly documented" languages. --Dan Polansky (talk) 07:10, 30 March 2012 (UTC)
    I appreciate the vote and the feedback. I agree the durable requirement needs to go in. If this vote passes, I will immediately draft a new vote adding that in. If not, I will re-draft with that included. For endangered languages, I often forget that most people don't see them because I see them everywhere: in the paper, on the Internet, in my community. Depending on who you ask, there are about 3000 of them in the world. Off the top of my head: Ainu (northern Japan), w:Jeju dialect (South Korea), Ditidaht (mentioned in the Beer Parlour), Makah (mentioned on the discussion page), Salish and w:Lushootseed (Washington State), Crow (US), Cornish (England), Irish, Quechuan (South America), dozens of languages in Australia, Maori (New Zealand) and Hawaiian. BenjaminBarrett12 (talk) 01:43, 31 March 2012 (UTC)
    There is no list of terms currently excluded because editors simply add terms from endangered languages and languages without strong written traditions without citations. If they were all sent to RFV, few would survive, but as a community we choose to ignore them instead of removing legitimate words, which would run counter to "all words in all languages". That is the problem that this vote can solve. --Μετάknowledgediscuss/deeds 02:54, 31 March 2012 (UTC)
    My complaint is that I have not a single term that would newly be included, and not a single quotation that would make the term includable. Examples of languages do not cut it for me; I need examples of terms and their prospective attestations according to the new rule. Discussing a rule in the abstract without dealing with specific examples is less than advisable.

    If a real problem with endangered languages is demonstrated, the likely solution for it would be a restoration of something like "Appearance in a refereed academic journal, or" in WT:ATTEST, a bullet point that was removed in diff. But again, in order to restore the bullet point, we need a list of example terms with their academic references to see what we are talking about. --Dan Polansky (talk) 07:20, 31 March 2012 (UTC)

    Here are two examples; would these work?
    1996 — ed. by Crisca Bierwert, Lushootseed Texts, pp. 124-125: ʔux̌áx̌ƛʼil - "they screeched" - I do not find this word on Google at all or in my Lushootseed Dictionary by Dawn Bates, Thom Hess and Vi Hilbert
    1987 — Allis Pakki Chipps-Sawyer, Standing on the Edge of Yesterday: A Dilemma of Oral Knowledge Survival in a West Coast Family [[1]], p. 106: "t’abuuk’ʷ" - I do not find this word on Google at all even though it's on the Internet. BenjaminBarrett12 (talk) 08:04, 31 March 2012 (UTC)
    You have not provided any sentences in which these terms are used. Are these citations of the terms in use to convey meaning? As the would-be quotations that you have provided are from works with English titles, they could be mere mentions. Maybe you should check WT:QUOTE and Citations:szökőár, a page that shows Hungarian sentences that use the cited term. After you provide complete sentences from your sources, we may quickly see whether these are uses or mentions. --Dan Polansky (talk) 15:24, 3 April 2012 (UTC)
    The first citation is from a sentence; however, the second is not. As noted in the Beer Parlour, the revised proposal will allow usage or mention, so sentences will not be required. BenjaminBarrett12 (talk) 03:35, 4 April 2012 (UTC)
    Citations should always have sentences, whether the sentences use the term or mention it. The citations that you have provided are therefore inadequate. Later: Maybe not; use-attestation should feature sentences, sure, but I am not so sure about each mention-attestation, such as from dictionaries. --Dan Polansky (talk) 06:42, 5 April 2012 (UTC)
    This proposal is because it is difficult in many cases to come up with any citation, and sentences are no different. My thought to address this is to say something like "sentences should be provided whenever possible" or something like that. I don't think the language could be made stronger than that. BenjaminBarrett12 (talk) 06:47, 5 April 2012 (UTC)
    It was always possible to enter a whole sentence or sentence-like thing such as a bullet point from the source that you are citing, unless you cannot actually see the page of the source, in which case you should not be entering the source anyway. --Dan Polansky (talk) 06:54, 5 April 2012 (UTC)
    Okay, I see what you're saying. Give whatever context there is if it's only "cat - gato" in a Spanish-English dictionary, for example. BenjaminBarrett12 (talk) 07:02, 5 April 2012 (UTC)
    I've seen some modern Burmese words in an English-Burmese phrasebook, which I hard hard time to find on the web. Also some simple web pages where words were written graphically. Later I was able to confirm the correctness from native speakers and also saw some video on Youtube - the words appeared written by a marker on a whiteboard. I trust native speakers' judgement. I think the proposal is good. You can always rvf any terms that don't appear right. --Anatoli (обсудить) 03:25, 4 April 2012 (UTC)
  3. Symbol oppose vote.svg Oppose, for drafting reasons. It removes the clause for extinct languages which I like. If there were a vote to add an exception for poorly attested languages without removing the exception for extinct languages, I'd support it, if the wording were unambiguous enough. This wording isn't unambiguous enough for me, and it fails my first criterion anyway. Mglovesfun (talk) 11:43, 30 March 2012 (UTC)
    Thank you for the vote and the feedback. The issue of extinct languages is complex. As mentioned on the discussion page, Makah recently became extinct according to the definition of having no native speakers. There are many other such languages, such as Manx, for example. Such languages fall somewhere between endangered and extinct, so I was trying to make sure they did not fall through the cracks. Also, the contemporaneous requirement for extinct language attestation puzzled me because it seems to rule out citing scholars for attestation. Nobody on the discussion page could say why it was there, so I tried to incorporate extinct and endangered languages together. I would welcome a reason to try to separate out long-extinct languages and keep the contemporaneous requirement if a way could be done that doesn't exclude languages that have recently gone dormant. BenjaminBarrett12 (talk) 01:43, 31 March 2012 (UTC)
    Re: "Also, the contemporaneous requirement for extinct language attestation puzzled me because it seems to rule out citing scholars for attestation": Funnily enough, one thing that I dislike about this proposal is that it seems to rule out citing scholars for attestation, since it restricts itself to uses rather than mentions, and I was thinking that I'd rather allow scholarly mentions than non-durably-archived uses. But I now take it that you did not mean to exclude scholarly mentions? —RuakhTALK 01:52, 31 March 2012 (UTC)
    That's correct. The word "usage" simply comes from the criterion as it currently stands. The aim of English Wiktionary is "to describe all words of all languages using definitions and descriptions in English" and the core of this proposal was to live within that spirit and allow languages without a lot of documentation in. BenjaminBarrett12 (talk) 03:02, 31 March 2012 (UTC)
  4. Symbol oppose vote.svg Oppose - -sche (discuss) 07:28, 31 March 2012 (UTC) Better wording can be ironed out and more widely agreed-upon before a new vote (which we should certainly hold). - -sche (discuss) 07:28, 31 March 2012 (UTC)
    I'm inclined to agree. The feedback here in this short time has been far more constructive than the first week was :) BenjaminBarrett12 (talk) 08:08, 31 March 2012 (UTC)
    Basically don't give up; just modify and try again. Mglovesfun (talk) 15:26, 3 April 2012 (UTC)


  1. Symbol abstain vote.svg Abstain I think some rules are best left unwritten. But I really don't care which way this'll go. -- Liliana 04:28, 30 March 2012 (UTC)
  2. Symbol abstain vote.svg Abstain per Mglovesfun and Dan Polansky. This is not drafted very well; for one thing, I don't think it makes any sense at all without the IMHO-tenuous supposition that the phrases "languages with sparse documentation" and "languages that are extinct, endangered or without a strong written tradition" are synonymous, or at least coextensive. —RuakhTALK 23:27, 30 March 2012 (UTC)
    Thank you for the feedback. This was also a difficult area. It is possible (either now or in the near future) for a language to be endangered or extinct, and yet have an abundance of written materials. In the nineteenth century, both the Hawaiian and Cherokee populations had near-universal literacy rates (with newspapers and other publications) and today are seeing a resurgence in speaking populations (however tenuous). Last year, two PhDs taken completely within the language of Hawaiian were granted, demonstrating that perhaps Hawaiian is not a good candidate for the "only one usage" criterion even though it is listed in the UNESCO Atlas. I wanted to provide a means to exclude such languages. The problem is that "sparse documentation" cannot be meaningfully defined; yet without a term like that, a conversation cannot begin. The way I read this is that if a language is not "with sparse documentation," you cannot refer to that section for that language. So mention of "languages that are extinct, endangered or without a strong written tradition" refers only to languages "with sparse documentation." My hope is that, like words, languages will be challenged to demonstrate they have only "sparse documentation" so that the "only one usage" criterion is not abused. This would apply to ancient Latin and Greek as well (if anyone cares to challenge).
    The words "extinct" and "endangered" were included as specific sorts of examples. Perhaps the insertion of "otherwise" would be better: "languages that are extinct, endangered or otherwise without a strong written tradition..." BenjaminBarrett12 (talk) 08:06, 31 March 2012 (UTC)
  3. Symbol abstain vote.svg Abstain Ƿidsiþ 07:24, 31 March 2012 (UTC) per Gloves.


  • No consensus, 5-4-3. --Yair rand (talk) 10:43, 1 May 2012 (UTC)