Wiktionary talk:Votes/pl-2011-05/Attestation of extinct languages

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Renaming this vote[edit]

I suggest renaming this vote to anything more explanatory, such as from Wiktionary:Votes/pl-2010-12/CFI amendment (2) to Wiktionary:Votes/pl-2010-12/Attestation of extinct languages. --Daniel. 22:57, 10 December 2010 (UTC)[reply]

Agreed. --Dan Polansky 23:03, 10 December 2010 (UTC)[reply]
Okay. -- Prince Kassad 23:15, 10 December 2010 (UTC)[reply]

Mentions[edit]

Should we allow mentions? There's more than one case where a language is only attested through words quoted in Roman and Greek sources. -- Prince Kassad 23:24, 10 December 2010 (UTC)[reply]

"Usage in a well-known work"[edit]

Doesn't the "Usage in a well-known work" attestation possibility usually cover this? In languages with a small amount of durably archived texts I would figure that most or all of the works in the language would qualify as "well-known works". --Yair rand (talk) 05:39, 12 December 2010 (UTC)[reply]

Dunno. Some think yes, but I'm not sure if this view is universally accepted. -- Prince Kassad 10:40, 12 December 2010 (UTC)[reply]

extinct[edit]

Should the proposal name who decides what's extinct? or shall we decide on a case-by-case basis at [[WT:RFV]]?​—msh210 (talk) 18:45, 14 December 2010 (UTC)[reply]

If anything, we should include all languages classified by the ISO as "Ancient" or "Historical". The third group "Extinct" also covers languages which have become extinct only very recently (like w:Ubykh) - if we want to apply the same rules to those, we can just change it to "any natural language with no native or second-language speakers". But it's debatable how to treat these. -- Prince Kassad 18:53, 14 December 2010 (UTC)[reply]
Just native speakers, not second-language speakers – there are people who converse in Old English and Ancient Greek, but these are not living languages. — lexicógrafa | háblame19:36, 14 December 2010 (UTC)[reply]
The problem is that without the "second-language speakers" part, it would also include pidgins, which we definitely don't want. -- Prince Kassad 19:46, 14 December 2010 (UTC)[reply]

one vs. three[edit]

Really, just one instance would be enough? How do you know what a word means from only one ever existence of its use? We require three cites of actual use, but in most cases there are more than that in durable media, or there are more in blogs and forums, or if nothing else there are mentions that at least give a hint in establishing meaning. For an extinct language, I understand that potentially someone could run across this and want to know what it means, but shouldn't we know what it means before we tell them? DAVilla 15:20, 21 December 2010 (UTC)[reply]

I'm not sure how to deal with words whose meaning is unknown, and whether we should include them or not. It may seem best to not include words whose meaning is not known at the present time. -- Prince Kassad 15:26, 21 December 2010 (UTC)[reply]

Recent cites?[edit]

Should we add some sort of restriction that the cite can't be recent? Like, Gothic is classified as "extinct", but that doesn't prevent me from writing a book in it; and if I were to do so, we shouldn't automatically include any words that I would be forced to reconstruct and/or invent. (Arguably, since Gothic is extinct and I am alive, a book that I wrote in Gothic would not actually be in Gothic; but we don't have a general practice of excluding non-native speaker cites.) —RuakhTALK 19:23, 9 May 2011 (UTC)[reply]

An interesting thought. Maybe the "contemporary" restriction should encompass the whole paragraph. -- Prince Kassad 19:41, 9 May 2011 (UTC)[reply]

Integration with Attestation section[edit]

This needs to be integrated with WT:CFI#Attestation section. One option is to add this bullet point to attestation section:

(a) "Single usage or mention in contemporary sources for a term of an extinct language"

That is more lenient than the following bullet, which excludes mentions from dictionaries and other mentions, and adds the requirement for permanently recorded media:

(b) "Single usage in permanently recorded media, conveying meaning, for a term of an extinct language"

For context, this is the wording of the 3-quotations bullet point:

"Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year."

The point (a) can be formulated as two points, for better clarity via avoidance of disjunctive bullets:

(c) "Single usage for a term of an extinct language"

(d) "Single mention in contemporary sources for a term of an extinct language"

--Dan Polansky 09:02, 10 May 2011 (UTC)[reply]

We've discussed this a bit on IRC. The problem with allowing usage in any sources is that if someone decided to write a book in Gothic, it would suddenly count as a source. Of course, that isn't good at all, so I extended the contemporary requirement to both mentions and usages.
While it may at first thought be good to integrate this proposal in the attestation requirement, I decided against it. One of the old complaints (see above) was that the rule didn't define what an extinct language is for the purpose of CFI, and this information would be lost if this is integrated as a mere bullet point. -- Prince Kassad 10:04, 10 May 2011 (UTC)[reply]
There needs to be a bullet point or two nonetheless. There can be a separate section that defines "extinct language", that is okay, but there still needs to be a bullet in Attestation section. Compare how "Attestation" section defines what "attested" means in a separate section, while the user of the word "attested" is the lead paragraph of the section "General rule". The separate section that defines "extinct language" should do only that: it should define what is meant by "extinct language" without stating any attestation criteria for extinct languages. When you look at "General rule", this is exactly the pattern of CFI: CFI first uses some terms in the lead paragraph, to define them in subsequent sections: the terms "term", "attested", and "idiomatic" used in the lead paragraph are defined in dedicated sections that come later. --Dan Polansky 10:18, 10 May 2011 (UTC)[reply]
Looking again at the CFI, this isn't as bad of an idea as I thought. It requires rewriting the proposal, which I'll do later today. -- Prince Kassad 14:02, 10 May 2011 (UTC)[reply]
An afterthought: You can even make it clear that a term is defined elsewhere on the page by using a hyperlink to the defining section, like this: "Single mention in a contemporary source for a term of an [[#Extinct language|extinct language]]". This is what CFI currently does not do, but it should IMHO. --Dan Polansky 14:29, 10 May 2011 (UTC)[reply]

Use or mention[edit]

I don't oppose allowing only one durably archived citation, but I hate the idea of allowing mentions. Problem is, etymologies aren't always correct, or they give transliterations. The New Oxford Dictionary of English only gives etyma in the Latin script, for Greek, Arabic, Gothic, etc. Mglovesfun (talk) 09:46, 10 May 2011 (UTC)[reply]

Hence the phrase in contemporary sources, so ex. if a language died out in 2nd century BC, we'd only be allowing citations until this time (roughly, give or take one or two centuries). -- Prince Kassad 09:53, 10 May 2011 (UTC)[reply]
Here's a specific example from Laüstic, Marie de France:
Ceo est russignol en franceis
nihtegale en dreit engleis.
This is 'rossignol' in French
And nightingale in good English
Does this mean that nihtegale meets CFI or Middle English using these criteria. --Mglovesfun (talk) 12:54, 10 May 2011 (UTC)[reply]
I'd say yes. Using this citation, nihtegale would be permissible for inclusion by the proposed rules. -- Prince Kassad 14:00, 10 May 2011 (UTC)[reply]

In this revision, "use" or "usage" is gone; only "mention" remains. Is this intentional? Single uses should count as well, should not they? See also #Integration with Attestation section, and proposed points (c) and (d). --Dan Polansky 09:13, 11 May 2011 (UTC)[reply]

I just copied the wording you provided, but it made me a bit suspicious as well. I fixed it. -- Prince Kassad 09:41, 11 May 2011 (UTC)[reply]
Then I suppose I feel uneasy about this; nihtegale seems to be attested anyway (examples), but I think there will be some mentions that aren't even in the right script. Why try and lump the two together, why not just allow 'one use' but exclude mentions. Would any genuine words be excluded under these criteria, and how can we be sure that they're genuine if they're not attested directly in the language in question? --Mglovesfun (talk) 09:58, 11 May 2011 (UTC)[reply]
A butting-in in a less intrusive place: I tend to agree with Mglovesfun that "mention" should better go, but I am sitting on the fence and watching the arguments unfold. --Dan Polansky 15:57, 11 May 2011 (UTC)[reply]
The entire Category:Dacian language would be excluded were we to disallow mentions. The words are only attested in Roman sources. -- Prince Kassad 11:30, 11 May 2011 (UTC)[reply]
I take that point, but in that case why not go a step further and allow a word to be included with no usages and no mentions; then we can include whatever we like. If you allow mentions, you have to allow words/terms that are totally wrong, and even when they're totally wrong, they'd still meet CFI. --Mglovesfun (talk) 12:20, 11 May 2011 (UTC)[reply]
I don't understand your logic. How do you assume that the people who lived when such a language was spoken were "totally wrong"? -- Prince Kassad 15:22, 11 May 2011 (UTC)[reply]
Like you say "assume". Let's put it another way, how could this (including terms only mentioned, never, or rather perhaps never used) even be a good idea? --Mglovesfun (talk) 15:28, 11 May 2011 (UTC)[reply]
Why do you think is this a problem? Do you know of any terms that are totally wrong (as you say), but would meet the new CFI? -- Prince Kassad 16:38, 11 May 2011 (UTC)[reply]
No, and nor does anyone. That's the point, it's guesswork, it's good faith, it's shut your eyes and hope for the best stuff. The Dacian language wouldn't be 'excluded' per se, it just so happens that right now, it isn't attested at all. On the other hand, what we would include is mentions of attested languages like Ancient Greek in the Latin script when mentioned in the Latin script in any Latin document or whatever. We'll have Ancient Greek entries in the Latin script automatically meeting CFI. Mglovesfun (talk) 22:45, 11 May 2011 (UTC)[reply]
There's still {{wrongscript}}. This overrides CFI, believe it or not. -- Prince Kassad 22:48, 11 May 2011 (UTC)[reply]
To turn the issue around, how many definitely valid entries would this produce? Quite simply, none. How many perhaps valid entries, a few; I think we are talking dozens, or perhaps a hundred or more in a couple of years time. That's one reason I wouldn't want this to be a sticking point of this vote; I'd say deal with the attested languages first by allowing a single 'durably archived' use; then propose and amendment later. If the amendment fails, the main part of the policy is still there. At this point we'll need more people's opinions to work out if this could possibly be supported. --Mglovesfun (talk) 10:30, 12 May 2011 (UTC)[reply]

I'm coming in late to this conversation, and I'm not sure what to indent this under, so I'll just start over on the left side. I agree with Mg that allowing mentions will let in too much. In particular: I don't agree with PK that {{wrongscript}} overrides the CFI: if something is attested (and idiomatic) per the CFI, then it's in, no matter its script. Thus, as Mg notes, if we allow mentions, there will be many inclusible wrong-script entries in dead languages. As an example, Rashi (w:), in his commentaries on Tanach and on the Talmud, frequently writes the Old French translation of a difficult word used in the text. A number of other medieval rabbis did the same, in Old French and other languages. That translation is written in Hebrew transliteration, and we'd include all those. (For example, in his commentary to Ezekiel 13:18, he writes (according to Wikisource) אצילי — איישיל״ש בלע״ז, that the Hebrew אצילי means איישיל״ש in Old French (presumably something derived from Latin (deprecated template usage) axilla).)​—msh210 (talk) 19:56, 12 May 2011 (UTC)[reply]

I don't have the discussion at hand, but there was something about Tatar Latin terms. Some people have opted, in modern times, to write Tatar using a self-invented Latin script. These are no doubt attestable and would meet CFI - but {{wrongscript}} invalidates it since they're not written in the correct script, which is Cyrillic only. -- Prince Kassad 19:59, 12 May 2011 (UTC)[reply]
That doesn't make sense. {{wrongscript}} does not explain which scripts may be used for which languages, so by your reasoning, I could start adding {{wrongscript}} to entries for English words in the Latin script, and suddenly those entries would become deleteable regardless of what WT:CFI might imply.
You could claim that there exists a policy, not documented anywhere, of excluding words in "wrong" scripts, and that {{wrongscript}} is used to enforce that policy; but such a claim would require much more support than you have provided. (Note that the existence and use of {{wrongscript}} makes perfect sense even if we assume that there is no such policy, since the regular CFI do exclude most words-in-wrong-scripts, especially if we take such to be misspellings. The point then would not be that wrong-script-ness itself means the entry must be deleted, but that deleteable wrong-script entries warrant special handling.)
RuakhTALK 20:21, 12 May 2011 (UTC)[reply]
(To make this a bit less abstract: if someone added a ==Hebrew== section at [[shalom]], that would warrant {{wrongscript}}. If someone added a ==Hebrew== section at [[SMS]], that would not warrant {{wrongscript}}. {{wrongscript}} is a tool, for fixing language/script mismatches, not a policy, for forbidding them. —RuakhTALK 20:26, 12 May 2011 (UTC))[reply]
Now, assuming that shalom would be attestable as Hebrew in running text as per the current CFI, would that suddenly warrant a ==Hebrew== section at shalom (and any other Latin-script Hebrew entries appearing with it)? -- Prince Kassad 20:33, 12 May 2011 (UTC)[reply]
Yes: it would be just like Hebrew (deprecated template usage) SMS, then, or Russian (deprecated template usage) b-кварк. But that's unlikely to occur: the hits at google books:+ואת +shalom -intitle:shalom don't seem actually to have shalom in the text.​—msh210 (talk) 20:38, 12 May 2011 (UTC)[reply]

Contemporary[edit]

"contemporary" seems to be ambiguous, per WT and other dictionaries. You mean contemporary--from the same time period, coexistent in time--rather than contemporary--modern, of the present age. At first, I understood your "contemporary" as "modern" or "of recent date", given you have mentioned the word in the context of "mention in contemporary sources", which contains "mention", and I have have understood the intention to include "mention" as one to include recent lexicographical sources.

A native speaker could suggest a less ambiguous word, or help to make the phrasing unambiguous. --Dan Polansky 14:39, 10 May 2011 (UTC)[reply]

I think the new wording has resolved that problem. -- Prince Kassad 15:56, 10 May 2011 (UTC)[reply]
I think I have found a less ambiguous word: "contemporaneous". It is less ambiguous per WT, MWO and Encarta. --Dan Polansky 12:14, 11 May 2011 (UTC)[reply]
I'm not quite positive we should resort to obscure terms that may not be understood by everyone. -- Prince Kassad 16:37, 11 May 2011 (UTC)[reply]
The term is not as obscure as you may think: just try this search in Google Ngram Viewer for "contemporaneous", "contemporary", "attestation", and "idiomatic". What I see there is that "attestation" and "idiomatic" have lower frequency than "contemporaneous", and that "contemporaneous" is approximately 10-times less common in the total considered time period than "contemporary". At the same time, the hits for "contemporary" include those for both senses, as in "contemporary fiction", which boosts the overall number of hits over those that are used in the requisite sense. Taking my initial confusion into account, I prefer a term that is unambigous even if less common. --Dan Polansky 17:11, 11 May 2011 (UTC)[reply]
I don't really know; other searches should be taken into account, as as those for "contemporary sources" and "contemporaneous sources" in Google web and Google books. In some of these, "contemporaneous" is beated badly. An impression from a native speaker would be welcome. --Dan Polansky 17:18, 11 May 2011 (UTC)[reply]
I agree with Dan Polansky. I find contemporaneous much clearer than this sense of contemporary, and no more obscure. (I'm familiar with this sense of contemporary, and even so, every time I hear it used, I have to think for second to try to decide how it's meant.) —RuakhTALK 17:16, 11 May 2011 (UTC)[reply]
Fine with me. -- Prince Kassad 20:03, 11 May 2011 (UTC)[reply]

Note to be better removed[edit]

This note should IMHO better be removed:

As the vocabulary of an extinct language is by definition fixed and not subject to change, and the language itself may only be fragmentarily attested, lesser requirements for attestation are made for terms in any of these languages.

This comment could appear in the justification or rationale for the vote (a thing two editors seem to frown upon, while it is supported by me and seems supported by Mglovesfun). The note does not serve to define or regulate, so it does not IMHO belong directly to CFI. --Dan Polansky 08:17, 11 May 2011 (UTC)[reply]

I modeled it after similar rationales listed in the other headers, such as ====Independence==== or ====Conveying meaning==== (read them to see what I mean). It might be useful for others to understand why this rule exists. We could also use it to add a ref to this vote, as per Wiktionary:Votes/2011-04/Sourced policies. -- Prince Kassad 15:28, 11 May 2011 (UTC)[reply]
I for one think the rationales should not be in CFI, while I admit they often are. "Conveying meaning" and "Independence" are some of the less fortunately written CFI sections. CFI should not explain why its rules are there, IMHO anyway; that should be explained in the vote that has introduced the rule. --Dan Polansky 15:49, 11 May 2011 (UTC)[reply]
I agree (with Dan).​—msh210 (talk) 19:39, 12 May 2011 (UTC)[reply]

Superflous hyperlinks[edit]

I would remove hyperlinks from the section "Extinct language", from terms "Extinct languages" and "contemporary". The reason: the section defines and clarifies both terms, so it should not rely upon the definitions of these two terms provided in Wiktionary. Like, the passage should not refer the reader to two definitions of "extinct language", one provided in the section of CFI, another provided in the mainspace.

This mistake (IMHO anyway) is now there in CFI when it refers the reader of section "General rule" to the mainspace definitions of "attested" and "idiomatic" by hyperlinking to the mainspace, only to redefine these terms in dedicated sections of CFI. --Dan Polansky 08:23, 11 May 2011 (UTC)[reply]

I removed the link to extinct language, as that entry is not helpful at all. Not sure about contemporary, though (foreigners might use it to get a translation in their language, you know). -- Prince Kassad 09:56, 11 May 2011 (UTC)[reply]
Makes sense a bit, yet I still think the hyperlink should better go. A non-native speaker can still use the copy-and-paste function to find the definition and the translations of the word. Not a big deal, though.
On a quite related note, "contemporaneous" seems less ambigous than "contemporary"; see my remark at #Contemporary. --Dan Polansky 12:14, 11 May 2011 (UTC)[reply]
I unlinked it for now. -- Prince Kassad 16:37, 11 May 2011 (UTC)[reply]

Single[edit]

After Ruakh's edits, "single" is gone from the proposed bullet point. I find the current phrasing slightly ambiguous, although unambiguous on a second reading:

  1. ...: usage or mention in any contemporaneous source.

I prefer one of the following over the current wording:

  1. ...: usage or mention in any single contemporaneous source.
  2. ...: usage or mention in anyat least one contemporaneous source.

On another note, I prefer "use" over "usage", but I admit that current CFI uses "usage" in other bullet points, and that a new bullet could do well to stick with that. --Dan Polansky 06:17, 12 May 2011 (UTC)[reply]

Yes, I went with "usage", and dropped "single", so as to better match the phrasing of the existing well-known work rule. I agree that "use" would otherwise be better. —RuakhTALK 19:22, 12 May 2011 (UTC)[reply]
The point "Usage in a well-known work" is going to go if this vote passes, I hope. Nonetheless, the point uses "a well-known work" rather than "any well-known work", so what would really match it is "...: usage or mention in anya contemporaneous source. I still think "at least one" would make it more explicit and do no harm, but if no one agrees, let it be. --Dan Polansky 19:26, 12 May 2011 (UTC)[reply]
I added an emphasis on one. -- Prince Kassad 12:18, 14 May 2011 (UTC)[reply]
Adding "at least" would be clearer. Like, does "one" mean "at least one", or "exactly one"? No one is going to read this as "exactly one", of course; "at least" is clearer, nonetheless. "at least" is used in the 3-attestation point: "... in at least three independent instances spanning at least a year." --Dan Polansky 11:15, 16 May 2011 (UTC)[reply]

Maybe devolve to language-considerations pages?[edit]

Maybe, instead of making a blanket statement that all extinct languages can be attested in these ways, we should simply say that individual extinct languages may allow these attestation methods? Next year, when Modern English evolves into English 2.0, I don't think we'll need to start accepting single, non-durably-archived cites for ==Modern English== words. And many languages have small enough corpora to justify a single-use rule, but not so small as to justify a single-mention rule. (I'm all on board with accepting mentions for Dacian words, but I share Mglovesfun's and msh210's concerns above about allowing Middle English words and Old French words on the basis of Middle French and Medieval Hebrew mentions.) In fact, we might make this a bit more general — even certain non-extinct languages might warrant such treatment. (For example, if reliable linguist compiles a glossary of an endangered Amerindian language and offers it to us under an appropriate copyleft license, I think we should snap it up.) In all cases, [[Wiktionary:About ____]] is the place for it, and the CFI just need to allow such variation. —RuakhTALK 20:44, 12 May 2011 (UTC)[reply]

This is an idea to consider, and something that is already on WT:AGRC. I see multiple problems with it though, one of them being the control. About pages can be freely edited by anyone, including anonymous users, having these directly affect the CFI creates a dangerous loophole that could allow lots of bad entries to be added to Wiktionary. Unless there's some way to prevent this from happening, I don't think it is a viable idea. -- Prince Kassad 21:23, 12 May 2011 (UTC)[reply]
I don't think that's an overriding concern. RFV will still catch the bad edits, and defense based on an About page can be countered by an edit to the About page; if the community agrees with that edit, a reversion thereof won't last.​—msh210 (talk) 21:35, 12 May 2011 (UTC)[reply]
My thought is that chaos would ensue if anyone could edit the criteria for inclusion at will. But I may be too careful, as always... -- Prince Kassad 21:43, 12 May 2011 (UTC)[reply]

I don't like the idea of distributing CFI across several about pages. If we want to accept mentions, we need to figure out what sorts of languages warrant such a criterion. In the worst case, such languages can be listed directly in CFI, like this:

For Dacian, Language2, and similarly poorly attested extinct languages: mention in a contemporaneous source.

Thus, we define a class of languages by listing examples and then rounding it up using a similarity closure. If the closure is opposed, a mere list of all cases does the job:

For Dacian, Language2, and Language3: mention in a contemporaneous source.

--Dan Polansky 10:04, 19 May 2011 (UTC)[reply]

I previously suggested a subpage giving a list of which languages require only one citation, which require only two. Something like Wiktionary:Criteria for inclusion/Poorly attested languages. --Mglovesfun (talk) 10:45, 19 May 2011 (UTC)[reply]
We do not need any subpage, IMHO; we should not distribute the actual CFI into several pages. CFI needs to be controlled by votes. "Attestation" section of CFI can be expanded to host any further rules. There should be one location that provides the answer to the question "When is a term considered attested in Wiktionary?". --Dan Polansky 10:53, 19 May 2011 (UTC)[reply]
I like the idea of a page where all the language-specific rules would be listed. Not sure if it should be a subpage of CFI, but it should exist in one form or another. -- Prince Kassad 11:52, 19 May 2011 (UTC)[reply]

Start of vote.[edit]

I'm just going to inform you the vote has been started, so you can cast your opinion now. -- Prince Kassad 14:03, 16 May 2011 (UTC)[reply]

Follow-up[edit]

For reference, two follow-up votes:

--Dan Polansky 10:26, 8 August 2011 (UTC)[reply]