RFV discussion: September 2015–February 2016

TK archive icon.svg

The following discussion has been moved from Wiktionary:Requests for verification

This discussion is no longer live and is left here as an archive.

Latin. I don't think this word occurs at all, be it as a noun form or a verb form. — I.S.M.E.T.A. 17:26, 21 September 2015 (UTC)

Normally we don't require citations for every inflected form of a word, but rather include entries for inflected forms if the word is attested at all. Is there any reason not to do so in this case? —Mr. Granger (talkcontribs) 18:47, 21 September 2015 (UTC)
@Mr. Granger: Well, we have a convention of treating them as "innocent until proven guilty", but surely they're not exempt from WT:CFI, right? — I.S.M.E.T.A. 00:15, 22 September 2015 (UTC)
My understanding was that all inflected forms of a word are considered one unit, and that three citations (or one citation for LDLs) of any inflected form of a word are considered to attest the word. If we instead require three (or one) citations for each inflected form, then many words that have passed RFV should have failed: genophobe, for example, doesn't have any citations of its lemma form, but its RFV discussion was closed as passed.
I have seen this stated explicitly before, for instance by User:Ruakh in Wiktionary:Tea room/2012/June#o dinosaur! when he explained why we have an entry for internacionalizabais even though it is probably unattested. —Mr. Granger (talkcontribs) 00:49, 22 September 2015 (UTC)
The only reason I can see for RFVing an inflected form is if there's reason to doubt it's correct, based on the existing forms, e.g., if the form in question has a first declension ending even though the -a in the nominative is due to the word being a borrowing from an Ancient Greek third declension neuter, or if it's impossible to tell from the existing forms how the word is actually inflected in that part of the paradigm. While the letter of CFI doesn't prohibit rfving inflected forms for other reasons, it would be a really bad idea: with few exceptions, just about every Latin or Ancient Greek word has unattested forms somewhere in its paradigm, and we use automated templates/modules for inflection tables. Do you really want to have a redlink or an empty cell in the inflection table because it just happens that there are no texts that use the ablative plural for a given word? Chuck Entz (talk) 01:27, 22 September 2015 (UTC)
As Mr Granger says, most of us consider inflected forms to constitute one lexeme, although 1-3 users oppose this and suggest infeasible alternatives. As I wrote in 2013 about German, and as Ruakh agreed: if an adjective isn't attested in the comparative, we should say it's incomparable, but if it's merely that I can only find 2 citations of mitternachtsblauen as the neuter mixed genitive form of mitternachtsblau, I'd still list it in the inflection table. It'd be prohibitively hard to do otherwise; a user would need 156 citations to attest one table, an understanding of German grammar to know which of the 26 homographic slots each citation of mitternachtsblauen supported, and different tables for every combination of missing slots. And as Chuck Entz pointed out and I agreed, marking individual forms of valid paradigms as invalid would mislead more readers in a more harmful way than not marking such forms. Someone who was learning German and was about to use an adjective foobar could turn to Wiktionary to double-check that the ending on a neuter adjective in the nominative after ein is indeed -es and not -e (as it would be after das). If we told them foobares was not attested, I think the odds are slim that they would grasp that that signified only that at the time some Wiktionary users checked, insufficiently many books using the word in that case had been digitised by Google. I think the odds are better that they would conclude that they had to use some other form, and thus they would end up writing something ungrammatical."
Further up this page you can find Angr expressing the same view re kar. Unless dulco is only used actively and the entire passive conjugation doesn't exist, or there's a reason another form would be expected as the second-person plural present, I'd close this (since the lexeme is attested). - -sche (discuss) 03:39, 23 September 2015 (UTC)
On one hand, when I cite an Esperanto word, I'm glad I can worry about citing one lexeme instead of several different predictably-inflected forms. On the other hand, Esperanto has the neatness of an artificial language; can we trust that Latin or Ancient Greek were really that predictable? For example, with English, the plural form is is almost always predictable, but there's a substantial body of words where the predictable form is not the one used. I don't know that one rule covers everything; there are cases where we have complete certainty and there are cases where we have a strong pattern that has a number of clear exceptions.--Prosfilaes (talk) 06:15, 23 September 2015 (UTC)
AFAIK, there is no consensus as for whether to exempt inflected forms from attestation; maybe we should add that statement to CFI so it accurately reports the state of discussion. It is certainly easier to ignore facts and evidence and rely on regularities of inflection to automatically create pages. Applied to Czech and extended to archaic sections of the inflection tables, this would lead to creation of loads of forms that, for modern speaker, look bizarre, since they combine a modern lemma with an archaic inflection. I hope that this approach will not be extended to derivation using highly productive suffixes such as -ness. As an option, I proposed to keep unattested forms but mark them as hypothetical, but that was claimed to be impractical. I do not deem it impractical: one could take a particular comprehensive corpus, collect all forms from it, and then mark every inflected form that is not in that collection as hypothetical. The hypothetical label could then be removed once it is confirmed that the form is attested; the attesting quotations would not need to be placed to Wiktionary, only reasonably unique identification of the locations where the attesting quotation are actually found. Let me note that the result of this discussion does not change the fact that we pool attestations of inflected forms to support a lemma entry, even if the lemma form itself is unattested; that is a fairly separate issue. --Dan Polansky (talk) 09:09, 26 September 2015 (UTC)
De facto we tend not to RFV these. For a couple of reasons that I can think of. One, because it would leave inflection tables with unexplained red links. Two, the amount of time you'd need to even attempt to cite all of these. Dulco on its own has about 100 forms, imagine trying to cite all of these, and that's just one verb. And of course right now all our attesting is done by humans (not bots) so the number of person-hours would be enormous. It would run into the thousands very quickly. Imagine how long it would take to tag all these entries by hand, never mind the actual citing. Renard Migrant (talk) 14:28, 26 September 2015 (UTC)
Regarding old inflections + modern stems, one thing we could do is identify entire sections of conjugation tables that are in general no longer used, and create tables that don't include those entire sections. This would be in line with how we use tables that only show singular forms on words that aren't used in the plural, and how only older Latinate German words which have vocative + ablative forms (and not newer words which don't have such forms) list such forms. Striking individual forms, e.g. striking only the masculine singular mixed declension dative superlative but not the masculine singular mixed declension accusative superlative of some German adjective because the former only got 2 (or 0) BGC hits while the latter got 3, would be a different matter, one which I think would be impractical and inappropriate for reasons already outlined. - -sche (discuss) 18:48, 26 September 2015 (UTC)
Closed per my comments above, in accordance with our usual practice. If there is a concern that the verb is only active and that no passive forms exist, or that the noun is indeclinable and doesn't change from case to case, that's a separate matter from concern that insufficiently many books using a certain expected slot (out of hundreds) have been digitized by Google Books. - -sche (discuss) 17:52, 13 February 2016 (UTC)