Wiktionary talk:Translations/Noting lemma forms in WT:ELE

from WT:BP#Noting lemma forms in WT:ELE Rod (A. Smith) 22:20, 3 November 2007 (UTC)

With the recent update to WT:AJ, we documented our common practice of differentiating the full format used for lemma entries from the abbreviated format used for non-lemma entries. We should do something similar in WT:ELE. I think the following wording reflects how we currently select English lemmata and belongs in the ==Basics== section of WT:ELE:

===Lemma entries===
Each language may have its own traditional choice of lemma forms for various parts of speech. For English entries, the lemma form is usually the “bare” form: the singular form for nouns (e.g. word but not words), the bare infinitive form for verbs (e.g. talk but not talks, talking, or talked), and the positive form for adjectives (e.g. easy but not easier or easiest). With some types of words, an alternate form is the preferred lemma, e.g. the plural entry for a plurale tantum. For other types of words, there may be no distinction between lemma and non-lemma forms, e.g. for pronouns, articles, prepositions, and defective verbs like may (have permission to) that lack an infinitive form. With such terms, all forms are treated as lemmata. When the situation is unclear, editors are advised to use their best judgment on a case by case basis.
Following are guidelines for entries for the lemma form of terms. For non-lemma entries (e.g. for the plurals of most nouns), a more abbreviated format is used instead.

Is the above accurate? Is it too wordy for WT:ELE? Does it allow for enough or too much flexibility? Rod (A. Smith) 07:16, 24 September 2007 (UTC)

Sounds good to me. But ideally ELE would address the format for "form-of" entries as well. -- Visviva 07:56, 24 September 2007 (UTC)
I think it would be much better to avoid using the term "lemma" instead of "base form." The above doesn't read very cleanly. But the very last phrase is concerning. AFAIK, we welcome clarifications on the form-of entries. (That is, tags and a gloss, examples, pronunciation, etc.) The wording you have above, suggests those be removed. Again, AFAIK, only the ===Translations=== section is discouraged in the "form-of" entries (and only for noun forms and verb forms.) Is there a better sub-page for this? Entry layout explained is far too long as it is. It obviously fails the TLDR test for most newcomers. --Connel MacKenzie 08:21, 24 September 2007 (UTC)
I'm not sure about "base form", because that seems easily confused with "stem" or "root". "Citation form" or "canonical form" are pretty clear, but I'm not sure either is better than "lemma". Is there a precise layman synonym? Rod (A. Smith) 22:15, 9 October 2007 (UTC)
Good suggestions, Visvia and Connel. We seem to need Wiktionary:Entry in a nutshell for the TLDR problem and the full WT:ELE for the specifics, including the details for the non-lemma formats. My understanding is that non-lemma entries are not supposed to include the following:
  • Etymology (e.g., the entry for speaking should not show the Old English etymons spēcan or sprēcan)
  • Other non-lemma inflections (e.g., the entry for speaking should not show the inflection spoke on the headword line)
  • Detailed definitions (e.g., the entry for speaking should show that it is the present participle of speak, but should not have separate definitions “communicating with one's voice”, “having a conversation”, “communicating by some means other than orally”, “delivering a message to a group”, or “being able to communicate in a language”)
  • Synonyms, antonyms, or other -onyms
  • Translations
Is anyone under the impression that non-lemma entries should contain the above information? Rod (A. Smith) 18:01, 24 September 2007 (UTC)
My understanding is that a non-lemma should include the following:
  • Alternative spellings (or whatever)
  • Pronunciation
  • POS header
  • Simple bold term inflection line (possibly with gender, number, etc. specific to that form but not any other inflectional forms)
  • A gloss/definition that links to the lemma and explains the relationship
  • Example sentences
  • Supporting citations
My understanding also is that a non-lemma should not contain other kinds of information, though I can imagine there may be exceptions in unusual situations, such as when a past tense form has an unusual etymology. Including synonyms & antonyms would get messy; consider that the antonym of whiter is less white, not blacker. Likewise you run into problems if you're going to have synonym listings for all the inflections of Latin nouns, adjectives, verbs, etc. --EncycloPetey 23:25, 24 September 2007 (UTC)
I don't think non-lemma entries usually need alternative spellings, example sentences, or supporting citations; all of those should ordinarily go in the lemma page. Conversely, I'm O.K. with a term derived from a non-lemma appearing at both the lemma entry and the non-lemma entry (say, double-dealing being listed both at deal and at dealing), and I'm O.K. with non-lemmata belonging to relevant lexical categories (like Category:English plurals). —RuakhTALK 20:03, 27 September 2007 (UTC)
I have to disagree strongly about alternative spellings and citations. The non-lemmae need to have separate citations listed because they are different spellings from the lemma! In Latin, for example, the reason we know that a particular word is irregular is through documented citations of the irregularity. It would be silly to burden the lemma page citations with all the dozens of inflected forms, and would be difficult for anyone trying to make use of the data to parse out the appropriate citation or two that supports, say, the irregular dative feminine plural form.
Likewise, an inflected form may have an alternative spelling that the lemma does not have. A non-lemma page should therefore have an alternative spellings section of its own. --EncycloPetey 22:15, 28 September 2007 (UTC)
Re: citations: Are you saying that citations for an inflected form should never appear at the lemma entry, or only that they shouldn't necessarily appear at the lemma entry? If the latter, then I agree completely; but if the former, then I beg to differ. If in the oldest known citation using a noun, it appears in the plural genitive, then I think that citation needs to be included in the entry for the noun, i.e. the entry for the noun's singular nominative form. The entry for the lemma is really the entry for the word as a whole; that's why we define dictionary as "A publication, usually in the form of a book, […]" and not "The singular form of a noun used to refer to a publication, usually in the form of a book, […]" or whatnot.
Re: alternative spellings: How can that be? Either the same word has multiple spellings for one form (in which case the lemma page should note that in the inflection line, inflection table, and/or usage notes), or it has multiple spellings for all forms (in which case the lemma page will have its own alternative-spellings section). Can you give an example of what you mean?
RuakhTALK 00:22, 30 September 2007 (UTC)
I do mean that they shouldn't necessarily appear in the lemma page. I imagine that the citations for a lemma page could include the lemma form or various inflected forms, and even alternative spellings (see the citations for parrot). However, I want to see the inflected forms backed up with at least some citations specific to that form and listed on the non-lemma page.
With regard to alternative spellings, I mean exactly what I say. sometimes only one or two of the inflected forms have alternative spellings but the lemma does not. For example, a Latin word might have an alternative form for the accusative singular, but not for any of the other forms. For example ingēns has two forms of the ablative singular, but only a single form for each of the other inflections. I've set up the appropriate pages for ingēns, ingenti#Latin, and ingente#Latin to show how I would handle this. Putting the alternative spelling information onto only the lemma page would cause the information to be visually lost. In the case of some entries with multiple inflectional parts with mutliple forms (such as deus), it can be be downright confusing. --EncycloPetey 03:09, 30 September 2007 (UTC)
It seems like your approach would be less effective for deus, because suddenly deī would need to have a full-out explanation of its alternative spellings dīvi, , diī, and dii, (are those last two really different, or is that just a typo?) explaining which alternative spellings exist for which senses. If, instead, all the information is in the declension table at deus (as it is currently), and the entries for inflected forms all point there, then we don't need to worry about all these complicated explanations at the inflected forms; if we want to be explicit about it, we can even have dei#Alternative spellings say to see deus. —RuakhTALK 03:40, 30 September 2007 (UTC)
As I said, it gets really confusing in the case of deus, but that's an exceptional case. There are only a handful of those in the whole of the Latin language. For most alternative spelling issues, there are just one to three forms with alternatives (or all of them), and there's usually just one alternative. --EncycloPetey 04:17, 30 September 2007 (UTC)

I've been talking to Ruakh about including definitions etc. of non-lemma entries, and I disagree quite strongly with what appears to be the current practice of simply stating the grammatical form of a word. If a reader doesn't know these grammatical terms, it isn't helpful for them to see something like "first-person singular pluperfect form of X" instead of an actual definition. For example, I'm going to use the word citisem, the first-person singular pluperfect form of the verb citi. Assuming that the reader does not know what "pluperfect" and "citi" mean, if the entry for citisem only includes "first-person singular pluperfect form of citi" they need to look first at citisem, to see the form, citi, to see "to read" and then pluperfect, to see that it means "pertaining to action completed before or at the same time as another," which honestly might not be the best definition. After all of this, without the definition "I had read," they still might not know what "citisem" means. Better I say to see as much as possible about the word "citisem" in its own entry than for it to be necessary to hop around wiktionary looking at definitions for other words. It might take more time to include all this information, but in the end it's more helpful for the readers. Sorry for being so wordy, but I feel strongly about this =) — [ ric | opiaterein ] — 18:25, 30 September 2007 (UTC)

The reason is that we want to avoid multiple unnecessary duplication of content. Consider that the Latin adjective albus has 3 definitions and 35 inflected forms. If we add definitions to the non-lemma pages, that's 105 additional definition lines that have to be added and maintained. Then there are the comparative and superlative forms of all Latin adjectives, which aren't included in the count above. Just for the forms of albus, that's an additional 42 forms with 3 base definitions. Now multiply all that by two, beacuse the comparative of "white" in Latin can mean whiter, but can also mean "rather white", and the superlative can mean "whitest" but can also mean "very white". It makes more sense to explain this behavior of Latin adjectives once in a single location instead of needlessly repeating it on thousands of individual entries.
Likewise, consider that the Spanish verb tener has 7 definitions and 61 additional inflected forms. That makes 427 additional definition lines. And what happens when someone adds a new definition sense to the lemma? That sense has to be added to all 71 non-lemma pages. The same happens if an edit is made. It makes much more sense to give the definitions in a single central location, and provide a grammatical appendix for interpretation. --EncycloPetey 18:51, 30 September 2007 (UTC)
In such extreme cases as tener, where a word has that many definitions to begin with, with even one example of how the word is translated and possibly a "see tener for further definitions" that wouldn't be a huge problem at all. That, or maybe only give those definitions which are actually used frequently. Appendices are useful, if you know they're there (which even I didn't for a long time). But still, I think that words in a dictionary should be defined.
Still yet, not all languages have to be done the same way. I work mostly on Romanian entries, and being that I'm one of the few people who does so, I'm really not bothered much to go back and change things in multiple articles at once if I change something that affects multiple entries. — [ ric | opiaterein ] — 19:16, 30 September 2007 (UTC)
tener is hardly an extreme case. The example you gave—citi—has eight definitions in the Romanian entry and presumably about a hundred forms. To give non-lemma words full treatment in this dictionary, then, we would eventually need to maintain eight hundred definitions. As this dictionary becomes more complete, such maintenance concerns will become the norm, not the exception. Rod (A. Smith) 19:41, 1 October 2007 (UTC)
One option is to have the “form of” templates include a link to an appendix that explains what the grammar terms mean. Readers could then get a usable sense of the word by reading the appendix and clicking through to the lemma. We would do readers a significant disservice if we pretend that citi has eight distinct senses but citisem can only be used in the sense “I had read”. Rod (A. Smith) 19:52, 1 October 2007 (UTC)
I'm not sure where you go the hundred forms thing from. I'm not talking about a single entry for every form with auxiliary words (am citit, am să citesc). The definitions of citi on the Romanian wiktionary mostly mean "to read," whether it be music, a book or something else. Another says "to learn" or "to study" for which Romanian has other specific words, which are named directly in the definition. "A citi" means "to read" and has numerous definitions with almost the same meaning just as the entry read for English. Looking at the translation table of citi in the Romanian wiktionary entry, you'll see one word per language, with the exception of German. — [ ric | opiaterein ] — 16:39, 2 October 2007 (UTC)
OK, so it seems Romanian has not a hundred verb forms, but merely thirty-five or so. For the eight meanings, though, it doesn't make sense to discount them just because there are other words for those concepts. If citi can mean, "to learn" or "to study", those senses belong in our entry. So that means we would merely need to maintain 280 definitions for the various forms of that word if we give non-lemma entries full treatment. Regardless the exact number, though, it seems wrong to use the number of forms and senses to decide whether to give full treatment to non-lemma forms. Doing so would seem to imply a rule along the lines of, “if a language has fewer than forty(?) inflected forms for a given word class, all the forms should be given semantic definitions, but if there are forty(?) or more forms, only the lemma should be given semantic definitions while other forms should show only the gramatical relationship with the lemma.” Rod (A. Smith) 16:30, 3 October 2007 (UTC)
Even 35 is a bit high... verbs generally have about 26 forms (with one word, that is without auxiliaries). If I remember correctly, definitions for languages other than English aren't supposed to be the full definitions as they would be given in that own languages wiktionary. We wouldn't have "to interpret typographic indications of a map or plan and to reconstruct after them the conforms of the terrain" under citi. Words (at least in Romanian) don't generally have meanings that are completely different. Using citi as an example again, the definitions as they would be in English would all be to read, whether it be a map, a book, music or whatever. (Side note: In an entry instead of having separate definitions for each of these senses, one could just point them out in example sentences, which are much more useful anyway.) The problem with the definition of "to study/learn" is that they don't appear to have a word that means to study in the sense of re-reading something you've read before to remember it: they just have "to read". Their word for study is cognate to our word for study, but it means about the same thing as "to learn". If you asked someone "what are you studying" they would answer with what they're learning about in school. If you saw someone studying for a test, or something, and asked what they were doing they would say they were reading. Anyway I've gotten off the main point.
Back on the topic: All that said, as I said before, in non-lemmas, you could simply give the most basic or most widely used definitions and adding "see [whatever the main article is] for more possible definitions". This would make "maintenance", etc, a lot easier. I never meant to give non-lemmas FULL treatment, just enough so that one wouldn't have to run between 3 or 4 articles to know how to translate a certain word. Right now, with non-lemmas, I'm including pronunciation, POS, a definition (and sometimes a see also/synonyms, if there's an alternate form or something), whereas normally I'd include etymology, synonyms, antonyms, related words, etc. I think maybe people have been thinking I meant to go all out with everything included in these entries. — [ ric | opiaterein ] — 18:17, 3 October 2007 (UTC)
Well, regardless of the specific format for non-lemma entries, it seems that everyone agrees we should distinguish between lemma entries and non-lemma entries. The next logical step is to decide what constitutes a lemma entry and which details WT:ELE should explicitely exclude from non-lemma entries. Rod (A. Smith) 19:42, 7 October 2007 (UTC)

Please see Wiktionary:Votes/2007-10/Lemma entries and help me refine it to something we can all support. Rod (A. Smith) 23:32, 17 October 2007 (UTC)

Wiktionary:Votes/2007-10/Lemma entries is now open. Rod (A. Smith) 19:14, 30 October 2007 (UTC)

The vote is clearly headed toward rejection of a distinction between lemma entries and non-lemma entries. Consider the degree of completion we might expect to achieve if contributors work primarily on lemma entries for the next five years. If we instead dilute those contributions across the various inflected forms of words, we can expect to achieve that same degree of completion in fifty years instead. I hope this community can understand how counterproductive it is to our project for us to reject a normative focus on lemma entries. Rod (A. Smith) 22:09, 1 November 2007 (UTC)

I don't think anybody is saying that there should be no distinction. What kind of completion are you talking about? I don't think anybody focuses primarily on non-lemma entries. And supposing that people work primarily on lemma entries for 5 years, you still have to go back and add non-lemmas (even if you're only adding "form of" information). So what it comes down to really is the total amount of effort you're willing to put into the dictionary. Focusing on basic forms of words and for the most part ignoring their forms isn't going to help readers that aren't familiar with grammatical terms. Even then the clarification is nice.
Note: In the above I'm referring mostly to translations under the inflection line for languages other than English. I think providing translations, etymology, synonyms, antonyms, etc to all English entries is basically a waste of time and energy. However, having the absolute minimum amount of information in non-lemma entries, while easier to manage, is just lazy. — [ ric | opiaterein ] — 00:42, 2 November 2007 (UTC)
That's pretty sensible, so long as it's clear to readers that more information can be found in the lemma entry. As this project matures, we will serve readers much better by expanding our foreign entry lemma definitions from crude one-word translations to fuller explainations of the terms' subtle nuances than we will by expanding non-lemma entries and trying to keep them synchronized. Unfortunately, it doesn't seem that a change to the proposal to more strongly encourage brief gloss translations of non-lemma entries would sway anyone who currently has an opposing vote to one of support. Rod (A. Smith) 02:46, 2 November 2007 (UTC)
I've been doing it like I'm suggesting now. It never occurred to me that there would or should be another way of doing it. :-) See prietenului for one example of how I do non-lemmas. For those articles in which it's necessary, I definitely think a note should be made that there is more than one definition. The problem is how to work that in without having another subsection or something... — [ ric | opiaterein ] — 03:39, 2 November 2007 (UTC)
OK. It seems important to minimize confusion now caused by phrases like formal third person trial animate pluperfect negative perfective subjunctive. Since the format requested by you, Robert, et al. places full grammatical description in the headword line, we should have some {{xx-verb}} templates display the headword line with a full technical jargon phrase linked to a language appendix specific to the corresponding inflection in that the language. Since grammatical properties cannot be only in definition lines under the requested system, we will also need to repeat a given POS header, e.g.:


{{es-verb|third person present singular indicative of|ganar}}

# [[win]]s. 
#: ''Él '''gana'''.'' — “He wins.”

{{es-verb|second person singular imperative of|ganar}}

# [[win]].
#: ''¡'''Gana'''!'' — “Win!”



  1. wins.
    Él gana. — “He wins.”



  1. win.
    ¡Gana! — “Win!”

A concern I have about this is that some editors have tried to merge sequential "===Verb===" sections, so we'd need agreement not to do so. Does that format seem ideal? Rod (A. Smith) 07:28, 2 November 2007 (UTC)

That's how I've always done it. :-) {{ro-verbform}} is what I use for the headword/inflection line. I suck at template design, so it might not be the best template evar, but it seems to get the job done. mâncaţi (needs example sentences, I know)
I think a {{see lemma}} template should have parentheses and <small></small> tags or something so it doesn't distract too much from the definition itself? And to repeat it every time, I don't know. That's the main thing I was worried about with such a notification. We could always just put it under ====See also==== or ====Usage notes==== with "See the lemma article, x for more detailed information" or "This word (or these words) can be translated in more diverse ways. For other possible usages"... etc.— [ ric | opiaterein ] — 15:09, 2 November 2007 (UTC)
I dislike the above approach intensely. It would be counterproductive for Latin. See the page/section alba#Latin. Using the above format would require six separate Adjective sections under Latin for that word. That would be silly. Providing a translation for each separate "sense" would also be silly. I've been doing Latin inflected adjectives the way alba is done. --EncycloPetey 22:07, 3 November 2007 (UTC)
Hmm. Striking a balance here is challenging, but is it reasonable to show important grammatical properties in each headword/inflection line, gloss-type definitions in each definition line, and detailed translations within just “main” (lemma) entries? For example, in alba, we could just do this:


alba (nominative feminine singular, nominative neuter plural, accusative neuter plural, vocative feminine singular, vocative neuter plural of albus)

  1. white.

albā (ablative feminine singular of albus)

  1. white.

Does that seem reaonable? I don't think any fidelity is lost or any confusion introduced. Right? Rod (A. Smith) 23:04, 3 November 2007 (UTC)

More than reasonable, actually :-p I still really think the {{see main}} formatting should be <small> or something, though :-D Maybe it should even be included in the form-of templates. But right now, I'm too tired to have ideas. — [ ric | opiaterein ] — 03:32, 4 November 2007 (UTC)
We can create some customization options for {{see main}}, but first I want to know that the overall approach has some chance of success. There is resistance from both the "list each grammatical role as a definition" camp and the "allow translations in all entries" camp, so I'm not yet sure this approach can succeed. Rod (A. Smith) 17:46, 4 November 2007 (UTC)

With 3-5-1 opposing, the current vote is suspended. Wiktionary:Votes/2007-11/Lemma entries 2 is where I will work on the new version. Following are the change so far:

  • Instead of "lemma" and "non-lemma", it uses "main" and "secondary" to describe the entries. Hopefully that makes it more clear that if any special senses exist for an inflected word, that inflection counts as a "main entry", regardless of whether it's considered "lemma" or "citation" form.
  • The "form of" information is now in the headword line, where it belongs.
  • {{form of}} is dropped. Instead, it recommends a brief semantic definition followed by "see full entry at... for details".
  • An example sentence and quotation are given in the example of the secondary entry.
  • Fewer sections are explicitly prohibited from secondary entries.

It's still not clear to me whether the opposition really believes we should leave translations in secondary entries like talks. I will try to describe clearly why such translations are harmful to this project. Rod (A. Smith) 02:01, 3 November 2007 (UTC)

I think translation tables in secondary entries are a bad idea in most cases. Plurals and stuff yeah, they're pretty harmless and easy to keep under control, but the information is just a repetition of what's in the primary entry which can be easily accessed to begin with. If you want to see the plural that bad, you can go through the primary article to the word you want. As we've seen, such tables for verbs would be a fucking nightmare.
Rod, do you think we should try to keep all our talk about this stuff in one place? So far we have it here at the beer parlor, on the old vote page, its talk page, and the new vote's talk page. Seems as messy as the translation tables lol — [ ric | opiaterein ] — 20:33, 3 November 2007 (UTC)
Agreed. Everything is now accessible through Wiktionary talk:Translations. Better? Rod (A. Smith) 17:46, 4 November 2007 (UTC)