User:Rodasmith/lexeme data

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Purpose and scope[edit]

This initiative attempts to maximize the effectiveness of contributions for lexicological details that apply to multiple entries. It does not seek to prohibit efforts to build synchronization systems across various inflections.

Lexicological details[edit]

Many of the lexicological details in any dictionary apply to a specific headword, i.e. in sequence of characters (e.g. words) in a specific language (e.g. English) with a specific part of speech (e.g. Noun) and a specific grammatical form (e.g. plural). For example, below are a pronunciation and citation, two lexicological details that apply to the specific headword words, which is a plural English noun:

Pronunciation
  • (file)
Noun
words (plural of word)
  1. [definition]
    • Common English translation of a quote from Pindar,
      Words have a longer life than deeds.

In our dictionay (and in every good dictionary that describes English or any other inflected language), some lexicological details (e.g. a definition for a verb) apply more broadly to an entire lexeme (i.e. to all of the inflections of a given headword). To illustrate, consider the English word talk, which functions as several specific grammatical forms of the lexeme to talk/talk/talks/talking/talked. Following are a few of its specific grammatical functions:

  • talk is the bare infinitive:
    1. (To) communicate to someone else by means of voice.
      I did not talk to him.
  • talk is the first person singular present tense indicative mood:
    1. (I) communicate to someone else by means of voice.
      I talk to him every day.
  • talk is the imperative:
    1. Communicate to someone else by means of voice!
      Talk!

Although the word talk functions as several grammatical forms, no dictionary gives separate definitions for each of those grammatical forms. Rather, each definition explains only the meaning of the lexeme to talk/talk/talks/talking/talked, using a grammar-neutral verb phrase:

  1. To communicate to someone else by means of voice.
    ... but not also...
  2. (I) communicate to someone else by means of voice.
  3. Communicate to someone else by means of voice!

The senses struck from above are unnecessary because each dictionary definition only describes the meaning of the lexeme and its part of speech without attempting describe the grammar details of any particular verb form. So, each dictionary definition applies to all forms of the lexeme (e.g. for to talk, talk, talks, talking, and talked) . Thus, readers who see “To communicate to someone else by means of voice” know that they can use the word in places where an infinitive is not grammatically correct. (I.e., it's OK to say I talk to him every day, even though it is grammatically incorrect to say, *I to communicate... to him every day.)

So, we can classify each lexicological detail according to the type of entity to which it applies:

  1. Details that apply to a headword (a specific grammatical form of a specific word), e.g.:
    • citations for the English word talks
    • grammar categories of the Spanish word habla
  2. Details that apply to an entire lexeme, e.g.:
  3. Details that apply to a large set of words, e.g.:

This initiative seeks to determine where best to put lexicological details that apply to an entire lexeme. Three alternatives follow:

  • Repeat lexeme details in every entry for its various forms.
  • Consolidate lexeme details in its “main” entry.
  • Consolidate lexeme details in templates.

Repeat lexeme details in every entry for its various forms[edit]

In this approach, there would be no single place for details that apply to the whole lexeme. Instead, an editor who wants to correct a translation of the lexeme hablar/hablo/hablas/habla/hablamos/... must edit the 60+ pages of various verb forms to ensure that subsequent readers find corrected the definition.

Consolidate lexeme details in its “main” entry[edit]

In this traditional dictionary approach, a “secondary” entry like hablasteis (you all talk) would show details for that specific word along with an abbreviated version of the details that apply to the lexeme hablar/hablo/hablas/habla/hablamos/... and a notice that more details for the lexeme may be found in the “main” (lemma/citation form) entry hablar. The disadvantage to this approach is that readers must click the “main entry” link for more details.

Consolidate lexeme details in templates[edit]

In this approach, we would transclude lexeme details into the entries for each of its forms. This gives readers “zero-click” access to the details that apply to all of its forms from entries for every form of a lexeme. For example, readers would find a complete, up-to-date set of definitions and translations that contributors have supplied for the lexeme to talk/talk/talks/talking/talked immediately whenever a reader looks up talked without forcing the reader to click through to talk. One disadvantage of this approach is the apparent mismatch of the grammatical form of talks with a definition like “to communicate by...”. Another disadvantage is that several applications today depend on definitions appearing in a numbered list in the core article, and not transcluded from a template. Also, given how MediaWiki handles section editing, this approach creates certain editorial hurdles.

Benefits of consolidation[edit]

The latter two approaches would create a single consistent place (1) for contributors to furnish such details, (2) for readers to read about such details, and (3) for bots and other applications (e.g. Ninjawords) to consume such details. Either alternative would provide the following benefits:

  • For definitions of English lexemes: A contributor would be able to add and edit definitions in our project for the English lexeme talk/talks/talking/talked by editing the numbered list at talk#Verb without also having to edit the entries for talks, talking, and talked. A reader who wants to know whether “they talk” can refer to non-verbal communication would be able to check talk#Verb without having also to check definitions in our entries for talks, talking, and talked. External applications would be able to consume synonyms and antonyms for each lexeme from a single source.
  • For translations from English lexemes: A contributor would be able to add a new translation in a single edit that is seen by all future readers who want to translate to talk/talk/talks/talking/talked. A reader who wants to translate "he will talk" into Spanish would find all such translations by reading the Spanish entries linked from the translations table in talk#Verb without also having to check talks, talking, or talked.
  • Inflections of a lexeme: A contributor who wants to supply the conjugation of a French verb would be able to do so in a single edit for all future readers who seek that conjugation. A reader who wants to know how to conjugate that French verb would be able to look in one consistent place. A bot writer who wants to add entries for inflections of that lexeme would be able to read just that one resource.

Some policy document should define where to house lexicological details that apply to entire lexemes.