Wiktionary:Criteria for inclusion: difference between revisions

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
(Moved and expanded discussion of idiomaticity (I don't expect this to be the last word), re-titled "non-word entries" section)
(fmt, re-arranged text on idiomaticity, more on independence (with an eye toward uniform treatment of proper names), a bit more on proper names.)
Line 1: Line 1:
 
As an international dictionary, Wiktionary is intended to include "all words in all languages".
 
As an international dictionary, Wiktionary is intended to include "all words in all languages".
   
As a general guideline, a term should be included if any of the following apply:
+
==General rules==
  +
  +
As a general guideline, a term should be included if it is ''attested'' and ''idiomatic''.
  +
  +
===Attestation===
  +
  +
"Attested" means any of these applies:
   
 
* It is clearly in widespread use.
 
* It is clearly in widespread use.
Line 8: Line 8:
 
* It has been used in running text in at least three independently recorded instances, whether in print, audio, video or on the internet.
 
* It has been used in running text in at least three independently recorded instances, whether in print, audio, video or on the internet.
   
== Issues to consider ==
+
====Running text====
 
 
In the above, ''in running text'' is meant to exclude references to words such as:
 
In the above, ''in running text'' is meant to exclude references to words such as:
 
:''The word ''baeiouc'' has no known meaning, but does contain all five vowels in order.''
 
:''The word ''baeiouc'' has no known meaning, but does contain all five vowels in order.''
The criterion of independence is meant to exclude made-up or exteremely specialized words that appear only in works by a given author or otherwise within a closed context.
+
The term should be used in ordinary sentences, for its meaning.
  +
  +
====Independence====
  +
  +
The criterion of independence is meant to exclude made-up or exteremely specialized words that appear only in works by a given author or otherwise within a closed context. A use which defines the term is not independent. This applies particularly to proper names. For example, in an article mentioning "Lawrence city commissioner Boog Highberger", it's clear that "Boog Highberger" is a proper name denoting a city commissioner. On the other hand, if a New York Times article were to mention Boog without glossing who he is, that would definitely count.
  +
  +
Similarly, the Harry Potter books contain detailed accounts of [[Quidditch]], but these are not independent, since Rowling does not expect the reader to be familiar with Quidditch without having first read her explanation of the game. A usage such as "Quidditch (a fictional game featured in the Harry Potter books)," would not be independent. A usage such as "the shipping department had all the order and decorum of a Quidditch game played by mutant wombats." would count (assuming it wasn't written by Rowling).
  +
  +
====Protologisms====
   
 
There is a separate designation, [[protologism]], for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on [[Wiktionary:List of protologisms]], but should not be given their own entries.
 
There is a separate designation, [[protologism]], for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on [[Wiktionary:List of protologisms]], but should not be given their own entries.
   
In general, entries should be [[idiomatic]] in the sense that they cannot be easily understood from their component parts. For example, ''this is a door'' should not get an entry, but [[shut up]] and [[red herring]] should. Many idiomatic phrases take several forms. If the forms vary only in the pronoun in use, use ''one'' or ''one's'', as in [[feel one's oats]]. Use the least inflected form that is actually used. In the worst case, there may need to be separate entries for variants, with links between them.
+
===Idiomaticity===
   
The saying ''It's raining cats and dogs'' is an interesting example. One can also say ''It was raining cats and dogs'', or ''I think it's going to rain cats and dogs any minute now'', or ''It's rained cats and dogs for the last week solid.'' The entry should be (and is) under [[rain cats and dogs]], with the other variants derived by the usual rules of grammar (including the use of ''it'' with weather terms and other [[impersonal verb]]s).
+
"Idiomatic" means that a term is used with the expectation of being understood without further explanation, but its full meaning cannot be easily derived from its parts.
   
  +
For example, ''this is a door'' should not get an entry, but [[shut up]] and [[red herring]] should. If a term is only used with an explanation of its meaning, there is no need to include it. This applies particularly to proper names.
 
There is no particular need to include completely regular inflections such as ''[[cameras]]'' or ''[[singing]]''. If they are present, they should redirect to the stem form. On the other hand, ''irregular'' forms such as ''[[geese]]'' and ''[[were]]'' should have their own entries. Inflected forms — whether regular or irregular — with idiomatic meanings, such as ''[[blues]]'' or ''[[smitten]]'', should have their own entries, with the predictable meanings briefly noted.
 
There is no particular need to include completely regular inflections such as ''[[cameras]]'' or ''[[singing]]''. If they are present, they should redirect to the stem form. On the other hand, ''irregular'' forms such as ''[[geese]]'' and ''[[were]]'' should have their own entries. Inflected forms — whether regular or irregular — with idiomatic meanings, such as ''[[blues]]'' or ''[[smitten]]'', should have their own entries, with the predictable meanings briefly noted.
  +
  +
  +
== Issues to consider ==
  +
  +
===Phrases with multiple forms===
  +
Many phrases take several forms. If the forms vary only in the pronoun in use, use ''one'' or ''one's'', as in [[feel one's oats]]. Use the least inflected form that is actually used. In the worst case, there may need to be separate entries for variants, with links between them.
  +
  +
The saying ''It's raining cats and dogs'' is an interesting example. One can also say ''It was raining cats and dogs'', or ''I think it's going to rain cats and dogs any minute now'', or ''It's rained cats and dogs for the last week solid.'' The entry should be (and is) under [[rain cats and dogs]], with the other variants derived by the usual rules of grammar (including the use of ''it'' with weather terms and other [[impersonal verb]]s).
  +
  +
===Attesation vs. the slippery slope===
   
 
There is occasionally concern that adding an entry for a particular term will lead to entries for a large number of similar terms. This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form. Some examples:
 
There is occasionally concern that adding an entry for a particular term will lead to entries for a large number of similar terms. This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form. Some examples:
Line 29: Line 40:
 
* It may seem that trendy internet prefixes like ''e-'' and ''i'' are used everywhere, but they aren't. If I decide to talk about ''e-thumb-twiddling'' but no one else does, then there's no need for an entry.
 
* It may seem that trendy internet prefixes like ''e-'' and ''i'' are used everywhere, but they aren't. If I decide to talk about ''e-thumb-twiddling'' but no one else does, then there's no need for an entry.
   
==Language considerations==
+
===Language considerations===
   
 
Uncommon languages are acceptable as long as they are (or were) used for everyday communication by some identifiable, natural population of humans. If the language lacks an [[w:ISO 639|ISO 639 language code]], it's almost surely not acceptable.
 
Uncommon languages are acceptable as long as they are (or were) used for everyday communication by some identifiable, natural population of humans. If the language lacks an [[w:ISO 639|ISO 639 language code]], it's almost surely not acceptable.
Line 50: Line 61:
 
For more information about formatting entries, see [[Wiktionary:Entry layout explained]].
 
For more information about formatting entries, see [[Wiktionary:Entry layout explained]].
   
==Terms included need not be "words" in any narrow sense==
+
===Terms included need not be "words" in any narrow sense===
   
 
So long as it meets the criteria above, a term need not be a single word in the usual sense. Any of these is also acceptable:
 
So long as it meets the criteria above, a term need not be a single word in the usual sense. Any of these is also acceptable:
Line 57: Line 68:
 
* [[Idiom]]s such as ''[[go on]]'' and ''[[give up the ghost]]''.
 
* [[Idiom]]s such as ''[[go on]]'' and ''[[give up the ghost]]''.
 
* [[Abbreviation]]s, [[acronym]]s, and [[initialism]]s such as ''[[NBA]]''.
 
* [[Abbreviation]]s, [[acronym]]s, and [[initialism]]s such as ''[[NBA]]''.
* [[Prefix]]es and [[suffix]]es such as ''[[-ist]]''.
+
* [[Prefix]]es and [[suffix]]es such as ''[[re-]]'' and ''[[-ist]]''.
 
* Characters used in [[ideograph]]ic or [[phonetic]] writing such as [[字]] or [[æ]].
 
* Characters used in [[ideograph]]ic or [[phonetic]] writing such as [[字]] or [[æ]].
 
* Terms which contain unusual characters or are otherwise unusual in form, such as [[G-d]], [[pH]], [[pr0n]], [[i18n]] or [[veg*n]]
 
* Terms which contain unusual characters or are otherwise unusual in form, such as [[G-d]], [[pH]], [[pr0n]], [[i18n]] or [[veg*n]]
   
   
==Proper nouns (names)==
+
===Proper nouns (names)===
   
A [[proper noun]] may be included if any of the following apply:
+
While [[proper noun|proper nouns]] are basically subject to the same guidelines as other terms, some special considerations apply. As a rule of thumb, a proper noun should be included if:
   
 
# It is used as a common noun (especially if it is commonly written without capitalization).
 
# It is used as a common noun (especially if it is commonly written without capitalization).
Line 71: Line 82:
 
# The name appears in different forms in different languages (e.g. John/Johann/Jan/Juan/Jean/Giovanni ...)
 
# The name appears in different forms in different languages (e.g. John/Johann/Jan/Juan/Jean/Giovanni ...)
   
==Wiktionary is not an encyclopedia==
+
===Wiktionary is not an encyclopedia===
   
 
Care should be taken so that entries do not become [[encyclopedic]] in nature; if this happens, such content should be moved to [[Wikipedia]], but the entry itself should be kept.
 
Care should be taken so that entries do not become [[encyclopedic]] in nature; if this happens, such content should be moved to [[Wikipedia]], but the entry itself should be kept.

Revision as of 04:56, 2 May 2005

As an international dictionary, Wiktionary is intended to include "all words in all languages".

General rules

As a general guideline, a term should be included if it is attested and idiomatic.

Attestation

"Attested" means any of these applies:

  • It is clearly in widespread use.
  • It is used in a well known work.
  • It appears in a refereed academic journal.
  • It has been used in running text in at least three independently recorded instances, whether in print, audio, video or on the internet.

Running text

In the above, in running text is meant to exclude references to words such as:

The word baeiouc has no known meaning, but does contain all five vowels in order.

The term should be used in ordinary sentences, for its meaning.

Independence

The criterion of independence is meant to exclude made-up or exteremely specialized words that appear only in works by a given author or otherwise within a closed context. A use which defines the term is not independent. This applies particularly to proper names. For example, in an article mentioning "Lawrence city commissioner Boog Highberger", it's clear that "Boog Highberger" is a proper name denoting a city commissioner. On the other hand, if a New York Times article were to mention Boog without glossing who he is, that would definitely count.

Similarly, the Harry Potter books contain detailed accounts of Quidditch, but these are not independent, since Rowling does not expect the reader to be familiar with Quidditch without having first read her explanation of the game. A usage such as "Quidditch (a fictional game featured in the Harry Potter books)," would not be independent. A usage such as "the shipping department had all the order and decorum of a Quidditch game played by mutant wombats." would count (assuming it wasn't written by Rowling).

Protologisms

There is a separate designation, protologism, for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on Wiktionary:List of protologisms, but should not be given their own entries.

Idiomaticity

"Idiomatic" means that a term is used with the expectation of being understood without further explanation, but its full meaning cannot be easily derived from its parts.

For example, this is a door should not get an entry, but shut up and red herring should. If a term is only used with an explanation of its meaning, there is no need to include it. This applies particularly to proper names. There is no particular need to include completely regular inflections such as cameras or singing. If they are present, they should redirect to the stem form. On the other hand, irregular forms such as geese and were should have their own entries. Inflected forms — whether regular or irregular — with idiomatic meanings, such as blues or smitten, should have their own entries, with the predictable meanings briefly noted.


Issues to consider

Phrases with multiple forms

Many phrases take several forms. If the forms vary only in the pronoun in use, use one or one's, as in feel one's oats. Use the least inflected form that is actually used. In the worst case, there may need to be separate entries for variants, with links between them.

The saying It's raining cats and dogs is an interesting example. One can also say It was raining cats and dogs, or I think it's going to rain cats and dogs any minute now, or It's rained cats and dogs for the last week solid. The entry should be (and is) under rain cats and dogs, with the other variants derived by the usual rules of grammar (including the use of it with weather terms and other impersonal verbs).

Attesation vs. the slippery slope

There is occasionally concern that adding an entry for a particular term will lead to entries for a large number of similar terms. This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form. Some examples:

  • Any word in any language might be borrowed into English, but only a few actually are. Including spaghetti does not imply that ricordati is next.
  • Any word may be rendered in Pig-Latin, but only a few (e.g., amscray) have found their way into common use.
  • Any word may be rendered in leet style, but only a few (e.g., pr0n) see general use.
  • Grammatical affixes like meta- and -ance can be added in a great many more cases than they actually are. (Some basic suffixes like plural -s and past tense -ed really can be used almost anywhere.)
  • It may seem that trendy internet prefixes like e- and i are used everywhere, but they aren't. If I decide to talk about e-thumb-twiddling but no one else does, then there's no need for an entry.

Language considerations

Uncommon languages are acceptable as long as they are (or were) used for everyday communication by some identifiable, natural population of humans. If the language lacks an ISO 639 language code, it's almost surely not acceptable.

Since this is the English Wiktionary, all definitions should be given in English. If a non-English word has the same spelling as an English one, place all of the definitions on the same page but arrange them under their respective language headings with the English entries first. For example:

==English==
===Noun===
'''boot'''
# A shoe that covers part of the leg.
===Verb===
'''to boot'''
# To kick.

==German==
===Noun===
'''Boot'''
# Boat.

For more information about formatting entries, see Wiktionary:Entry layout explained.

Terms included need not be "words" in any narrow sense

So long as it meets the criteria above, a term need not be a single word in the usual sense. Any of these is also acceptable:


Proper nouns (names)

While proper nouns are basically subject to the same guidelines as other terms, some special considerations apply. As a rule of thumb, a proper noun should be included if:

  1. It is used as a common noun (especially if it is commonly written without capitalization).
  2. It is used in an attributive sense with the expectation that the meaning will be widely understood (a David Beckham hairstyle).
  3. Words or terms derived from the name are already in Wiktionary.
  4. The name appears in different forms in different languages (e.g. John/Johann/Jan/Juan/Jean/Giovanni ...)

Wiktionary is not an encyclopedia

Care should be taken so that entries do not become encyclopedic in nature; if this happens, such content should be moved to Wikipedia, but the entry itself should be kept.