Wiktionary:Criteria for inclusion: difference between revisions

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
(Names: Rewritten and expanded to attempt to clarify what we do)
(General rules: general revision)
Line 1: Line 1:
 
As an international dictionary, Wiktionary is intended to include "all words in all languages".
 
As an international dictionary, Wiktionary is intended to include "all words in all languages".
   
==General rules==
+
===General rule===
   
As a general guideline, a term should be included if it is ''attested'' and ''idiomatic''.
+
As a general guideline, a term should be included if it is ''[[attested]]'' and ''[[idiomatic]]''.
   
 
===Attestation===
 
===Attestation===
   
"Attested" means any of these applies:
+
"Attested" means verified through
+
# Clearly in widespread use,
* It is clearly in widespread use.
+
# Usage in a well known work,
* It is used in a well known work.
+
# Appearance in a refereed academic journal, or
* It appears in a refereed academic journal.
+
# Usage in [[running text]] in at least three independently recorded instances, whether in print, audio, video or on the internet, but preferably not from blogs, e-mails or chat rooms.
* <s>It has been used in running text in at least three independently recorded instances, whether in print, audio, video or on the internet.</s>
 
 
:*'''Note:''' the fourth entry above (that attempts to validate unverified internet sources) is disputed as being in direct conflict with the <!--[http://en.wiktionary.org/w/index.php?title=Wiktionary:Criteria_for_inclusion&oldid=513 original version of this page]-->[{{lurl}}Wiktionary:Criteria_for_inclusion&oldid=125137 preceding version] of this page.
 
:*'''Note:''' the fourth entry above (that attempts to validate unverified internet sources) is disputed as being in direct conflict with the <!--[http://en.wiktionary.org/w/index.php?title=Wiktionary:Criteria_for_inclusion&oldid=513 original version of this page]-->[{{lurl}}Wiktionary:Criteria_for_inclusion&oldid=125137 preceding version] of this page.
   
 
====Running text====
 
====Running text====
In the above, ''in running text'' is meant to exclude references to words such as:
+
In the above, ''in running text'' means in properly formed and grammatical ordinary sentences in a context that exemplifies its meaning.
:''The word ''baeiouc'' has no known meaning, but does contain all five vowels in order.''
 
The term should be used in ordinary sentences, for its meaning.
 
   
 
====Independence====
 
====Independence====
   
The criterion of independence is meant to exclude made-up or exteremely specialized words that appear only in works by a given author or otherwise within a closed context. A use which defines the term is not independent. This applies particularly to proper names. For example, in an article mentioning "Lawrence city commissioner Boog Highberger", it's clear that "Boog Highberger" is a proper name denoting a city commissioner. On the other hand, if a New York Times article were to mention Boog without glossing who he is, that would definitely count.
+
Independence is intended in a relative rather than an absolute sense. The criterion of independence is meant to exclude multiple references which draw on each other. Where Wikipedia has an article on a given subject, and that article is mirrored by an external site the use of certain words on the mirror site would not be independent. It is quite common to find that material on one site is readily traced to another. A use which defines the term is not absolutely independent, but may be relatively independent.
 
Similarly, the Harry Potter books contain detailed accounts of [[Quidditch]], but these are not independent, since Rowling does not expect the reader to be familiar with Quidditch without having first read her explanation of the game. A usage such as "Quidditch (a fictional game featured in the Harry Potter books)," would not be independent. A usage such as "the shipping department had all the order and decorum of a Quidditch game played by mutant wombats." would count (assuming it wasn't written by Rowling).
 
   
 
====Protologisms====
 
====Protologisms====
  +
The designation, [[protologism]], is for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on [[Wiktionary:List of protologisms]], and should not be given their own separate entries.
   
There is a separate designation, [[protologism]], for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on [[Wiktionary:List of protologisms]], but should not be given their own entries.
+
===Idiomaticity===
  +
An expression is "idiomatic" if its full meaning cannot be easily derived from the meaning of its separate components.
   
===Idiomaticity===
+
For example, ''this is a door'' is not idiomatic, but [[shut up]] and [[red herring]] are.
   
"Idiomatic" means that a term is used with the expectation of being understood without further explanation, but its full meaning cannot be easily derived from its parts.
+
===Proper names===
  +
Proper names alone are generally discouraged as entries. Nevertheless, articles explining the source of a name can be quite useful. The personal names of identifiable individuals more properly belong in Wikipedia.
   
For example, ''this is a door'' should not get an entry, but [[shut up]] and [[red herring]] should. If a term is only used with an explanation of its meaning, there is no need to include it. This applies particularly to proper names.
+
===Inflections===
There is no particular need to include completely regular inflections such as ''[[cameras]]'' or ''[[singing]]''. If they are present, they should redirect to the stem form. On the other hand, ''irregular'' forms such as ''[[geese]]'' and ''[[were]]'' should have their own entries. Inflected forms &mdash; whether regular or irregular &mdash; with idiomatic meanings, such as ''[[blues]]'' or ''[[smitten]]'', should have their own entries, with the predictable meanings briefly noted.
+
Although it is not forbidden, there is no particular need to include completely regular inflections such as ''[[cameras]]'' or ''[[singing]]''. To the extent that they are present, they should indicate what inflection is intended and link to the stem form, and should not merely redirect.
   
  +
Irregular forms such as ''[[geese]]'' and ''[[were]]'' should have their own entries, because people unfamiliar with the irregularity will look for them under the inflected form. Inflected forms &mdash; whether regular or irregular &mdash; with idiomatic meanings, such as ''[[blues]]'' or ''[[smitten]]'', should have their own entries, with the predictable meanings distinguished from the idiomatic.
   
 
== Issues to consider ==
 
== Issues to consider ==

Revision as of 09:51, 23 May 2005

As an international dictionary, Wiktionary is intended to include "all words in all languages".

General rule

As a general guideline, a term should be included if it is attested and idiomatic.

Attestation

"Attested" means verified through

  1. Clearly in widespread use,
  2. Usage in a well known work,
  3. Appearance in a refereed academic journal, or
  4. Usage in running text in at least three independently recorded instances, whether in print, audio, video or on the internet, but preferably not from blogs, e-mails or chat rooms.
  • Note: the fourth entry above (that attempts to validate unverified internet sources) is disputed as being in direct conflict with the preceding version of this page.

Running text

In the above, in running text means in properly formed and grammatical ordinary sentences in a context that exemplifies its meaning.

Independence

Independence is intended in a relative rather than an absolute sense. The criterion of independence is meant to exclude multiple references which draw on each other. Where Wikipedia has an article on a given subject, and that article is mirrored by an external site the use of certain words on the mirror site would not be independent. It is quite common to find that material on one site is readily traced to another. A use which defines the term is not absolutely independent, but may be relatively independent.

Protologisms

The designation, protologism, is for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on Wiktionary:List of protologisms, and should not be given their own separate entries.

Idiomaticity

An expression is "idiomatic" if its full meaning cannot be easily derived from the meaning of its separate components.

For example, this is a door is not idiomatic, but shut up and red herring are.

Proper names

Proper names alone are generally discouraged as entries. Nevertheless, articles explining the source of a name can be quite useful. The personal names of identifiable individuals more properly belong in Wikipedia.

Inflections

Although it is not forbidden, there is no particular need to include completely regular inflections such as cameras or singing. To the extent that they are present, they should indicate what inflection is intended and link to the stem form, and should not merely redirect.

Irregular forms such as geese and were should have their own entries, because people unfamiliar with the irregularity will look for them under the inflected form. Inflected forms — whether regular or irregular — with idiomatic meanings, such as blues or smitten, should have their own entries, with the predictable meanings distinguished from the idiomatic.

Issues to consider

Phrases with multiple forms

Many phrases take several forms. If the forms vary only in the pronoun in use, use one or one's, as in feel one's oats. Use the least inflected form that is actually used. In the worst case, there may need to be separate entries for variants, with links between them.

The saying It's raining cats and dogs is an interesting example. One can also say It was raining cats and dogs, or I think it's going to rain cats and dogs any minute now, or It's rained cats and dogs for the last week solid. The entry should be (and is) under rain cats and dogs, with the other variants derived by the usual rules of grammar (including the use of it with weather terms and other impersonal verbs).

Attestation vs. the slippery slope

There is occasionally concern that adding an entry for a particular term will lead to entries for a large number of similar terms. This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form. Some examples:

  • Any word in any language might be borrowed into English, but only a few actually are. Including spaghetti does not imply that ricordati is next.
  • Any word may be rendered in Pig-Latin, but only a few (e.g., amscray) have found their way into common use.
  • Any word may be rendered in leet style, but only a few (e.g., pr0n) see general use.
  • Grammatical affixes like meta- and -ance can be added in a great many more cases than they actually are. (Some basic suffixes like plural -s and past tense -ed really can be used almost anywhere.)
  • It may seem that trendy internet prefixes like e- and i are used everywhere, but they aren't. If I decide to talk about e-thumb-twiddling but no one else does, then there's no need for an entry.

Language considerations

Uncommon languages are acceptable as long as they are (or were) used for everyday communication by some identifiable, natural population of humans. If the language lacks an ISO 639 language code, it's almost surely not acceptable.

Since this is the English Wiktionary, all definitions should be given in English. If a non-English word has the same spelling as an English one, place all of the definitions on the same page but arrange them under their respective language headings with the English entries first. For example:

==English==
===Noun===
'''boot'''
# A shoe that covers part of the leg.
===Verb===
'''to boot'''
# To kick.

==German==
===Noun===
'''Boot'''
# Boat.

For more information about formatting entries, see Wiktionary:Entry layout explained.

Terms included need not be "words" in any narrow sense

So long as it meets the criteria above, a term need not be a single word in the usual sense. Any of these is also acceptable:


Names

Names fall into two categories: individual given names and family names, which are single words, and the names of actual people, places, and things. Wiktionary classifies both as proper nouns, but applies caveats to each.

Given names and family names

Given names (such as David, Roger, and Peter) and family names (such as Baker, Bush, Rice, Smith, and Jones) are words, and subject to the same criteria for inclusion as any other words. Wiktionary has main articles giving etymologies, alternative spellings, meanings, and translations for given names and family names, and has two appendices for indexing those articles: Wiktionary Appendix:First names, Wiktionary Appendix:Surnames.

For most given names and family names, it is relatively easy to demonstrate that the word fulfils the criteria, as for most given names and family names the name words are in widespread use in both spoken communication and literature. However, being a name per se does not automatically qualify a word for inclusion. A new name, that has not been attested, is still a protologism. A name that occurs only in the works of fiction of a single author, or within a closed context such as the works of several authors writing about a single fictional universe, does not meed the criterion for independence.

hypocoristics, diminutives, and abbreviations of names (such as Jock, Misha, Kenny, Ken, and Rog) are held to the same standards as names.

The status of patronymics has not been settled.

Names of actual people, places, and things

A name should be included if it it is used attributively, with a widely-understood meaning. For example: New York is included because "New York" is used attributively in phrases like "New York delicatessen", to describe a particular sort of delicatessen. A person or place name that is not used attributively (and that is not a word that otherwise should be included) should not be included. Lower Hampton, Empire State Building, and George Walker Bush thus should not be included. Similarly, whilst Jefferson (an attested family name word with an etymology that Wiktionary can discuss) and Jeffersonian (an adjective) should be included, Thomas Jefferson (which isn't used attributively) should not.

A name should be included if it has become a generic term. For example: Remington is used as a synonym for any sort of rifle, and Hoover as a synonym for any sort of vacuum cleaner. (Both are also attested family name words, and are included on that basis as well, of course.) Hamburger is used as generic term for a type of sandwich. One good rule of thumb as to whether a name has become a generic word is whether the word can be used without capitalization (as indeed "sandwich" was in the previous sentence).

Being a trademark or a company name does not guarantee inclusion. (Of course, some company names are derived from family names, and are included on that basis.) Although some words are trademarks and company names, not all trademarks and company names are words. (Indeed, trademark holders will vigourously defend their trademarks against becoming words. According to Adobe Systems, there is no such word as Photoshopped, since Photoshop® is a trademark and not a common verb that can have a past participle; according to Xerox there is no such word as xerox, since Xerox® is a trademark and not a common verb; according to Sony there is no such word as Playstationize since there's no word Playstation at all and PlayStation® is a trademark and not a common verb.) Many trademarks and company names are deliberately protologisms. To be included, the use of a trademark or company name other than its use as a trademark (i.e. a use as a common word) has to be attested.

What Wiktionary is not with respect to names

Wiktionary is not a genealogy database. Wiktionary articles on family names, for example, are not intended to be about the people who share the family name. They are about the name as a word. For example: Whilst Yoder will tell the reader that the word originated in Switzerland (as well as give its pronunciations and alternative spellings), it is not intended to include information about the ancestries of people who have the family name Yoder.

Wiktionary is not an encyclopaedia. That's Wikipedia's job. Wiktionary articles are about words, not about people or places. For example: Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the specific places and people belong in Wikipedia. For example: Wiktionary will give the etymologies, pronunciations, alternative spellings, and so forth, of the names Darlington, Hastings, David, Houdini, and Britney. But articles on the specific towns (Darlington, Hastings), statue (David), escapologist (Houdini), and pop singer (Britney) are Wikipedia's job.

Wiktionary is not an encyclopedia

Care should be taken so that entries do not become encyclopedic in nature; if this happens, such content should be moved to Wikipedia, but the dictionary entry itself should be kept.