Wiktionary:Criteria for inclusion: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
→‎Formatting: per Wiktionary talk:Criteria for inclusion#Formatting of misspellings.: that's a definition (or a format of one), not an entry; remove explicit wikilink; add include lang=en
formatting/typography, minor fixes
Line 4: Line 4:


==General rule==
==General rule==
A term should be included if it's likely that someone would [[run across]] it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is '''[[attested]]''' and '''[[idiomatic]]'''.
A term should be included if it’s likely that someone would [[run across]] it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is '''[[attested]]''' and '''[[idiomatic]]'''.


===Terms===
===Terms===
Line 27: Line 27:
This filters out appearance in raw word lists, commentary on the form of a word, such as “The word ‘foo’ has three letters,” lone definitions, and made-up examples of how a word might be used. For example, an appearance in someone’s online dictionary is suggestive, but it does not show the word actually used to convey meaning. On the other hand, a sentence like “They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind,” appearing in an account of a sailboat race, would be fine. It happens to contain a definition, but the word is also used for its meaning.
This filters out appearance in raw word lists, commentary on the form of a word, such as “The word ‘foo’ has three letters,” lone definitions, and made-up examples of how a word might be used. For example, an appearance in someone’s online dictionary is suggestive, but it does not show the word actually used to convey meaning. On the other hand, a sentence like “They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind,” appearing in an account of a sailboat race, would be fine. It happens to contain a definition, but the word is also used for its meaning.


:''The vote "[[Wiktionary:Votes/2011-06/Redirecting combining characters|2011-06/Redirecting combining characters]]" is relevant to this section, without specifying text to be amended in this document, so please see it for details.''<ref>[[Wiktionary:Votes/2011-06/Redirecting combining characters]]</ref>
:''The vote [[Wiktionary:Votes/2011-06/Redirecting combining characters|2011-06/Redirecting combining characters]] is relevant to this section, without specifying text to be amended in this document, so please see it for details.''<ref>[[Wiktionary:Votes/2011-06/Redirecting combining characters]]</ref>


====Number of citations====
====Number of citations====
Line 37: Line 37:


====Independent====
====Independent====
This serves to prevent double-counting of usages that are not truly distinct. Roughly speaking, we generally consider two uses of a term to be "independent" if they are in different sentences by different people, and to be non-independent if:
This serves to prevent double-counting of usages that are not truly distinct. Roughly speaking, we generally consider two uses of a term to be “independent” if they are in different sentences by different people, and to be non-independent if:


* one is a verbatim or near-verbatim quotation of the other; or
* one is a verbatim or near-verbatim quotation of the other; or
Line 55: Line 55:
Compounds are generally idiomatic, even when the meaning can be clearly expressed in terms of the parts. The reason is that the parts often have several possible senses, but the compound is often restricted to only some combinations of them.
Compounds are generally idiomatic, even when the meaning can be clearly expressed in terms of the parts. The reason is that the parts often have several possible senses, but the compound is often restricted to only some combinations of them.


For example, [[mega-]] can denote either a million (or 2<sup>20</sup>) of something or simply a very large or prominent instance of something. Similarly [[star]] might mean a celestial object or a celebrity. But [[megastar]] means "a very prominent celebrity", not "a million celebrities" or "a million celestial objects", and only rarely "a very large celestial object" (capitalized, it is also a brand name in amateur astronomy).
For example, [[mega-]] can denote either a million (or 2<sup>20</sup>) of something or simply a very large or prominent instance of something. Similarly [[star]] might mean a celestial object or a celebrity. But [[megastar]] means “a very prominent celebrity”, not “a million celebrities” or “a million celestial objects”, and only rarely “a very large celestial object” (capitalized, it is also a brand name in amateur astronomy).


This rule must be applied carefully and is somewhat subjective. For example, [[bank]] has several senses and [[parking lot]] has an idiomatic sense of "large traffic jam". However ''bank parking lot'' can't possibly mean "to put a large traffic jam in a financial institution". With such clearly wrong interpretations weeded out, the remaining choices are "place to park cars for any of several kinds of business" or "place to park cars by, for or on a river bank or similar (as opposed to, say, the ''hill parking lot'')." The whole phrase could plausibly mean either, depending on context (though the first is likely far more common), and so the phrase is not idiomatic.
This rule must be applied carefully and is somewhat subjective. For example, [[bank]] has several senses and [[parking lot]] has an idiomatic sense of “large traffic jam”. However ''bank parking lot'' can’t possibly mean “to put a large traffic jam in a financial institution”. With such clearly wrong interpretations weeded out, the remaining choices are “place to park cars for any of several kinds of business” or “place to park cars by, for or on a river bank or similar” (as opposed to, say, the ''hill parking lot''). The whole phrase could plausibly mean either, depending on context (though the first is likely far more common), and so the phrase is not idiomatic.


This criterion is sometimes referred to as the ''fried egg test'', as a [[fried egg]] generally means an egg (and generally a chicken egg or similar) fried in a particular way. It generally doesn't denote a [[scrambled egg]], which is nonetheless cooked by frying.
This criterion is sometimes referred to as the ''fried egg test'', as a [[fried egg]] generally means an egg (and generally a chicken egg or similar) fried in a particular way. It generally doesn’t denote a [[scrambled egg]], which is nonetheless cooked by frying.


See [[Wiktionary:Idioms that survived RFD]] for other examples. However, many idioms are clearly idiomatic, for example ''[[red herring]]''. These tests are invoked only in discussion of unclear cases.
See [[Wiktionary:Idioms that survived RFD]] for other examples. However, many idioms are clearly idiomatic, for example ''[[red herring]]''. These tests are invoked only in discussion of unclear cases.


Phrasebook entries are very common expressions that are considered useful to non-native speakers. Although these are included as entries in the dictionary (in the main namespace), they are not usually considered in these terms. For instance, ''[[What's your name?]]'' is clearly a summation of its parts.
Phrasebook entries are very common expressions that are considered useful to non-native speakers. Although these are included as entries in the dictionary (in the main namespace), they are not usually considered in these terms. For instance, ''[[What's your name?|What’s your name?]]'' is clearly a summation of its parts.


:''The vote [[Wiktionary:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word]] adds a criterion for inclusion without specifying text to be amended in this document, so please see it for the additional criterion.''<ref>([[WT:COALMINE]]) [[Wiktionary:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word]]</ref>
:''The vote [[Wiktionary:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word]] adds a criterion for inclusion without specifying text to be amended in this document, so please see it for the additional criterion.''<ref>([[WT:COALMINE]]) [[Wiktionary:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word]]</ref>


===Spellings===
===Spellings===
Line 101: Line 101:


===Proverbs===
===Proverbs===
A proverb entry's title begins with a lowercase letter, whether it is a full sentence or not. The first word may still be capitalized on its own:
A proverb entry’s title begins with a lowercase letter, whether it is a full sentence or not. The first word may still be capitalized on its own:
* ''[[you can't judge a book by its cover]]''
* ''[[you can't judge a book by its cover|you can’t judge a book by its cover]]''
* ''[[Rome wasn't built in a day]]''
* ''[[Rome wasn't built in a day|Rome wasn’t built in a day]]''


==Languages to include==
==Languages to include==
Line 109: Line 109:
All natural languages are acceptable. However, it is important to note that the question of whether a proposed language is considered a living language, or a dialect of or alternate name for another language is inherently subjective in some cases, and either designation may have political overtones.
All natural languages are acceptable. However, it is important to note that the question of whether a proposed language is considered a living language, or a dialect of or alternate name for another language is inherently subjective in some cases, and either designation may have political overtones.


:''The vote "[[Wiktionary:Votes/2011-10/Unified Romanian|2011-10/Unified Romanian]]" established that the Moldavian and Romanian lects are treated as one language, Romanian.''<ref>[[Wiktionary:Votes/2011-10/Unified Romanian]]</ref>
:''The vote [[Wiktionary:Votes/2011-10/Unified Romanian|2011-10/Unified Romanian]] established that the Moldavian and Romanian lects are treated as one language, Romanian.''<ref>[[Wiktionary:Votes/2011-10/Unified Romanian]]</ref>
:''The vote "[[Wiktionary:Votes/2011-09/Unified Tagalog|2011-09/Unified Tagalog]]" established that the Filipino and Tagalog lects are treated as one language, Tagalog.''<ref>[[Wiktionary:Votes/2011-09/Unified Tagalog]]</ref>
:''The vote [[Wiktionary:Votes/2011-09/Unified Tagalog|2011-09/Unified Tagalog]] established that the Filipino and Tagalog lects are treated as one language, Tagalog.''<ref>[[Wiktionary:Votes/2011-09/Unified Tagalog]]</ref>


====Sign languages====
====Sign languages====
Line 120: Line 120:
Some individual terms from constructed languages may have been adopted into other languages. These should be treated as terms in the adoptive language, and the origin noted in the etymology, regardless of whether the language as a whole is considered to meet the criteria for inclusion.
Some individual terms from constructed languages may have been adopted into other languages. These should be treated as terms in the adoptive language, and the origin noted in the etymology, regardless of whether the language as a whole is considered to meet the criteria for inclusion.


*These languages have an apparent consensus (All have ISO 639-3 codes):''' [[Esperanto]], [[Ido]], [[Interlingua]], [[Interlingue]] ([[Occidental]]), [[Lojban]], [[Novial]], [[Volapük]]
*These languages have an apparent consensus (All have ISO 639-3 codes): '''[[Esperanto]], [[Ido]], [[Interlingua]], [[Interlingue]] ([[Occidental]]), [[Lojban]], [[Novial]], [[Volapük]]'''


*At present another 12 of the 7000 languages in the ISO 639-3 list are constructed languages. Words in 9 of those languages have not yet been approved for inclusion in the English Wiktionary. These are [[Afrihili]], [[Blissymbols]], [[Brithenig]], [[Dutton World Speedwords]], [[Glosa]] ([[Interglossa]]), [[Kotava]], [[Láadan]], [[Lingua Franca Nova]], and [[Romanova]]. The remaining three are prohibited (see below).<ref>[[Wiktionary:Votes/pl-2010-02/Correct figures in CFI]]</ref>
*At present another 12 of the 7000 languages in the ISO 639-3 list are constructed languages. Words in 9 of those languages have not yet been approved for inclusion in the English Wiktionary. These are [[Afrihili]], [[Blissymbols]], [[Brithenig]], [[Dutton World Speedwords]], [[Glosa]] ([[Interglossa]]), [[Kotava]], [[Láadan]], [[Lingua Franca Nova]], and [[Romanova]]. The remaining three are prohibited (see below).<ref>[[Wiktionary:Votes/pl-2010-02/Correct figures in CFI]]</ref>


*There is ''no'' apparent consensus for including these additional languages which do not have an ISO 639-3 code: [[Ceqli]], [[D'ni]], [[Delason]], [[Ekspreso]], [[Europanto]], [[Glos]], [[Jakelimotu]], [[Kyerepon]], [[Latejami]], [[Latino sine Flexione]], [[Linga]], [[Romanica]], [[Sasxsek]], [[Suoczil]], [[Tceqli]], [[Toki Pona]]
*There is ''no'' apparent consensus for including these additional languages which do not have an ISO 639-3 code: [[Ceqli]], [[D'ni|D’ni]], [[Delason]], [[Ekspreso]], [[Europanto]], [[Glos]], [[Jakelimotu]], [[Kyerepon]], [[Latejami]], [[Latino sine Flexione]], [[Linga]], [[Romanica]], [[Sasxsek]], [[Suoczil]], [[Tceqli]], [[Toki Pona]]


*There ''is'' consensus that languages whose origin and use are restricted to one or more related literary works and its fans do not merit inclusion as entries or translations in the main namespace. They may merit lexicons in the [[Appendix:Contents|Appendix]] namespace. These languages include [[Quenya]], [[Sindarin]], [[Klingon]], and [[Orcish]] (the first three do have ISO 639-3 codes).<ref>[[Wiktionary:Votes/pl-2007-04/Fictional languages]]</ref>
*There ''is'' consensus that languages whose origin and use are restricted to one or more related literary works and its fans do not merit inclusion as entries or translations in the main namespace. They may merit lexicons in the [[Appendix:Contents|Appendix]] namespace. These languages include [[Quenya]], [[Sindarin]], [[Klingon]], and [[Orcish]] (the first three do have ISO 639-3 codes).<ref>[[Wiktionary:Votes/pl-2007-04/Fictional languages]]</ref>


Even when rejected for treatment as a language for purposes of this Wiktionary, a single article about the name of that language ''may'' be acceptable.
Even when rejected for treatment as a language for the purposes of this Wiktionary, a single article about the name of that language ''may'' be acceptable.


:''The vote [[Wiktionary:Votes/pl-2010-12/Clarification of language inclusion|"pl-2010-12/Clarification of language inclusion"]] is relevant to this section and specifies an additional criterion for inclusion in some cases, but did not specify any emendation of the text of this section, so please see that vote for details.''<ref>[[Wiktionary:Votes/pl-2010-12/Clarification of language inclusion]]</ref>
:''The vote [[Wiktionary:Votes/pl-2010-12/Clarification of language inclusion|“pl-2010-12/Clarification of language inclusion”]] is relevant to this section and specifies an additional criterion for inclusion in some cases, but did not specify any emendation of the text of this section, so please see that vote for details.''<ref>[[Wiktionary:Votes/pl-2010-12/Clarification of language inclusion]]</ref>


===Reconstructed languages===
===Reconstructed languages===
Line 140: Line 140:
Terms originating in fictional universes which have three citations in separate works, but which do not have three citations which are ''independent of reference to that universe'' may be included only in appendices of words from that universe, and not in the main dictionary space.<ref>[[Wiktionary:Votes/pl-2008-01/Appendices for fictional terms]]</ref> With respect to names of persons or places from fictional universes, they shall not be included unless they are used out of context in an attributive sense. See [[Wiktionary:Criteria for inclusion/Fictional universes|examples]].
Terms originating in fictional universes which have three citations in separate works, but which do not have three citations which are ''independent of reference to that universe'' may be included only in appendices of words from that universe, and not in the main dictionary space.<ref>[[Wiktionary:Votes/pl-2008-01/Appendices for fictional terms]]</ref> With respect to names of persons or places from fictional universes, they shall not be included unless they are used out of context in an attributive sense. See [[Wiktionary:Criteria for inclusion/Fictional universes|examples]].


For purposes of defining a single work, a series of books, films, or television episodes by the same author, documenting the exploits of a common set of characters in a fictional universe (e.g. the ''Harry Potter'' books, Tolkien's Middle Earth books, the ''Star Wars'' films), shall be considered a single work in multiple parts.
For the purposes of defining a single work, a series of books, films, or television episodes by the same author, documenting the exploits of a common set of characters in a fictional universe (e.g. the ''Harry Potter'' books, Tolkien’s Middle Earth books, the ''Star Wars'' films), shall be considered a single work in multiple parts.


:''The vote "[[Wiktionary:Votes/pl-2010-10/Disallowing certain appendices|pl-2010-10/Disallowing certain appendices]]" is relevant to this section, without specifying text to be amended in this document, so please see it for details.''<ref>[[Wiktionary:Votes/pl-2010-10/Disallowing certain appendices]]</ref>
:''The vote [[Wiktionary:Votes/pl-2010-10/Disallowing certain appendices|pl-2010-10/Disallowing certain appendices]] is relevant to this section, without specifying text to be amended in this document, so please see it for details.''<ref>[[Wiktionary:Votes/pl-2010-10/Disallowing certain appendices]]</ref>


===Wiktionary is not an encyclopedia===
===Wiktionary is not an encyclopedia===
Line 149: Line 149:
Care should be taken so that entries do not become [[encyclopedic]] in nature; if this happens, such content should be moved to [[Wikipedia]], but the dictionary entry itself should be kept.
Care should be taken so that entries do not become [[encyclopedic]] in nature; if this happens, such content should be moved to [[Wikipedia]], but the dictionary entry itself should be kept.


Wiktionary articles are about words, not about people or places. Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the specific places and people belong in Wikipedia. For example: Wiktionary will give the etymologies, pronunciations, alternative spellings, and eponymous meanings, of the ''names'' [[Darlington]], [[Hastings]], [[David]], [[Houdini]], and [[Britney]]. But articles on the specific towns ([[w:Darlington|Darlington]], [[w:Hastings|Hastings]]), statue ([[w:David|David]]), escapologist ([[w:Houdini|Houdini]]), and pop singer ([[w:Britney|Britney]]) are Wikipedia's job.
Wiktionary articles are about words, not about people or places. Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the specific places and people belong in Wikipedia. For example: Wiktionary will give the etymologies, pronunciations, alternative spellings, and eponymous meanings, of the ''names'' [[Darlington]], [[Hastings]], [[David]], [[Houdini]], and [[Britney]]. But articles on the specific towns ([[w:Darlington|Darlington]], [[w:Hastings|Hastings]]), statue ([[w:David|David]]), escapologist ([[w:Houdini|Houdini]]), and pop singer ([[w:Britney|Britney]]) are Wikipedia’s job.


===Language-specific issues===
===Language-specific issues===
Individual languages may have additional restrictions on inclusion. These will be mentioned on that language's About page. For instance, [[Wiktionary:About English]] notes that the community has voted<ref>[[Wiktionary:Votes/pl-2007-07/Exclusion of possessive case]]</ref> to not allow most modern English possessives.<ref>[[Wiktionary:Votes/pl-2010-01/Removing Modern English possessive forms section from CFI]]</ref>
Individual languages may have additional restrictions on inclusion. These will be mentioned on that language’s About page. For instance, [[Wiktionary:About English]] notes that the community has voted<ref>[[Wiktionary:Votes/pl-2007-07/Exclusion of possessive case]]</ref> to not allow most modern English possessives.<ref>[[Wiktionary:Votes/pl-2010-01/Removing Modern English possessive forms section from CFI]]</ref>


==Names==
==Names==
Names fall into several categories, including company names, the names of products, given names, family names, and the full names of specific people, places{{,}} and things. Wiktionary classifies all as proper nouns, but applies caveats to each.
Names fall into several categories, including company names, the names of products, given names, family names, and the full names of specific people, places{{,}} and things. Wiktionary classifies all as proper nouns, but applies caveats to each.


Generic terms are common rather than proper nouns. For example: [[Remington]] is used as a synonym for any sort of rifle, and [[Hoover]] as a synonym for any sort of vacuum cleaner. (Both are also attested family name words, and are included on that basis as well, of course.<!-- SO this could use BETTER EXAMPLES -->) [[Hamburger]] is used as generic term for a type of sandwich. One good rule of thumb as to whether a name has become a generic word is whether the word can be used without capitalization (as indeed “[[sandwich]]” was in the previous sentence<!-- BUT that's the WRONG EXAMPLE -->).
Generic terms are common rather than proper nouns. For example: [[Remington]] is used as a synonym for any sort of rifle, and [[Hoover]] as a synonym for any sort of vacuum cleaner. (Both are also attested family name words, and are included on that basis as well, of course.<!-- SO this could use BETTER EXAMPLES -->) [[Hamburger]] is used as generic term for a type of sandwich. One good rule of thumb as to whether a name has become a generic word is whether the word can be used without capitalization (as indeed “[[sandwich]]” was in the previous sentence<!-- BUT that’s the WRONG EXAMPLE -->).


===Company names===
===Company names===
Line 186: Line 186:


===Names of specific entities===
===Names of specific entities===
This section regulates the inclusion and exclusion of names of specific entities, that is, names of individual people, names of geographic features, names of celestial objects, names of mythological creatures, names and titles of various works, etc.<ref><s>[[Wiktionary:Votes/pl-2010-05/Names of specific entities]]</s></ref><ref><s>[[Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2]]</s></ref><ref>[[Wiktionary:Votes/pl-2011-02/Remove "Place names" section of WT:CFI]]</ref><ref>[[Wiktionary:Votes/pl-2010-12/Names of individuals]]</ref> Some examples include the [[Internet]], the [[Magna Carta]], the [[Mona Lisa]], the [[Qur'an]], the [[Red Cross]], the [[Titanic]], and [[World War II]].
This section regulates the inclusion and exclusion of names of specific entities, that is, names of individual people, names of geographic features, names of celestial objects, names of mythological creatures, names and titles of various works, etc.<ref><s>[[Wiktionary:Votes/pl-2010-05/Names of specific entities]]</s></ref><ref><s>[[Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2]]</s></ref><ref>[[Wiktionary:Votes/pl-2011-02/Remove "Place names" section of WT:CFI]]</ref><ref>[[Wiktionary:Votes/pl-2010-12/Names of individuals]]</ref> Some examples include the [[Internet]], the [[Magna Carta]], the [[Mona Lisa]], the [[Qur'an|Qur’an]], the [[Red Cross]], the [[Titanic]], and [[World War II]].


A name of a specific entity must not be included if it does not meet the [[Wiktionary:Criteria for inclusion#Attestation|attestation]] requirement. Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which. However, policies exist for names of certain kinds of entities. In particular:
A name of a specific entity must not be included if it does not meet the [[Wiktionary:Criteria for inclusion#Attestation|attestation]] requirement. Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which. However, policies exist for names of certain kinds of entities. In particular:

Revision as of 10:20, 14 November 2012

This is a Wiktionary policy, guideline or common practices page.
It should not be modified without discussion and consensus. Any substantial or contested changes require a VOTE.
Policies – Entries: CFI - EL - NORM - NPOV - QUOTE - REDIR - DELETE. Languages: LT - AXX. Others: BLOCK - BOTS - VOTES.

As an international dictionary, Wiktionary is intended to include “all words in all languages”.

General rule

A term should be included if it’s likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic.

Terms

A term need not be limited to a single word in the usual sense. Any of these are also acceptable:

Attestation

“Attested” means verified through[1]

  1. clearly widespread use,
  2. use in a well-known work, or
  3. use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year (different requirements apply for certain languages).[2]

Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as Usenet groups, which are durably archived by Google. Print media such as books and magazines will also do, particularly if their contents are indexed online. Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived. We do not quote other Wikimedia sites[3][4] (such as Wikipedia), but we may use quotations found on them (such as quotations from books available on Wikisource). When citing a quotation from a book, please include the ISBN.

Conveying meaning

See use-mention distinction.

This filters out appearance in raw word lists, commentary on the form of a word, such as “The word ‘foo’ has three letters,” lone definitions, and made-up examples of how a word might be used. For example, an appearance in someone’s online dictionary is suggestive, but it does not show the word actually used to convey meaning. On the other hand, a sentence like “They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind,” appearing in an account of a sailboat race, would be fine. It happens to contain a definition, but the word is also used for its meaning.

The vote “2011-06/Redirecting combining characters” is relevant to this section, without specifying text to be amended in this document, so please see it for details.[5]

Number of citations

For languages well documented on the Internet, three citations in which a term is used is the minimum number for inclusion in Wiktionary. For terms in extinct languages, one use in a contemporaneous source is the minimum, or one mention is adequate subject to the below requirements. For all other spoken languages that are living, only one use or mention is adequate, subject to the following requirements:

  • the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
  • each entry should have its source(s) listed on the entry or citation page, and
  • a box explaining that a low number of citations were used should be included on the entry page (such as by using the {{LDL}} template).[6]

Independent

This serves to prevent double-counting of usages that are not truly distinct. Roughly speaking, we generally consider two uses of a term to be “independent” if they are in different sentences by different people, and to be non-independent if:

  • one is a verbatim or near-verbatim quotation of the other; or
  • both are verbatim or near-verbatim quotations or translations of a single original source; or
  • both are by the same author.

If two or more usages are not independent of each other, then only one of them can be used for purposes of attestation.[7]

Spanning at least a year

This is meant to filter out words that may appear and see brief use, but then never be used again. The one-year threshold is somewhat arbitrary, but appears to work well in practice.

Idiomaticity

An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components.

For example, this is a door is not idiomatic, but shut up and red herring are.

Compounds are generally idiomatic, even when the meaning can be clearly expressed in terms of the parts. The reason is that the parts often have several possible senses, but the compound is often restricted to only some combinations of them.

For example, mega- can denote either a million (or 220) of something or simply a very large or prominent instance of something. Similarly star might mean a celestial object or a celebrity. But megastar means “a very prominent celebrity”, not “a million celebrities” or “a million celestial objects”, and only rarely “a very large celestial object” (capitalized, it is also a brand name in amateur astronomy).

This rule must be applied carefully and is somewhat subjective. For example, bank has several senses and parking lot has an idiomatic sense of “large traffic jam”. However bank parking lot can’t possibly mean “to put a large traffic jam in a financial institution”. With such clearly wrong interpretations weeded out, the remaining choices are “place to park cars for any of several kinds of business” or “place to park cars by, for or on a river bank or similar” (as opposed to, say, the hill parking lot). The whole phrase could plausibly mean either, depending on context (though the first is likely far more common), and so the phrase is not idiomatic.

This criterion is sometimes referred to as the fried egg test, as a fried egg generally means an egg (and generally a chicken egg or similar) fried in a particular way. It generally doesn’t denote a scrambled egg, which is nonetheless cooked by frying.

See Wiktionary:Idioms that survived RFD for other examples. However, many idioms are clearly idiomatic, for example red herring. These tests are invoked only in discussion of unclear cases.

Phrasebook entries are very common expressions that are considered useful to non-native speakers. Although these are included as entries in the dictionary (in the main namespace), they are not usually considered in these terms. For instance, What’s your name? is clearly a summation of its parts.

The vote “Wiktionary:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word” adds a criterion for inclusion without specifying text to be amended in this document, so please see it for the additional criterion.[8]

Spellings

Misspellings, common misspellings and variant spellings:[9] There is no simple hard and fast rule, particularly in English, for determining whether a particular spelling is “correct”. A person defending a disputed spelling should be prepared to provide references for support.[10] Published grammars and style guides can be useful in that regard, as can statistics concerning the prevalence of various forms.

Most simple typos are much rarer than the most frequent spellings. Some words, however, are frequently misspelled. For example, occurred is often spelled with only one c or only one r, but only occurred is considered correct. The misspellings may well merit entries.

It is important to remember that most languages, including English, do not have an academy to establish rules of usage, and thus may be prone to uncertain spellings. This problem is less frequent, though not unknown, in languages such as Spanish where spelling may have legal support in some countries.

Regional or historical variations are not misspellings. For example, there are well-known differences between British and American spelling. A spelling considered incorrect in one region may not occur at all in another, and may even dominate in yet another.

Formatting

Once it is decided that a misspelling is of sufficient importance to merit its own page, the formatting of such a page should not be particularly problematical. The usual language and part of speech headings can be used, followed by a simple definition using the following format:

# {{misspelling of|occurred|lang=en}}

An additional section explaining why the term is a misspelling should be considered optional.

Inflections

The entries for such inflected forms as (deprecated template usage) cameras, (deprecated template usage) geese, (deprecated template usage) asked, and (deprecated template usage) were should indicate what form they are, and link to the main entry for the word ((deprecated template usage) camera, (deprecated template usage) goose, (deprecated template usage) ask, or (deprecated template usage) be, respectively, for the preceding examples). Except with multi-word idioms, they should not merely redirect.

At entries for inflected forms with idiomatic senses, such as (deprecated template usage) blues and (deprecated template usage) smitten, predictable meanings should be distinguished from idiomatic ones.[11]

Idiomatic phrases

Many phrases take several forms. It is not necessary to include every conceivable variant. When present, minor variants should simply redirect to the main entry. For the main entry, prefer the most generic form, based on the following principles:

Pronouns

Prefer the generic personal pronoun, one or one’s. Thus, feel one’s oats is preferable to feel his oats. Use of other personal pronouns, especially in the singular, should be avoided except where they are essential to the meaning.

Articles

Omit an initial article unless it makes a difference in the meaning. E.g., cat’s pajamas instead of the cat’s pajamas.

Verbs

Use the infinitive form of the verb (but without “to”) for the principal verb of a verbal phrase. Thus for the saying It’s raining cats and dogs, or It was raining cats and dogs, or I think it’s going to rain cats and dogs any minute now, or It’s rained cats and dogs for the last week solid the entry should be (and is) under rain cats and dogs. The other variants are derived by the usual rules of grammar (including the use of it with weather terms and other impersonal verbs).

Proverbs

A proverb entry’s title begins with a lowercase letter, whether it is a full sentence or not. The first word may still be capitalized on its own:

Languages to include

Natural languages

All natural languages are acceptable. However, it is important to note that the question of whether a proposed language is considered a living language, or a dialect of or alternate name for another language is inherently subjective in some cases, and either designation may have political overtones.

The vote “2011-10/Unified Romanian” established that the Moldavian and Romanian lects are treated as one language, Romanian.[12]
The vote “2011-09/Unified Tagalog” established that the Filipino and Tagalog lects are treated as one language, Tagalog.[13]

Sign languages

Terms in signed languages are acceptable as entries, and should be entered as described in the policy document Wiktionary:About sign languages.[14]

Constructed languages

Constructed languages have not developed naturally, but are the product of conscious effort in the fulfillment of some purpose. In general, terms in such languages, particularly languages associated with works of fiction, do not meet the basic requirement that one might run across them and want to know the meaning of their words, since they are only used in a narrow context in which further material on the language is readily available. There are specific exceptions to this general rule, listed below, based on consensus of the Wiktionary community. Esperanto, in particular, is a living language with a sizeable community of fluent speakers, and even some native speakers!

Some individual terms from constructed languages may have been adopted into other languages. These should be treated as terms in the adoptive language, and the origin noted in the etymology, regardless of whether the language as a whole is considered to meet the criteria for inclusion.

  • There is consensus that languages whose origin and use are restricted to one or more related literary works and its fans do not merit inclusion as entries or translations in the main namespace. They may merit lexicons in the Appendix namespace. These languages include Quenya, Sindarin, Klingon, and Orcish (the first three do have ISO 639-3 codes).[16]

Even when rejected for treatment as a language for the purposes of this Wiktionary, a single article about the name of that language may be acceptable.

The vote “pl-2010-12/Clarification of language inclusion” is relevant to this section and specifies an additional criterion for inclusion in some cases, but did not specify any emendation of the text of this section, so please see that vote for details.[17]

Reconstructed languages

Terms in reconstructed languages such as Proto-Indo-European do not meet the criteria for inclusion. They may be entered in appendices, and referred to from etymological sections.[18] See Wiktionary:Reconstructed terms.

Exclusions

Fictional universes

Terms originating in fictional universes which have three citations in separate works, but which do not have three citations which are independent of reference to that universe may be included only in appendices of words from that universe, and not in the main dictionary space.[19] With respect to names of persons or places from fictional universes, they shall not be included unless they are used out of context in an attributive sense. See examples.

For the purposes of defining a single work, a series of books, films, or television episodes by the same author, documenting the exploits of a common set of characters in a fictional universe (e.g. the Harry Potter books, Tolkien’s Middle Earth books, the Star Wars films), shall be considered a single work in multiple parts.

The vote “pl-2010-10/Disallowing certain appendices” is relevant to this section, without specifying text to be amended in this document, so please see it for details.[20]

Wiktionary is not an encyclopedia

See also Wiktionary is not an encyclopaedia.

Care should be taken so that entries do not become encyclopedic in nature; if this happens, such content should be moved to Wikipedia, but the dictionary entry itself should be kept.

Wiktionary articles are about words, not about people or places. Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the specific places and people belong in Wikipedia. For example: Wiktionary will give the etymologies, pronunciations, alternative spellings, and eponymous meanings, of the names Darlington, Hastings, David, Houdini, and Britney. But articles on the specific towns (Darlington, Hastings), statue (David), escapologist (Houdini), and pop singer (Britney) are Wikipedia’s job.

Language-specific issues

Individual languages may have additional restrictions on inclusion. These will be mentioned on that language’s About page. For instance, Wiktionary:About English notes that the community has voted[21] to not allow most modern English possessives.[22]

Names

Names fall into several categories, including company names, the names of products, given names, family names, and the full names of specific people, places, and things. Wiktionary classifies all as proper nouns, but applies caveats to each.

Generic terms are common rather than proper nouns. For example: Remington is used as a synonym for any sort of rifle, and Hoover as a synonym for any sort of vacuum cleaner. (Both are also attested family name words, and are included on that basis as well, of course.) Hamburger is used as generic term for a type of sandwich. One good rule of thumb as to whether a name has become a generic word is whether the word can be used without capitalization (as indeed “sandwich” was in the previous sentence).

Company names

Being a company name does not guarantee inclusion. To be included, the use of the company name other than its use as a trademark (i.e., a use as a common word or family name) has to be attested.

Brand names

A brand name for a product or service should be included if it has entered the lexicon.[23][24] Apart from genericized trademarks, this is measured objectively by the brand name’s use in at least three independent durably archived citations spanning a period of at least three years. The sources of these citations:

  1. must be independent of any parties with economic interest in the brand, including the manufacturer, distributors, retailers, marketers, and advertisers, their parent companies, subsidiaries, and affiliates, at time of authorship; and
  2. must not identify any such parties.

If the term has legal protection as a trademark, the original source must not indicate such. The sources also must not be written:

  1. by any person or group associated with the type of product or service;
  2. about any person or group specifically associated with the product or service; or
  3. about the type of product or service in general.

The text preceding and surrounding the citation must not identify the product or service to which the brand name applies, whether by stating explicitly or implicitly some feature or use of the product or service from which its type and purpose may be surmised, or some inherent quality that is necessary for an understanding of the author’s intent. See examples.

Given and family names

Given names (such as David, Roger, and Peter) and family names (such as Baker, Bush, Rice, Smith, and Jones) are words, and subject to the same criteria for inclusion as any other words. Wiktionary has main articles giving etymologies, alternative spellings, meanings, and translations for given names and family names, and has two appendices for indexing those articles: Appendix:Names, Appendix:Surnames/A.[25]

For most given names and family names, it is relatively easy to demonstrate that the word fulfills the criteria, as for most given names and family names the name words are in widespread use in both spoken communication and literature. However, being a name per se does not automatically qualify a word for inclusion. A new name, that has not been attested, is still a protologism. A name that occurs only in the works of fiction of a single author, a television series or a video game, or within a closed context such as the works of several authors writing about a single fictional universe is not used independently and should not be included.

Hypocoristics, diminutives, and abbreviations of names (such as Jock, Misha, Kenny, Ken, and Rog) are held to the same standards as names.

Genealogical content

Wiktionary is not a genealogy database. Wiktionary articles on family names, for example, are not intended to be about the people who share the family name. They are about the name as a word. For example: Whilst Yoder will tell the reader that the word originated in Switzerland (as well as give its pronunciations and alternative spellings), it is not intended to include information about the ancestries of people who have the family name Yoder.[26]

Names of specific entities

This section regulates the inclusion and exclusion of names of specific entities, that is, names of individual people, names of geographic features, names of celestial objects, names of mythological creatures, names and titles of various works, etc.[27][28][29][30] Some examples include the Internet, the Magna Carta, the Mona Lisa, the Qur’an, the Red Cross, the Titanic, and World War II.

A name of a specific entity must not be included if it does not meet the attestation requirement. Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which. However, policies exist for names of certain kinds of entities. In particular:

  • No individual person should be listed as a sense in any entry whose page title includes both a given name or diminutive and a family name or patronymic. For instance, Walter Elias Disney, the film producer and voice of Mickey Mouse, is not allowed a definition line at Walt Disney.
  • Names of specific companies are subject to the “Company names” section of this page.
  • Names of fictional people and places are subject to the “Fictional universes” section of this page.

Such definitions as are included should be succinct rather than encyclopedic.

Issues to consider

Attestation vs. the slippery slope

There is occasionally concern that adding an entry for a particular term will lead to entries for a large number of similar terms. This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form. Some examples:

  • Any word in any language might be borrowed into English, but only a few actually are. Including spaghetti does not imply that ricordati is next (though it is of course fine as an Italian entry).
  • Any word may be rendered in pig Latin, but only a few (e.g., amscray) have found their way into common use.
  • Any word may be rendered in leet style, but only a few (e.g., pr0n) see general use.
  • Grammatical affixes like meta- and -ance can be added in a great many more cases than they actually are. (Inflectional suffixes like -s for the plural of a noun and -ed for the past tense of a verb can actually be used for almost any noun or verb.)
  • It may seem that trendy internet prefixes like e- and i- are used everywhere, but they aren’t. If I decide to talk about e-thumb-twiddling but no one else does, then there’s no need for an entry.

References

  1. ^ Wiktionary:Votes/pl-2010-11/Attestation in academic journals
  2. ^ Wiktionary:Votes/2012-04/Languages with limited documentation
  3. ^ Wiktionary:Votes/pl-2008-04/WMF jargon
  4. ^ Wiktionary:Votes/pl-2010-06/WMF jargon accepted when it meets CFI
  5. ^ Wiktionary:Votes/2011-06/Redirecting combining characters
  6. ^ Wiktionary:Votes/2012-06/Well Documented Languages
  7. ^ Wiktionary:Votes/pl-2012-02/Independence
  8. ^ (WT:COALMINE) Wiktionary:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word
  9. ^ Wiktionary:Votes/pl-2011-02/Renaming CFI section for spellings
  10. ^ Wiktionary:Beer parlour#Deleting "his" in CFI
  11. ^ Wiktionary:Votes/pl-2008-08/Inclusion of regular inflected forms
  12. ^ Wiktionary:Votes/2011-10/Unified Romanian
  13. ^ Wiktionary:Votes/2011-09/Unified Tagalog
  14. ^ Wiktionary:Votes/pl-2008-08/Wiktionary:About sign languages
  15. ^ Wiktionary:Votes/pl-2010-02/Correct figures in CFI
  16. ^ Wiktionary:Votes/pl-2007-04/Fictional languages
  17. ^ Wiktionary:Votes/pl-2010-12/Clarification of language inclusion
  18. ^ Wiktionary:Votes/pl-2006-12/Proto- languages in Appendicies
  19. ^ Wiktionary:Votes/pl-2008-01/Appendices for fictional terms
  20. ^ Wiktionary:Votes/pl-2010-10/Disallowing certain appendices
  21. ^ Wiktionary:Votes/pl-2007-07/Exclusion of possessive case
  22. ^ Wiktionary:Votes/pl-2010-01/Removing Modern English possessive forms section from CFI
  23. ^ Wiktionary:Votes/pl-2007-08/Brand names of products 2
  24. ^ Wiktionary:Votes/pl-2012-02/Brand names and physical product 2
  25. ^ Wiktionary:Votes/pl-2010-01/Renaming given name appendixes
  26. ^ Wiktionary:Votes/pl-2010-01/Renaming CFI section on genealogic names
  27. ^ Wiktionary:Votes/pl-2010-05/Names of specific entities
  28. ^ Wiktionary:Votes/pl-2010-05/Placenames with linguistic information 2
  29. ^ Wiktionary:Votes/pl-2011-02/Remove "Place names" section of WT:CFI
  30. ^ Wiktionary:Votes/pl-2010-12/Names of individuals