Wiktionary:Criteria for inclusion: difference between revisions

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
(Names of actual people, places, and things: See discussion re whether names such as London, Beijing, etc should be included)
(Protologisms: see discussion for exclusion of the words in lists - Protologisms, WikiSaurus, concordances etc,)
Line 96: Line 96:
   
 
===Protologisms===
 
===Protologisms===
The designation, [[protologism]], is for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on [[Wiktionary:List of protologisms]], and should not be given their own separate entries.
+
The designation, [[protologism]], is for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on [[Wiktionary:List of protologisms]], and should not be given their own separate entries.
  +
:<small> see discussion for exclusion of the words in lists - Protologisms, WikiSaurus, concordances etc, from application of the CFI to each individual listed word.</small>
   
 
===Wiktionary is not an encyclopedia===
 
===Wiktionary is not an encyclopedia===

Revision as of 10:57, 26 February 2006

Application-certificate Gion.svg This is a Wiktionary policy, guideline or common practices page. It must not be modified without a VOTE.
An editable version of this page is available at Wiktionary:Criteria for inclusion/Editable.
Policies: CFI - ELE - BLOCK - REDIR - BOTS - QUOTE - DELETE - NPOV - AXX

As an international dictionary, Wiktionary is intended to include "all words in all languages". Template:Shortcut

General rule

A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic.

"Terms" to be broadly interpreted

A term need not be limited to a single word in the usual sense. Any of these is also acceptable:

Attestation

"Attested" means verified through

  1. Clearly widespread use,
  2. Usage in a well-known work,
  3. Appearance in a refereed academic journal, or
  4. Usage in permanently-recorded media, conveying meaning, in at least three independent instances spanning at least a year.

Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media such as blogs and usenet groups, which are durably archived by Google. Print media such as books and magazines will also do, particularly if their contents are indexed online. Other recorded media such as audio and video are also acceptable, provided they are of verifiable origin and are durably archived. When citing a quotation from a book, please include the ISBN.

Conveying meaning

See use-mention distinction.

This filters out appearance in raw word lists, commentary on the form of a word, such as "The word "foo" has three letters," and lone definitions. For example, an appearance in someone's online dictionary is suggestive, but it does not show the word actually used to convey meaning. On the other hand, a sentence like "They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind," appearing in an account of a sailboat race, would be fine. It happens to contain a definition, but the word is also used for its meaning.

Independence

This is meant to exclude multiple references which draw on each other. Where Wikipedia has an article on a given subject, and that article is mirrored by an external site the use of certain words on the mirror site would not be independent. It is quite common to find that material on one site is readily traced to another. Similarly, the same quote will often occur verbatim in separate sources. While the sources may be independent of each other, the usages in question are clearly not.

The presumption is that if a term is only used in a narrow community, there is no need to refer to a general dictionary such as this one to find its meaning.

Spanning at least a year

This is meant to filter out words that may appear and see brief use, but then never be used again. The one year threshold is somewhat arbitrary, but appears to work well in practice.

Idiomaticity

An expression is "idiomatic" if its full meaning cannot be easily derived from the meaning of its separate components.

For example, this is a door is not idiomatic, but shut up and red herring are.

Misspellings, common misspellings and variant spellings

There is no simple hard and fast rule, particularly in English, for determining whether a particular spelling is "correct". A person defending a disputed spelling should be prepared to support his view with references. Published grammars and style guides can be useful in that regard, as can statistics concerning the prevalence of various forms.

Most simple typos are much rarer than the most frequent spellings. Some words, however, are frequently misspelled. For example occurred is often spelled with only one c or only one r, but only occurred is considered correct. The misspellings may well merit entries.

It is important to remember that most languages, including English, do not have an academy to establish rules of usage, and thus may be prone to uncertain spellings. This problem is less frequent, though not unknown, in languages such as Spanish where spelling may have legal support in some countries.

Regional or historical variations are not misspellings. For example, there are well-known differences between British and American spelling. A spelling considered incorrect in one region may not occur at all in another, and may even dominate in yet another.

Formatting

Once it is decided that a misspelling is of sufficient importance to merit its own page the formatting of such a page should not be particularly problematical. The usual language and part of speech headings can be used, followed by this simple entry:

# misspelling of [[...]]

An additional section explaining why the term is a misspelling should be considered optional.

Inflections

Although it is not forbidden, there is no particular need to include completely regular inflections such as cameras or singing. To the extent that they are present, they should indicate what inflection is intended and link to the stem form, and should not merely redirect.

Irregular forms such as geese and were should have their own entries, because people unfamiliar with the irregularity will look for them under the inflected form. Inflected forms — whether regular or irregular — with idiomatic meanings, such as blues or smitten, should have their own entries, with the predictable meanings distinguished from the idiomatic.

Idiomatic phrases

Many phrases take several forms. It is not necessary to include every conceivable variant. When present, minor variants should simply redirect to the main entry. For the main entry, prefer the most generic form, based on the following principles:

Pronouns

Prefer the impersonal pronoun, one or one's. Thus, feel one's oats is preferable to feel his oats. Use of personal pronouns, especially in the singular, should be avoided except where they are essential to the meaning.

Articles

Omit an initial article unless it makes a difference in the meaning. E.g., cat's pajamas instead of the cat's pajamas.

Verbs

Use the infinitive form of the verb (but without "to") for the principal verb of a verbal phrase. Thus for the saying It's raining cats and dogs, or It was raining cats and dogs, or I think it's going to rain cats and dogs any minute now, or It's rained cats and dogs for the last week solid the entry should be (and is) under rain cats and dogs. The other variants are derived by the usual rules of grammar (including the use of it with weather terms and other impersonal verbs).

Proverbs

Proverbs that are whole sentences should begin with a capital letter.

Languages to include

Natural languages

Uncommon languages are acceptable as long as they are (or were) used for everyday communication by some identifiable, natural population of humans. If the language lacks an ISO 639 language code, it is almost surely not acceptable. It is important to note that the question of whether a proposed language is considered a living language, or a dialect of or alternate name for another language is inherently subjective in some cases, and either designation may have political overtones.

Constructed languages

Constructed languages have not developed naturally, but are the product of conscious effort in the fulfillment of some purpose. In general, terms in such languages, particularly languages associated with works of fiction, do not meet the basic requirement that one might run across them and want to know the meaning of its words, since they are only used in a narrow context in which further material on the language is readily available. There are specific exceptions to this general rule, listed below, based on consensus of the Wiktionary community. Esperanto, in particular, is a living language with a sizeable community of native speakers.

Some individual terms from constructed languages may have been adopted into other languages. These should be treated as terms in the adoptive language, and the origin noted in the etymology, regardless of whether the language as a whole is considered to meet the criteria for inclusion.

Even when rejected for treatment as a language for purposes of this Wiktionary, a single article about the name of that language may be acceptable.

Exclusions

Vandalism

From time to time, various parties will insert material into Wiktionary which clearly has nothing to do with Wiktionary's purpose or practices. Such activity is considered vandalism and will be undone at the first opportunity. If the vandalism consists of an edit to an existing page, that edit will be reverted. If the vandalism consists of a new article, that article will be removed. This is done at the discretion of the administrators and does not require discussion, even if the vandalism consists of a new article for a term which would otherwise meet these criteria but has not yet been entered legitimately.

Protologisms

The designation, protologism, is for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on Wiktionary:List of protologisms, and should not be given their own separate entries.

see discussion for exclusion of the words in lists - Protologisms, WikiSaurus, concordances etc, from application of the CFI to each individual listed word.

Wiktionary is not an encyclopedia

See also Wiktionary is not an encyclopaedia.

Care should be taken so that entries do not become encyclopedic in nature; if this happens, such content should be moved to Wikipedia, but the dictionary entry itself should be kept.

Wiktionary articles are about words, not about people or places. Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the specific places and people belong in Wikipedia. For example: Wiktionary will give the etymologies, pronunciations, alternative spellings, and eponymous meanings, of the names Darlington, Hastings, David, Houdini, and Britney. But articles on the specific towns (Darlington, Hastings), statue (David), escapologist (Houdini), and pop singer (Britney) are Wikipedia's job.


Issues to consider

Attestation vs. the slippery slope

There is occasionally concern that adding an entry for a particular term will lead to entries for a large number of similar terms. This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form. Some examples:

  • Any word in any language might be borrowed into English, but only a few actually are. Including spaghetti does not imply that ricordati is next (though it would of course be fine as an Italian entry).
  • Any word may be rendered in Pig-Latin, but only a few (e.g., amscray) have found their way into common use.
  • Any word may be rendered in leet style, but only a few (e.g., pr0n) see general use.
  • Grammatical affixes like meta- and -ance can be added in a great many more cases than they actually are. (Some basic suffixes like plural -s and past tense -ed really can be used almost anywhere.)
  • It may seem that trendy internet prefixes like e- and i are used everywhere, but they aren't. If I decide to talk about e-thumb-twiddling but no one else does, then there's no need for an entry.

Typographic variants

The inclusion of terms which contain unusual characters or are otherwise unusual in form, such as G-d, pr0n, i18n or veg*n is somewhat controversial. A few view some of these as bizarre or illiterate, notwithstanding their appearance in a variety of media, and find them to have no place in a respectable dictionary. This raises the question of what constitutes a respectable dictionary, and in any case tends to exclude terms that people might well run across and want to know the meaning of.

Names

Names fall into two categories: individual given names and family names, which are single words, and the names of actual people, places, and things. Wiktionary classifies both as proper nouns, but applies caveats to each.

Given names and family names

Given names (such as David, Roger, and Peter) and family names (such as Baker, Bush, Rice, Smith, and Jones) are words, and subject to the same criteria for inclusion as any other words. Wiktionary has main articles giving etymologies, alternative spellings, meanings, and translations for given names and family names, and has two appendices for indexing those articles: Wiktionary Appendix:First names, Wiktionary Appendix:Surnames.

For most given names and family names, it is relatively easy to demonstrate that the word fulfils the criteria, as for most given names and family names the name words are in widespread use in both spoken communication and literature. However, being a name per se does not automatically qualify a word for inclusion. A new name, that has not been attested, is still a protologism. A name that occurs only in the works of fiction of a single author, a television series or a video game, or within a closed context such as the works of several authors writing about a single fictional universe is not used independently and should not be included.

hypocoristics, diminutives, and abbreviations of names (such as Jock, Misha, Kenny, Ken, and Rog) are held to the same standards as names.

The status of patronymics has not been settled.

Names of actual people, places, and things

A name should be included if it it is used attributively, with a widely-understood meaning. For example: New York is included because "New York" is used attributively in phrases like "New York delicatessen", to describe a particular sort of delicatessen. A person or place name that is not used attributively (and that is not a word that otherwise should be included) should not be included. Lower Hampton, Empire State Building, and George Walker Bush thus should not be included. Similarly, whilst Jefferson (an attested family name word with an etymology that Wiktionary can discuss) and Jeffersonian (an adjective) should be included, Thomas Jefferson (which isn't used attributively) should not. See discussion re whether names such as London, Beijing, etc should be included)

A name should be included if it has become a generic term. For example: Remington is used as a synonym for any sort of rifle, and Hoover as a synonym for any sort of vacuum cleaner. (Both are also attested family name words, and are included on that basis as well, of course.) Hamburger is used as generic term for a type of sandwich. One good rule of thumb as to whether a name has become a generic word is whether the word can be used without capitalization (as indeed "sandwich" was in the previous sentence).

Being a trademark or a company name does not guarantee inclusion. (Of course, some company names are derived from family names, and are included on that basis.) Although some words are trademarks and company names, not all trademarks and company names are words. (Indeed, trademark holders will vigorously defend their trademarks against becoming words. According to Adobe Systems, there is no such word as Photoshopped, since Photoshop® is a trademark and not a common verb that can have a past participle; according to Xerox there is no such word as xerox, since Xerox® is a trademark and not a common verb; according to Sony there is no such word as Playstationize since there's no word Playstation at all and PlayStation® is a trademark and not a common verb.) Many trademarks and company names are deliberately protologisms. To be included, the use of a trademark or company name other than its use as a trademark (i.e. a use as a common word) has to be attested.

What Wiktionary is not with respect to names

Wiktionary is not a genealogy database. Wiktionary articles on family names, for example, are not intended to be about the people who share the family name. They are about the name as a word. For example: Whilst Yoder will tell the reader that the word originated in Switzerland (as well as give its pronunciations and alternative spellings), it is not intended to include information about the ancestries of people who have the family name Yoder.