Wiktionary:Criteria for inclusion/Editable

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
Application-certificate Gion.svg This is an editable draft of Wiktionary:Criteria for inclusion with no policy authority. It is intended to help the Wiktionary community develop new and perhaps better approaches. Please feel free to edit this page conscientiously, as you would any document on a wiki.
Policies: CFI - ELE - BLOCK - REDIR - BOTS - QUOTE - DELETE - NPOV - AXX

As an international dictionary, Wiktionary is intended to include all words in all languages.

General rule[edit]

A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic.

“Terms” to be broadly interpreted[edit]

A term need not be limited to a single word in the usual sense. Any of these are also acceptable:

Attestation[edit]

“Attested” means verified as meeting any of the following criteria

  1. Clearly widespread use,
  2. Usage in a well-known work (but see Fictional universes below for an exception to inclusion on the basis of such usage),
  3. Appearance in a refereed academic journal,
  4. Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.

Permanently recorded media[edit]

Where possible, it is better to cite sources that are likely to remain easily accessible over time, so that someone referring to Wiktionary years from now is likely to be able to find the original source. As Wiktionary is an online dictionary, this naturally favors media that can be accessed electronically.

Print media such as books and magazines are strongly preferred, particularly if their contents can be viewed online. If a source viewable online, please include the URL so that other users can verify the quote. When citing a quotation from a recent book, please also include the ISBN so that it can be located in libraries and bookstores.

Acceptable sources are not limited to print. Other recorded media such as audio and video are also accepted, provided they are of verifiable origin and are durably archived. Thus, most commercially-produced CDs and DVDs would be acceptable, but most YouTube clips would not. In addition, online media that have a reliable, permanent digital archive may be used. This includes Usenet groups, which are durably and redundantly archived by Google and other services. However, it does not include most web archives. For example, the Wayback Machine maintained by Archive.org is not considered usable for attestation, because the archive of a site can be erased at the request of the site owner.

We do not accept quotes from other Wikimedia sites (such as Wikipedia) as proof of attestation. However, it is permitted to use quotations from eligible works that happen to be available on a Wikimedia site. This would include, for example, quotations from books available on Wikisource.

Conveying meaning[edit]

See use-mention distinction.

This filters out appearance in raw word lists, commentary on the form of a word, such as “The word ‘foo’ has three letters,” lone definitions, and made-up examples of how a word might be used. For example, an appearance in someone’s online dictionary is suggestive, but it does not show the word actually used to convey meaning. On the other hand, a sentence like “They raised the jib (a small sail forward of the mainsail) in order to get the most out of the light wind,” appearing in an account of a sailboat race, would be fine. It happens to contain a definition, but the word is also used for its meaning.

Independence[edit]

This is meant to exclude multiple references that draw on each other. Where Wikipedia has an article on a given subject, and that article is mirrored by an external site the use of certain words on the mirror site would not be independent. It is quite common to find that material on one site is readily traced to another. Similarly, the same quote will often occur verbatim in separate sources. While the sources may be independent of each other, the usages in question are clearly not.

The presumption is that if a term is only used in a narrow community, there is no need to refer to a general dictionary such as this one to find its meaning.

Spanning at least a year[edit]

This is meant to filter out words that may appear and see brief use, but then never be used again. The one-year threshold is somewhat arbitrary, but appears to work well in practice.

Idiomaticity[edit]

An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components.

For example, this is a door is not idiomatic, but shut up and red herring are.

Compounds are generally idiomatic, even when the meaning can be clearly expressed in terms of the parts. The reason is that the parts often have several possible senses, but the compound is often restricted to only some combinations of them.

This rule must be applied carefully and is somewhat subjective. For example bank has several senses and parking lot has an idiomatic sense of "large traffic jam". However bank parking lot can't possibly mean "to put a large traffic jam in a financial institution". With such clearly wrong interpretations weeded out, the remaining choices are "place to park cars for any of several kinds of business" or "place to park cars by, for or on a river bank or similar (as opposed to, say, the hill parking lot)." The whole phrase could plausibly mean either, depending on context (though the first is likely far more common), and so the phrase is not idiomatic.

This criterion is sometimes referred to as the fried egg test, as fried egg generally means an egg (and generally a chicken egg or similar) fried in a particular way. It generally doesn't denote a scrambled egg, which is nonetheless cooked by frying.

See Wiktionary:List of idioms that survived RFD for other examples. However, many idioms are clearly idiomatic, for example red herring. These tests are invoked only in discussion of unclear cases.

Because of the complex and somewhat subjective character of idiomaticity, care should be taken to avoid deleting idiomatic entries. Where there is a basis for reasonable doubt, a term should be considered idiomatic.

If a term has an idiomatic sense, non-idiomatic senses may be added as well so that reader does not infer that the phrase has only a limited meaning (see pearl necklace).

Multi-word terms that are more common spellings of an already-included, single-word terms are included regardless of their own idiomaticity.

Phrasebook entries[edit]

Phrasebook entries are very common expressions that are considered useful to non-native speakers. Although these are included as entries in the dictionary (in the main namespace), they are not usually considered in these terms. For instance, What's your name? is clearly a summation of its parts.

Spellings[edit]

Misspellings, common misspellings and variant spellings:

There is no simple hard and fast rule, particularly in English, for determining whether a particular spelling is “correct”. A person defending a disputed spelling should be prepared to support his view with references. Published grammars and style guides can be useful in that regard, as can statistics concerning the prevalence of various forms.

Most simple typos are much rarer than the most frequent spellings. Some words, however, are frequently misspelled. For example occurred is often spelled with only one c or only one r, but only occurred is considered correct. The misspellings may well merit entries.

It is important to remember that most languages, including English, do not have an academy to establish rules of usage, and thus may be prone to uncertain spellings. This problem is less frequent, though not unknown, in languages such as Spanish where spelling may have legal support in some countries.

Regional or historical variations are not misspellings. For example, there are well-known differences between British and American spelling. A spelling considered incorrect in one region may not occur at all in another, and may even dominate in yet another.

Included misspellings should be fairly common relative to the spellings considered correct or common in absolute numbers. For this purpose, a carefully executed Web search may provide good evidence of relative and absolute frequency. A spelling that accounts for more than 20% of current usage is difficult to dismiss as a misspelling. It may better be considered an alternative spelling, possibly requiring an explanation in a usage note. A spelling that accounts for less than 3% of current usage on the Web is difficult to credit as common, because of the use of such misspellings as a means of attracting searchers to websites with advertising.

Formatting[edit]

Once it is decided that a misspelling is of sufficient importance to merit its own page the formatting of such a page should not be particularly problematical. The usual language and part of speech headings can be used, followed by this simple entry:

# {{misspelling of|[[...]]}}

An additional section explaining why the term is a misspelling should be considered optional. An alternative spelling or form that some consider incorrect may require an explanation.

Inflections[edit]

The entries for such inflected forms as cameras, geese, asked, and were should indicate what form they are, and link to the main entry for the word (camera, goose, ask, or be, respectively, for the preceding examples). Except with multi-word idioms, they should not merely redirect.

At entries for inflected forms with idiomatic senses, such as blues and smitten, predictable meanings should be distinguished from idiomatic ones.

Idiomatic phrases[edit]

Many phrases take several forms. It is not necessary to include every conceivable variant. When present, minor variants should simply redirect to the main entry. For the main entry, prefer the most generic and simplistic form with words in their lemma form if possible.

Main entry Variant
feel one’s oats feel his oats
cat’s pajamas the cat’s pajamas
rain cats and dogs It’s raining cats and dogs
you can't judge a book by its cover You can't judge a book by its cover

Additional, language-specific concerns may be found at the language policy pages, such as WT:About English#Phrases.

Languages to include[edit]

Natural languages[edit]

All natural languages are acceptable. However, it is important to note that the question of whether a proposed language is considered a living language, or a dialect of or alternate name for another language is inherently subjective in some cases, and either designation may have political overtones.

Sign languages[edit]

Terms in signed languages are acceptable as entries, and should be entered as described in the policy document Wiktionary:About sign languages.

Constructed languages[edit]

Constructed languages have not developed naturally, but are the product of conscious effort in the fulfillment of some purpose. In general, terms in such languages, particularly languages associated with works of fiction, do not meet the basic requirement that one might run across them and want to know the meaning of their words, since they are only used in a narrow context in which further material on the language is readily available. There are specific exceptions to this general rule, listed below, based on consensus of the Wiktionary community. Esperanto, in particular, is a living language with a sizeable community of fluent speakers, and even some native speakers!

Some individual terms from constructed languages may have been adopted into other languages. These should be treated as terms in the adoptive language, and the origin noted in the etymology, regardless of whether the language as a whole is considered to meet the criteria for inclusion.

  • There is consensus that languages whose origin and use are restricted to one or more related literary works and its fans do not merit inclusion as entries or translations in the main namespace. They may merit lexicons in the Appendix namespace. These languages include Quenya, Sindarin, Klingon, and Orcish (the first three do have ISO 639-3 codes).[2]

Even when rejected for treatment as a language for purposes of this Wiktionary, a single article about the name of that language may be acceptable.

Reconstructed languages[edit]

Terms in reconstructed languages such as Proto-Indo-European do not meet the criteria for inclusion. They may be entered in appendices, and referred to from etymological sections. See Wiktionary:Reconstructed terms.

Exclusions[edit]

Vandalism[edit]

From time to time, various parties will insert material into Wiktionary which clearly has nothing to do with Wiktionary's purpose or practices. Such activity is considered vandalism and will be undone at the first opportunity. If the vandalism consists of an edit to an existing page, that edit will be reverted. If the vandalism consists of a new article, that article will be removed. This is done at the discretion of the administrators and does not require discussion, even if the vandalism consists of a new article for a term which would otherwise meet these criteria but has not yet been entered legitimately.

Protologisms[edit]

The designation protologism is for terms defined in the hopes that they will be used, but which are not actually in wide use. These are listed on Appendix:List of protologisms, and should not be given their own separate entries.

see discussion for exclusion of the words in lists - Protologisms, Wikisaurus, concordances etc, from application of the CFI to each individual listed word.

Fictional universes[edit]

Terms originating in fictional universes which have three citations in separate works, but which do not have three citations which are independent of reference to that universe may be included only in appendices of words from that universe, and not in the main dictionary space. With respect to names of persons or places from fictional universes, they shall not be included unless they are used out of context in an attributive sense. See examples.

For purposes of defining a single work, a series of books, films, or television episodes by the same author, documenting the exploits of a common set of characters in a fictional universe (e.g. the Harry Potter books, Tolkien's Middle Earth books, the Star Wars films), shall be considered a single work in multiple parts.

Encyclopedic content[edit]

See also Wiktionary is not an encyclopaedia.

Care should be taken so that entries do not become encyclopedic in nature; if this happens, such content should be moved to Wikipedia, but the dictionary entry itself should be kept.

Wiktionary articles are about words, not about people or places. Many places, and some people, are known by single word names that qualify for inclusion as given names or family names. The Wiktionary articles are about the words. Articles about the specific places and people belong in Wikipedia. For example: Wiktionary will give the etymologies, pronunciations, alternative spellings, and eponymous meanings, of the names Darlington, Hastings, David, Houdini, and Britney. But articles on the specific towns (Darlington, Hastings), statue (David), escapologist (Houdini), and pop singer (Britney) are Wikipedia's job.

Language-specific issues[edit]

Individual languages may have additional restrictions on inclusion. These will be mentioned on that language's About page. For instance, Wiktionary:About English notes that modern English possessives aren't usually allowed and Wiktionary:About French mentions that independent reflexive verbs are not allowed.


Names[edit]

Names fall into several categories, including company names, the names of products, given names, family names, and the full names of specific people, places, and things. Wiktionary classifies all as proper nouns, but applies caveats to each.

Generic terms are common rather than proper nouns. For example: Remington is used as a synonym for any sort of rifle, and Hoover as a synonym for any sort of vacuum cleaner. (Both are also attested family name words, and are included on that basis as well, of course.) Hamburger is used as generic term for a type of sandwich. One good rule of thumb as to whether a name has become a generic word is whether the word can be used without capitalization (as indeed “sandwich” was in the previous sentence).

Company names[edit]

Being a company name does not guarantee inclusion. To be included, the use of the company name other than its use as a trademark (i.e., a use as a common word or family name) has to be attested.

Brand names[edit]

A brand name for a physical product should be included if it has entered the lexicon. Apart from genericized trademarks, this is measured objectively by the brand name’s use in at least three independent durably archived citations spanning a period of at least three years. The sources of these citations:

  1. must be independent of any parties with economic interest in the product, including the manufacturer, distributors, retailers, marketers, and advertisers, their parent companies, subsidiaries, and affiliates, at time of authorship; and
  2. must not identify any such parties.

If the term has legal protection as a trademark, the original source must not indicate such. The sources also must not be written:

  1. by any person or group associated with the type of product;
  2. about any person or group specifically associated with the product; or
  3. about the type of product in general.

The text preceding and surrounding the citation must not identify the product to which the brand name applies, whether by stating explicitly or implicitly some feature or use of the product from which its type and purpose may be surmised, or some inherent quality that is necessary for an understanding of the author’s intent. See examples.

Given and family names[edit]

Given names (such as David, Roger, and Peter) and family names (such as Baker, Bush, Rice, Smith, and Jones) are words, and subject to the same criteria for inclusion as any other words. Wiktionary has main articles giving etymologies, alternative spellings, meanings, and translations for given names and family names.

For most given names and family names, it is relatively easy to demonstrate that the word fulfills the criteria, as for most given names and family names the name words are in widespread use in both spoken communication and literature. However, being a name per se does not automatically qualify a word for inclusion. A new name, that has not been attested, is still a protologism. A name that occurs only in the works of fiction of a single author, a television series or a video game, or within a closed context such as the works of several authors writing about a single fictional universe is not used independently and should not be included.

Hypocoristics, diminutives, and abbreviations of names (such as Jock, Misha, Kenny, Ken, and Rog) are held to the same standards as names.

The status of patronymics has not been settled.

Genealogic content[edit]

Wiktionary is not a genealogy database. Wiktionary articles on family names, for example, are not intended to be about the people who share the family name. They are about the name as a word. For example: Whilst Yoder will tell the reader that the word originated in Switzerland (as well as give its pronunciations and alternative spellings), it is not intended to include information about the ancestries of people who have the family name Yoder.

Names of specific entities[edit]

A name should be included if it is used attributively, with a widely understood meaning other than its referent. For example: New York is included because “New York” is used as a byword for a particular type of city in phrases such "the New York of the West". In contrast, a person or place name that refers only to its referent should not be included. "Lower Hampton", "Burj Dubai", and "George Herbert Walker Bush" thus should not be included. Similarly, although Jefferson and Jeffersonian can be included, "Thomas Jefferson" should be included only if it is shown to refer to something other than specific historical individuals known by that name.


Issues to consider[edit]

Attestation vs. the slippery slope[edit]

There is occasionally concern that adding an entry for a particular term will lead to entries for a large number of similar terms. This is seldom a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form. Some examples:

  • Any word in any language might be borrowed into English, but only a few actually are. Including spaghetti need not imply that ricordati is next (though it is of course fine as an Italian entry).
  • Any word may be rendered in pig Latin, but only a few (e.g., amscray) have found their way into common use.
  • Similarly, any word may be rendered in leet style, but only a few (e.g., pr0n) see general use. And only those leet and pig Latin terms that can be fully attested are eligible for inclusion.
  • Combining forms like meta- and -ance can be added in a great many more words than they actually are. (This differs from inflectional suffixes like -s for the plural of a noun and -ed for the past tense of a verb, which can be used for almost any noun or verb.) Again, only those combined forms that are attestable can be included in Wiktionary.
  • Trendy internet prefixes like e- and i- may seem to be used everywhere, but they aren’t. If I decide to talk about e-thumb-twiddling but no one else does, then there’s no need for an entry. The term becomes eligible for inclusion only if it has been taken up by at least three people in durably-archived speech or writing.

Typographic variants[edit]

The inclusion of terms which contain unusual characters or are otherwise unusual in form, such as G-d, pr0n, i18n or veg*n is somewhat controversial. A few view some of these as bizarre or illiterate, notwithstanding their appearance in a variety of media, and find them to have no place in a respectable dictionary. This raises the question of what constitutes a respectable dictionary. In any case, it tends to exclude terms that people might well run across and want to know the meaning of. In some cases, however, users may want to know the meaning of something using unfamiliar characters or symbols, but not know how to reproduce the characters or symbols on their keyboards. This is particularly troubling in cases where the problematic characters occur among the first few that they enter in a search. Some thought should be given to means to help users find such entries, including appendices and redirects.

References[edit]

  1. ^ Wiktionary:Votes/pl-2010-02/Correct figures in CFI
  2. ^ Wiktionary:Votes/pl-2007-04/Fictional languages