Wiktionary talk:Criteria for inclusion/Editable

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Discussion[edit]

Addition[edit]

Something else that could be added here is the info on "Gaps in entries" most recently discussed I think at Wiktionary:Beer parlour archive/2008/April#Gaps in entry titles. Not sure quite where, though. --Bequw¢τ 02:34, 14 October 2009 (UTC)

Appendix:Names as an index[edit]

I removed this sentence: and has two appendices for indexing those articles: Appendix:Names, Appendix:Surnames-A, because it is inaccurate. These appendices are a project by an editor who does not use other people's contributions as his source. Many names with actual entries are missing from them. The appendices could be used as an index if somebody would create a bot and run it, but even then, only for names in Roman script. It could be explained better in a "Wiktionary:About given names" than in the CFI. --Makaokalani 14:40, 26 October 2009 (UTC)

Why does a term have to be "idiomatic" ?[edit]

The commonly understood meaning of idiomatic surely is Resembling or characteristic of an idiom.

  • (OK, I know there is another meaning probably used mostly by linguists - Pertaining or conforming to the mode of expression characteristic of a language., but this Wiktionary surely is not intended primarily for academic linguists ?

Therefore, to say a word or term must be "idiomatic", is, to most people, to say it must be an idiom. Which is wrong. So why include the need for it to be idiomatic at all. Unless your intentiion is to make this an academic work, not a popular work.

Just leave "idiom" as one of the alternative criteria. —This unsigned comment was added by Richardb (talkcontribs).

So would all "lexical words" (letters between white-space) be included? What about highly-agglutinative languages, where a long compound word might not merit independent inclusion? What about Thai where there's no whitespace boundaries between words? Maybe "idiomatic" can be replaced with criteria similar to WT:IDIOM. --Bequw¢τ 16:12, 14 November 2009 (UTC)
But Richard, that's not our leading definition for the word. The fact that you weren't able to distinguish one definition of idiomatic from another does not argue for removing this time-honored, much-used, heavily-relied-upon principle from the CFI. Your confusion can also be solved by adding more explanation, instead of by removal of the part you didn't understand. --EncycloPetey 17:35, 14 November 2009 (UTC)
So, your argument is, in a nutshell, "we insiders know what it means". IPersonally I cannot see why it is neeeded at all (All words in all Languages), but if you want it in, please find a bette way of expressing it unambiguously, that is understood by the general population, not just expreinced linguisitcs insiders.--Richardb 02:30, 6 December 2009 (UTC)

The idiomaticity criterion is based on the assumption that a dictionary is used only for definitions, which is not the case. It is used not only to find a definition, but also to find which word, or which phrase, should be used for expressing a given idea. A set phrase criterion should be used instead, for this reason. Lmaltier 16:29, 15 November 2009 (UTC)

Three points here:
  • I think both "idiom" and "idiomatic" need to be used in CFI - their different functions (as noun and adjective) are useful in different contexts. Sometimes it is more idiomatic to use the noun form and say "such and such is an idiom." And both definitions of the adjectival form which Richardb quotes above apply. The first definition would usually tend to be descriptive of the term itself, while the second usually describes it's usage (even for non-academics).
  • Yes, a "set phrase" criterion should be used, but as well as idiom/atic, not instead. From the Usage notes in Wiktionary's definition of set phrase, set phrase and idiom overlap, but neither is a sub-set of the other. Some idioms are set phrases, but not all, and some set phrases are idioms, but not all. The set phrase "set phrase" does not appear anywhere on the current CFI - I strongly suggest it should, but leave it to experts for placement after this discussion has matured.
  • In Usage notes, editors should include references to which category/ies each meaning of a term belongs, e.g. set phrase, idiom, compound word, multiple word, proverb, etc. -- Bricaniwi 19:18, 29 November 2009 (UTC)
Set phrases are useful and could be listed in the dictionary, but do they merit separate entries? Could we include the necessary information by simply listing the set phrases in a section on the entries for the base/principal word(s)? That would make the treatment of set phrases an ELE issue and not a CFI issue. That way idiomaticity is still the guiding principle for the CFI. --Bequw¢τ 22:32, 29 November 2009 (UTC)

Wikisaurus sub-entries[edit]

  • This could also apply to lists, concordances etc.

Wikisaurus undeniably does have, and very usefully, many words that some argue do not meet the strictest interpretation of CFI. Unless someone can mount a sensible argument why Wikisaurus should be limited, and to therefore to some extent emasculated, by the strict interpretation of CFI, Wikisaurus should have a slightly looser CFI.


My suggestion is
===Wikisaurus sub-entries===
Terms added under Wikisaurus entries (not the headwords) do not need to strictly meet the same critieria for inclusion. Wikisaurus can easily accommodate many local, temporary, slang words.

Some management attention may be needed to sort out these entries using structured pages, and additional pages such as /more, /translation.


Allowing Wikisaurus to be used in this way somewhat takes the pressure off the arguments and wheel wars over neologisms, protologisms etc. And can be useful. If I run across a new slang word when listening to my kids' conversations, or in a youth publication, it would be nice to be able to find it in Wiktionary, through searching Wikisaurus if necessary, and, possibly, find out what it might mean.

Equally, if a new user is drawn to Wiktionary because they have "discovered" a new word, it is an encouragement to them to be a contributor to allow them to enter their new word somewhere, or to find that it already has been captured, even if only in Wikisaurus.

Conversely, if a user looks up a word thay have seen but don't know, and don't find it in Wiktionary at all, then Wiktionary is held in a lesser regard by the user. —This unsigned comment was added by Richardb (talkcontribs).

It's true that other namespaces generally don't follow the CFI. This is usually because these other namespaces deal with specific works (whether it be a concordance of a book, or terms from a reference work of a constructed language). Wikisaurus (a thesaurus) on the other hand is a general reference work more similar to the main namespace (a dictionary). Therefore many people feel the CFI should apply to Wikisaurus independent of what the current CFI says. I agree with them. Wiktionary does not try to provide answers to all questions, and if one has to go to Urban Dictionary sometimes, that's fine. Your proposal could be discussed better if you could provide concrete alternative criteria for Wikisaurus (other than just saying "less strict"). Also, please remember to sign your comments. --Bequw¢τ 16:22, 14 November 2009 (UTC)
In my book, the only real criteria would be
  1. Does it add value?
  2. Is the value it adds outweighed by the "cost" - eg: does it bring the dictionary into dsirepute for inaccuracy.
My view is that most conscientious entries do add value. There is very little real argument for excluding them. If they are a stub, add value by expanding it. In my view, people who want to delete things are always taking value away, unless it is justified by the thing being removed actually being wrong in some fundamental way.
But that argument was lost long ago to the exclusionists. It is too easy for them to win, simply by deleting!
But I would like to see the simple "Does it Add Value" Criteria being applied to the simple addition of a word to a Wikisaurus page. Does it add to the sum of knowledge? I will never understand the urge some people feel to delete, to remove, to censor.
You may not klnow my history. I come and go every few months, try to add value and do some tidying up. But usually quit in dusgust after the exclusionists win again.--Richardb 06:22, 15 November 2009 (UTC)
Yes the CFI isn't completely inclusionist. It is, however, the community consensus. You haven't said why Wikisaurus should be bound to a different CFI than the main namespace. And yes, by making edits contrary to community consensus it is inevitable that you will get disgusted. --Bequw¢τ 16:14, 15 November 2009 (UTC)
Richard, I oppose your proposal. I have started a thread in Beer Parlour on the subject of "Wikisaurus - inclusion criteria". --Dan Polansky 16:58, 15 November 2009 (UTC)

I agree that there is no reason to adopt different criteria for Wikisaurus. But the text of CFI should be changed to clearly allow all words. No, there is no community consensus on the current CFI text, and Richard's reaction illustrates this absence of consensus. Lmaltier 17:15, 15 November 2009 (UTC)

Consensus as understood in discussions about wiki governance is not the same as unanimity, meaning that every single editor agrees. So Richard's reaction does not show the absence of consensus; it only shows at least one person disagrees: the consensus is not unanimous. --Dan Polansky 20:46, 15 November 2009 (UTC)

Purpose[edit]

This is an Editable CFI to promote discussion on what CFI could or should be. It is therefore inappropriate to control what is entered into the page on the grounds of what CFI currently is. —This unsigned comment was added by Richardb (talkcontribs).

I believe the other editors reverted your edits because of how they thought the CFI should be. As several other editors (including myself) disagreed with your edits, it would be best to discuss this controversial opinion here on the talk page. --Bequw¢τ 16:24, 14 November 2009 (UTC)
And for the same reason, I just reverted your edits. Once you know a certain change is controversial, find consensus on the talk page before making unilateral changes. --Bequw¢τ 16:01, 15 November 2009 (UTC)

Language-specific content[edit]

I think the CFI should be a little less English-centric in regards to sections such as Wiktionary:Editable CFI#Modern English possessives. One way to do this would be to have standard sections on language About pages dealing with language-specific inclusion criteria. The aforementioned section could then be moved to WT:AEN, and the CFI could be updated to link to the appropriate sections in the language About pages. --Bequw¢τ 17:01, 14 November 2009 (UTC)

Word criterion[edit]

About the question above (So would all "lexical words" (letters between white-space) be included?), the answer has to depend on the language. In German, these "lexical words", even very long ones, are considered as words, and should be included. For languages without whitespace boundaries, the criterion should be based on what is considered as a word in the language. Lmaltier 16:30, 15 November 2009 (UTC)

Contradiction in CFI[edit]

The first sentence is As an international dictionary, Wiktionary is intended to include all words in all languages.. The second sentence contradicts the first one by restricting it: A term should be included if it's likely that someone would run across it and want to know what it means.. This is a serious issue. The logical second sentence should be: A term should be included if it's a word used in the language (attested in the language), or a term used in the language like a word (i.e. an elementary brick of the language, such as a set phrase or a proverb). A dictionary is used not only when reading a text, but also when writing texts, this is what the current second sentence forgets. Lmaltier 16:39, 15 November 2009 (UTC)

I think the second sentence does need work, as "what someone would run across" is not really a guiding principle for inclusion and it does not include the needs for language production information. Your cleanup, however, introduces a new criterion which is vague (what does "used ... like a word" mean?) and controversial (moving away from just attestation+idiomaticity). If one is to add an additional criterion for inclusion, it should be precisely defined. I would be fine scrapping the second sentence all together and rewording the third to say that our principles are idiomaticity and attestation. --Bequw¢τ 00:21, 30 November 2009 (UTC)
I explained what I mean: an elementary brick of the language, such as a set phrase or a proverb. Atlantic salmon is not idiomatic at all, but is a set phrase worth an entry nonetheless, as it's the usual name given to a species. All set phrases are used as words, i.e. people using them do not build them in their minds from several words, they use them as a whole, they use them as words (in the linguistic sense of word), even when not idiomatic. Another example is indoor football, which is not idiomatic, but is worth an entry nonetheless, because it's a sport, with its own rules. A third example is prime minister. What I explain is consistent with definitions provided by word and w:word, and this is what I understand when I read all words in all languages. Lmaltier 21:11, 1 December 2009 (UTC)
Actually, Atlantic salmon as a common name for a species is idiomatic. While originally confined to the Atlantic basin and the only salmon species found there, Salmo salar have escaped aquafarms and established wild populations in the Pacific basin. Hence, one can have Pacific Atlantic salmon or Atlantic Atlantic salmon. — Carolina wren discussió 00:33, 2 December 2009 (UTC)
First a note about the meaning of idiomatic that might reduce confusion. One common meaning, "cannot be understood by way of a literal translation" is too foreign-language specific to be unambiguous in these types of analyses so I prefer a non-SoP defintion like "having a meaning different than that determined directly by its component parts". I believe Lmaltier may have been using the former definition. The French Saumon atlantique is both the literal and actual translation of Atlantic salmon, so one might think it was unidiomatic. In Basque, however, the actual translation is Izokin which is not the literal translation, implying that Atlantic salmon is idiomatic. The second definition improves clarity by showing that the species sense does not mean "any salmon in the Atlantic" (the SoP definition) and therefore the term is idiomatic. In the same way I would also label indoor football and prime minister as idiomatic and deserving of entries. --Bequw¢τ 05:08, 2 December 2009 (UTC)
I don't think there's any controversy about inclusion of non-SoP terms. What is controversial is the inclusion of unidiomatic (non SoP) set phrases (eg any press is good press) by considering them words. [[word]] is too vague in my view to clarify whether non-SoP set phrases are "words" but of the five methods listed on w:Word only one criterion ("Indivisibility") would count these as "words". The other four would not count them as words, and I would therefore not call them "elementary bricks" of language. Additionally, as w:Set phrase notes, there is no clear dividing line between set phrases and mere common phrases, so it would be troublesome to make CFI policy around them. Better to stick to just "idiomatic". --Bequw¢τ 05:10, 2 December 2009 (UTC)
We include proverbs and expressions of equivalent force because we include them (using unspecified criteria), not by reference to CFI, at least not usefully so. I am not convinced that one set of principles covers all. I think we actually have real choices about classes of entries based on criteria other than metaphysical principles about the definition of "word" or "idiom". Is it searchable? Is it what people expect from a dictionary? Can we satisfy users be means other than having a headword? Is it likely to be maintained? Is it already well covered by another project (eg, two-part species names). This cuts both ways. We have many entries that are non-constituents or that contain placeholder terms that are not words, not grammatical units, and not likely to be typed as search terms by normal users.
  • I think we have progressed beyond slogans and statements of first principles to a point where we are making practical decisions about the value of classes of entries. DCDuring TALK 10:44, 2 December 2009 (UTC)
CFI needs todeal with two things properly, a) what is a language, b) what is a word. Then it it reasonable to say "all words in all languages". I know little about what makes a language, but our policy on what makes a word could (as said) do with some formal clarification. I would very strongly prefer for WT:CFI to be a small set of criteria, the rest of the waffle and clarification does not need to be policy, it could be at Help:CFI or some-such place. As I see it, a rough overview of what we currently think is:
  1. A set of symbols (possibly including spaces) that has a meaning when taken as a unit,
    something like that anyway...
    • except where such a meaning can be derived unambiguously from the context and constituents of the word,
      exclude things that are not idiomatic - but take context into account for ambiguity "she dumped the bucket" is not ambiguous even though dump is polysemous.
    • and except where that meaning is merely a name for specific entities, not for classes of entities,
      i.e. No places or people, but species and names are fine
    • and only when the word can be found to mean the same thing in three durably archived resources,
      citable is essential, though this may vary per language?
      • excluding the cases that the resources are published within the same year, or are not independent of each other,
        protologisms should also be excluded
      • and excluding the cases that the resources only mention a term.
        and definitions of terms are not to count either
Though this leaves massive gaps of undefined-ness and ambiguity, "constituents" are normally only divided on spaces (and apostrophes) for english terms (note that other dictionaries divide on common prefixes and suffixes too, listing words beginning with un- but never defining them, and not even listing words ending with -s). How do we define "independance" of resources. etc. Conrad.Irwin 01:03, 2 December 2009 (UTC)
Sounds a sensible way to proceed.--Richardb 02:53, 6 December 2009 (UTC)
With regard to Lmaltier's original statement that there is a contradiction because of restriction. there isn't. Saying something should be included is not a restriction. Both statements are consistent. the second statement does not say anything at all about exclusions; it only makes a statement about inclusions. --EncycloPetey 03:29, 2 December 2009 (UTC)
Well, such a sentence in criteria for inclusion is very misleading. Lmaltier 21:47, 2 December 2009 (UTC)
Why? It's exactly what the page title says; it's a criterion ofr inclusion. --EncycloPetey 03:31, 4 December 2009 (UTC)
Actually, CFI are used for excluding (RfD), this is why it's misleading. Lmaltier 11:58, 5 December 2009 (UTC)
I take it from what EncycloPetey says above, that the second sentence is superfluous, and can be dropped, so that we end up with "Wiktionary is intended to include all words in all languages". Is that your intentiion EnclycloPetey ? (I have trouble understanding EP's logic, so need to clarify this)--Richardb 02:53, 6 December 2009 (UTC)
It's not my logic; it's simply logic. I have a hard time following your prose. Your determiner is missing a referent, so I can't be sure which sentence is "the second one" in your opinion. Especially so in this case, where CFI has no sentence in the opening section following the lead one --EncycloPetey 02:08, 10 December 2009 (UTC)
Isn't it just people and places that people have a problem with? Has there been any hubbub about other proper nouns sense such as "Hebrew" (language)? --Bequw¢τ 04:19, 7 December 2009 (UTC)
With regard to specific entities (people and places) I think idea is that we want to include it if there is a meaning apart from the original referent. Currently, however, we are restricted only to cases where the term is used attributively. The OED is slightly more liberal, including them if they are used attributively ("New York deli"), possessively ("Foucault's pendulum"), figuratively, or allusively ("meet one's Waterloo"). Additionally, they have include Marengo since it is used in the term "poulet a la Marengo" (w:Chicken Marengo), though I'm not sure if this would be an additional exception or one of the four previous ones. As this this sort of relaxation is in the same vein as attributive usage, would there be any problems in adding this types of exemptions for people and place names? --Bequw¢τ 21:04, 9 December 2009 (UTC)
If those ideas could be well articulated, with examples we can agree upon, then yes, I'd say that sounds like a good idea. --EncycloPetey 02:08, 10 December 2009 (UTC)

History[edit]

Does anyone know the history of the phrase "all words in all languages"? Was this a part of the original proposal or just something that caught on early? Just curious. --Bequw¢τ 14:44, 10 December 2009 (UTC)

Particular individuals[edit]

In this edit, I have replaced the term "specific entity" with the term "particular individual", and made the wording more specific. This is a proposal edit, of course. The term "entity" refers ambiguously both to individuals and classes; the term "specific" refers to the narrowness of the class, not to the fact that the referent in question is an individual, by my understanding anyway.

If you like the edit, you may indicate it here, so it can be brought to a vote.

If my understanding of "specific" and "entity" is wrong, I am eager to learn where I err. --Dan Polansky 15:06, 1 January 2010 (UTC)

Undone as many thought is increased confusion. --Bequw τ 15:26, 7 April 2010 (UTC)

Clarifying Vote wording[edit]

WT:Votes/pl-2009-12/Unidiomatic multi-word phrases to meet CFI when the more common spelling of a single word finished without specific wording. It would seem like a priority to settle the wording (probably here) so that we can vote on the real text to be inserted rather than leave the vague vote link that's in there now. Anyone have thoughts on what I just added? --Bequw¢τ 02:24, 27 January 2010 (UTC)

Idioms[edit]

The WT:CFIE#Idiomatic phrases section would seem to be English-specific and therefore more appropriate at WT:About English. Would there be any reason against moving that section and instead leaving something short and language neutral with maybe an English example? --Bequw τ 02:33, 21 March 2010 (UTC)

First attempt. Moved many details to WT:AEN#Phrases. Thoughts? --Bequw τ 15:25, 7 April 2010 (UTC)