Wiktionary:Beer parlour/2011/January

This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.

Beer parlour archives edit

2024

2023

Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

December

Historical events

I created Category:Historical events (which is different from Category:History in many aspects) and filled it with fourteen members. Naturally there are many other names of historical events yet absent from this category and from Wiktionary.

Since proper nouns are currently a delicate subject, I ask: Are there any limits and thoughts about the inclusion of these events? For instance, in the subject of Christianity, I suggest the creation of Council of Chalcedon, Protestant Reformation and Great Awakening. --Daniel. 00:59, 1 January 2011 (UTC)[reply]

Err, how is World War I or the Middle Ages an "event"? ---> Tooironic 14:04, 1 January 2011 (UTC)[reply]

The entry event is defined as An occurrence; something that happens. World War I and the Middles Ages happened. Historically. --Daniel. 17:12, 1 January 2011 (UTC)[reply]

The Middle Ages aren't really an "event" that "happened"; they're a time period (in a certain region). World War I, however, I would consider to be an event. —Ruakh_TALK 22:52, 1 January 2011 (UTC)[reply]

Very often, these events are known under somewhat arbitrary names, and this fact makes it possible to consider these phrases as idiomatic phrases, as words (they should be defined as proper nouns). It's the case for Great Awakening, World War I or Révolution française: the meaning does not derive from the sense of their components, it cannot be guessed precisely with certainty. But including Council of Chalcedon is very disputable, for the same reason as Winston Churchill. I would not include it. Lmaltier 15:21, 1 January 2011 (UTC)[reply]

From your chosen examples, the proposed distinction between events that are or nor arbitrary idiomatic names is obscure. One can guess that "Council of Chalcedon" is a council held in a place named Chalcedon, or by a person named Chalcedon. Similarly, one can guess that "World War I" is the first of a sequence of wars that affect the world in great scale; and that Révolution française is a revolution that occurs in France. --Daniel. 17:12, 1 January 2011 (UTC)[reply]

There have been several revolutions in France, only one of them is called this way. The meaning of World War I is not obvious at all, you can only guess it was a world war. On the other hand, councils are normally named after the place where they were held, you cannot add any linguistic data, what you can add would be encyclopedic. Lmaltier 21:26, 1 January 2011 (UTC)[reply]

There's only one Révolution française, but it's frequently referred to simply as la Révolution. —Ruakh_TALK 21:32, 1 January 2011 (UTC)[reply]

You are right, Révolution might be created (but certainly not la Révolution, the article does not belong to the word). fr:Révolution française includes many translations, and a well-known anagram. Lmaltier 21:54, 1 January 2011 (UTC)[reply]

As you can see, I did link to [[Révolution]]; but I definitely think la should appear on the headword line. —Ruakh_TALK 22:48, 1 January 2011 (UTC)[reply]

I disagree: no, the article does not belong to the word, even in the headword line: it's like France, and unlike La Haye. The use of la seems to be systematic in sentences, you are right, but here is an example of a use without the article: http://www.bertrand-malvaux.fr/p/486/plaque-de-giberne-de-la-garde-nationale-revolution-1789-1792.html Lmaltier 22:27, 2 January 2011 (UTC)[reply]

Is the Council of Chalcedon the only council ("committee that leads or governs", "discussion" or "deliberation") that happened in Chalcedon? --Daniel. 22:23, 1 January 2011 (UTC)[reply]

From "World War I", one can infer that it is not only a world war, but the first of them. With a little creativity, and a certain lack of knowledge of basic history of the 20th century, one possible alternative explanation is that the "World War I" is the ninth world war, after World War A, World War B, World War C, World War D, World War E, World War F, World War G and World War H. --Daniel. 22:23, 1 January 2011 (UTC)[reply]

What bothers me about all of these entries for singular entities or events is that they are so essentially different in nature from the fundamentally empirical nature of a dictionary. A "good" encyclopedic entry is a fairly precise and comprehensive description of something, but does not correspond to actual usage by very many people. It seems to me that such entries end up being prescriptive. How do we know what most people actually mean when the use the term French Revolution? Do we have recourse to experts about what they should mean? Do we abridge encyclopedias and other references? How do we know which facts are the salient ones in common usage for inclusion in a dictionary-length definiton (or short-attention-span encyclopedia entry)? Is it just a matter of individual opinion? That would politicize many matters. Are there any facts about usage that we could have recourse to clarify such matters? — This unsigned comment was added by DCDuring (talk • contribs) at 00:35, 3 January 2011 (UTC).[reply]
How do we know what people mean by French Revolution? The same way we know what they mean by oat; we look at examples of how people are using it, and try and infer a definition. What facts are salient? That's a hard problem; I find the oat solution, which gives a definition for the word that is precise, accurate, and completely useless to any one who actually needs the definition, to be frustrating, but it's probably the best thing to do with Wikipedia to point people to. (A "short-attention-span encyclopedia entry" would probably be more useful for readers, though.) Could a definition be political? Sure, like "A grain, which in England is generally given to horses, but in Scotland supports the people." as a definition for oats. And if you want facts about usage hit Google Books or Usenet, and see how it's being used.--Prosfilaes 03:36, 3 January 2011 (UTC)[reply]
Well, the definition of a word should be what the word means. Yes, this is the meaning actually intended by people using the word (if we don't know with certainty what people were intending, it's better either not to include any definition, or to add a warning). The definition must be sufficient to understand what the word means, and not include anything more. For a revolution, this might include the location and the date, everything else being covered by Wikipedia: a link is necessary). I see no difference between oat and French Revolution about all these points. Lmaltier 16:00, 3 January 2011 (UTC)[reply]

"singular of..."

The entry Assembly of God is defined as "The singular of Assemblies of God." Are there other English nouns whose plural is lemmatized and the singular is defined as a "singular of"? We don't categorize on that basis, so it's hard to know. --Daniel. 03:49, 1 January 2011 (UTC)[reply]

Does shoop count? —Ruakh_TALK 03:52, 1 January 2011 (UTC)[reply]

I sometimes feel with entries that are rare in the singular, but the plural is much more common, that {{singular of}} seems justified. Mglovesfun (talk) 22:14, 2 January 2011 (UTC)[reply]

Renaming "Mandarin" headings to "Chinese"

The Wikipedia page of w:Standard Mandarin has been recently moved to w:Standard Chinese after months of discussion, and I think it's about time for Wiktionary to follow this practice. "Modern Standard Chinese" (MSC, direct translation of its Chinese term 现代标准汉语) is the official language of the People's Republic of China and Taiwan, and is the de facto literary standard of all Chinese varieties. Although it is based on the phonology of the Beijing dialect and the grammar of northern Mandarin dialects in general, it is nonetheless inappropriate to simply call MSC "Mandarin". It functions as a high prestige spoken and written standard of all Chinese variants, in a way quite similar to Modern Standard Arabic (MSA), but is much more widely used than MSA. Saying that Standard Chinese is Standard Mandarin is implying the presence of other standard Chinese languages when in fact there is not; there is no such thing as Standard Gan, Standard Wu, Standard Hakka, etc. Even Cantonese does not have a de jure written standard and its written form has to converge as much as possible to MSC. Just look at the major news websites from Hong Kong [1][2][3][4][5][6][7] - the thing written on those pages is essentially what Wiktionary calls "Mandarin" (I'm not eliminating the possibility of variety headings for dialectal words however). Nevertheless, the current Wiktionary policy of eliminating "Chinese" headings, based on some editors' superficial impression that the mutual intelligibility across Chinese varieties makes them vastly different when spoken, ignores the fact that Wiktionary is written-language-orientated, and gives unfair treatment of Chinese compared with other languages in similar situations. For example, Serbo-Croation, Arabic and Chinese are all macrolanguages under ISO 639-3; Serbo-Croatian is written in different scripts and has dissimilar literary standards in different countries; Arabic has a modern standard language (MSA) based on Qur'anic Arabic which is less promulgated and used than MSC; And Chinese has a single (spoken and) written standard (ignoring the simp-trad complication) which is widely used amongst virtually all varieties. Yet Serbo-Croatian and Arabic headings are accepted perfectly fine, but the Chinese headings have to be forcibly changed to "Mandarin". This hardly seems reasonable.

I know this may seem like a significant change, but personally I think this is something that has to be changed sooner or later, before Wiktionary is filled up with horrendous-looking reduplicate sections of written-language-sharing languages (as you like) which are neither official in any state nor phonologically and grammatically standardised. With bots, the change should be fairly easy. But first, we have to start acknowledging that designating a written language as "Chinese" is perfectly justified (because written Chinese is Vernacular Chinese which is in turn Standard Chinese) and hence give some tolerance to those headings; and then, it's the work to expunge the misnomerous "Mandarin" headings. Wjcd 04:40, 16 January 2011 (UTC)[reply]

This does not seem like a significant change; it's a label. It is, however, one I oppose, since Cantonese, Wu and other languages are as equally Chinese as Mandarin. The rhetoric about intolerance towards those languages not official in any state I find rather disturbing. It's simply bizarre to say that a language is not "phonologically and grammatically standardised"; by the definition of language, there is sufficient agreement among speakers as to phonology and grammar to communicate. Probably more so among most unofficial languages then among the large official languages that have millions of second language speakers and groups of speakers that have been geographically separated for some time.--Prosfilaes 06:24, 16 January 2011 (UTC)[reply]

It is true that these languages are equally Chinese as Mandarin, but the thing is that there is only one de facto literary standard in Chinese, which happens to be based on a dialect of Mandarin (w:Regional language#Relationship with official languages). So whilst the other varieties are sometimes significantly different in speech from the literary standard, when written convergence has to be implemented. Because there is virtually nothing published which can be used as a written attestation of a term in Chinese varieties other than MSC, and it is similarly quite hard to find audio or video material which is "of verifiable origin" and "durably archived" to show the "clearly widespread use" of the term, it is probably best to avoid this kind of attestation. The high lexical similarity resulting from the logographic nature of Chinese characters also makes it hard to say that "this word is a strictly dialectal word", because locals often mingle into their MSC some regional expressions without realising their dialectal nature, and cross-dialectal borrowing is very common (the resultant mixture is termed "regional MSC"). Wjcd 09:33, 16 January 2011 (UTC)[reply]

I still see no reason to label non-Mandarin languages as dialects.--Prosfilaes 21:57, 16 January 2011 (UTC)[reply]

Like you said, "dialect" or "language" is only a label. Wjcd 00:49, 17 January 2011 (UTC)[reply]

Sure. But dialect is a way of putting down people who don't speak the standard languages and the languages they speak. "Respectable people don't speak dialect." And ISO 639-3 has labeled them languages, so that's the default position for us; we have to argue why we're going against that.--Prosfilaes 02:45, 17 January 2011 (UTC)[reply]

If we had the header ==Standard Mandarin==, your argument for writing ==Standard Chinese== would make sense to me: Modern Standard Chinese already implies a form of Mandarin. But we don't, so your argument seems like a stretch. You state that "Wiktionary is written-language-orientated", but I don't think that's true. Wiktionary itself is exclusively written, and it is largely dependent on written sources for verification; but these are limitations to be confronted, not strengths to be celebrated. —Ruakh_TALK 06:34, 16 January 2011 (UTC)[reply]

The current header is a misnomer; the language that the section underneath it describes is properly called "Modern Standard Chinese". Proponents of the name "Mandarin" while realise that the de facto written form of Chinese is based on a dialect of Mandarin and that "Chinese" is too inhomogenous a macrolanguage to be packed under one heading, fail to realise that the group of Mandarin dialects is not homogeneous enough to be described in such a way either, and that using "Mandarin" to refer to MSC is an inappropriate underrepresentation of MSC's actual use. Wiktionary's written-language-orientatedness is inherent; we essentially take words as how they are normally written, and attestation via non-written means is essentially negligible. This is especially true for languages which may differ greatly in written and spoken forms, such as Tibetan and Finnish (and Chinese). Wjcd 09:44, 16 January 2011 (UTC)[reply]

I see no reason to change the system, and certainly we should not necessarily model ourselves after Wikipedia since the aims of the projects are so different. However I have an open mind, and would like to hear how exactly you propose to change it - e.g. how would you define words with the same characters but under different languages (Mandarin, Cantonese, Min Nan, etc)? These are different (read: mutually unintelligible) languages and one reading will inevitably have different meanings, pronunciations, etymologies, usage notes, etc, all of which is very relevant information for Wiktionary. ---> Tooironic 07:17, 16 January 2011 (UTC)[reply]

My envisaged layout would be:

1. For characters (一):

==Chinese==
===Etymology===
:Pictographic. Originally as [[弌]], symbolising an outstretched finger. <ref>清代陳昌治刻本『說文解字』［卷一］［一部］</ref>

/*Different scripts*/

===Pronunciation (and romanisations)===
*Modern Standard Chinese:
**Hanyu Pinyin: [[yī]]
**Zhuyin fuhao: [[ㄧ]]
**Wade-Giles: i1
**{{audio1|zh-yī.ogg}}
*Mandarin dialects:
**Beijing (Jilu Mandarin): i 11
**Tianjin (Jilu Mandarin): 1 45
**Jinan (Jilu Mandarin): i 11
**Xi'an (Zhongyuan Mandarin): i 11
**Taiyuan (Jin): iǝʔ 41 /*also possible as a separate branch*/
**Chengdu (Southwestern Mandarin): i 12
**Wuhan (Southwestern Mandarin): i 12
**Yangzhou (Jianghuai Mandarin): iǝʔ 4
*Wu dialects:
**Shanghainese: iîʔ 4
**Suzhou dialect: iôʔ 41
**Wenzhou dialect: iai 41
*Xiang dialects:
**Changsha dialect: i 4
**Shuangfeng dialect: i 12
*Gan dialects:
**Nanchang dialect: it 41
*Yue (Cantonese) dialects:
**Guangzhou dialect: {{IPA|/jɐt˥/}}
**Yale romanisation: yat1
**Romanisation by linguistic society of Hong Kong: jat1
*Hakka dialects:
**Hailufeng: rit7
**Lufeng: jit7
**Shatoujiao: jit7
**Hakka-English dictionary: jit7
**Meixian: jit7
**Dongguan: jit7
**Taiwanese: jit7
**Baoan: jit7
*Min dialects:
**POJ: 
**Teochew dialect: zêg8(chêk) ig4(ik)
**Jieyang dialect: êg4
**Fuzhou dialect: eiʔ 41
**Xiamen dialect: it 41 (''literary''); cit 41

====Historical====
*Middle Chinese:
**Fanqie: 於悉
**Initial: 影
**Final: 質
**Tone: 入
**She: 臻
**Openness: 开
**Deng: 重钮四等
**Reconstructions:
***Karlgren: ʔ˜(ĕt
***Zhou Fagao: ʔiIt
***Pu Liben: ʔjit
***Wang Li: ˜(ĕt
***Li Rong: ʔiĕt
***Zheng Zhang Shangfang: ʔiIt
***Dong Tonghe: ʔjet
***Shao Rongfen: ʔjet
***Pan Wuyun: ʔit
*Old Chinese: 
**Reconstructions:
***Karlgren: {{IPA|ʔi̯ĕt}}　Ⅹ/5部
***Wang Li: {{IPA|iet}}　質
***Zheng Zhang Shangfang: {{IPA|qlig}}　質2部
***Li Fanggui: {{IPA|ʔjit}}　質
***Bai Yiping: {{IPA|ʔjit}}　質部
***Pan Wuyun: {{IPA|qlig}}　質2部
*Proto-Sino-Tibetan: *ʔĭt (˜ɣ-) (''per ...'')

===Definitions===
/*quotes omitted*/
#(''cardinal number'') [[one]]
#:/*quote in Classical Chinese*/
#:/*quote in Modern Standard Chinese*/
#(''ordinal number'') [[first]]
#one of
#[[part]], [[portion]]
#[[once]]
#[[some]], [[someone]], [[something]]
#[[every]], [[each]]
#[[full]], [[complete]]
#[[same]], [[identical]]
#[[unite]], [[unity]]; [[alliance]]
#[[unification]], to [[unify]], to [[integrate]]
#[[single-minded]], [[undistracted]]
#[[start]], [[begin]]; [[beginning]], [[inception]]
#[[pure]], [[unmixed]]
#[[another]]
#[[alone]], [[single-handed]]
#[[or]]
#[[all]], [[altogether]]
#[[very]], [[quite]]
#[[always]], at all times
#[[unexpectedly]], to one's surprise
#[[once]], [[as soon as]]
#[[one by one]]
#(''conjunction, indicating the continuousness of actions'')
#(''particle, intensifier'')
#(''philosophy'') the original state of the universe
#(a stroke in calligraphy)
#(surname)

===Compounds===
...

===References===
...

===See also===
...

2. For cross-dialectal disyllabics (暧昧):

==Chinese==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*Beijing dialect (Jilu Mandarin): ài mèi
*Guangzhou dialect (Guangfu Cantonese): oi3 mui6
*Suzhou dialect (Taihu Wu): e me
*Shanghainese (Taihu Wu): ee2 mee3
*Taiwanese (Hokkien Minnan): ai3 mui7
*Teochew dialect (Minnan): ai2 mue6
*Hakka: oi4 mi4

===Definitions===
{{zh-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

3. For mildly dialectal ones (鬼佬):

==Chinese==
{{zh-hanzi|[[鬼]][[佬]]}}
===Etymology===
[[鬼]][[佬]] means literally "[[ghost]] man" and arose as a comment on the pale complexion of white foreigners, which was seen as being [[ghost]]-like.

===Pronunciation===
*(''Guangzhou'')
**{{IPA|/kʷɐi˧˥ lou˧˥/|lang=yue}}
**Jyutping: gwái lóu
*(''Beijing'')
**Pinyin: guǐ lǎo
**IPA:

===Definitions===
{{zh-polysyl|ts|rs=鬼00}}
# {{slang|pejorative|lang=yue|skey=gwai2lou2}} [[gweilo]]

====Usage notes====
*A derogatory term commonly used by Cantonese speakers to refer to white people, mainly in speech. (A polite alternative for this term is [[西人]] "sai1 jan4" simply meaning "[[Westerner]]"). Prior to the 1980s the term was commonly prefixed with [[死]] (''sei'', [[w:jyutping|jyutping]]: sei2), meaning death or damnation, to make ''sei gweilo''  meaning “[[damned]] [[ghost]] [[man]]” or “[[damned]] [[gweilo]]”.
*It is chiefly used in Cantonese and other southern Chinese dialects, but has also permeated into Mandarin dialects, especially in those outside PRC and ROC.

====Related terms====
* [[鬼子]]
* [[鬼婆]]
* [[鬼仔]]
* [[鬼妹]]
* [[白鬼]]
* [[黑鬼]]

====See also====
* (''Min Nan'') [[阿斲仔]]

[[Category:zh:Ethnic slurs]]

4. For strongly dialectal ones (傾偈):

==Cantonese==
{{zh-forms|[[倾偈]]|[[傾]][[偈]]}}
===Pronunciation===
*Guangdong Romanisation:
*Hong Kong Government: 
*Yale romanisation:

===Definitions===
{{yue-polysyl|t|sim=|rs=}}
#[[chat]], [[talk]]

Wjcd 10:48, 16 January 2011 (UTC)[reply]

Initially oppose, if I've understood we're talking about merging the Chinese languages into one, not just changing a header. OK, sure there are lots of similarities between the languages, but what about Spanish, Italian, Portuguese, Occitan, Catalan (etc.) I wouldn't want to merge them into "Romance". That said, I'm not too hot on Chinese, so I await further argument. Mglovesfun (talk) 12:44, 16 January 2011 (UTC)[reply]

I hope that the idea is not to merge Chinese languages.

I oppose renaming Mandarin: this is a language, with its own ISO code, etc. and Mandarin seems to be the most standard name for this language (Chinese is ambiguous, and standard Chinese less common). But I would not oppose Chinese headers to be allowed in addition, for readers looking for Chinese words, without knowing exactly what precise language they mean. Lmaltier 14:36, 16 January 2011 (UTC)[reply]

Arabic and Serbo-Croatian are both macrolanguages under ISO 639-3, and these headings are currently allowed in Wiktionary. Would it be reasonable to further split a single Arabic section into 20+ copies, for people may be more interested in their language status than the actual content? Or to further split Serbo-Croatian into its component languages, as Serbo-Croatian is pluricentric with multiple literary standards? Wjcd 14:51, 16 January 2011 (UTC)[reply]

Of course, this is reasonable. The basic principle is a section for each language. Sections would not be identical, they may differ for pronunciation, examples, probably senses of the word (at least in some cases), usage notes, homophones, etc. We have sections for Egyptian Arabic, etc, Croatian, etc. and this is perfectly normal. Lmaltier 15:42, 16 January 2011 (UTC)[reply]

No it's not normal. "Croatian" is a language fabricated in the 1990s by Croatian nationalists once the former federal state seceded from the Communist Yugoslavia. It's exactly the same language as Bosnian, Serbian and Montenegrin, having 99% identical grammar. The notion of "macrolanguage" is non-existent in linguistics: that's a term only used by Ethnologue, a Christian organization intent on translating the Bible to as many languages as possible. Croatian language sections are completely obsoleted by Serbo-Croatian. I don't see how the situation with it is applicable at all to the Chinese scenario: the issue here is merely terminological (Mandarin vs. Chinese), not how the content should be formatted or split. EDIT: Whoops: after reading more thoroughly it appears that the issue is more than about renaming Mandarin to Standard Chinese. This discussion is very misleadingly named. --Ivan Štambuk 16:17, 16 January 2011 (UTC)[reply]

The discussion is not about Croatian. The Chinese case is more similar to Arabic. I don't speak Arabic but I know that words are different between different Arabic languages. An example: a river is pronounced as something like wadi in the Yemen, and something like wed in Algeria, they are different words, even if the writing and the origin is the same (it might be somewhat different, I'm not a specialist). Lmaltier 17:12, 16 January 2011 (UTC)[reply]

Just because a word has different pronunciations, does that mean they aren't the same language? With r's and vowel changes, there are very few words pronounced the same in Northern English dialects of English and Western American dialects. I'm not sure that comparing Chinese, Serbo-Croation and Arabic is reasonable here; they're each sui generis. Chinese at least seems to have less of a resistance to the concept of multiple languages then the Arabs do, and yet a script that has more tolerance to phonological diversity--though I think in Arabic, if I'm not mistaken, wadi/wed would both be written vowelless and hence spelled the same.--Prosfilaes 18:48, 16 January 2011 (UTC)[reply]

wadi/wed is not a small variation in pronunciation... What I mean is that words from different languages are different words. Would you want to merge the English and the French section of interjection, because the spelling, the etymology and the meaning are the same? No, because languages are different. Lmaltier 19:16, 16 January 2011 (UTC)[reply]

Orthographically, interjection is one and the same word, written in English or French. --Ivan Štambuk 19:34, 16 January 2011 (UTC)[reply]

There's a lot of languages with major systematic changes. Loosing phonemes is not rare in dialects of English or Spanish. Is not kasa and kaθa a large variation in pronunciation? There are two words in Spanish, either of which can take either pronunciation, depending on where you are in the Spanish speaking world, and the variation between s and θ, even when it confuses words pronounced different in standard European Spanish, is systemic. What about fag versus cigarette in English? So, no, without a serious corpus and linguistic study, I'm not really interested in debating what is or isn't a large variation in pronunciation. I wouldn't merge them because there's advantages to being systematically consistent. I think whether or not they are the same word is a definitional game; they are if you think that words with the same pronunciation, spelling, etymology and meaning are the same word.--Prosfilaes 21:48, 16 January 2011 (UTC)[reply]

The Spanish s-θ isn't a big variation. In Mandarin, for example, in the Tianjin dialect (100+ km from Beijing), there are significant phonological variations from the Beijing dialect. All the retroflexes are dropped, merged with their fricative or affricate counterparts or a semivowel, and the high level tone becomes a low falling one, the dip tone becomes purely low rising. These are, however, not important. For languages which have a literary standard, such as Arabic and Chinese, reduplicating the sections just because the languages have their own ISO codes is not reasonable. This is more so for Chinese, which has a script tolerant to phonological variation, in which this ثلج - تلج kind of dialectal variation in written Arabic is not possible. Wjcd 00:49, 17 January 2011 (UTC)[reply]

Support renaming Mandarin to Chinese. Mandarin is a fruit grown by Georgians, not a language. --Vahag 15:12, 16 January 2011 (UTC)[reply]

Chinese is a nationality of a state with 292 different languages that fall into seven different language families. Chinese is also the name of an ethnicity that speaks at least seven Sinitic languages. For linguistic confusion, I think that blows the fruit/language one out of the water.

(And I think you weren't being entirely series with that second sentence, but it surely has me confused. w:Mandarin orange lists neither Georgia or the US as major growers of the fruit, so I don't know which one you were referring to. I suspect that Georgian Caucasian language versus US dialect is probably an actual confusion among some of our users, but one a little--non-Wiktionary-specific--education will clear up, and I believe most of our users will be wise enough to look up Georgian if they are confused by a Georgian entry. Of course, on Commons, we had someone telling us that an image needed fixed, because Hurricane Dora was in the Atlantic, not the Pacific, and it was in 1969, not 1988, so apparently some people won't pause to figure out polysemy.)--Prosfilaes 18:48, 16 January 2011 (UTC)[reply]

Do note the existence of Dungan, however. It is not written in Chinese characters. -- Prince Kassad 19:53, 16 January 2011 (UTC)[reply]

Strongly oppose. I think the system that has been proposed above is utterly complicated and confusing - as if making Chinese language entries on Wiktionary isn't difficult enough as it is! Furthermore, to group all the languages under one language header is misleading because just because a word can have multiple readings does not mean that all the readings potentially have the same meaning - they are different languages after all. ---> Tooironic 22:55, 16 January 2011 (UTC)[reply]

How about this for 暧昧?

==Gan==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{gan-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Hakka==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{hak-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Huizhou==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{czh-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Jinyu==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{cjy-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Literary Chinese==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{lzh-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Mandarin==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{cmn-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Min Bei==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{mnp-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Min Dong==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{cdo-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Min Nan==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{nan-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Min Zhong==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{czo-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Pu-Xian==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{cpx-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Wu==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{wuu-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Xiang==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{hsn-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

==Yue==
{{zh-forms|[[暧]][[昧]]|[[曖昧]]}}
===Pronunciation===
*

===Definitions===
{{yue-polysyl|s|tra=曖昧|rs=日10}}
#[[ambiguous]], [[ambiguity]]; [[dubious]]; [[vague]], [[obscure]], [[vagueness]]
#:/*quote in Classical Chinese*/
#:/*quote in MSC*/
#(''modern'') (''of a relationship'') [[intimate]] but not [[explicit]]
#:/*quote in MSC*/

1) These are all valid languages with proper ISO codes. We'd rather have fourteen or so languages which share a single written form listed separately, because of fear of potential differences in meaning. Please, are there any nameable cross-dialectal nuances in meaning in any of the English words "ambiguous", "vague", "obscure", "intimate", "dubious", "explicit", "relationship", "modern", "language"? Yet English is unregulated officially, and written Chinese is regulated.

response: I think Tooironic was thinking of a word like 垃圾. -- A-cai 02:43, 17 January 2011 (UTC)[reply]

2) There is basically nothing that can be regarded as an attestation of this term in Min Zhong, Min Bei etc., according to Wiktionary guidelines. What's spoken in dialects remains spoken, in written form it has to be Modern Standard Chinese. So if someone lists the above definitions for attestation, all but "Mandarin" will fail.

response: I would be very careful about making that type of assertion. The fact that a language is not generally written down does not mean that it can't be written down. This is why we have Wikipedias written in Min Nan, Min Dong, Hakka, Cantonese and Gan, among others. -- A-cai 02:43, 17 January 2011 (UTC)[reply]

3) The name "Mandarin" is erroneous. There is no such thing as "modern written Mandarin". The language that is described under this header is "Modern Standard Chinese", which is applicable as a literary standard to nearly all Chinese varieties. Mandarin is a group of dialects spoken across much of northern and southwestern China, not written. Wjcd 00:49, 17 January 2011 (UTC)[reply]

Wjcd, first of all, allow me to welcome you to Wiktionary. I see that you are a native Chinese speaker. We desperately need those around here, so I hope you stay and continue to contribute for many years to come. You may or may not be aware, but I was the person who originally advocated switching the label from Chinese to Mandarin several years ago. The debates are still in the Beer parlor archives, if you care to look them up. I don't want to rehash that stuff all over again, so I will confine myself to responding to some of the points that you made above.

You said, "Standard Mandarin is implying the presence of other standard Chinese languages when in fact there is not." This is not how I interpret "Standard Mandarin." "Standard Mandarin" means that there are other varieties of "Mandarin" that are not standard. These varieties of Mandarin can be quite different from the "Standard Mandarin" that you hear on CCTV or read in the People's Daily.
You said, "The current header is a misnomer; the language that the section underneath it describes is properly called "Modern Standard Chinese." That's not quite right. There is nothing modern or standard about the term 仆射, yet it is entirely appropriate to include under a Mandarin label. The reason it is appropriate, in my view, is that there is a Mandarin reading for it and it appears in Mandarin language texts. However, it is a historical term, not a modern one. It also features a non-standard reading for the second syllable, not a standard one. That means that we would not be able to include it under a label that implies modern and standard.

In conclusion, the "Mandarin" label simply means any word in Mandarin, whether ancient or modern, standard or non-standard (Mandarin). Of course, for the sake of simplicity, anything not otherwise labelled is considered to be the "Modern Standard" form of Mandarin, as opposed to an "archaic" or "non-standard" form of Mandarin. I hope this clarifies my position, and I look forward to your response. -- A-cai 02:16, 17 January 2011 (UTC)[reply]

First of all, Mandarin is a rather new classification of Chinese varieties. The term 仆射 has basically fallen into disuse by the Song Dynasty as the position was abrogated, which is before the widely-regarded time of inception of Mandarin (using Zhongyuan Yinyun as a division between Middle Chinese and Mandarin). Thus saying that this term is Mandarin is not very appropriate; sure it can be used later on in Mandarin, when talking about historical events, but there is similarly no restriction on this term being used in other Chinese varieties, just that the pronunciation changes according to the dialectal readings of the characters. Therefore it is best called "Classical Chinese" or simply "Chinese".

Secondly, we need to agree that the current "Mandarin" header refers 99% of the time to "Standard Mandarin", because that is the way it is normally written. Other Mandarin dialects are rarely mentioned, not because the absence of anyone knowledgable, but because most of the time the definitions are identical, only the pronunciations differ. Your example of 垃圾 is also a manifestation of this. What is taken to be the pronunciations in this "Mandarin" language is the pronunciations in Modern Standard Chinese; the Taiwanese pronunciation is Min-influenced, but still, it is Beijing dialect-based Guoyu, an example of regional MSC, not another Mandarin dialect. (By the way, 垃圾 may also be used adjectivally in MSC, as well as figuratively. Similarly from what I know, there are also nominal uses of this term in Min Nan (in Fujian), by analogy with MSC.)

That being said, since the Wikipedia page has been moved, in accordance with the actual use of MSC as a prestige literary (and spoken) standard, we should also reconsider the validity of the name "Mandarin" in headers. In fact, the Chinese either call the written form 现代标准汉语 (modern standard Chinese), 普通话 (common speech), or 國語 (national language), 華語 (Chinese language). There is no mention of "Mandarin" (官话, 北方话), which is not surprising since this written form is applicable to nearly all Chinese varieties. Surely local dialects could be written down using the phonetic aspects of Chinese characters, but this is not how people write their language (as evidenced in the above Hong Kong example). Wikipedia content certainly does not constitute an indication of attestability of term usage in non-MSC dialects; anyone could write there. Hence if amalgamating the Chinese definitions is not allowed, attestation in non-MSC dialects will be very difficult. Wjcd 03:40, 17 January 2011 (UTC)[reply]

Is the proposal to rename Mandarin headings to Chinese or to merge all the Chinese languages under a Chinese header? --Yair rand (talk) 04:10, 17 January 2011 (UTC)[reply]

It's about renaming, because the current "Mandarin" header refers essentially to "Modern Standard Chinese", the modern literary standard amongst Chinese varieties. Wjcd 05:14, 17 January 2011 (UTC)[reply]

Mandarin Chinese: It's complicated

I'm creating a new subheader because the thread is becoming unmanageable.

垃圾 may be used as an adjective in Mandarin, but it is not used as a noun in Min Nan. The colloquial Min Nan equivalent is 糞埽. That's the point I was trying to make. The usages don't always match up. Also, sometimes the meaning between dialects is the same, but a given word may be in a formal register in one dialect whereas it is in an informal register in another. See Appendix:Sino-Tibetan Swadesh lists for a good example of what I'm talking about.
Your point about 仆射 is well taken. It is in fact a Classical Chinese term, or to be more precise, an Old Chinese (ISO code och) and a Middle Chinese (ISO code ltc) term. It is also a Mandarin term, albeit an archaic one. The modern Mandarin reading is púyè. To be completely accurate, we would have to include an Old Chinese header, with our best guess as to how it might have been pronounced, along with a separate "Middle Chinese" header, along with our best guess for the pronunciation from this period. The definition section would probably also change, since the meaning of the term gradually evolved over time.

See 字 for an example of how we thought this might eventually be implemented here. I know it seems like a pain to separate words under multiple headers, but the purpose of Wiktionary is not to create yet another mediocre bilingual dictionary. If you want one of those, I can direct you to about a dozen online dictionaries that would fit the bill. Our goal is to document every aspect of the world's spoken and written languages. It will take many years, and may simply not be possible in some instances. Yet, some of us continue to toil away year after year :) -- A-cai 04:23, 17 January 2011 (UTC)[reply]

1) Yes, that is what I meant. When a paragraph based on MSC grammar or imitating MSC grammar is read out aloud by a Min Nan speaker using Min Nan readings, no MSC speakers would regard that as MSC. It may however be perceived as a more formal register of Min Nan.

2) I think our main focus should be on literary languages, not spoken. There is no point in differentiating between Old Chinese or Middle Chinese since the style of writing is largely identical (based on the eloquent speeches of the Confucius etc.), and it's hard to ascertain when a new sense developed. Phonological reconstructions up to Proto-Sino-Tibetan are only necessary for monosyllabic characters; for polysyllabics it is primarily derivative work, based on monosyllabic characters, which I think is unnecessary since mechanically combining the reconstructed pronunciations ignores the important tone sandhis (as we are not even certain in the exact tonal values). The only differentiation needed is between Classical Chinese and Modern Vernacular Chinese, which can also be omitted if the senses deemed archaic or obsolete are tagged accordingly.

3) Actually, according to ISO standards, 字 is not completed. There are at least six Chinese languages unaccounted for there. That's why I don't think we necessarily have to rely on ISO codes when it comes to headers. The work in getting them sectionalised gets exponentially laborious as one gradually proceeds, and in the end we come to the realisation that these "languages", whether historical or modern, are basically the same in terms of written language, and they converge to the prestige literary standard when written; and it's also very difficult to find attestations of these terms in languages other than Classical Chinese and MSC. For example, in 字, the sense "to marry, to wed" and "to nurture" regarded as already archaic in Old Chinese still pops up as a free morpheme in Qing Dynasty literature, and in a way still exists in Modern Chinese, in the form of many fossilised expressions (e.g. 待字闺中). If you are interested, please have a read of my proposed layout for monosyllabics above. By separating out languages that have a fundamentally different core vocabulary (ko, ja, vi) and listing the definitions in Chinese in a roughly chronological order, a general pattern of the dialectal differences in pronunciation and semantic development of characters should become clear. Wjcd 05:14, 17 January 2011 (UTC)[reply]

I understand what you are proposing. I'm just not convinced at this point that it is the best approach. The fact that we're attempting the impossible is not lost on me. However, I don't view this as a short term project. I think of it as a multi-year, possibly multi-decade endeavor. As for your argument about written vs spoken, you're entitled to your opinion and I wouldn't to discourage you from working on whatever aspect of wiktionary you feel is important. We can certainly use all the help we can get. However, please recognize that there may be some who feel that spoken languages are even more important to document, especially since so many of them are rapidly facing extinction. My personal opinion is that wiktionary needs both, and I don't necessarily think it matters which part gets done first. -- A-cai 05:35, 17 January 2011 (UTC)[reply]

Wjcd, I basically support and understand your point of view about the standard Chinese, as Mandarin = Standard Chinese and most written Chinese is actually Mandarin and even the majority of dialectal words are already borrowed into Mandarin. Only a few dialects have standardised or semi-standardised written form and Chinese themselves choose to write in Mandarin when they do write. The marginal cases, noteably Cantonese, which a limited number of specific characters,only support the case for a unified treatment of the language. However, a lot of contributions are made for the dialects as well and the headings and the way we translate Chinese entries have been discussed ad nausea. We have many archived discussions and votes. The decision is made and can be changed only if there is another vote that passes. The status quo is a compromise for all contributors with a different view on Chinese languages/dialects/topolects (whatever you prefer). If you are willing to stay, you are welcome but please follow our guidelines. I did change my attitude since I joined. If you want to make a difference, not make a point, please contribute in Mandarin. We need more skilled editors. Everybody knows that terms "Mandarin and "standard Chinese" are interchangeable. --Anatoli 06:13, 17 January 2011 (UTC)[reply]

It's complicated only if simple principles are not followed. Strictly following the rule all languages and the rule a language=a section, has several advantages:

it's simple,
it's the only solution anyway, as the project is not allowed to take a position on sensitive, controversial, issues (NPOV principle). This principles applies to Wikipedia, but also here.

multi-decade endeavor? Much more: the project cannot be completed: it's impossible, as many new words are created each year. But the objective of all words, all languages is what makes it successful. Lmaltier 06:54, 17 January 2011 (UTC)[reply]

It has been stated at least three times that all the dialects write in standard Chinese but in fact it is the case that often newspaper articles written in Chinese characters in some regional dialects can be 40% or more unintelligible to standard Chinese-only readers, due to different grammar, vocabulary, etc. Doesn't everyone know this? This fact came up several times in the original discussion for separating the Chinese languages/dialects into separate headers several years ago, which I am not certain the new editor has yet read, or expressed an interest in reading, or plans to read. Yes, some local newspapers in southern Chinese regions probably use standard Chinese but those that actually write the local dialect in the local vernacular cannot be understood well by Mandarin-only readers. I'm sure A-cai, as a speaker of Min Nan, can easily find some text like this. 71.66.97.228 08:24, 17 January 2011 (UTC)[reply]

Yes, that is one of the reasons I included the following example in the quotations section of 阿斲仔:

有一ê歐巴桑去美國chit-thô，欲去便所ê時，因為m7捌字，煞行入去查甫 e0 彼間，無外久，一ê阿督仔行入去，隨擱闖出來，一直喝講：「I am sorry ， I am sorry。」尾a0，彼ê阿婆仔行出來氣 tshuà tshuà 講：「夭壽哦！一ê阿督仔真無禮貌，行入來人ê便所，也擱怪人門「抑m7鎖咧！」

The above is from a collection of various writtings, written in vernacular Min Nan, that I found on this website. Note that some words are written in Romanized script. This is quite common in vernacular Min Nan writings, since there are many Min Nan words that lack a standard written form. One solution to this problem is to simply write those words in Romanized script. Another solution, as is evident in the word 阿斲仔, is to substitute the original obscure character with a character whose Mandarin pronounciation roughly apes the pronunciation in Min Nan. Hence, 阿斲仔 becomes 阿桌仔 or 阿督仔. -- A-cai 12:04, 17 January 2011 (UTC)[reply]

That looks like something from a blog; do you have any examples of newspaper text written where some southern dialect is spoken, which can unquestionably be said to not be written in Mandarin, but in one of those southern dialects? 71.66.97.228 22:41, 17 January 2011 (UTC)[reply]

response: I'm not sure if there is anything like that on the web for Min Nan. I don't believe I have run across anything that formal. Such publications have certainly existed in the past. One example is given in the Taiwanese Hokkien article:

. However, there has never been an agreement as to how non-Mandarin languages should be written. Many of the informal blogs use a mixture of Chinese characters and letters. There are even entire novels in Min Nan that do it this way, an example of which is also provided in that same article:

. Again, the larger point is that just because a language isn't generally written down doesn't mean that it can't be. Actually, do you want to know the most common example of written Taiwanese Min Nan in non-academic publications? Taiwanese karaoke song lyrics on KTV systems. The song lyrics are almost always written exclusively in Chinese characters, but heavily borrowing from Mandarin usage. For a more complete discussion of this phenomenon, see User_talk:A-cai/2009#甲伊. -- A-cai 01:22, 18 January 2011 (UTC)[reply]

With hardly any support, I realise that the proposal is unachievable and unimplementable. Please close this discussion. Wjcd 22:47, 17 January 2011 (UTC)[reply]

Rukhabot vote.

Quoth Wiktionary:Votes/bt-2011-01/User:Rukhabot for bot status:

I hereby request the Bot flag for Rukhabot (talk • contribs) for the purpose of adding interwiki links, as in these edits: [8][9][10][11][12][13][14][15]. Unlike other interwiki bots, Rukhabot uses custom code that only takes into account what mainspace page-names exist on other Wiktionaries; that is, it doesn't depend on existing interwiki links between other Wiktionaries.

—Ruakh_TALK 13:11, 19 January 2011 (UTC)[reply]

.az language name / Azeri / Azerbaijani

I have been asked to change language names for .az on my Wikistats: " "Azeri" to "Azerbaijani" and "Azərbaycan" to "Azərbaycanca" [16] quote: "Please change "Azeri" to "Azerbaijani" and "Azərbaycan" to "Azərbaycanca" in this page. It's true name of language (http://en.wikipedia.org/wiki/Azerbaijani_language). And "Azərbaycanca" too. (See interwikis in wikipedia). Thanks." Does that mean we should think about changing it here too? Like all words in Category:Azeri nouns for example use "===Azeri===", should they be "Azerbaijani" instead? Are the reasons given above valid? Mutante 17:12, 2 January 2011 (UTC)[reply]

Azeri refers to the people of Azeri descent, while Azerbaijani explicitly only refers to the people in Azerbaijan. This would therefore exclude the Azeris living in Iran. Unless we do split them up like this, the name should not be changed. -- Prince Kassad 17:20, 2 January 2011 (UTC)[reply]

Btw, WP says "Azerbaijani or Azeri or Torki" and we dont have Torki as a word yet. Mutante 17:24, 2 January 2011 (UTC)[reply]

Azerbaijani does not refer explicitly only to the people in the Republic of Azerbaijan. It also refers to the people of w:West Azerbaijan and w:East Azerbaijan, provinces of Iran. In fact, the Turkic-speaking people of the Republic of Azerbaijan were not referred to as Azerbaijanis before 1920s. Before that they were either "Transcaucasian Tatars" or simply "Muslims".

Azeri and Azerbaijani are 100% synonyms and because we have already picked Azeri, we should stick to it, if only not to bother with renaming so much stuff. --Vahag 17:41, 2 January 2011 (UTC)[reply]

I think that the most common name should always be used, when it's obvious. In this case, it's not obvious, and I would keep Azeri. Lmaltier 22:18, 2 January 2011 (UTC)[reply]

Nicoletapedia

The user Nicoletapedia has been blocked, and in IRC they were asking what to do; they said they'd emailed, and had no reply.

We couldn't find a Wikt admin around, so I thought I'd try posting here, to see if someone could have a look - User_talk:Nicoletapedia#Block. The user also said that they were unable to edit their own talk page.

So...please can someone investigate. And...sorry if this is the wrong place to ask, but I couldn't find anywhere more appropriate. Ta. Chzz 09:59, 22 January 2011 (UTC)[reply]

Message from the user, via IRC;

User:Nicoletapedia was contributing to Wiktionary's Czech vocabulary by adding genitive & genitive plural forms of words already in Wiktionary when she suddenly found her account blocked by User:Mglovesfun with no message or reason given other than the ambiguous phrase "disruptive edits." No postings were made to her talkpage nor was any attempt made to communicate to her what, if anything, she was doing wrong. She was further blocked from contacting any Wiktionary admins regarding the matter, but was able to e-mail Mglovesfun via his profile on the Wikipedia project. Mglovesfun continued to be ambiguous and evasive and stopped responding to any correspondence since January 10. Nicoletapedia was subsequently able to contact User:Dominic on #Wiktionary IRC chat who assured her she'd done nothing wrong and that her account would soon be reactivated. Her account has still not been reactivated nor has she received any further correspondence on the matter.

Note: I am not involved with this case at all; I am just pasting this on behalf of that person, trying to help out. Thanks, Chzz 10:47, 22 January 2011 (UTC)[reply]

I'm pretty sure what I said at User_talk:Nicoletapedia#Block covers it. Mglovesfun (talk) 11:06, 22 January 2011 (UTC)[reply]

Unblocked, hopefully someone will follow her round and correct her entries. Or even better, she will read her own talk page. Mglovesfun (talk) 14:14, 22 January 2011 (UTC)[reply]

Don't be ridiculous, women don't edit Wiktionary, they can't read; User:Nicoletapedia is a man. BTW, Gloves, don't go down Semper's path, be nice to newbies. --Vahag 16:45, 22 January 2011 (UTC)[reply]

1st edit April 2010 - so not a newbie (and hence fair game) SemperBlotto 17:41, 22 January 2011 (UTC)[reply]

I know you're joking, but for the record, Nicoletapedia is a woman; or at least, {{gender:Nicoletapedia|he|she|he or she}} returns currently evaluates to she, so that is the appropriate set of pronouns. —Ruakh_TALK 17:03, 22 January 2011 (UTC)[reply]

Oh my god, she can read then. Do you know what that means? She's a witch! --Vahag 17:21, 22 January 2011 (UTC)[reply]

Also for the record, not funny. DAVilla 06:23, 25 January 2011 (UTC)[reply]

Sidebar changes

In the spirit of being bold as well as the spirit of letting people know what you are being bold about, you may notice that "(by language)" has disappeared (twice) from the sidebar. Neither of the tools supporting those features are currently working, one since October and one longer than that I am told. - [The]DaveRoss 21:58, 3 January 2011 (UTC)[reply]

Wiktionary:Per-browser preferences

Shouldn't this be merged into the new gadgets system? It has the disadvantage that it only works on one browser, so if you login from somewhere else all the settings are gone. -- Prince Kassad 22:25, 3 January 2011 (UTC)[reply]

I think it would be better if both per-browser preferences and per-user preferences were available. Perhaps a bit of javascript could allow the gadgets page to have the option of saving preferences in a cookie to make it browser-dependent? --Yair rand (talk) 19:44, 4 January 2011 (UTC)[reply]

IP block exemptions

We have a user group called "IP block exemption" (ipblock-exempt), which has the rights ipblock-exempt (yes, same name as the group, just to confuse people) and torunblocked. The ipblock-exempt right allows someone to edit even if his IP number is blocked, and is a right shared by admins. The torunblocked right allows someone to edit even if a Tor user, but admins don't have that right. Now, we have several members of the group "IP block exemption", which are all admins' accounts: I, for example, added my alternate accounts to that list to avoid IP blocks, which I thought uncontroversial. But now I realize that I've actually assigned myself a right never allowed by the community (and so have others), namely torunblocked, so I thought I'd bring it here for discussion.—msh210℠ (talk) 19:44, 7 January 2011 (UTC)[reply]

Well... it is only needed if you use Tor. The only reason one would have for using Tor is if he comes from the PRC. Otherwise, I don't really see the point in this right for admins. -- Prince Kassad 21:59, 7 January 2011 (UTC)[reply]

No, I wasn't proposing assigning the torunblocked right to admins (NTTAWWT) but merely broadcasting that my account (and other accounts) already have the right, despite no community consensus on anyone's having it, and allowing for discussion on who should have the it and how that should be determined. The status quo is fine by me for now, but I didn't want to leave it that way without mentioning it here.—msh210℠ (talk) 18:14, 11 January 2011 (UTC)[reply]

Yes, and I explained why no admin should ever have this right. -- Prince Kassad 18:24, 11 January 2011 (UTC)[reply]

I assume you mean no admin account should have it? Some admins' non-admin accounts have it for the reason outlined above: that it comes along with the ipblock-exempt right. What if an admin does want to edit from China?—msh210℠ (talk) 19:29, 11 January 2011 (UTC)[reply]

All of these 'access rights' are to allow good faith editors exemptions from restrictions placed generally to block bad faith editors. Admins are all good faith editors, so they should be exempt from all restrictions placed generally, including Tor blocks, open proxy blocks, IP blocks. The people using the Tor/proxy/IP are people who we want to contribute, so the blocks are not for them. Why do we care if these exemptions are given to people we want editing anyway? - [The]DaveRoss 20:18, 11 January 2011 (UTC)[reply]

plurals versus noun forms

There's been a suggestion on WT:RFDO that we should move from plurals to noun forms. Reasons for this, well, plural isn't specific to any part of speech. In English nouns, proper nouns and pronouns can all have plurals. It would also create standardization - noun forms for all languages, while currently many use noun forms, many use plurals, and some (like Catalan) use both. Lastly, some Wiktionaries already use this system - the French only only allows noun forms, adjective forms, adverb forms (etc.) never plurals. Thoughts? Mglovesfun (talk) 16:02, 9 January 2011 (UTC)[reply]

Dunno. De-facto consensus was to use plurals when a language has only plurals (and no other noun forms), and noun forms for languages which actually inflect their nouns. I think this should be unified in one way or another, as it's currently causing a big chaos. Just look at Category:Plurals by language, Category:Plurals and Category:Noun forms by language. -- Prince Kassad 16:51, 9 January 2011 (UTC)[reply]

Which class of users would be helped by a procrustean-bed treatment? How? We aren't being run for the benefit of the categorizing advocates and contributors, are we? DCDuring TALK 17:25, 9 January 2011 (UTC)[reply]

I see added complexity for the general case, at the expense of other cases. The fact that all nouns, including special nouns like proper nouns and pronouns can have plurals doesn't lead me to believe that we should move from plurals to noun forms.--Prosfilaes 22:43, 9 January 2011 (UTC)[reply]

German verbs ending in -eln, -ern

Should a category be created for Category:German verbs ending in -eln? Category:German verbs ending in -ern? Most verbs end in -en. Those, that end in -eln or -ern, share the same conjugation pattern. Would it be helpful to group them? Category:German nouns ending in "-ismus" and Category:German adjectives suffixed with -bar are already categories, for example. Also, what is the style? That is, should Category:German nouns ending in "-ismus" be Category:German nouns ending in -ismus or Category:German nouns suffixed with -ismus, or should Category:German adjectives suffixed with -bar be Category:German adjectives suffixed with "-bar" or Category:German adjectives ending in "-bar"? - -sche 20:27, 10 January 2011 (UTC)[reply]

Both should be deleted. They do not conform to the affix category system we use, which does not distinguish by part of speech. -- Prince Kassad 17:00, 11 January 2011 (UTC)[reply]

XML dumps, stats

Finally, after two months of standstill, XML dumps are being produced again and en.wiktionary was one of the first. Consequently, WT:STATS has been updated (by Conrad.Bot) and reveals that Swedish is now the 11th biggest language here, having more than 40K entries, up from position 19 (19K entries) in October and 24 (11K entries) in August. --LA2 14:08, 11 January 2011 (UTC)[reply]

New languages: Catawba, Dolgan, Meänkieli, Nogai, O'odham, Old Swedish, Sundanese, Uab Meto. Lingua Franca Nova is no longer in the list. -- Prince Kassad 15:12, 11 January 2011 (UTC)[reply]

Fictional and other characters

I believe I could place all entries defined as fictional characters into Category:Fictional characters and some of its subcategories accordingly.

I have not placed Zeus, Osiris, sphinx, mermaid, Santa Claus and Mephistopheles, among other entries, in that category. It would be more reasonable to place them into Category:Folkloric characters and/or Category:Mythological characters. --Daniel. 20:06, 11 January 2011 (UTC)[reply]

Do these categories have a use? Or is it just to satisfy editors' mania for categorisation? Ƿidsiþ 20:07, 11 January 2011 (UTC)[reply]
Yes; no. --Daniel. 20:15, 11 January 2011 (UTC)[reply]
If yes, Daniel, would you please specify?—msh210℠ (talk) 20:37, 11 January 2011 (UTC)[reply]
My personal view on topical categories can be generalized and explained shortly as this: They serve as useful references. For example, I own multiple dictionaries: including one of chess, another of Greek mythology, another of literature... Since we have Category:Chess, Category:Greek mythology and Category:Literature, I virtually feel like we are competing with them, and I like it. I like to be able to find meaningful and restrict relationships between terms, including lists of words that name animals, colors, etc. In theory, if I want to know all terms about chess, I should just navigate the specific category. In exchange, editors interested in the maintenance of topical categories would be reasonably expected to make them navigable and complete if possible. If a topical category is perceived as uncomplete, it serves as a clue for creating the necessary entries and categorizing them.

Notably, the functions of Wikisaurus and topical categories often overlap, but I personally see them as clearly different projects, with different scopes, merits and presentations. I expect topical categories to be simpler and cleaner in comparison with Wikisaurus; both projects provide lists of "synonyms", "coordinate terms", "hyponyms", "meronyms" and "instances", but the categories effectively hide these labels by simply displaying a raw list of items. Wikisaurus displays useful linguistic information and more relationships.

In addition, I expect Wikisaurus pages to be eventually much more numerous, due to their elaborated and complex approach to each concept, that may naturally lead to the inclusion of more concepts, regularly. While we have both Category:Animals and WS:animal with pleasant results (aside from few issues such as which categories of a category tree should categorize each member, that is a can of worms to be perhaps discussed in another thread), I highly doubt we would have equivalent categories for WS:boring and WS:precognitive.

The choice of whether a topical category should exist is highly subjective and prone to editor discretion. Recently, on WT:RFDO, various topical categories have been nominated for deletion with only the obscure argument of considering them "overly specific". As a result, I parodied this phenomenon somewhere by mentioning that we would not want Category:Brown quadrupedal animals or Category:Musical instruments that touch the ground indeed. With the natural limits of topical categorization in mind, I believe it is safe to maintain Category:Fictional characters. It contains specific characters such as Dr Jekyll, Dracula and Batman; stock characters such as tsundere, shoulder angel and superhero; and also roles of characters, such as protagonist, title character and antagonist. These groups of entries fit well together, especially separately from the more abrangent Category:Fiction, that contains not only those characters, but multiple other terms.

The additional categories for floklore and mythology would simply follow suit, presenting a way to find all the characters defined on Wiktionary easily and also a separation from more abrangent categories such as Category:Mythology. It seems natural to me separating fiction from mythology. If necessary, it may be useful to create Category:Religious characters as well in the future, while we already have Category:Biblical characters.

Most or all of my words above may or not seem obvious due to the nature of the discussed projects, but to be on the safe side, I did explain them anyway. --Daniel. 11:17, 12 January 2011 (UTC)[reply]

No; yes. -- Prince Kassad 20:24, 11 January 2011 (UTC)[reply]

I can't explain why, but "folkloric characters" and "mythological characters" both sound wrong to me. The former I would express as "characters from folklore"; the latter, as "mythological figures". Is that just me? —Ruakh_TALK 20:35, 11 January 2011 (UTC)[reply]

Me, too.—msh210℠ (talk) 20:37, 11 January 2011 (UTC)[reply]

For what is worth, there are at least some thousands of pages containing either "mythological character" or "folkloric character" according to Google. Nonetheless, Ruakh's suggestions are good enough for me. I probably would not oppose the possible creation of Category:Mythological figures and/or Category:Characters from folklore. --Daniel. 11:17, 12 January 2011 (UTC)[reply]

Are characters from fairy tales, such as Big Bad Wolf, Little Red Riding Hood and Prince Charming, "fictional", "from folklore" or "mythological"? --Daniel. 11:17, 12 January 2011 (UTC)[reply]

At the present, poor Little Red Riding Hood can be found in categories: Fiction | Fairy tale | Fairy tales | Artistic works | Fictional characters | Fictional people | Fairy tale characters. Isn't it a bit repetitious? Isn't fairy tale folklore, and why must it be said three times? Please don't add any new categories. Shouldn't we have as few categories as possible? --Makaokalani 17:33, 14 January 2011 (UTC)[reply]

It depends on whether or not we want to repeat categorization in various levels.

For example, the word dog is a member of Category:Dogs and of Category:Canids. It is not a member of Category:Mammals or of Category:Animals, but arguably can be added to them. --Daniel. 00:39, 15 January 2011 (UTC)[reply]

Wouldn't it make more sense for clearly hierarchical categories like Category:Dogs > Category:Canids > Category:Mammals > Category:Animals to handle parent categorization automatically? I.e., such that simply marking a word as Category:Dogs alone would automatically categorize the word under all the parent categories as well. -- Eiríkr Útlendi | Tala við mig 00:53, 15 January 2011 (UTC)[reply]

I agree. --Daniel. 05:45, 15 January 2011 (UTC)[reply]

Template:defn outside of language sections

In my opinion, editors shouldn't add {{defn}} outside of language sections, such as {{defn|lang=et}} in a Finnish or Spanish section. I'm sure the idea is to get an editor to add the word, but I've seen some weird things like {{defn|lang=ro}} between two Latin definitions. The 'correct' procedures would be Wiktionary:Requested entries or to create a language section for the word using defn or {{rfdef}} (defn has been nominated for deletion). Mglovesfun (talk) 15:30, 12 January 2011 (UTC)[reply]

Coincidentally, I removed all the {{defn|lang=et}}s a few minutes after you posted this, listing them all at [[Wiktionary:Requested entries (Estonian)#A]]. Needless to say, I agree with you. —Ruakh_TALK 00:09, 15 January 2011 (UTC)[reply]

Organizing Japanese entries

Forgive me if this has already been gone over recently; I've been out of the loop for a few years.

Japanese presents an interesting organizational challenge for Wiktionary, in that we appear to have multiple different locations where a single Japanese word can be entered:

Romaji (possibly more than one page)
Katakana
Hiragana
Kanji (plus okurigana if appropriate; possibly more than one page)

The question becomes, where should etymologies and inflection information go? Putting all this information on all the pages creates additional work, and increases the likelihood of the information falling out of synch. I'd like to propose that phonetic entries not include extensive information like inflections and etymologies, and primarily list other more specific entries that would then include fuller information. Exceptions would be cases where the phonetic rendering is the form most commonly used, such as しかし (然し exists, but its usage is archaic, and thus this entry should point the user towards しかし instead).

Subaru presents an excellent example here. The various possible renderings, in no specific order:

Subaru (the company, romaji for the star cluster)
subaru (romaji for the verb)
スバル (the company, possibly also a phonetic entry for the verb and noun forms)
すばる (basic phonetic entry for the company, the verbs, the noun)
昴#Japanese (the Pleiades star cluster)
窄ばる (one verb meaning with specific contextual overtones)
統ばる (another verb meaning with different contextual overtones)

This gives us seven different headings for subaru. Other Japanese dead-tree and electronic dictionaries take the general approach that phonetic (generally hiragana, sometimes also romaji) entries list (or redirect to) the specific katakana or kanji-based entries, which then give the etymologies, inflections, and other information. So for the subaru example, this would break down as follows:

Subaru: Full definition as the automaker division of Fuji Heavy Industries, link to Wikipedia page, etymologies, スバル as alternate
subaru: List other renderings briefly defined to help user pick the relevant one
スバル: Subaru as alternate; list other renderings briefly defined to help user pick the relevant one
すばる: List other renderings briefly defined to help user pick the relevant one
昴#Japanese: Full definition as Pleiades star system, examples, etymologies, etc.
窄ばる: Full definitions, verb conjugation, examples, etymologies, etc.
統ばる: Full definitions, verb conjugation, examples, etymologies, etc.

Is this clear? What does everyone else think? I look forward to the discussion. -- Eiríkr Útlendi | Tala við mig 00:02, 14 January 2011 (UTC)[reply]

Sorry to hijack your topic, but this problem concerns English entries as well. Compare for example center and centre, or color and colour. -- Prince Kassad 00:11, 14 January 2011 (UTC)[reply]

No worries, that's a good point. Part of this issue is caused by the limitations of the Wiki software and how data is presented. A different database design could easily show all relevant data in one window, possibly reordered as appropriate. Which makes me wonder if there'd be any way to use transclusion -- i.e. keep all relevant info stored under one entry heading, and simply transclude into the other possible renderings? This wouldn't be a template per se, but using the same basic mechanism. Would that even be possible? -- Eiríkr Útlendi | Tala við mig 00:17, 14 January 2011 (UTC)[reply]

Partially answering my own question, I've learned about labeled section transclusion, which would seem to do exactly what's needed here. After choosing a headword under which to enter all entry information, alternates could then simply transclude all relevant portions. This would neatly avoid the problem of entering identical information in multiple places, with that information possibly diverging over time. I just looked at centre and center, for instance, and found some notable differences that should probably be resolved (center's etymology is much more complete, for instance). -- Eiríkr Útlendi | Tala við mig 22:12, 14 January 2011 (UTC)[reply]

I've just used labeled section transclusion to basically clone the 忍坂 entry on its alternate spelling page 忍阪. Might be worth a look-see; this could be an easy and elegant way of dealing with alternate spellings. -- Cheers, Eiríkr Útlendi | Tala við mig 23:31, 14 January 2011 (UTC)[reply]

Personally I think that the place of the main content should conform to etymology. If the word is native or its borrowing from Chinese cannot be ascertained, it should have its main content in its hiragana (sometimes katakana) entry, and other forms should only include a link, with no detailed explanations. The reason is that kun'yomi (as well as ateji and (to a lesser extent) jukujikun) often involves multiple correspondences of hanzi to create nuances in meaning, and it's hard to decide which one is the dominant form. If the word is clearly Sino-Japanese, then its main content should be at its Kanji entry. For words with unknown etymologies, the most common form should be used. (This is how the Japanese Wiktionary organises its entries.) Wjcd 00:33, 14 January 2011 (UTC)[reply]

So if I understand you correctly, for the subaru example above, the verb definitions and etymologies should then be listed under the hiragana heading すばる, since the etymologies are substantially the same, but with the definitions specifying which kanji is used for which sense. Is this what you mean? -- Eiríkr Útlendi | Tala við mig 22:12, 14 January 2011 (UTC)[reply]

Yes. Wjcd 04:40, 16 January 2011 (UTC)[reply]

Desysopping

Believing this is the best venue to discuss this issue (rather than opening various votes for desysop pages on every single one of our inactive admins), I want to request confirmation on what this past discussion regarding such sysops mean for the following administrators. Note that their desysopping is not a jab at their character or any of their normal functions; just as a matter of having no edits within the past year:

Arne List
Cynewulf
Williamsayers79
Brion VIBBER [Feel free -- I don't need em regularly and can get em back if I need em! :) --Brion 07:52, 22 January 2011 (UTC)]][reply]
Psy guy
Tawker
Kappa
Kevin Rector
Medellia
Alhen
Jeffqyzt

TeleComNasSprVen 22:09, 14 January 2011 (UTC)[reply]

Boy, when you do edit, you really go for it! Mglovesfun (talk) 22:43, 14 January 2011 (UTC)[reply]

I'd like to decide them on an individual basis. At least for brion, however, he does not need it since he has the developer right, which grants him the right to give himself sysop status on any wiki. -- Prince Kassad 22:57, 14 January 2011 (UTC)[reply]

There is certainly precedent for a desysoping without prejudice after one year of inactivity. The benefits as I see them are a more accurate picture of the number of people working (which gives us incentive to promote more people to help when needed) and the fact that "the community" changes and most likely many of these folks are not known by "the community" and do not have "the community"'s support. This is the best reason for term limits on wikirights, just because 5 years ago the group of people working on this project thought it was a good idea to make me a sysop does not mean that the folks working on the project now still think it is a good idea. I would be in favor of removing bits unless there is a good reason to keep them, though I would also be in favor of utilizing the "email this user" function (or in Alhen's case just messaging him in IRC, he is still active on es) to let them know what is going on in case they wish to unretire. - [The]DaveRoss 23:32, 14 January 2011 (UTC)[reply]

I agree completely. I should note, though, that there's not precedent for the notion that "just because 5 years ago the group of people working on this project thought it was a good idea to make me a sysop does not mean that the folks working on the project now still think it is a good idea"; on the contrary, the precedent in de-sysopping votes has generally been to make explicit that if the user ever returns, they can get the bit back without ceremony. (But I agree with your notion, and dislike this aspect of the precedent, and would happy to abandon it.) —Ruakh_TALK 00:07, 15 January 2011 (UTC)[reply]

Not actually true about brion etc. Devs get sysop on a wiki through the usual community means, unless they need to make a change in their official capacity (for example for legal reasons), which is a totally different matter. (Spoken as a dev :-P) -- ArielGlenn 20:10, 19 January 2011 (UTC)[reply]

I would have liked to see the users mentioned emailed and contacted before bringing this up in the Beer Parlour. But that might just be me. --Neskaya … gawonisgv? 00:16, 15 January 2011 (UTC)[reply]

Not only do I agree with all of the above, but I have suggested similar things in the past. See associated data at User:SemperBlotto/Sysop Activity. SemperBlotto 08:42, 15 January 2011 (UTC)[reply]

I don't see a problem with this - a message could be left on their talk page and they can be undesysopped on request if they resurrect themselves —Saltmarsh^{απάντηση} 11:48, 15 January 2011 (UTC)[reply]

Has anyone yet written any communication to these folks to inform them of the ongoing discussion? I don't think any action should be taken until that happens. - [The]DaveRoss 13:54, 16 January 2011 (UTC)[reply]

I prefer that someone use the EmailUser function to contact them and ask them to come back as well. This is a well-known proposal for desysopping that I've made on other wikis, too. If they receive an email, they might turn active again. However, I use a fake address and so I can send emails but I cannot receive them. That's why I chose to bring it up at the Beer Parlour; to see whether anyone had any other ideas about them, and whether or not EmailUser would be necessary. TeleComNasSprVen 19:13, 16 January 2011 (UTC)[reply]

Functionally, no real difference from my point of view, don't see any harm in having flags set as long as passwords are secure but no real need for the tools - I haven't had much time in the past year to contribute, I think I did the odd edit but forgot to sign in. -- Tawker 01:41, 20 January 2011 (UTC)[reply]

Yeah, go ahead, I've been read-only for long enough anyway. Cynewulf 14:36, 31 January 2011 (UTC)[reply]

All (except Alhen (now partially active)) have been desysopped. I have sent them all a message, saying the sysop status can be reinstated if they need it. SemperBlotto 15:08, 19 February 2011 (UTC)[reply]

Wiktionary:Topical categories

My recent comments at WT:RFDO have stressed that there are no rules for topical categorization, so it's purely down to personal preference. Is anyone brave enough to write such a 'policy'? Does a 'consensus' exist? Mglovesfun (talk) 12:37, 16 January 2011 (UTC)[reply]

My reply to the first question is: Sure, I guess. I started it now with a little information. --Daniel. 14:17, 16 January 2011 (UTC)[reply]

There is a tentative guideline at Wiktionary:Categorization#Topic, linked to from Wiktionary:Topical category. It seem that Wiktionary:Topical categories can be turned into a redirect too. --Dan Polansky 14:45, 16 January 2011 (UTC)[reply]

I tend to agree with the redirect unless enough information about topical categories to justify a separate page, rather than the "main" WT page. Mglovesfun (talk) 23:15, 16 January 2011 (UTC)[reply]

I, too.—msh210℠ (talk) 16:58, 18 January 2011 (UTC)[reply]

I have turned Wiktionary:Topical categories into a redirect. --Dan Polansky 09:01, 22 January 2011 (UTC)[reply]

give a man a fish, feed him for a day; teach a man to fish, feed him for a lifetime

Proverbs comprised of two or more sentences seem rare or inexistent in Wiktionary. Is that a consensus against their inclusion? Anyone objects the creation of give a man a fish, feed him for a day; teach a man to fish, feed him for a lifetime or Give a man a fish, feed him for a day. Teach a man to fish, feed him for a lifetime.? --Daniel. 15:12, 16 January 2011 (UTC)[reply]

We already have give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime. "Wonderfool" beat you to the chase. --Downunder 19:18, 16 January 2011 (UTC)[reply]

It also survived RFD almost unanimously, so I'd say go ahead and make more. —Internoob (Disc•Cont) 01:17, 17 January 2011 (UTC)[reply]

Aren't they just rare in general? What other two-sentence proverbs are there? Equinox ◑ 10:54, 17 January 2011 (UTC)[reply]

I'd say these.

--Daniel. 11:18, 17 January 2011 (UTC)[reply]

Here's another.

The more you study, the more you know. The more you know, the more you forget. The more you forget, the less you know. The less you know, the more you study.

And, here is just an interesting variety of that initial proverb.

Sell a man a fish, he eats for a day. Teach a man how to fish, you ruin a wonderful business opportunity.

--Daniel. 11:52, 17 January 2011 (UTC)[reply]

Hm. IMO, some of these are platitudes, not proverbs, and one ("light a man on fire") is just a comedian's quip playing on the "teach a man to fish". That would be a bit like including knock-knock jokes. But I'm sure there are some okay multi-sentence ones. Equinox ◑ 13:07, 17 January 2011 (UTC)[reply]

Always bold the head word?

WT:ELE says "We give a word's inflections without indentation in the line below the "Part of speech" header. There is no separate header for this. For uninflected words it is enough to repeat the entry word in boldface. Further forms can be given in parentheses." which actually isn't mega-helpful. I wanted to add the changes made by Ruakh to Template:Grek to all the script templates. However Prince Kassad undid my revision to {{Goth}}. I don't see where WT:ELE says always list the head word in bold, does it say it anywhere? I would like it to, I think per Wikipedia and other Wiktionaries/Wikipedias, we should always bold the head word, no matter what script. I've had no technical problems resulting from this on either Firefox or Internet Explorer. Mglovesfun (talk) 15:40, 17 January 2011 (UTC)[reply]

On rereading, WT:ELE#A very simple example says "the inflection word itself (using the correct Part of Speech template or the word in bold letters),". Some languages doesn't use 'letters', but I think the spirit of the rule is "all head word in boldface". Mglovesfun (talk) 15:47, 17 January 2011 (UTC)[reply]

I think that the only time we don't bold (is that a verb?) a headword is when it is in "Chinese" or other strange characters. Other than that - always. SemperBlotto 15:48, 17 January 2011 (UTC)[reply]

Well "strange language" is too subjective in this context - see Template talk:ja-noun. It's true that Kanji and Hiragana don't use "letters" so much as "characters", so it would bypass ELE under the letter of the law. Mglovesfun (talk) 15:50, 17 January 2011 (UTC)[reply]

One of our Hebrew editors (Ruakh? Shai? someone else?) decided that the he- inflection^Wheadword-line templates' making the words larger rather than boldfaced allows for greater readability. Methinks that's more important than consistency. (And as far as following the letter of the law, the passage Martin quotes from ELE says only that a template or boldfacing must be used.)—msh210℠ (talk) 16:08, 17 January 2011 (UTC)[reply]

Hmm that's a good point - and I agree that consistency should come second to language-specific considerations. It should say something like "the head word should appear in bold, except for certain scripts where this worsens readability." Mglovesfun (talk) 16:22, 17 January 2011 (UTC)[reply]

Hebrew's biggitude is a blend of readability and consistency considerations: boldface makes the vowel diacritics impenetrable, and though we don't always include vowel diacritics, I think it would be weird to use embiggening when we include diacritics and emboldening when we don't. (That, and there's no good way to distinguish the two cases, since they're both he and Hebr.) —Ruakh_TALK 19:32, 17 January 2011 (UTC)[reply]

Actually, I reverted the addition because it caused the template, strangely, to not work at all. -- Prince Kassad 16:23, 17 January 2011 (UTC)[reply]

You might have to revert again, as I redid the modification and using diff, it's ended up exactly the same. That said, I've had no problem with it - has anyone else? — This unsigned comment was added by Mglovesfun (talk • contribs) at January 18 2011.

I've set up a small test case here. Tell me which of the letters you can see. -- Prince Kassad 19:13, 17 January 2011 (UTC)[reply]

Running Firefox on Debian Linux unstable (that is, beta) showed me all of them, but they all looked exactly the same.--Prosfilaes 21:05, 17 January 2011 (UTC)[reply]

I see the same as Prosfilaes. Mglovesfun (talk) 12:35, 20 January 2011 (UTC)[reply]

Italics

Personal opinion, it's the way {{term}}'s set up, but we should only use italics for the Latin (Latn) script. I boldly remove the face=ital option from {{Cyrl}}. Perhaps a bit too boldly, but ah well. Mglovesfun (talk) 19:06, 17 January 2011 (UTC)[reply]

Too boldly IMO unless you first checked that face=ital wasn't being used anywhere.—msh210℠ (talk) 19:13, 17 January 2011 (UTC)[reply]

You've slightly misunderstood (probably not actually). Cyrillic script is really hard to read in italics, to the extent that м nad т (that is м and т) look almost identical in italics. Same for some other letters. I removed it precisely because it is used by some template, the same way that {{infl}} calls face=bold for {{Hebr}}, but so long as face=head doesn't exist, it won't bold the head words. Mglovesfun (talk) 19:17, 17 January 2011 (UTC)[reply]

I didn't follow your last sentence. What I meant, though, wasn't that it's unwise to remove face=ital if it's used (which is what all but the last sentence you just wrote seem to be thinking). I meant, rather, merely that it's unwise to remove it without discussion if it's used.—msh210℠ (talk) 19:37, 17 January 2011 (UTC)[reply]

Correct. Mglovesfun (talk) 23:52, 17 January 2011 (UTC)[reply]

Publishing a Wiktionary

Let's say that i think that the Hebrew Wiktionary has enough good translations of English words into Hebrew. I write a script that dumps them to a nicely formatted file, print it and sell it in bookstores as "The Free English-Hebrew Dictionary, by Wiktionary contributors". It's supposed to be legal in general, but in practice - would it be enough to write on the first page: "This dictionary is published under the terms of the CC-BY-SA license. The list of contributors can be found in the history listing of each headword at http://he.wiktionary.org ."?

And did anyone already publish a printed Wiktionary in any language? --Amir E. Aharoni 00:08, 18 January 2011 (UTC)[reply]

The Hebrew Wiktionary actually only includes in Hebrew words, but to answer your question: yes, I think that would be enough. (I'm not a lawyer, though.)

I don't know about printed Wiktionaries, but a publisher called ICON Group International puts out lots of books that consist largely, or even primarily, of snippets from the English Wikipedia. I assume these books are print-on-demand, but still, you have to imagine that someone at some point has accidentally bought a copy of one, thereby bringing it into print. Their copyright page is here, if you're curious. I don't know if it would hold up in court, but obviously WMF hasn't sued them . . .

—Ruakh_TALK 00:19, 18 January 2011 (UTC)[reply]

I would suggest talking to an IP lawyer in the country where you wish to publish and sell your book. The purpose of the CC-BY-SA license was to make it easier for people to reuse content, and I know there is some reasonable method for content with many authors, but we are not lawyers (mostly) and the best place for legal advice is not here. - [The]DaveRoss 00:22, 18 January 2011 (UTC)[reply]

I'm assuming that Amire80 (talk • contribs) was asking hypothetically, given that he chose an impossible example; but yeah, [[w:Wikipedia:Legal disclaimer]] obtains, as always. —Ruakh_TALK 00:41, 18 January 2011 (UTC)[reply]

I believe Wiktionary is a trademark of the foundation. And the ICON Group International may get away with it, but I think it violates the license to handwave at someone else's website. Print the list of contributors to each page, just like PediaPress does, and you'll be clearly legit.--Prosfilaes 03:22, 18 January 2011 (UTC)[reply]

The problem with that is that it would be larger than the rest of the entire book. The attribution requirements, as shown by the Mediawiki edit screen, is that the authors "agree to be credited by re-users, at minimum, through a hyperlink or URL to the page you are contributing to". --Yair rand (talk) 03:54, 18 January 2011 (UTC)[reply]

You can fit a thousand names on a letter-size page in six point font. If you were to print out all two million articles, the 450 pages you'd need for the names wouldn't be that much of your text. alphabet has 24 editors credited, and several of them are bots that would arguably only need to be listed once. If you were doing a lot of excerpting, it would be more problematic, yes. But the Mediawiki requirements don't at all fit the requirements of attribution in the legal text; there's no guarantee the page will exist or have not been deleted and rewritten. And even at that, they say "through a hyperlink or URL to the page you are contributing to"; i.e. not "http://he.wiktionary.org", but a URL to every page. Oxford found space to thank me for my meager efforts to the Oxford Science Fiction Dictionary; it'd be nice to see the same when my work was used on Wikimedia projects.--Prosfilaes 06:17, 18 January 2011 (UTC)[reply]

The Creative Commons Attribution-ShareAlike 3.0 says "You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work)". In the case of the content of Wikimedia project, the manner specified is simply that one gives the URL or hyperlink. And in any compiled printed list of the translations of Wiktionary, the list of authors would be about the same length as the translations list itself. --Yair rand (talk) 06:32, 18 January 2011 (UTC)[reply]

BTW, the foundation:Terms of Use seem to say the same thing: Re-users must include either "a) a hyperlink (where possible) or URL to the page or pages you are re-using, b) a hyperlink (where possible) or URL to an alternative, stable online copy which is freely accessible, which conforms with the license, and which provides credit to the authors in a manner equivalent to the credit given on this website, or c) a list of all authors." --Yair rand (talk) 06:45, 18 January 2011 (UTC)[reply]

Which means that if you have a list of words and their translations, you must provide a URL for each word. Again, about the same length as the translations list.--Prosfilaes 07:01, 18 January 2011 (UTC)[reply]

But the URL for each word would just be he.wiktionary.org/wiki/WORD. It seems a bit ridiculous to waste so much paper printing that string for each and every word. If CC-BY-SA requires it, it's a practical problem. --Amir E. Aharoni 08:49, 18 January 2011 (UTC)[reply]

The CC-BY-SA requires that you list the authors. Wikimedia is effectively amending the license by letting you use a list of articles instead. You could get away with a lot, but as I said, I'm a little miffed that my right to attribution is getting waved at all, so I'm more inclined to be a hard-liner on what I'm getting instead.--Prosfilaes 18:48, 18 January 2011 (UTC)[reply]

The question was indeed hypothetical, but the practical side is indeed, this: unlike in a encyclopedia, the articles in a bilingual dictionary are usually very short and a complete list of contributors may make the printed dictionary very long. Also, if i just dump all the contributors to all articles, it will include translators of the word to other languages, who, with all due respect, are not related to this work - unless, of course, i write some clever code that filters them out.

But maybe someone already wrote such filtering code? And maybe someone already measured how much space would a list of contributors take? Or am i the first to raise this problem? --Amir E. Aharoni 08:49, 18 January 2011 (UTC)[reply]

I think so. It's not a problem if you're printing full articles, only if you take a small excerpt from many of them. Again, a letter sized page can fit a thousand names in six point font. There are 450,000 unique contributors to Wiktionary; I believe that includes IPs, which I would argue waved their attribution rights by remaining anonymous. alphabet had 24 unique authors, some being bots or hard-working humans that will repeat from article to article. If someone is serious, then they might want to write a program to figure out exactly how many contributors you'd have to list.--Prosfilaes 18:48, 18 January 2011 (UTC)[reply]

Don't forget, in the citation provided above, "or c) a list of all authors". I think that providing the URL of the site + an explanation on how to build the URL to each individual page + a list of all authors, at the beginning of the book (2nd page), would probably meet the conditions. Lmaltier 18:06, 18 January 2011 (UTC)[reply]

If you provide a list of authors anywhere, that should be sufficient on its own.--Prosfilaes 18:48, 18 January 2011 (UTC)[reply]

Category:Definitionless terms

I've managed to get the contents of this category down from over 400 to under 70. Could anyone else like to try and get rid of the rest. There are entries in French, Hebrew and other strange things as well as English. Some of them might need deleting. SemperBlotto 11:14, 19 January 2011 (UTC)[reply]

I wouldn't worry too much about getting them down to zero - I should be able to do escumer. I found citations for it but couldn't be bothered analysing them - that's why I've had a few days of to de-wikify my brain. NB if {{defn}} fails RFDO it will be way more than this, as {{defn}} doesn't categorize in this category, but {{rfdef}} does. Mglovesfun (talk) 12:37, 20 January 2011 (UTC)[reply]

Esperanto corpus

I took the 66 Esperanto texts on PG and made a corpus. I haven't completely uploaded it, but User:Prosfilaes/Esperanto corpus has links to lists of all words found in more than 56 texts. It lists the word, and links to every text that includes it. Already I found a word, terura, that's a genuine Esperanto word used in 60 out 66 texts, and isn't up here.--Prosfilaes 23:02, 19 January 2011 (UTC)[reply]

nine and 9

How should we relate Roman and Arabic "ciphered" numbers to their "lettered" counterparts? Sometimes they are listed as Synonyms (as IX is on nine), and sometimes as Alternative forms (as I is on one). I don't much care at the moment. --Bequw → τ 23:11, 19 January 2011 (UTC)[reply]

I'd vote for all these to be synonyms. Alternative forms just doesn't seem right. Thus a user of any language's lettered number entry would have the translingual symbols available in the entry, whereas a user of the symbol entry needs to traverse the link to the English lettered word to find non-English lettered numbers. DCDuring TALK 01:30, 20 January 2011 (UTC)[reply]

They are merely alternative spellings. Characters like 9 are logographs akin to Chinese characters in which one character represents the entire word. The underlying word, however, is the same word.--Brett 15:57, 22 January 2011 (UTC)[reply]

Yes, but we present "9" as Translingual. If [A] is an alternative spelling of B, then [B] is an alternative spelling of [A]. This would seem to leave us with each language's lettered number appearing at the top of the entry for 9. Is this the presentation that you would recommend? Or is this a case where English "nine" has a privileged position over neuf and neun for example? I also doubt that we want a language section at 9 for each language. DCDuring TALK 17:39, 22 January 2011 (UTC)[reply]

Currently, the entry for 9 gives only the English nine, and I think that's fine given that this is the English Wiktionary. Only in cases where 9 happens to mean something else would you need another language section there. But in the entry for nine or neuf, I would simply list 9 and IX as alternative spellings. I suppose it's not perfectly elegant following your logic, but it should suffice. Perhaps it might be momentarily frustrating if you look up 9 hoping to find out how to say it in French, but there are links to other language wiktionaries.--Brett 18:40, 22 January 2011 (UTC)[reply]

Category:Cosmetics, Category:Makeup

Should they be merged? Ƿidsiþ 09:15, 20 January 2011 (UTC)[reply]

I don't know, but FWIW there is Wikisaurus:toiletry, broader than both mentioned categories. --Dan Polansky 15:49, 20 January 2011 (UTC)[reply]

Category:Makeup might only include some nouns, like into Category:Perfumes or more commonly like the Category:Sports subcategories. JackPotte 21:04, 20 January 2011 (UTC)[reply]

New gadget

As some people asked for it, I have added a new gadget which adds country flags next to the language headers on entries, similar to how it's done on other wiktionaries such as Lithuanian Wiktionary. It is mainly meant for those people who think the current headers are too boring and/or do not stand out enough in the page. I realize that there are still many flags missing, hopefully I'll manage to add all of them. -- Prince Kassad 22:48, 20 January 2011 (UTC)[reply]

Rather than fear of being boring in style, I sometimes think that people who look at Swedish sections might want a way to get in touch with the Swedish-speaking users of Wiktionary, i.e. a link to Wiktionary:About Swedish or something like that. But such anonymous newcomers have of course not installed any special gadgets. --LA2 22:50, 21 January 2011 (UTC)[reply]

"All of them" meaning that, for example, the "English" header will have, next to it, the flags for the US, the UK, South Africa, India, Austl., NZ, a whole bunch of Caribbean countries, several African ones, and a few Middle Eastern ones?—msh210℠ (talk) 08:17, 23 January 2011 (UTC)[reply]

That was an issue raised on IRC, namely that the flag chosen is completely arbitrary. Obviously, we can't have all flags (it would cause vertical scrolling which is bad), we must decide on one. -- Prince Kassad 15:09, 23 January 2011 (UTC)[reply]

Flag Law, as proposed by yours truly: 1. Flags will be displayed in the order (left to right) of the number of speakers of the language residing in the country. 2. Only the flags of countries which represent >= 10% of the global speakers of the language will be displayed; the maximum limit on the number of flags being the lesser of ten or the number which cause wrapping or scrolling on an 800px width resolution. 3. Flags will be displayed to the left and right (or only right) of the language name, a maximum of one flag to the left of the name and the remainder to the right. 4. You do not talk about Flag Law. - [The]DaveRoss 16:15, 23 January 2011 (UTC)[reply]

Can we have a Roman eagle for Latin? SemperBlotto 16:20, 23 January 2011 (UTC)[reply]

I think it currently displays the flag of Vatican City. Is that offending or misleading? -- Prince Kassad 16:24, 23 January 2011 (UTC)[reply]

Change it to an eagle. Where do I edit flags? --Vahag 18:54, 23 January 2011 (UTC)[reply]

MediaWiki:Gadget-WiktCountryFlags.css. —Ruakh_TALK 19:18, 23 January 2011 (UTC)[reply]

I definitely like it and I'll be using it, but I do think the flags are rather big. I think they should be shrunk down to at most two thirds of the current size. There are also quite a few languages that are missing flags still, but I take it those are going to be added later? Perhaps for translingual sections a picture of Earth can be added. —CodeCa t 16:34, 23 January 2011 (UTC)[reply]

Hmm, I used the same size that's used at Lithuanian Wiktionary (45 pixels width), and it seems to work fine for me, but maybe not for everyone. And yes, the missing flags will be added soon. -- Prince Kassad 17:16, 23 January 2011 (UTC)[reply]

More options:

multiple flags can be displayed as shown here, or all flags can line up on the right side. They could all line up on the left side but I think we would all agree that is silly. - [The]DaveRoss 19:59, 23 January 2011 (UTC)[reply]

Some sites make a single flag composed of the UK and US flags together. There is a diagonal divide from top right to bottom left, and the US flag is on the top left half and the UK flag bottom right. Maybe that would work for us too? —CodeCa t 20:20, 23 January 2011 (UTC)[reply]

I have seen that before on Wikipedia. Now if I could remember where it is... -- Prince Kassad 10:05, 24 January 2011 (UTC) (addendum: gotcha!

)[reply]

People are going to bitch for choosing the flag of communist Yugoslavia for Serbo-Croatian... --Vahag 23:37, 23 January 2011 (UTC)[reply]

What would be the 'best' choice for Serbo-Croation? Also, it is currently and opt-in gadget and person css will override gadget css so if someone wanted to replace a certain flag (or all of them) they can also do that. - [The]DaveRoss 23:49, 23 January 2011 (UTC)[reply]

I don't know. I like the communist flag with the red star. --Vahag 23:56, 23 January 2011 (UTC)[reply]

There was a heated discussion over the icon choice for Perapera-kun, my favourite Mozilla Firefox plug-in for Chinese (also Japanese) pop-up dictionary. Some American users refused to use the Communist red flag as the icon (one guy's argument was he worked for teh government and for him it was an issue if someone sees a red flag on his desktop) and chose an icon with a Chinese character instead, which I find very strange. Perhaps we don't need any icons? --Anatoli 00:57, 24 January 2011 (UTC)[reply]

I'm decidedly not a fan of using a single country's flag to represent a language, as (a) it's insulting to those in other countries who speak the language and (b) may imply something about the dialect of the word in the entry. As the script is now, we should IMO never make this default, and I frankly don't like having it even as an option. If we use TDR's ideas, above, or others, for representing multiple countries, I wouldn't be (so) against it (though I wouldn't enable it, myself).—msh210℠ (talk) 07:33, 24 January 2011 (UTC)[reply]

I basically agree with Msh210. Unlike flags, languages are not country-specific; even currencies aren't. We will be giving the wrong impression. Equinox ◑ 09:49, 24 January 2011 (UTC)[reply]

I more or less agree. (See also <http://www.useit.com/alertbox/flagproblem.html>, BTW.) I'm O.K. with it as an opt-in thing, though, since there are apparently users who want it. (The blend of U.S. and U.K. flags seems disrespectful to both, though!) —Ruakh_TALK 12:49, 24 January 2011 (UTC)[reply]

Since the flags don't link to their file: pages, I assume they can't be GFDL/CC-BY licensed (which need to link to the authorship info AFAIK (IANAL)), so we'd need to use only PD files. Right?—msh210℠ (talk) 07:35, 24 January 2011 (UTC)[reply]

Can images of national flags be copyrighted in the first place? --Yair rand (talk) 07:44, 24 January 2011 (UTC)[reply]

The SVG certainly can. I suppose if the server is sending PNGs that wouldn't be a problem in many cases, but File:Flag of Sicily.svg and File:Flag of Sicily (revised).svg show copyrightable variation. However, I think virtually all flags are PD, just for this reason.--Prosfilaes 04:54, 25 January 2011 (UTC)[reply]

Templates don't link to the Template page, either (except for the edit window, which however is not immediately obvious). You could, for the same reason, remove all the templates on Wiktionary. -- Prince Kassad 08:45, 24 January 2011 (UTC)[reply]

Arguing that something isn't true because we don't like the consequences is not a valid argument. We are pushing the lines by not linking to the template pages, but templates were made for and by Wikimedians who understand how templates are used so it's less of a big deal. And you can reach templates through the edit page; I see no way besides figuring out what a filename must be from a long URL hidden in a CSS file. Images were sometimes made by non-Wikimedians for non-Wikimedians and deliberately released under the CC-BY license (instead of having it pushed on them, like template writers). They may not appreciate as free an interpretation of the license.--Prosfilaes 04:54, 25 January 2011 (UTC)[reply]

A better example might be our MediaWiki:Common.css and MediaWiki:Common.js, which are used for every single page. It is absolutely impossible to find these out if you don't know them. -- Prince Kassad 20:02, 28 January 2011 (UTC)[reply]

Targeted Translations

I think it would be useful if User:Yair rand/TargetedTranslations.js were enabled by default. The script (based on User:Atelaes/TargetedTranslations.js) brings translations of specified languages into the NavHead (the grey part at the top of translation tables). It adds a small "Select targeted languages" button to the inside of translations tables so the users can specify their preferred languages, and not have to repeatedly open tables and search for the translation for every page. The script is currently available as a gadget in Special:Preferences. --Yair rand (talk) 23:58, 20 January 2011 (UTC)[reply]

I don't know enough to say one way or the other about enabling it by default, but thanks to Yair rand for putting this together -- it sounds quite useful to me and I'm enabling it now. Cheers! Eiríkr Útlendi | Tala við mig 20:38, 21 January 2011 (UTC)[reply]

Well, it's been a while, and nobody's objected, so ... I've enabled targeted translations by default. --Yair rand (talk) 23:30, 31 January 2011 (UTC)[reply]

Just to make Yair happy I will voice approval here, so that it doesn't seem unilateral. - [The]DaveRoss 23:42, 31 January 2011 (UTC)[reply]

Adding translation glosses is really really boring. I had a stint during my Volants time (well, I did many boring things then, you know, to branch out a little, and this was even more boring than adding {{also}} disambigs...possibly on the same level of boringness as archiving RFC or Tea Room discussions) and it is thankless. --DStirke 00:08, 1 February 2011 (UTC)[reply]

Sign gloss namespace

We need to reevaluate the benefits of creating an entire namespace to include a single entry to support said namespace. Sign gloss is not getting anywhere, and if Rodasmith was here, maybe xhe'd help. TeleComNasSprVen 09:04, 21 January 2011 (UTC)[reply]

There is no harm in having the namepsace so that when the editors who work with sign languages have the time, they can create the appropriate entries in that namespace. In fact, I might get around to making some for the sign language entries we have at the moment, so that every ASL entry has a sign gloss: linking to it. I think I'll start that this week. In the mean time it does not hurt anyone for the namespace to have been created, which is what I read from your comment here (a presumption that it existing is causing some imaginary harm) --Neskaya … gawonisgv? 09:49, 23 January 2011 (UTC)[reply]

My question is, how are the glosses in the least bit useful? What we really need are some people who would like to sit in front of a camera and sign a few thousand words each, then create either .gifs or .ogg videos which we can put into translation tables. That I would find useful. - [The]DaveRoss 20:38, 28 January 2011 (UTC)[reply]

The following comment was added after the discussion was archived

I've attempted to contribute to the ASL entries on Wiktionary, but there doesn't seem to be a driving organizational force. The last e-mails I sent to Rodasmith in early 2010 were answered with "I don't have time for this."

I can create original sign language videos, animated gifs, still frame sequences, or alpha-channel overlaid 'ghost' images of ASL words. The main problem that Rodasmith was trying to tackle is the lack of an actual orthography: no two people would describe the positions / movements of the sign for SUPPORT in the same manner. Thus we could end up with many supposed spellings of the same word. PositiveSigner 11:58, 12 April 2011 (UTC)[reply]

Wikisaurus:defecate

(Not sure whether this belongs to Beer parlour.)

There is a host of redlinks at Wikisaurus:defecate of which it is not quite straightforward to decide whether they are attestable. If someone could help create in the mainspace those redlinked terms that are attestable, that would be great. I am trying to do just that, but, as I am a non-native, this is more of a challenge to me.

I do not know of any other Wikisaurus page with redlinks. My goal is to ensure there are no or very few redlinked entries in Wikisaurus, so it is possible to spot new spurious additions to Wikisaurus as redlinks. Yesterday, I have succeeded adding some attestable terms for urination to the mainspace, without adding the attestation in order to reduce the cost of the addition. --Dan Polansky 09:29, 21 January 2011 (UTC)[reply]

This is a worthy goal. We could consider in the future instating a no redlinks policy on Wikisaurus (which would also require that there be a corresponding definition). It's easier to attest through RFV than to have two processes where inevitably ignoring one results in accumulation of cruft. DAVilla 06:31, 25 January 2011 (UTC)[reply]

To correct myself: there are many more Wikisaurus entries with redlinked terms. These include WS:thingy, WS:glans, WS:masturbate, WS:insane, WS:corpse, WS:nude, WS:watercraft, WS:grass, WS:antihistamine, and more. I would prioritize fixing redlinks in those entries that are likely to attract random contributors, such as the anatomy ones. --Dan Polansky 09:14, 28 January 2011 (UTC)[reply]

Topical category for places that people live in

Currently we have Category:Home, Category:Housing and Category:Place names, but we don't have a category for general places where several people live together. A category for terms like village, city, hamlet and so on. I imagine that 'place names' would be a subcategory of that, and maybe 'home' too. I would create the category myself, but I don't actually know a good term for such a category. Can anyone help? —CodeCa t 19:10, 21 January 2011 (UTC)[reply]

The Category:Cities does have a textual description that includes villages and towns of all sizes. In German and Swedish we would prefer the abstract term Ort. The English Wikipedia has a long tradition of chaos in naming such categories. Right now it seems that w:Category:Populated places has the upper hand. --LA2 22:46, 21 January 2011 (UTC)[reply]

I don't think the name of that category is very well chosen then. I've never heard anyone refer to a small settlement as a city, although I did notice address forms in America tend to ask for 'city' rather than 'place of residence' or something like that. Do you think 'places of residence' would be a good name for the category, or is that still too vague? And also, the category I'm thinking of is not for specific places of residence, but for terms relating to them. So it would not contain London but it would contain town, metropolis and so on. —CodeCa t 12:16, 22 January 2011 (UTC)[reply]

The experience from en.wp is that this discussion never ends and renaming the City category leads to just as much surprise as keeping any existing name. "Populated places" has now been in stable use on en.wp for almost an entire year, perhaps a new record. If we rename our "cities" to "places of residence", we will still differ from en.wp. --LA2 13:51, 22 January 2011 (UTC)[reply]

It may be a British/American things, but w:Ingersoll, Oklahoma, population 9, is a city. While village does get thrown around in an informal context, the only place I've seen it in a formal context in the US is when the village is legally part of a city.--Prosfilaes 20:09, 22 January 2011 (UTC)[reply]

Maybe that depends on the state? In Michigan and Ohio there are plenty of municipalities that are formally and officially incorporated as "villages" (look through google:"the Village of * Ohio" and you'll see plenty of examples). That said, in ordinary colloquial usage I think they're much more likely to be referred to as "towns". —Ruakh_TALK 20:27, 22 January 2011 (UTC)[reply]

Ah, here: w:Village (United States). It does depend on the state. —Ruakh_TALK 20:31, 22 January 2011 (UTC)[reply]

I doubt that we want to depend on legal definitions, which vary in the US by state as Ruakh suggests. In NY we have, at least villages, towns, and cities (of different classes!). The US Census has faced the US nomenclature problem and resolved it by coming up with its own set of names often distinct from the states' terms. In New York, for example, they dispense with "town" which can include villages within its borders and have CDP (Census-designated place), possibly for the portion of a town not also included in any village. CDP also serves as the type for places that are outside any municipal jurisdiction (aka, unincorporated places). They have another set of their own terms for large populated places that include multiple jurisdictions, possibly in multiple states. Coming up with standard names across countries seems quite hopeless. I can only hope our translations of the jurisdiction-type terms do justice to the realities. "Populated places" seems like a wonderful subset within toponyms. DCDuring TALK 21:14, 22 January 2011 (UTC)[reply]

As to CodeCat's original question, I think the problem is that the category that includes all of the natural, legal, and administrative types of place/jurisdiction names is not a natural category, being both abstract and heterogeneous. Moreover, such names are not limited to places that have human populations. Would something as awkward as "Area designation types" be acceptable? DCDuring TALK 21:14, 22 January 2011 (UTC)[reply]

One very natural thing is what you write in a postal address, next to the ZIP code. In U.S. mail order forms, that field is typically called "City:", which is a very short and convenient word, except that what's written in that field is not always a "major town with a cathedral". German and Swedish mail order forms say "Ort:" which is even shorter, and always correct since it is an abstract term (almost as generic as "place"). A similar problem occurs with "rivers", which is a short and convenient word, but one really doesn't want separate categories for creeks and streams. I'm perfectly happy with the "postal address" definition of "city", and think our categories can keep their current name. --LA2 02:49, 23 January 2011 (UTC)[reply]

Vestigial Quotation headers

Many Quotation headers are now just storing {{seeCites}} notices. This is quite cumbersome as the notice can be separated from the actual quotations (below sense lines) by several headers. I've added a pos=right option to {{seeCites}} so that it displays the notice in a thin right-floating box (similar to {{slim-wikipedia}}). This allows the notice to be displayed in the PoS area(s) and in my view improve the layout of entries (when there aren't too many RHS elements already). Are there any problems with making this kind of change (example) on a wider-scale? This would reduce the number of quotation headers and help in the broad task of moving content from the quotation header to either below definitions or to the citations page. --Bequw → τ 03:13, 26 January 2011 (UTC)[reply]

I do not like this. We should better avoid right-floating boxes, I think. I dislike both {{wikipedia}} and {{slim-wikipedia}}, and prefer {{pedia}} AKA {{pedialite}}. So for me, {{seeCites}} is basically okay as it is, while I admit that having a dedicated section that contains just a link seems a bit odd. However, I for one was never quite convinced that it was a good idea to list quotations directly below the definitions in the definitions section, as is now the common practice. Originally, the section for quotations was intended for them. --Dan Polansky 07:54, 26 January 2011 (UTC)[reply]

I'm not a big fan of right-side floating boxes, but IMO the diff provided looks good. I think that as long as there aren't other such boxes near it (pictures, interproject links, {{examples-right}}), this is fine. (What about {{seeMoreCites}}?)—msh210℠ (talk) 16:55, 26 January 2011 (UTC)[reply]

On large screen resolutions, this box is a bit too innocent and easily missed. I don't consider it a good replacement to the current solution. -- Prince Kassad 19:45, 26 January 2011 (UTC)[reply]

Renaming CFI section for spellings

I would like to see one section of CFI renamed: "Misspellings, common misspellings and variant spellings" to "Spellings".

The edit would be the following:

===Spellings~~Misspellings, common misspellings and variant spellings~~===

Misspellings, common misspellings and variant spellings: There is no simple hard and fast rule, particularly in English, for determining whether a particular spelling is “correct”. A person defending a disputed spelling should be prepared to support his view with references. Published grammars and style guides can be useful in that regard, as can statistics concerning the prevalence of various forms.

As a result, the section would have a succinct heading, while the fuller scope of the section would be stated immediately after the heading.

What do you think of this proposal? Are there other succinct versions of the heading that you would like to see? --Dan Polansky 07:50, 26 January 2011 (UTC)[reply]

I don't care. That is, I certainly don't mind it, and if there's a vote (though I sincerely hope this can be concluded in the BP) I'd not oppose, but I don't see the point, why the new version is better than the current.—msh210℠ (talk) 16:48, 26 January 2011 (UTC)[reply]

Right now, the section 1.4 ("Misspellings, common misspellings and variant spellings") is rather conspicuous in the table of contents of CFI, because of its length. Succinct headings are generally better, I think.

If people won't oppose the proposal, I will set up a short vote. CFI should not be modified without a vote, which is IMHO a good thing. --Dan Polansky 18:14, 26 January 2011 (UTC)[reply]

I have created a vote: Wiktionary:Votes/pl-2011-02/Renaming CFI section for spellings. I have packed one more proposal into the vote, a proposal that can be rejected separately. --Dan Polansky 15:27, 1 February 2011 (UTC)[reply]

I have removed the other proposal from the vote, so the vote is only about renaming. --Dan Polansky 15:43, 1 February 2011 (UTC)[reply]

Removing CFI section on leet spellings

I suspect the last section on the page [[WT:CFI]], "Typographic variants", no longer matches what the vast majority of editors think, so should be excised. Thoughts?—msh210℠ (talk) 16:48, 26 January 2011 (UTC)[reply]

I've been considering this whole section of CFI, "Issues to consider" and its two subsections. They're not really criteria - as other people have said (notably, in my mind Conrad.Irwin) CFI tries to do too much, I doesn't contain so much "rules" or "criteria" as just general discussion, which is interesting and well written, just not in the right place. I'd like CFI to be more black-and-white with less "narrative". In other words, those two paragraphs could be removed entirely, or replaced with a much more concise version. Mglovesfun (talk) 17:02, 26 January 2011 (UTC)[reply]

I agree; CFI should be a clear set of rules, not discussion.--Prosfilaes 02:04, 27 January 2011 (UTC)[reply]

In what sense does it not match what the editors think? It implies that the consensus is to include i18n, G-d and such forms.--Prosfilaes 02:04, 27 January 2011 (UTC)[reply]

It suggests to me that there's no consensus and that whoever wrote those lines is for their inclusion. Perhaps I'm reading it wrong, though.—msh210℠ (talk) 06:09, 27 January 2011 (UTC)[reply]

I tend to agree that section "Typographic variants" can be removed from CFI. It seems that even the whole section "Issues to consider" can be removed, as proposed by MG. I assume that the inclusion of the mentioned terms (G-d, pr0n, i18n or veg*n) is no longer controversial. If you set up a vote, it could consist of two subvotes, one subvote proposing the removal of the larger section, while the other subvote proposing the removal of the narrower section. --Dan Polansky 15:44, 27 January 2011 (UTC)[reply]

I agree. (Or we could have one subvote for each subsection, with the understanding that we'd remove the whole thing if both subvotes pass.) —Ruakh_TALK 17:02, 27 January 2011 (UTC)[reply]

I like the system of WT:BRAND, where the relevant passage in CFI is only a few words long, as there is a subpages several hundred words long explain what that policy means. Mglovesfun (talk) 17:04, 27 January 2011 (UTC)[reply]

I've created ~~started~~ Wiktionary:Votes/pl-2011-01/Final sections of the CFI.—msh210℠ (talk) 18:28, 27 January 2011 (UTC) 17:59, 3 February 2011 (UTC)[reply]

And now it's started.—msh210℠ (talk) 17:59, 3 February 2011 (UTC)[reply]

Citations of letters

Is it really necessary to add citations that show letters in use, as done on a? -- Prince Kassad 19:54, 26 January 2011 (UTC)[reply]

Not on a. But it might be if you claimed that ṛ is an English letter.--Prosfilaes 02:09, 27 January 2011 (UTC)[reply]

I would say not. —Internoob (Disc•Cont) 04:31, 27 January 2011 (UTC)[reply]

That was for a WT:FUN, fwiw.—msh210℠ (talk) 06:07, 27 January 2011 (UTC)[reply]

It's not necessary to add citations to common letters such as a for the purpose of attesting them, because their existence is very unlikely to be challenged. That said, I fundamentally agree with Prosfilaes's statement: one would likely need to provide citations to attest ṛ as an English letter, because that's a controversial letter. Nonetheless, the existence of the verb be and the article the also are unlikely to be challenged, but we have Citations:be and Citations:the. Citations of entries with "clearly widespread use" are basically not required, but they also are susceptible to be informative and useful. --Daniel. 16:37, 27 January 2011 (UTC)[reply]

Well of course, for be and the citations can be useful because they show exactly how to use these terms, which may not be obvious for non-native speakers of English (think Russian, which has neither articles nor does it generally use the verb to be). On the other hand, how to use the letter a should really be obvious to everyone. -- Prince Kassad 16:42, 27 January 2011 (UTC)[reply]

Citations are often needed to attest to specific PoSes, uses, forms, or senses of headwords that are in widespread use in other uses or senses. Whatever is specifically challenged needs to be cited. DCDuring TALK 16:51, 27 January 2011 (UTC)[reply]

I'd like to see earliest citation of J/j - middle ages from Spain I think. SemperBlotto 16:56, 27 January 2011 (UTC)[reply]

I'd like to see examples of various Latin letters used to romanize Japanese, Chinese, Russian, etc. by multiple systems and authors eventually. Wiktionary still is not a very good place to look for this. --Daniel. 19:25, 27 January 2011 (UTC)[reply]

Kassad's "how to use the letter a should really be obvious to everyone" is not accurate: People start to learn how to read all the time. Certainly lists of examples of usage of each letter would be useful for many people. --Daniel. 07:39, 30 January 2011 (UTC)[reply]

Let's talk Wonderfool.

I see that more incarnations of Wonderfool have been blocked, and I think it might be worth discussing what exactly we should be doing in this case. The facts are more or less as follows: the guy is one of our most prolific contributors, probably top 5 or 10 in number of contributions. The guy is also the subject of something like 80% of all checkuser queries on this project. Technically, he is always subject to being blocked because his original (I think?) account is indef blocked. Whether or not it should have been an indef block or something shorter is debatable. Lastly he likes to make some dubious edits, but he makes an awful lot of pretty good edits in between and frankly there are several admins who make edits at least as suspect as Wonderfool's.

So my question is, what should we do about him moving forward? I see three distinct options, there may be others.

Unblock User:Wonderfool or leave unblocked the next account he creates and let him edit on a single known account which is not subject to indef blocking any more than any other account is.
Continue to play the game of cat and mouse which has been ongoing for the past 4 years or so, blocking him whenever someone spots his edits.
Actually pursue his removal from the project. We can block the entire ISP he edits from, he is the only editor who uses it. We can also send an abuse report to the ISP but in this case they probably wouldn't care that much, unless Wikipedia also blocked the ISP.

Personally I am inclined to option 1, I would prefer not having to mess around with the other two, and it makes sense to me to have someone as dedicated to this project (in his own special way) as Wonderfool editing openly and under the same individual scrutiny that we all are. Perhaps Wonderfool would also like to comment, particularly if option 1 gets any support. Thoughts? - [The]DaveRoss 20:38, 27 January 2011 (UTC)[reply]

My current strategy is to let him carry on editing for several days / weeks until something goes click in his head and he starts misbehaving. I then block him, and block him again if he comes back too quickly. It's no big deal. SemperBlotto 20:42, 27 January 2011 (UTC)[reply]

- I guess this is my point, there is a huge downside to that method of dealing with him. Whenever anyone new begins to edit someone suspects them of being the next incarnation of Wonderfool. I get lots of requests for checking IPs, people fling accusations about, it isn't a good thing. We are sitting on the fence and it isn't as harmless as it seems. - [The]DaveRoss 21:00, 27 January 2011 (UTC)[reply]

The thing of #1 is that he frequently has multiple simultaneously editing sockpuppets, so it's a unilateral decision: we can say that we're "let[ting] him edit on a single known account", but that decision is not determinative of reality. This is not necessarily a fatal problem with #1 — after all, he'll presumably continue to use multiple sockpuppets no matter what we do — but it's worth keeping in mind. #1 will not stop us from accusing people of being Wonderfool; at most, it will be a community acknowledgment that being Wonderfool is not a capital offense, and hence that accusations of such are not, um, capital accusations. —Ruakh_TALK 21:08, 27 January 2011 (UTC)[reply]
- That has not been the case recently, most of the time it is a single account at a time. To be sure, in the past he has had several sleeper accounts going once, or multiple active accounts at once, but unless he is playing quite a deep game he has been using one at a time for the past few months. I agree though, that there is an assumption that if he is given the right to edit back officially then we are also assuming that the sock puppetry will stop. I guess that were we to go with option #1 there would be a caveat included that were more socks to emerge we would change to option #3. - [The]DaveRoss 21:21, 27 January 2011 (UTC)[reply]

I like playing cat and mouse with Wonderfool, it makes my time on WIktionary more fun. Oh, and Dave Ross is Wonderfool. --Vahag 21:24, 27 January 2011 (UTC)[reply]

The only person who enjoys playing cat and mouse with Wonderfool . . . is Wonderfool. —Ruakh_TALK 21:28, 27 January 2011 (UTC)[reply]

Is that a masturbation joke? :) Sorry, not a native speaker. --Vahag 21:33, 27 January 2011 (UTC)[reply]

I'm not profiler but this person is a French speaker, and on fr.wikt we've got exactly the same kind of guy called User:X (mainly present is the English, Esperanto and Greek paragraphs), also known as User:ABC on el.wikt. He used to contribute with accuracy BUT by copying some pages from one Wiktionary to another without any author right, without reading any syntax consensus, and being rude or ignoring on his talk page. However, we've chosen to let him unblocked (after seven blocs) because he continued to edit from some other IPs every day, and a bot can easily correct his syntax while he collaborates only partially. We can't confound him with the other daily contributors because he's incurably asocial. In conclusion I propose to stop to speak to him ASAP because we've already lost many hours for no result, and I would bet that his addiction would be able to make him try several ISPs. JackPotte 23:48, 27 January 2011 (UTC)[reply]

No, this doesn't sound like our guy at all. Wonderfool is a very talkative and social creature. --Vahag 23:53, 27 January 2011 (UTC)[reply]

@JackPotte: If it helps: Wonderfool is Foumidable. —Ruakh_TALK 00:50, 28 January 2011 (UTC)[reply]

User:DStirke and User:Romanb are his latest incarnations. Does anyone want to talk to him? SemperBlotto 19:58, 28 January 2011 (UTC)[reply]

I see the advantage of unblocking him, but only if he stuck to one account. I get the impression he likes his multiple accounts so it would achieve nothing. He's got loads on the French Wiktionary where AFAIK he's never been blocked for a long period. Mglovesfun (talk) 11:36, 29 January 2011 (UTC)[reply]

So this guy's still around! He started off well, and even became admin. Then without warning he went off the deep end, but soon after that went back to normal. This all suggests someone who occasionally goes off his meds. Has anyone been in contact with the real person to check this out? Some version of #1 seems appropriate when he needs time to regulate his meds. Eclecticology 21:24, 15 February 2011 (UTC)[reply]

He is very active lately and keeps going into his crazy phase and having to be blocked (when he starts creating a lot of tenuous "words" based on porn sites etc. There used to be at least a few weeks between these crazinesses, but now I doubt that more than 50% of his entries are worthwhile and legitimate. It leaves us with a bunch of weird, embarrassing, created-in-bad-faith crap that probably won't be challenged because there's one citation somewhere, whether or not it remotely meets his definition. Honestly, I think WF is fun and has a good sense of humour, but he is doing a rather good, careful job of damaging the project, and I wouldn't oppose some kind of perma-block measure if it could work at all. Equinox ◑ 21:50, 16 February 2011 (UTC)[reply]

Voting percentages

Author's note, this is mainly abstract and hypothetical: the thought occurred to me last night. Take the current vote Wiktionary:Votes/pl-2011-01/Final sections of the CFI. The vote is to remove the sections, which should require 70% approval or more. What if the vote were worded the opposite way? Voting to keep the final sections of CFI. Wouldn't you then need 70% consensus to keep them?

Less hypothetical: it seems odd in a way to need 70% approval to remove something from CFI. It leaves open the possibility of one of our official criteria not having community support, but as long as 31% of the community support it, too bad for the other 69%. Mglovesfun (talk) 09:13, 28 January 2011 (UTC)[reply]

One of the many reasons why votes are bad and discussion leading to consensus is good. Then again we can't usually agree on much of anything; 31% is actually quite a good showing! - [The]DaveRoss 11:05, 28 January 2011 (UTC)[reply]

Re: "Wouldn't you then need 70% consensus to keep them?": No, because we don't have a bias in favor of "oppose", but rather a bias in favor of the status quo. Usually proposals are to change something, such that the status quo is reflected by the "oppose" option, but if you were to structure a vote such that the status quo were reflected by the "support" option, then the "support" option is what would get the boost. —Ruakh_TALK 12:16, 28 January 2011 (UTC)[reply]

Votes are often good. The threshold of 2/3 can be used for this vote, as it is not a meta-vote, instead of the sometimes mentioned 70%. --Dan Polansky 13:45, 28 January 2011 (UTC)[reply]

Wiktionary talk:Todo#Chinese translations

Heads up. Mglovesfun (talk) 10:30, 28 January 2011 (UTC)[reply]

CFI: Removing usage in a well-known work

I would like to see this part or clause removed from CFI's attestation criteria:

"Usage in a well-known work, or"

One entry that would as a result fail attestation is bababadalgharaghtakamminarronnkonnbronntonnerronntuonnthunntrovarrhounawnskawntoohoohoordenenthurnuk. Another one is lukkedoerendunandurraskewdylooshoofermoyportertooryzooysphalnabortansporthaokansakroidverjkapakkapuk.

Thoughts? Opposition? --Dan Polansky 12:23, 28 January 2011 (UTC)[reply]

I'd oppose it (though I'm glad it's being discussed). We only require three durably archived citations anyway, lessening this to one when it's a well-known work seems appropriate. These two entries are pretty ridiculous, yet we could conceivably lose along with these entries used in Shakespeare, Chaucer or Dickens (etc.). Another example is overwicked which I created based on the NIV which is a well-known work. I didn't check for any other independent citations. Mglovesfun (talk) 12:52, 28 January 2011 (UTC)[reply]

I suppose one possibility is to have appendices for very well-known authors (like Shakespeare) with their nonce words. Shakespeare of course has tons, as do Spenser, Milton, etc. whereas Melville, Nabokov and Pynchon might just have a few. Joyce is so much deliberate wordplay and stream-of-consciousness that the appendix doesn't seem worth doing. Equinox ◑ 13:00, 28 January 2011 (UTC)[reply]

Words that can only be cited based on usage (not mention) in a single work are words that cannot be said to have accepted meaning, except possibly by onomatopoeia or by composition of the meaning, possibly allusive - something we rarely accept - of components. I would strongly favor Appendices by author of their unsuccessful coinages, preferrably linked to by {{only in}}.

I am not committed to imposing such a prohibition in languages other than English. In English the corpus potentially provides ample evidence of usage for real words. DCDuring TALK 13:46, 28 January 2011 (UTC)[reply]

The rules comes in handy for lesser attested languages like Old French and Anglo-Norman (in my case in particular) where getting three citations might be difficult. I definitely wouldn't support it being removed entirely, though I think there is some room for maneuver. The proposed "only one citation is needed for word/terms in dead languages" that was proposed a few months ago, if that passed it would circumnavigate part of the problem I just mentioned. Though not for languages that aren't dead where little written material exists. Mglovesfun (talk) 17:32, 28 January 2011 (UTC)[reply]

I can see your point with less attested languages, but I do not see how the "well-known work" clause saves them. "Well-known" to whom, then, are the Old French works or Ancient Greek works supposed to be? Is there a single Old French term that is actually included because of the well-known-work clause? --Dan Polansky 17:41, 28 January 2011 (UTC)[reply]

Debatable, and that goes to the meaning of "well-known work" which oddly, nobody has brought up. Simple answer is that I've RFV'd some Old French words, only one of which from memory has failed (host). None of the ones that passed (just preindre isn't it?) have passed via this rule. So I think the fairly simplistic answer is "no". Mglovesfun (talk) 17:47, 28 January 2011 (UTC)[reply]

I don't know about Old French, but there are many well-known Ancient Greek works--the Iliad, the Odyssey, Plato, Homeric Hymns, Sophocles, etc. If it were interpreted as being language specific, I could find a list of well-known Esperanto works--Zamenhof's Hamleto, Fundamento Krestiomatio, Vikimoj, etc.--Prosfilaes 18:41, 28 January 2011 (UTC)[reply]

I don't see why lack of accepted meaning should be a fatal problem. If a word appears in the KJV and nowhere else, there is a lot that we can usefully say about it, even without knowing for sure what the translators thought the source-word meant. —Ruakh_TALK 18:31, 28 January 2011 (UTC)[reply]

KJV has an unusually limited vocabulary of words excepting proper names. To say something about hapax legomena we would need to simply rely on authorities, in the manner of WP, unless we accept each other's non-attestation-based opinions. DCDuring TALK 20:13, 28 January 2011 (UTC)[reply]

I think I'd oppose. The cases that have come up are for unquestionably well-known works, and I don't see why it's bad to have bababadalgharaghtakamminarronnkonnbronntonnerronntuonnthunntrovarrhounawnskawntoohoohoordenenthurnuk for our Joyce reading users.--Prosfilaes 18:41, 28 January 2011 (UTC)[reply]

Are you saying that a reader of Joyce is going to consult a dictionary to find out what the long thing means? --Dan Polansky 19:00, 28 January 2011 (UTC)[reply]

After typing the first 5 letters in the search box only this "word" shows up, so we don't have a good test of the typing skills of readers of Joyce. Do we have anything meaningful to say about such an entry. We can't define it or translate it. Do we transliterate it? Accepting a single cite from a "well-known work" a proof of the existence of the "word" makes little contribution to attesting to its meaning. That, say, Nabakov or Joyce or Burgess uses a word a certain way says nothing about whether the word has "entered the lexicon" of any community of users. DCDuring TALK 20:07, 28 January 2011 (UTC)[reply]

To have this BP discussion more complete, let me quote Visviva: "The rationale for the well-known work exemption, as I understand it, is that a complete version of Wiktionary should leave no word-sense questions unanswered for someone reading Shakespeare, Milton, etc. This seems reasonable enough to me, though the flip side of that is that we are currently missing thousands of words and word forms that appear even in respelled modern editions of Shakespeare. (I have some lists, if anyone is interested.) On the other hand, this particular need could arguably be better addressed in Concordance: or Appendix:-space, though that approach also has problems. That said, if we eliminate the exemption entirely, we need to replace it with a more nuanced approach to languages that are poorly-attested (Homeric Greek, Eteocypriot, Cia-Cia) or unstandardized (Middle English, Middle Korean, actually almost any Middle/Old language). "Well-known work" gives us an loophole for including forms that appear only in the Homeric hymns, or that are found in a particular spelling only in Chaucer. This is unsatisfactory, of course, since it still excludes less-known writings; but I don't think the well-known-work issue can be addressed before the poorly-attested-languages issue. -- Visviva 05:55, 17 October 2009 (UTC)", from Talk:bababadalgharaghtakamminarronnkonnbronntonnerronntuonnthunntrovarrhounawnskawntoohoohoordenenthurnuk. --Dan Polansky

Visviva's statement is sensible. IMHO, the use of {{only in}} to direct users to appendices or concordances or even a Citation space page would address the Shakespeare/Joyce-reader problem in that we could confirm that a given term is a hapax legomenon and provide opportunity for evidence and conjecture about meaning and derivation. I see no reason why all languages need to follow identical policies. English in particular can afford its own policies at English Wiktionary. Extinct languages, constructed languages, languages (not) distinguished by Google are all candidate classes for uniform treatment, should individual treatment by language be deemed inappropriate. DCDuring TALK 17:44, 30 January 2011 (UTC)[reply]

I'm not seeing the value in making the change, though. It would be silly to make a special rule for, e.g., phobias, but this rule stood with consensus for a while. And we aren't going to be able to cleanly reverse it; there's going to be entries added under this rule that won't be ferreted out for a while. I also don't always see the advantage of appendixes instead of just tagging them on their page; I'd be happy to move dictionary-only words into the mainspace, as long as they were so tagged. Lastly, if we do want to treat other languages separately, that's something that needs discussion.--Prosfilaes 18:48, 30 January 2011 (UTC)[reply]

Let's think about how writers coin new words. I would distinguish four kinds: 1, words made by recombining existing English elements (eg overwicked, elbowlessness); 2, words made from foreign elements which are fitted to existing English analogues (eg aqueity, anemocracy); 3, blends (eg spife, a spoon-cum-knife, which I just invented); 4, onomatopoeic creations (eg Joyce's hunded-letter word above). To me, group 1 is a set of clearly valid English words and I would not have a problem including them with a single citation from a well-known work. Group 2 is a bit different, but I think we have a good way of including them without creating an actual page for each coinage. What they demonstrate is a function of English's ability to neologize with specific elements, and we can use them as citations of those elements rather than of the whole word. For example, (deprecated template usage) anemocracy may not meet criteria on its own, but if we can also find a citation for (deprecated template usage) anemophobe and (deprecated template usage) anemophagic then we have three citations for (deprecated template usage) anemo- as a valid prefix in English. Similarly on the (deprecated template usage) -ity page, apart from a link to all the derived terms, we can include direct citations of on-the-fly uses like Ben Jonson's (deprecated template usage) aqueity. Groups 3 and 4 I would exclude (until they meet regular CFI). Also note that some novels are clearly not written in modern standard English (eg some futuristic novels like A Clockwork Orange or books like Finnegans Wake) and their vocabulary shouldn't count on a single citation. This is just me thinking out loud, I don't know how exactly I would want to codify it all into criteria. Ƿidsiþ 11:37, 2 February 2011 (UTC)[reply]

Non-CFI-attestable verb forms

If I create a verb I check whether its inflections are CFI-attestable before adding them. (The -s form is often the rarest, especially with scientific terms, e.g. today I couldn't find (deprecated template usage) palettises even though the other inflections meet CFI.) However, User:CodeCat has told me that it's standard practice for bots to create these forms, even if they are not attestable (here talking about Finnish, which has far more obscure inflections); see User_talk:Hekaheka#Inflection_tables_that_don.27t_match_page_name. Is that true, and if we don't have other "theoretically extant but unused forms", like (deprecated template usage) femtobyte, is it wise? Equinox ◑ 21:52, 28 January 2011 (UTC)[reply]

I think I can find some French forms I can more or less guarantee are unattested; past historic and imperfect subjunctive forms of modern verbs like coacher for example. — This unsigned comment was added by Mglovesfun (talk • contribs).

And do you think we should have entries for the theoretical-only forms? Why, or why not? Equinox ◑ 22:01, 28 January 2011 (UTC)[reply]

I don't favor keeping them, just spending a month trying to verify these will be a nightmare. Though I don't ideologically oppose it, my reservation is on practical grounds. Mglovesfun (talk) 22:03, 28 January 2011 (UTC)[reply]

I'm not proposing we RFV all of the inflections of every bot-edited verb. I just question the utility of having yet another bot to create dubious inflections, especially in Finnish where some verbs have dozens of unused forms. Equinox ◑ 22:13, 28 January 2011 (UTC)[reply]

If these entries should not exist in the first place, we shouldn't really list them in inflection tables either. We already have bots that ignore certain forms such as the plural if they are not present in the inflection table. —CodeCa t 22:27, 28 January 2011 (UTC)[reply]

I still don't oppose it on ideological grounds, but it would be akin to getting three citations for each verb form before creating the inflections. It would basically kill off inflection bots like SemperBlottoBot and MewBot. Mglovesfun (talk) 22:41, 28 January 2011 (UTC)[reply]

I think that we should include all forms of a word, unless there's some reason to doubt a certain form. Note that if we were to apply the CFI separately to each individual form, we'd run into problems with rare words in highly inflected languages. Suppose that a language inflects its verbs to indicate tense (past/present/future), aspect (progressive/perfect/frequentative/neutral), mood (indicative/jussive/imperative/optative/contrafactual/negative/interrogative), evidentiality (observed/inferred), subject number (singular/dual/plural), object number (ditto), subject person (first/second/third-proximate/third-obviative), object person (ditto, plus third-reflexive), subject gender (masculine/feminine/neuter), and object gender (ditto). Yes, languages really do these sorts of things. In such a language, a word could appear dozens of times in durably archived works without any one specific form appearing three times. "I see him" and "have you seen them?" and "she sees someone-else-who's-not-important" and "they must have seen him" and "I hope someone sees it" and so on would all count toward different attestational "buckets", even if "see" is a perfectly regular verb. It makes no sense. —Ruakh_TALK 23:26, 28 January 2011 (UTC)[reply]

Oh, unless you're saying that a cite for abdicates (say) would count toward both abdicate and abdicates, whereas a cite for abdicate would count only toward itself? I might be O.K. with that sort of thing. (And I'm not sure how much we really disagree. I note that in the discussion that led to this one, the issue came up because of forms that there is reason to doubt: a verb describing a political process that can't logically be used in certain subjects, voices, and/or moods.) —Ruakh_TALK 23:33, 28 January 2011 (UTC)[reply]

Coming from the seat of an editor in Esperanto, it would seem weird and pointless to have one form of a verb and not another. If someone understand pluvas, they're going to use pluvis and pluvus without thinking about it, whether or not we can attest them. Even in English, if you can attest backspace as a verb, you know that backspaced is a word. (As for Ruakh, I don't know the instance you're talking about, but I still remember my German teacher refusing to conjugate kosten for ich, even though we all knew it was ich koste, and most of us could find a place to use it.)--Prosfilaes 01:42, 29 January 2011 (UTC)[reply]

Yeah, but I gave the example of (deprecated template usage) femtobyte. Clearly if someone wants to talk about a quadrillionth of a byte (for some reason — it doesn't make sense in current computer science), they will say femtobyte. But until they say it it's not a word we should list. Ditto weird inflections, IMO. Equinox ◑ 01:46, 29 January 2011 (UTC)[reply]

But it's not just weird inflections. Neither Esperanto nor English really have weird inflections; they have a small set of normal ones. With Esperanto, our corpus is small enough that we may be able to attest a word in the accusative (-n) but not quite in the nominative root form. I don't know how to even enter that; you put the main definition under the nominative and just accusative form of ---- in the accusative. In English, what if you have a word that appears five times, three in the plural (a simple -s) and two in the singular? Do you put the definition in the plural and act like the singular doesn't exist? Either case could seriously confuse someone looking for the obvious root. I've been attesting Esperanto words with different conjugations of the root, since whether it ends in -, -j, -jn, or -n, it's the same underlying word and every speaker will recognize that.

Femtobyte doesn't scare me. It's a different issue; we don't have every combination of prefix + root for any case. Also, we could take the standard prefixes * the normal words they attach to, add them to Wiktionary automatically, and the net effect would be minimal. I wish we had better control over the search; it should be possible to say: Did you mean femto- + byte?--Prosfilaes 03:08, 29 January 2011 (UTC)[reply]

It would make sense to delete unattested verb forms if the search box understood the inflection templates, and could for example redirect connectitudes to connectitude automatically. But it doesn't. You'll only get a "did you mean", which might not even point to the right entry. For this reason, I think separate entries for verb forms are useful. Rspeer 03:01, 9 March 2011 (UTC)[reply]

How to choose topical categories

Let me describe and compare the coverage of some topical categories.

There are Category:Foxes and Category:Dogs. Yet, there are a few entries of dogs and foxes scattered around Category:Canids and Category:Mammals, no one at Category:Vertebrates and no one at Category:Animals.

There are Category:Greek mythology, Category:Roman mythology, Category:Norse mythology and of various other mythologies. And we have Category:Mythology also containing many terms of these mythologies. By contrast, Category:Culture does not have any entry defined as something from mythologies.

None of the members of Category:Sex positions is also a member of Category:Sex.

Few but not all members of Category:Algebra are also of Category:Mathematics.

By rationalizing them, I could form the following guideline from my view of an apparent consensus. If it is correct, it is not perfectly in practice:

All entries should only be members of the narrowest topical categories available. German Shepherd should be a member of Category:Herding dogs, but not of Category:Dogs, Category:Canids, Category:Vertebrates or Category:Animals.

Am I right? Does anyone agree with it?

As I may have mentioned earlier, I can see some minor problems with this possible practice (mostly related with category names, multiple languages, etc.); but they aren't worth mentioning now, since I decided to open one can of worms at a time. Let's focus on that possibly consistent rule, if possible. --Daniel. 09:56, 30 January 2011 (UTC)[reply]

Personally I agree, but if we remove Category:Skiing from Category:Sports to keep it only in Category:Winter sports, we should explain to the readers how to list all of the sports (eg: http://toolserver.org/~daniel/WikiSense/CatScan.php). JackPotte 12:00, 30 January 2011 (UTC)[reply]

We should use that opportunity to delete some topical categories which are too narrow. Category:Water is a prime example. -- Prince Kassad 14:52, 30 January 2011 (UTC)[reply]

Even taking Water as liquid form only, that's a lot of words; lake, sea, ocean, pond, river, stream, rain, rainfall, raindrop, just off the top of my head.--Prosfilaes 18:07, 30 January 2011 (UTC)[reply]

I'm going to list some terms related to water, excluding life forms, transportation and proper nouns: ablution, absolute humidity, aquatic, baptism, bath, bathtub, bathroom, blackwater, canal, cloud, cryokinesis, cryokinetic, cumulonimbus, cumulus, dam, deep water, deep-water, dehydration, dew, dew point, dihydrogen monoxide, distilled water, drinking water, drown,drowning, ecohydrology, falls, faucet, fountain, freezing, freezing rain, freshwater, frost, frost point, glacier, glaciology, glaze, graupel, graywater, gray water, groundwater, hail, hailstone, heavy water, hoar frost, hoarfrost, hoar-frost, humidity, hydr-, hydrate, hydrated, hydration, hydro-, hydroelectric dam, hydroelectric generator, hydrogen, hydrogen oxide, hydrogeology, hydrokinesis, hydrokinetic, hydrology, hydrometeor, hydrophilic, hydrophobic, hydrosphere, hydrous, ice, ice cap, icecap, ice pellets, irrigation, lake, limnology, maelstrom, meltwater, mikva, mikvah, mikveh, mist, mizzle, Mpemba effect, oasis, ocean, oceanography, oxygen, pond, pool, precipitation, puddle, rain, rainbow, rain cats and dogs, raincoat, rain dogs and cats, raindrop, rainfall, rain gauge, rain hat, rain off, rain pitchforks, rainsoaked, rain-soaked, rainwater, relative humidity, river, sea, seawater, serein, serene, shower, sink, sleet, slush, snow, specific humidity, spring, squirt gun, steam, steam bath, stream, streamflow, surface water, tide, underwater, vapor, vapour, virga, wash, washing, wastewater, water, water bed, waterbed, water cycle, water down, watered-down, waterfall, water gun, waterlog, waterlogged, water pistol, water ski, water turbines, well, whirlpool, wudu. --Daniel. 00:03, 31 January 2011 (UTC)[reply]

I agree with the substance of the proposal: that, if we have topical categories, then entries should be in the narrowest one only. I do notagree with how it's stated, though, as it assumes we should have a category "Herding dogs".—msh210℠ (talk) 21:28, 30 January 2011 (UTC)[reply]

If, for any reason, people eventually agree to delete Category:Herding dogs, then the wording of the proposal would naturally change as a result, probably resulting in:

All entries should only be members of the narrowest topical categories available. German Shepherd should be a member of Category:Dogs, but not of Category:Canids, Category:Vertebrates or Category:Animals.

It should be noted, though, that despite our Category:Herding dogs being virtually empty, there are at least 70 herding dogs according to Wikipedia, so there is potential for expansion. --Daniel. 00:03, 31 January 2011 (UTC)[reply]

IMO we can get too specific, and then we'll fail to see the wood for the trees and make a load of useless, hyper-specific categories. In theory I feel that a word should be in its narrowest category and all of the containing categories, but I doubt Wikimedia software supports this. Equinox ◑ 21:31, 30 January 2011 (UTC)[reply]

Do you mean placing German Shepherd into Category:Herding dogs, Category:Dogs, Category:Canids, Category:Vertebrates and Category:Animals simultaneously? Wikimedia surely would accept that. --Daniel. 00:03, 31 January 2011 (UTC)[reply]

Yes, but I mean IMO membership of a topical category should imply membership of all parent categories, without the need for manual/bot insertion. Equinox ◑ 10:12, 1 February 2011 (UTC)[reply]

We could give the appearance of "implying" membership through templates. For example, the code {{category|Herding dogs}} is almost identical to [[Category:Herding dogs]] and can easily be programmed to categorize any entry into Category:Herding dogs, Category:Dogs, Category:Canids, Category:Vertebrates and Category:Animals simultaneously. --Daniel. 12:18, 1 February 2011 (UTC)[reply]

What about categories? Should they also be in only the narrowest available categories? For example, Category:Herding dogs is now in Pastoral dogs and in Dogs, and Pastoral dogs is in Dogs. (Not that I think we should have either of the two narrower ones anyway. They're just an example.) IMO yes: if we are to have topical categories, then they should be in the narrowest categories only.—msh210℠ (talk) 21:42, 30 January 2011 (UTC)[reply]

I personally consider categorization of topical categories a separate "can of worms", with many circumstances to be discused, perhaps later. That said, yes, I believe that the basic idea of only using the narrowest categories would work for them too. In my opinion, Category:Dogs should be a member of Category:Canids, but not of Category:Animals. --Daniel. 00:03, 31 January 2011 (UTC)[reply]

If I understand correctly, Sasquatch could be a member of two categories, say, Category: legendary creatures and Category:hominids/primates? Ie., can be in multiple categories as long as in narrowest category available in each category tree/heirarchy. Geof Bard 02:45, 16 February 2011 (UTC)[reply]

Geof Bard, the categories that you mentioned don't exist. However, sasquatch is already a member of Category:Cryptozoology and of Category:Primates, and it is not a member of Category:Zoology or of Category:Forteana. --Daniel. 03:13, 16 February 2011 (UTC)[reply]

Apparent loose cannon IP user, 90.209.77.78 (talk)

Apologies beforehand if this isn't the right venue.

I've noticed a number of apparently well-meaning edits by IP user 90.209.77.78 (talk), where the content is at least partially useful, but is very often in the wrong place. I just went through their contribs (after noting a change on my watchlist), and found that everything of theirs that I've looked at so far needs help. I don't have time right now to go through their whole contribs list, so I thought I'd post here as a heads-up, and to request help in notifying this clearly enthusiastic user (which is good) of WT formats, etc. -- TIA, Eiríkr Útlendi | Tala við mig 19:14, 30 January 2011 (UTC)[reply]

Looks like this user is now at IP 90.209.77.67 (talk). 90.205.76.31 (talk) is older, but follows the same pattern, and could well be the same user.

The crux of the issue is that this user, or group of users, appears to have an intimate knowledge of manga and animé, but little real knowledge of Japanese. In their zeal to add this knowledge to Wiktionary, they are getting various things wrong, and often adding things in the wrong place (such as Japanese-specific content in the "Translingual" section of Chinese character entries). I'm seeing cases where they've gone back to undo fixes I've implemented just in the past couple days (mostly at 結界). I'm cleaning things up as best as I can, time allowing, but if anyone has any advice on how best to get through to enthusiastic but misinformed IP-based users, I'm all ears^W um, eyes. :) -- Eiríkr Útlendi | Tala við mig 01:05, 31 January 2011 (UTC)[reply]

Language links to Wikipedia

I was having a browse today and ended up following an etymology to the page موم#Persian when I thought: wouldn't it be good if there was a link back to the wikipedia article on "Persian language". I notice some of the languages are linked, but has there ever been any discussion on doing this in a more widespread way? AndrewRT 22:05, 30 January 2011 (UTC)[reply]

Adding a user preference would make the most sense IMO. Nadando 22:10, 30 January 2011 (UTC)[reply]

I both love and hate cross project linking, it is great that we have an encyclopedia so closely tied to us, and it is awful internet design practice to shunt people from one site to another. It would be great if we had very solid "About:LangX" pages here which could then link to the full Wikipedia entry. We would not need to be too comprehensive, perhaps an abbreviated version of the pedia page describing the who (spoken by), what (orthography, grammar), when (duh) and where (geography) of the language in question. I know that we are not an encyclopedia but in certain, restricted ways it may make sense to keep folks on Wiktionary while providing certain encyclopedic information (I know, heresy, I am ready to be stoned). - [The]DaveRoss 23:46, 31 January 2011 (UTC)[reply]