Wiktionary:Beer parlour: difference between revisions

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
m (wiki syntax errors)
(Attestation: fix link for you (discussion has been archived))
Line 2,458: Line 2,458:
   
 
* '''Change'''. As has been said in [[Talk:novum|RFV]], we only partially, unclearly define "independent". [[User:-sche|- -sche]] [[User talk:-sche|(discuss)]] 19:08, 14 October 2011 (UTC)
 
* '''Change'''. As has been said in [[Talk:novum|RFV]], we only partially, unclearly define "independent". [[User:-sche|- -sche]] [[User talk:-sche|(discuss)]] 19:08, 14 October 2011 (UTC)
* '''Completely rewrite.''' I started a beer-parlour discussion about it back in February — see [[Wiktionary:Beer parlour#Independence.]] — and feedback was mostly positive (in that people mostly agreed with me about what it should say), but I let it drop without ever proposing a specific wording. —[[User: Ruakh |Ruakh]]<sub ><small ><i >[[User talk: Ruakh |TALK]]</i ></small ></sub > 02:03, 15 October 2011 (UTC)
+
* '''Completely rewrite.''' I started a beer-parlour discussion about it back in February — see [[Wiktionary:Beer parlour archive/2011/February#Independence.]] — and feedback was mostly positive (in that people mostly agreed with me about what it should say), but I let it drop without ever proposing a specific wording. —[[User: Ruakh |Ruakh]]<sub ><small ><i >[[User talk: Ruakh |TALK]]</i ></small ></sub > 02:03, 15 October 2011 (UTC)
   
 
{{fake====|[[Wiktionary:Criteria for inclusion#Spanning at least a year|Spanning at least a year]]}}
 
{{fake====|[[Wiktionary:Criteria for inclusion#Spanning at least a year|Spanning at least a year]]}}

Revision as of 00:31, 16 October 2011

Wiktionary > Discussion rooms > Beer parlour Wiktionary:Beer parlour/header

Contents

July 2011

How to treat participles on Wiktionary

I'd like to continue the discussion started above at Inflected German participles, but this time not only for German, but cross-linguistically since it turned out to be a problem that concerns many languages. Ok, so the basic question is how participles are to be treated best on Wiktionary. An example for participles in English would be playing as the present participle and played as the past participle of play. Let me sum up the previous discussion. Traditionally, participles are treated as verb forms, so they normally appear in inflection tables of verbal infinitives (see here for German spielen), in Wiktionary too (German, Dutch, French...). The tricky point is: Often participles are used as adjectives in sentences (and can then be declined like normal adjectives). This goes as far as that, for example, the German present participle cannot be used as a verb, only as an adjective (or adverb). This might apply to other languages and really questions whether such participles should be put under "Verb" headers (as is currently done in German and probably most other languages), and even whether they should appear in verb inflection tables.

All in all it seems that the current "German" way of treating participles is rather bad. I know of two other possible solutions. One was proposed by Dan Polansky above. When participles only appear as adjectives (such as German present participles) they don't get a Verb but an Adjective header. When participles are used both as verbs and as adjectives (such as most German past participles), they get both headers. Personally, I think this solution makes sense except for two problems: First, for almost any verb we'd have a verb as well as an adjective section for its past participle. To me this seems redundant, but I also understand the contrary attitude that it's more clear-cut. A more serious problem would be that there appear to be cases where participles are used ambiguously so one cannot tell for sure whether they are verbs or adjectives -- e.g. German das Haus ist gebaut, Dutch het huis is gebouwd (thanks to CodeCat), French Il a sucré son café, puis a bu le café sucré (thanks to Lmaltier). If it's true that participles are something in between verbs and adjectives here, another solution might be appropriate, and that solution is already being used in Latin. For this language, there's a separate Participle header which subsumes the different Latin participles. See āctus for an example. What's the downside of such an approach? As I said, participles can be inflected, and such inflected participle forms (such as āctī) are also under a Participle header. This misses the fact that those forms are completely unambiguously used as adjectives (ambigious cases can still be inflected, as in Spanish la casa está construida, thanks CodeCat), and "participle" is probably not a proper part of speech either.

That's quite complicated, and if anything's unclear or if I put something wrongly, I'm looking forward to your comments. So, how do participles behave in other languages? How are they treated on Wiktionary, and do you think it makes sense? What do you think about the Latin way? Is there possibly a uniform way to represent participles on Wiktionary independent of language, or should we continue to have language-dependent ways of treating them? But, as I said, all the current ways I know of have flaws. At the end of the discussion, of course I'd like to have a good solution for German, but if other languages benefit, so much the better. Longtrend 10:37, 9 July 2011 (UTC)

The inflected forms are not always unambiguously adjectives either. In French, for example, when a past participle is used to form the perfect tense, it still inflects based on the gender and number of its direct object. So they could arguably be considered 'declined verb forms'. —CodeCat 11:40, 9 July 2011 (UTC)
Thanks for the notice, I missed that. In languages that inflect for case, it would be more appropriate to say "non-nominative past participle forms are used unambiguously as adjectives". Longtrend 12:11, 9 July 2011 (UTC)
That may not be accurate either. In early Old Norse, the agreement in the perfect tense was actually the accusative, which later became specifically neuter accusative, but still agreed in gender in earlier texts. This example is found in Völuspá (with the agreement in bold): hverir hafði lopt alt lævi blandit , eða ætt iotuns Óðs mey gefna , with the first agreement being neuter nominative/accusative, but the second is feminine accusative. This is because the combination of participle and object was still considered an object of 'to have' in that language, and was therefore placed in the accusative case. That is, a sentence like 'I have painted a door' was not distinguished grammatically from 'I have a painted door' or 'I have a door painted'. —CodeCat 12:33, 9 July 2011 (UTC)
That's interesting. I think I better don't try another generalization :) But your example seems to be a strong argument in favor of the thesis that participles are (or can be) something in between verbs and adjectives -- that is, if we are going to treat participles uniformly across languages; otherwise it's at best an argument for Old Norse. Longtrend 12:57, 9 July 2011 (UTC)
Hungarian has present, past, future, and adverbial participles. The Etymology section contains the information that this entry is the participle of a verb. There can be adjective and noun sections to illustrate the appropriate usage and declension. See for example nevelő, the present participle of nevel (to educate). --Panda10 13:11, 9 July 2011 (UTC)
Latin participles also come in past, present, and future, and have mood (active or passive as well). There are some Latin participles that were used as adjectives, but since Classical Latin did not always clearly distinguish between adjectives and nouns (they had the same inflectional endings), this means that some participles were used as substantive nouns. In fact, the future passive participle eventually came to replace gerunds and infinitives to funtion as a noun. However, it still had a verb funtion in the passive periphrastic conjugation, and was never used in the nominative (you had to use a verbal infinitive for that). In other words, the situation was rather complicated as to what part of speech these things were. For Latin, we've chosen simply to recognize "Participle" is a separate part of speech because it simplifies everything. Other languages are free to make similar choice in how they handle their parts of speech, but I don't think there's a single way to handle everything that will work across all languages. --EncycloPetey 14:12, 9 July 2011 (UTC)
We should not invent anything: words should be addressed according to traditions of each language. In French, it's clear that participles are verb forms, not adjectives, and that adjectives are not participles, are not verb forms. I provided an example of a sentence with an ambiguous meaning. This sentence shows that this is not always an easy distinction, and this is a good reason to make it as clear as possible here, this is not a reason to blur the difference. Lmaltier 15:22, 9 July 2011 (UTC)
"words should be addressed according to traditions of each language" -- so in your opinion, we should treat German present participles as verb (form)s, even though they are never used as such, just because they are traditionally regarded as verb forms? "In French, it's clear that participles are verb forms, not adjectives" -- how come past participles inflect for gender in predicative use then, a behavior you can only find in adjectives otherwise? Longtrend 12:03, 10 July 2011 (UTC)
French past participles are inflected in some cases, yes, this does not make adjectives. Actually, I think that the distinction between participles and adjectives is exactly the same in English and in French. I also think that all German verbs have compound tenses, and, therefore, that all German participles are actually used as verb forms. Am I wrong? Lmaltier 13:56, 10 July 2011 (UTC)
Something that's still not clear to me is just when something is a verb and when it's an adjective. I can understand that finite verb forms are verb forms... but what about non-finite forms? Why are they verb forms? Etymologically they are often not verb forms at all (like in the Old Norse example; Romance participles have a similar history), so why do we call them verb forms now? —CodeCat 14:11, 10 July 2011 (UTC)
Because (1) we speak English, (2) English has become less inflected and so its grammar has changed, (3) the original categories for parts of speech were set up by the Romans and Greeks, and (4) we have a better understanding of rammar in the 21st century. --EncycloPetey 14:29, 10 July 2011 (UTC)
That still doesn't answer my question though. Why are they verb forms now when they were not originally? What about them makes us consider them verb forms? —CodeCat 14:35, 10 July 2011 (UTC)
In English, our classification of -ing forms and -ed forms and specific senses thereof depends on such things as whether there is a corresponding base form, and whether the forms behave like adjectives or nouns. The verb form is assumed to exist because it is hardly ever possible to find such forms never modified by any adverb. If derived from transitive verbs, they usually take complements just like other forms of the verb.
The conversion process of denominal verbs seems to sometimes begin with -ing and -ed forms. For example, one can be coffeed out or coffeed up, but instances like "He coffees himself up every morning" are more rare.
The answer to the question seems to be simple: when you think to the verb when using the word (when you think to the action expressed by the verb), it's a participle, a verb form (even when an ellipsis blurs this fact); when you don't think to any action, only to a characteristic of the thing (not to how the thing got this characteristic), then it's an adjective. See adjective and verb for definitions. Lmaltier 16:09, 10 July 2011 (UTC)
@Lmaltier: I still don't quite understand your analysis of French participles. You probably know what I was talking about, but to be sure here's an example: Le café est sucré_ vs. La sauce est sucrée (excuse me if those sentences are wrong -- I just have some very basic knowledge of French, but you get what I mean). Correct me if I'm wrong, but sucré(e) behaves just like an adjective and nothing like a verb here -- you can perfectly replace it by a "proper" adjective but not by a "proper" verb. So what makes you think it's a verb other than 1) tradition and 2) the fact that it's obviously derived from a verb (which is not sufficient, as Dan Polansky convincingly demonstrated above -- the fact that in English almost each verb can be "agentivized" (for lack of a better word) by -er in English doesn't make the new forms verbs)? And as of German: Yes, you are wrong in your assumption that all participles are used in compound tenses. Present participles are never used in such constructions. In English, I perfectly agree with the analysis that present participles are (or can be) verbs, since there's such cases as I am playing -- however, there is no equivalent form *Ich bin spielend in German or any other complex verbal constructions with a present participle. Longtrend 17:00, 10 July 2011 (UTC)
You misunderstand me. In your examples, used alone, they are not verbs: very clearly, both sucré(e) are adjectives. They refer to a characteristic of the thing. In Il a sucré le café or (passive form) La sauce a été sucrée avant d'être servie, it's also very clear that they are not adjectives, they are verb forms. The same applies to present participles (this is an easy case, as present participles are never inflected in French: when they can be inflected, then the words are not present participles, they are adjectives). For German, I was thinking to past participles. But, for German too, I think that the criterion should be: do you think to the action expressed by the verb or not? The difference between an adjective and a verb is not related to a suffix or anything of the kind, it's related to how it is used and what is meant by people using it; do people want to use the verb (to refer to an action), or do they want to use an adjective (to refer to a characteristic)? Lmaltier 18:43, 10 July 2011 (UTC)
Your criterion is semantics, which is not valid. Expressing an action is neither a necessary nor a sufficient condition for being a verb. Verbs can also express characteristics ("shine") and nouns can express actions (just take the word "action") -- whether in some language there are adjectives that express actions I can't tell, but probably there are. We define parts of speech not semantically, but syntactically. Back to Le café est sucré, couldn't that also be a passive sentence (perhaps continued by "par...")? In this case the participle could be analysed as a verb, couldn't it? Longtrend 19:14, 10 July 2011 (UTC)
But the definition of verbs and adjectives includes important semantic considerations! If you forget them, you won't be able to make the distinction in difficult cases. Of course, some verbs are not action verbs, but they probably don't cause problems. You are right: in Le café est sucré, sucré is an adjective, but in Le café est sucré par mes soins., it's a verb. It's exactly like sugared in English. Lmaltier 19:30, 10 July 2011 (UTC)
Many east Asian languages have verbs that express states or properties rather than actions, as does Esperanto ("mi estas blua" and "mi bluas" both mean 'I am blue', "mi estas bluinta" means 'I have been blue'). Several old Indo-European languages also have stative verbs, which are semantically very much like a copula and a participle in English. —CodeCat 19:40, 10 July 2011 (UTC)
English and French too have verbs that express states. But are there examples of an unclear status (verb(participle) or adjective?) for these verbs? Lmaltier 19:53, 10 July 2011 (UTC)
There is when dealing with Latin. There are a whole set of Latin deponent verbs whose meaning can only be conveyed in English using adjectves. A Latin scholar would identify the Latin translation as a verb, but only because it has verb endings and not because of any functional or semantic distinction. Latin participles are likewise not always verbs but primarily for the reason that they take the endings of an adjective, inflecting for gender which Latin verbs don't do. And yet the "participial form" is listed as a verb form in most texts and conjugation tables, and forms part of certain compound conjugations. So, in Latin the "verbness" of a participle comes from its tense and context, but its "adjectiveness" comes from its gender and inflectional endings. --EncycloPetey 20:21, 10 July 2011 (UTC)
How to treat participles on Wiktionary — AEL
· [de-indenting] I agree with Lmaltier about sucré. Let me give a similar example, but in English. Take the sentence “At 3:00 PM, the window was closed”: it can mean either “At 3:00 PM, someone closed the window”, or else “At 3:00 PM, the window was not open”. When it has the former sense, it's a use of the participle: “was closed” is just “closed” cast into the passive voice. When it has the latter sense, it's a use of the adjective: “was closed” means “was a closed window”. The important point is that this ambiguity is specific to the word closed. English has a lot of participial adjectives, but it also has a lot of participles that do not double as adjectives. “At 3:00 PM, the window was opened” has only one meaning. (The analogous alternative meaning would be expressed as “At 3:00 PM, the window was open.”) So it's hard to imagine a solution that uses just a single POS header for words like closed: even though participles are often called "verbal adjectives", we still must distinguish between those that double as real adjectives and those that do not. The former clearly need an ===Adjective=== POS header in addition to whatever POS header the latter have; and I think it's clearly a bad idea to use ===Adjective=== for words like "opened".
· I agree also with Lmaltier that we should generally follow language-specific traditions. That doesn't necessarily mean following two-hundred-year-old theories of grammar; there are current active linguistic traditions for all of these languages. If all of the linguists working on German describe the present participle as a verb form, then we should at least figure out why that is, before just deciding that we know better!
RuakhTALK 20:46, 10 July 2011 (UTC)
Thanks for your input. Actually, I agree with you on almost all points. sucré was probably a very bad example to argue for my position, since it has developed a new adjective meaning and usage independent of the participle. Just like closed, it falls under the category of what I dubbed "lexicalized participles" in the initial discussion above (which, surprisingly for me, seemed to be rather unintuitive to many). I absolutely agree with you that such lexicalized participles indeed need two sections -- one Adjective section for the lexicalized usage, and one for the participle, and I think we only need to discuss the latter, since (from my point of view) in many cases it's really unclear whether we are dealing with verbs or with adjectives here (or perhaps even with ===Participle===s?). As an example, imagine English had gender, and in the sentence At 3:00 PM, the window was opened the word opened agreed in gender with the subject window. Would we still be so sure that opened was a verb, it would be declined for gender, after all? It's more than a hypothetical situation, this is exactly what we find in Spanish and French and probably many other languages: La respuesta está obviada "The reply is avoided" -- obviada has feminine gender here which comes from the feminine respuesta, and as far as I know it is not the case that "obviado" has developed adjectival meaning and usage. So what about cases like that?
As for German current linguistic tradition, it's certainly not the case that all linguists describe the present participle as a verb form. It's what you learn at school, and in many cases present participles are listed in verb conjugation tables. For example, the Institut für Deutsche Sprache describes past participles as inflected forms of elements of the word class verb and present participles as adjectives formed from verbs by word formation. canoo.net, on the other hand, lists present participles in its grammar as infinite verb forms, but then says that "all present participles have the form and the function of adjectives" and also lists them as adjectives in its dictionary (e.g. spielend). I'll see if I can consult some printed grammars. Longtrend 09:16, 11 July 2011 (UTC)
@Longtrend, re: verb forms and gender: A word form's having a gender that matches the subject of the sentence does not speak against the form's being a verb form. Czech simple past tenses of verbs show the gender of the subject of the sentence, as in the verb dělat (to do) with its masculine simple past tense dělal, its feminine simple past tense dělala, and its neuter simple past tense dělalo. The same thing is seen in Russian, in its делать, де́лал, де́лала, and де́лало. Unlike these languages, German simple past tense machte does not show gender. --Dan Polansky 10:06, 11 July 2011 (UTC)
Sorry, I was unclear here. Of course claiming that verbs cannot inflect for gender would be wrong. My point is that in the languages under consideration, inflection for gender does not happen (I hope this is correct), except for the dubious cases of participles, so we'd have to assume that for some reason verbs inflect for gender in that kind of construction and only there. But maybe that's not too good an argument, since in Czech gender agreement on verbs only seems to happen in simple past forms, too. Still: even if there is no strong evidence that we are dealing with adjectives here, is there any evidence that they are verbs? Or is it possibly adequate to say that participles in such positions are something "in between"? Longtrend 10:24, 11 July 2011 (UTC)
Czech forms that have a similar function as English past participles (called Czech "passive participles" per W:Czech conjugation, whyever) show gender equally well as Czech simple past tense forms: dělán m, dělána f, děláno n, of dělat. They resemble their corresponding adjectival forms: dělaný m, dělaná f, dělané n. For example, "je dělán" corresponds to German "wird gemacht" and English "is made" or "is being made". --Dan Polansky 11:33, 11 July 2011 (UTC)
@Longtrend, re gender: I see no contradiction whatsoever in saying that a participle (a "verbal adjective", as they're often called) is a non-finite verb form that (often) has various adjective-like properties, including (often) agreeing in gender/number/case/definiteness/c. with a modified noun. And there's no need to imagine a hypothetical English-With-Gender; in actual English, verbs do not agree with their subject at all ("I/we/you/he/she/it/they went") — except for present-tense verbs, which display a bit of agreement, and be, which displays a bit more agreement. Do we therefore say that be is a different part of speech — say, ===Copula=== rather than ===Verb=== — and that present-tense verbs are a weird in-between form that has properties both of a ===Verb=== and of a ===Copula===? —RuakhTALK 12:20, 11 July 2011 (UTC)
As you already said, participles are sometimes called "verbal adjectives", and some experts don't even give a POS for them but say simply that they are "lexical items" that have "characteristics and functions of both verbs and adjectives" (see here). So discussing a ===Participle=== header is not as absurd as your analogy with English present tense verbs suggests (nobody doubts their verbal status). Of course inflection is only one criterion, there are other criteria that solidly confirm that English present tense verbs are verbs, such as position in the sentence. But I still miss any such criteria for past participles, let alone for present participles. I could agree very well with the approach to treat participles as verbs if they are used to form complex tense or voice constructions. This is the case in English with both present and past participles, so personally I would not change anything about the "English way" (unless we are going to find one solution for all languages). But this doesn't help for German present participles, since they are neither used as stand-alone verbs nor to form complex constructions. So what are they? Longtrend 16:57, 11 July 2011 (UTC)
If, when you use them, you think to the verb, to the meaning of the verb, you feel you use the verb, then, they are verb forms. In French too, the phrase adjectif verbal is used by some authors, but it's misleading, because they are not verbs at all, their only relationship with verbs is etymological. And these authors don't use this phrase for participles... Lmaltier 18:15, 11 July 2011 (UTC)
That doesn't always work either. When I think of verwarring in Dutch, I definitely think of verwarren. The form with -ing is very predictable like this in Dutch. But it's not a present participle like in English, it's a verbal noun. I've never heard of this form being considered a verb form any time, but I still think of the verb when the word is mentioned. —CodeCat 18:23, 11 July 2011 (UTC)
Well, in some languages (Bulgarian...), such forms, even nouns, are traditionally mentioned in conjugations. This is why traditions of the language are important. Your reference is right when stating that participles share characteristics of verbs and adjectives. Actually, they are verb forms with some characteristics of adjectives. But it's wrong when stating "In English, participles may be used as adjectives" (cf. opened, see above). Lmaltier 18:29, 11 July 2011 (UTC)
Are there participles that can not be used as adjectives? Or can all participles behave as an adjective in all languages that have them? It seems more economic to me to say 'participles are adjectives that may sometimes be used as verb forms' than 'participles are verb forms that can always be used as adjectives'. —CodeCat 18:52, 11 July 2011 (UTC)
I just answered: opened is a participle, and is not an adjective. And, in French too, corresponding adjectives don't exist for all participles: they're rather common, but not systematic at all, for past participles, and much less common for present participles (note that, for present participles, derived adjectives often have the same pronunciation as the participle, but not the same spelling, e.g. intriguant is a participle, intrigant is the adjective derived from the participle). Lmaltier 19:07, 11 July 2011 (UTC)
I'm sorry, that's not what I meant. By 'adjective' I meant 'showing adjective-like behaviour', not necessarily having 'adjective' as its part of speech. Opened can be used like an adjective: the opened door. So my question is, are all participles able to be used as adjectives? Are they all able to be used in non-adjectival ways (which apparently implies 'as a verb form')? —CodeCat 19:10, 11 July 2011 (UTC)
By that approach, we might as well list all words as ===Adjective===, since all words show adjective-like behavior. —RuakhTALK 19:20, 11 July 2011 (UTC)
@Lmaltier: You honestly think that English participles cannot be used as adjectives? So all these are wrong? Do you have any reason for asserting that apart from your "emotional" analysis? If not, what is that analysis you're proposing based on? Even if we accept a semantic analysis, it's really fuzzy. When I say watcher as a nominalization of watch, I certainly think of the action expressed by the verb. So, is watcher a verb in your opinion? All the syntactic evidence suggests it's a noun, and we treat it as a noun. Longtrend 19:24, 11 July 2011 (UTC)
In my opinion, in the opened door, opened is used as a participle, not as an adjective. I think that it can be considered as an ellipsis for the door which has been opened. But you probably know better than me.
@ CodeCat: I already answered your first question just above. I add that, in French, past participles of 100 % intransitive verbs are never inflected, it would be quite absurd to consider that they behave as adjectives. Second question: yes, in French, participles can always be used as verb forms (as they are verb forms). In English too. Most typical uses in French (not the only ones) are in compound tenses for past participles, and in the "en + participle" form for present participles. These forms are clearly verb forms.
About watcher: of course, you don't feel that you use a verb when you use watcher, you feel that you use a noun derived from the verb. Of course, it's not a verb form. Lmaltier 19:34, 11 July 2011 (UTC)
I just fixed intrigant: I removed the verb form section for French (it was a Tbot mistake). As you can see, considering that participles = verbal adjectives leads to serious mistakes. Lmaltier 19:34, 11 July 2011 (UTC)
Do you care to explain why "of course" watcher is not a verb form but participles undoubtedly are? I'm sorry, but your criterion just seems to be circular and fuzzy. Why does a word belong to a certain POS? Because you feel it. Why do you feel that it belongs to the POS? Because it does. Longtrend 19:59, 11 July 2011 (UTC)
I never explained than all words directly derived from verbs are verb forms. I even explain that adjectives derived from participles are not verb forms, and that verb forms are not adjectives, even if they share some characteristics. Lmaltier 21:19, 11 July 2011 (UTC)
@ Longtrend: I don't speak German, so it's impossible for me to judge; but there are other things you can look for. For example, in English, a transitive verb's present participle can take a direct object even outside of explicit progressive/continuous constructions: “while heating the milk, continue checking the temperature and consistency”. (There are a few adjectives that take directly construed complements, as in “it was worth every penny”, but that's very unusual among adjectives, but absolutely universal among transitive verbs' present participles.) —RuakhTALK 19:20, 11 July 2011 (UTC)
Yes, this is possible in German as well. The Institut für Deutsche Sprache already quoted above states, in my translation: "The present participle -- unlike the past participle -- is never used as a part of analytical verb forms but only in contexts where adjectives occur otherwise. However, present participles show a verbal 'heritage' through their valency". So on the one hand, valency is an argument for verb status of present participles, but on the other hand, both inflection and distribution are arguments for adjective status. (Besides, I'm not sure why you accept valency as an argument for verb status of present participles [there are only few other adjectives taking direct objects] but at the same time reject gender agreement as an argument for adjective status of past participles [there are no other verbs inflecting for gender in French or Spanish].) Longtrend 19:46, 11 July 2011 (UTC)
There are also many languages in which past participles can inflect as adjectives even if they are from an intransitive verb. I think Latin is an example, and so is modern Icelandic: hann er kominn (he has come) but hún er komin (she has come), the endings of 'come' differ based on the gender of the subject. This is apparently unlike French (it would literally translate as il est venu and elle est venue), but it just shows how much variation there is in each language. —CodeCat 20:00, 11 July 2011 (UTC)
Sorry if I'm misunderstanding you, but « il est venu » and « elle est venue » are exactly how you say it in French. I guess you're thinking that in French it would be *« il/elle a venu »? Most French verbs form the perfect by using avoir (to have) and an uninflected past participle, and that's the case we were talking about above, but a bunch of common ones, including venir, form it using être (to be) and an inflected one. (Lmaltier erred when he wrote that "past participles of 100 % intransitive verbs are never inflected", unless he was rounding to the nearest percent. :-) ) Some verbs, by the way, can go either way, depending on syntax or semantics or speaker preference. And some use être and an uninflected past participle, for reasons that make sense if you know French but aren't worth going into if you don't. —RuakhTALK 20:46, 11 July 2011 (UTC)
If that's the case, then it seems to me that such a sentence is just a subject, copula and an adjective, much like 'elle est verte'. Venu is simply an adjective that means 'in a state of having come' (also etymologically), parallel to 'in a state of being green'. —CodeCat 20:49, 11 July 2011 (UTC)
Yes, I was wrong, venir is an intransitive verb with an inflected past participle (but I was meaning always intransitive verbs, not 100% of intransitive verbs). What I was having in mind was only verbs using avoir, the common case. And, no, in this sentence, venue is not an adjective, no Francophone would consider it as an adjective, it's part of the "passé composé" of the verb. Lmaltier 21:10, 11 July 2011 (UTC)
[after e/c] @CodeCat: No, sorry. I see why you would say that, and that may well be the origin of the construction; but in everyday Modern French « elle est venue » can simply mean "she came", without any implication about present circumstances. (And even in literary French, which retains a separate preterite construction « elle vint » for that sense, one can write something like « elle est venue trois fois », meaning "she has come three times", where I think it's a bit farfetched to posit a state of "having come three times". Certainly in English you can't say "the window is open three times".) —RuakhTALK 21:20, 11 July 2011 (UTC)
@Longtrend: I'm not rejecting gender agreement as an argument for adjective status, I just don't see it as conclusive. In French and Spanish, it is not only adjectives and sometimes past participles that show gender agreement, but also determiners (la femme, la mujer) and many pronouns (elle, ella; la tienne, la tuya); and many animate nouns come in masculine–feminine pairs that resemble gender agreement (japonais(e)ADJun(e) Japonais(e)N, japonés/esaADJun(a) japonés/esaN). And of course, many Slavic, Afro-Asiatic, and other languages have gender agreement even in finite verb forms, so it's not like it's unheard-of. —RuakhTALK 21:57, 11 July 2011 (UTC)
How to treat participles on Wiktionary — AEL 2

I'm not sure about languages other than English, but in English, there are some simple syntactical clues to tell whether a participle form has split off and become a full adjective. If it can be modified by very, it certainly exists as an adjective (and continues to exist as a participle). You can't say, for example that the sandwich was *very eaten that the letter was *very typed or that the world was *very created. I suspect a similar test would work in French. Would tres créé, tres dactylographié, or tres mangé be acceptable? Of course, this doesn't work all the time because not all adjectives are gradable. Another test is to see whether it can be the complement of certain linking verb other than be (particularly become), for example he became closed, the movie became interesting, and the muscles became bruised, but not *the letter became typed, *the sandwich became eaten, or *the world became created.--Brett 01:45, 12 July 2011 (UTC)

Yes, the sense of adjective is exactly the same in English and in French. Lmaltier
The test with 'became' only works for English, because in Dutch de boterham werd gegeten (the sandwich became/was eaten) is not just valid, it's very common. The test with 'very' doesn't always work either, because there are certain verbs that indicate a progressive action. These are especially common in Dutch, where they begin with ver- (although not all verbs in ver- have this progressive aspect). In these verbs, very would simply indicate that the progress had continued to an exceptional degree. decomposed is a good example: it was very decomposed. This does not necessarily indicate an adjective, since you could easily imagine that the decomposition process had progressed to a significant degree. There are probably a lot of other verbs like this. I'm not arguing that this means decomposed is a verb form in such cases, I'm just saying that the test is ambiguous. —CodeCat 10:26, 12 July 2011 (UTC)
As I said, I was making the specific point for English, but it seems likely that, in Dutch or other languages, there would be certain modifiers that will modify verbs and not adjectives or adjectives and not verbs. It might not be the equivalent of very, but there may be something. Similarly, while the Dutch word for become may take both verbs and adjectives as complements, there is likely some verb that will take only adjectives (or AdjPs) as complements.--Brett 11:09, 12 July 2011 (UTC)
I know nothing about Dutch, but would de boterham schijnt/lijkt gegeten be grammatical?--Brett 12:33, 12 July 2011 (UTC)
It would be grammatical even though it sounds a little strange, mostly because people would not say it that way. Dutch has a separate verb opeten which is used when something is eaten completely. It's also more usual to add te zijn after 'schijnen' or 'lijken' and an adjective: de boterham schijnt/lijkt opgegeten te zijn (the sandwich seems to be eaten up), just like de boterham schijnt/lijkt rood te zijn (the sandwich seems to be red). But de boterham schijnt/lijkt gegeten is not really wrong, because people will understand 'gegeten' as an adjective. —CodeCat 12:52, 12 July 2011 (UTC)
That's true in English as well; participles can productively be turned into adjectives. (Just as you can reply to "Are you inside yet?" with "Very inside", even though "inside" is a preposition rather than an adjective and "very inside" doesn't have a single specific meaning, you can reply to "Is it eaten yet?" with "Very eaten", even though "eaten" is a participle rather than an adjective and "very eaten" doesn't have a single specific meaning. For example, it could mean that even the crumbs got eaten; or it could just mean that it was eaten a long time ago: "Am I too late? Is the cake eaten yet?" "Very eaten. You're about a week too late." That doesn't mean that eaten is normally an adjective, only that participles can be stretched into use as adjectives.) —RuakhTALK 13:37, 12 July 2011 (UTC)
The discussion is going a bit in circles right now. If they can be used as adjectives in all cases (not including cases that some 'known' adjectives lack, such as comparison), then why are they not adjectives after all? It doesn't really matter if they have extra properties that most other adjectives don't. Do they meet all the minimum requirements to qualify as adjectives? —CodeCat 14:45, 12 July 2011 (UTC)
All words can be used as adjectives. The point of parts of speech is not "is it remotely possible to use this word in this way?", but rather, "is this how this word is normally used?" It is possible to press participles into service as adjectives, and this is a fairly productive process: plenty of normal adjectives (tired, interesting, closed) began life as participles. But most participles are not normally used this way. —RuakhTALK 14:58, 12 July 2011 (UTC)
Historically it's actually the opposite. The oldest participles in English actually began life as adjectives and only later became used as verb forms. Proto-Indo-European had no periphrastic tenses (or even tenses at all!), and even in Proto-Germanic participles were still mostly adjectival (compare the Old Norse and Icelandic examples above, which closely reflect the PG situation). I realise this doesn't really change the situation for English as it is currently spoken, but it does point out that the question of 'which was first' is definitely 'adjective'. The productive process eventually came to be reversed, but it was not always so. I think if you go back far enough in history, you'll find that many old English participles were originally adjectives, then became participles, and (maybe?) had adjectives formed from them again. —CodeCat 15:05, 12 July 2011 (UTC)
You'll forgive me for not just taking your word for that, given that you also think that participles today are definitely adjectives. Just because they're not used in any periphrastic verb constructions, doesn't mean they're not verb forms. (I'm certainly not saying you're wrong. I'm just not confident that, if I knew more about those languages, that I would agree with you.) —RuakhTALK 15:55, 12 July 2011 (UTC)
In PIE, the distinguishing feature between verb forms and verbal adjectives is that the former are based on aspect stems (stative, perfective and imperfective) while the latter are based directly on roots. Strictly, only aspect stems form verbs in PIE, since they are conjugated while roots are not (unless it's an athematic root verb such as Template:termx, but those are rare). The English weak past participle and the Latin perfect passive participle both derive from a verbal adjective in *-tos which was attached directly to the root and had no aspect-forming infix originally. Irregular weak participles like brought are still remnants of that. —CodeCat 16:11, 12 July 2011 (UTC)
@Ruakh: Isn't there a third group of "original" participles between those that you just mentioned (participles that cannot be used as adjectives or just in such a way that all words can, and lexicalized participles -- tired etc. -- that are now true adjectives independent of the original verb): participles that are regularly used as adjectives and are not in any way peculiar in such constructions. I'm thinking of such cases as the opened window (I'm not even sure whether this is grammatical -- please correct me if it's not!). It is not lexicalized as an adjective here (compare open), but it's not just a weird way to use an adjective either (compare *the cried child). Longtrend 15:11, 12 July 2011 (UTC)
I wonder why 'cried child' is strange but 'fallen child' is fine, especially since both cry and fall are intransitive. There must be something inherent in the meanings of these participles that makes them different somehow. Maybe some participles like fall are active by nature while cried is passive? —CodeCat 15:14, 12 July 2011 (UTC)
Is fallen child really acceptable in the sense "child that fell"? Or is it rather only acceptable under a lexicalized interpretation of fallen? Longtrend 15:24, 12 July 2011 (UTC)
@Longtrend: I believe "the opened window" is a reduced passive; you can also say "the just-opened window", for example, meaning "the window that had just been opened", or "the next-opened window", meaning "the window that had been opened next". It's not really an adjective; you can't say *"the very opened window", even though semantically that would make sense. —RuakhTALK 15:55, 12 July 2011 (UTC)
Okay, I think this makes sense for English. In German there is the exact same kind of construction (das geöffnete Fenster) and you can also say das gerade (just) geöffnete Fenster but not *das sehr (very) geöffnete Fenster. Here, however, the participle inflects just like an adjective. That is, unlike in the discussion we led above, it doesn't just take one category typical of adjectives (gender), but inflects according to a whole adjective paradigm. Would you still say the participle is a verb there, given that info? Longtrend 16:35, 12 July 2011 (UTC)
Yes, that's what I'd say: lexically speaking, it's a non-finite verb form, and grammatically speaking, it differs in consistent ways from true lexical adjectives, so it's best thought of as a ===Verb=== rather than as an ===Adjective===. But I'd say it very cautiously, doing my best to make very clear that (1) this is my tentative opinion based on almost no knowledge of the language at all and (2) I mean, I'm not a linguist or anything. I'm just doing my best to understand what linguists have figured out. —RuakhTALK 17:28, 12 July 2011 (UTC)
Okay, I appreciate your assessment anyway. What I don't like about that solution is that we'd weirdly have an adjective declension table under a Verb header. I wouldn't even know how to handle this. Longtrend 14:04, 14 July 2011 (UTC)
When the word is not an adjective, it's not an adjective declension table, it's a verb form declension table... This may be included in the conjugation table. Lmaltier 19:46, 15 July 2011 (UTC)

In Greek, μετοχή (metochí, participle) is one of the ten parts of speech, at least according to school grammars. Its special character of being something that shares (μετέχει (metéchei, participates in)) qualities of both verb and adjective makes it worth distinguishing it from other POS. On el.wiktionary we follow this distinction and use μετοχή as an L3 header for Greek words. I see that there is in use a Participle L3 header for "some Russian, Lithuanian, and many Latin entries" (Wiktionary:Entry_layout_explained/POS_headers). So I think that we could also discuss the possibility of a more extended use of this header. --flyax 15:29, 12 July 2011 (UTC)

That's what I originally considered the best possibility (or rather after Prince Kassad's comment in the initital discussion) since there appear to be cross-linguistic problems of assigning participle forms to parts of speech, but at the moment I tend to a language-specific approach (I'll give my arguments later). That doesn't mean, though, that it's impossible that more languages use a Participle header, let alone that the header is wrong for the languages that already use it. Longtrend 15:39, 12 July 2011 (UTC)

Since this discussion is currently inactive (thank you all for your contributions!), I'll try to sum it up and draw my personal conclusions from it. If there is one thing that we all agree on, I think it's the fact that the matter is very complicated and not easy to handle. Put more concretely, it is not desirable to simply have a linguistically universal Participle header for everything that is traditionally called a participle. Even if there seem to be cross-linguistic problems of assigning participles to a POS, each language should be considered separately and carefully.
For German, after this discussion and checking out some grammars, my personal impression is that the introduction of a Participle POS header should be taken into consideration. I'll give my arguments for that impression, which might also be relevant for other languages.

  • First of all, it should be questioned whether different kinds of participle in one language even form a more or less homogeneous class, or if they should be treated separately: e.g. for German, should present (pr.p.) and past participles (pa.p.) be treated the same or differently? Opinions differ slightly here, Peter Eisenberg's grammar Grundriss der deutschen Grammatik only treats pa.p. as infinite verb forms, but pr.p. as adjectives. But most grammars agree in putting pr.p. as well as pa.p. into the same class (mostly infinite verb forms). There is an interesting article by Heinrich Weber (unfortunately in German) discussing the classification of German participles on the basis of twelve criteria that help distinguish verbs from adjectives (such as including a verbal lexeme, governing accusative and/or oblique cases, usability as an adverbial, gradability). He comes to the conclusion that of those, pr.p. and pa.p. have eight charasterics in common, pr.p. and the infinitive six characteristics, pa.p. and infinitive also six, but only five common characteristics for pr.p. + adjectives / pr.p. + finite verbs and four common characteristics for pa.p. + adj. / pa.p. + finite v. So present and past participles have more characteristics in common both with each other and with the infinitive than with either finite verbs or adjectives. This is an argument in favour of treating German pr.p. and pa.p. basically the same, whatever that solution may look like.
  • So what header should we use for German participles: Verb, Adjective or what? All grammars I checked out agree in that pa.p. are to be treated as a verb form, but that most can also be used as an adjective. For pr.p., there is less of a consensus: For Peter Eisenberg and the Institut für deutsche Sprache (IDS), pr.p. are not (infinite) verb forms but adjectives that are merely formed from verbs. All other grammars I know of classify them roughly as verb forms, but some then weirdly say that they are used only as adjectives (such as canoo.net or the Duden-Grammatik which states that pr.p. aren't conjugational forms of verbs). Since pa.p. are also used to form complex tenses, I think we can agree that putting both pr.p. and pa.p. solely under an Adjective header makes no sense.
    What solid arguments are there against using a Participle header for German (for both pr.p. and pa.p.)? Traditionally, "participle" was considered a separate part of speech. This has changed, now they are often regarded either as verbs or as adjectives, so this might be an argument against the Participle header. But I believe this to be simply due to basic differences between grammars and such dictionaries as the Wiktionary. We here at Wiktionary are forced to assign each word form to a POS. This is not the case for grammars. If we can't decide for a POS after considering all relevant aspects, why not recognize that what we need may be a separate POS? It might seem that pr.p. in German can be perfectly treated as adjectives, according to syntactic distribution and morphological inflection. But then they govern arguments like verbs, are generally not prefixable by un- or gradable, etc. They simply don't fit either category. And the same is true for pa.p., which might seem to be clearly verbs. But then they can be used attributively, decline like adjectives, are sometimes governed by other verbs (unlike finite verbs, but like adjectives), etc. Let's assume we use a Verb header for German participles despite the adjectival characteristics. How would we solve the dilemma of needing to have an adjective declension table under a Verb header?

For those reasons, it seems to me that introducing a Participle header would be the best option for German. We could put declension tables there without a contradiction (as there would occur for declined "verbs") and at the same time link to the verbal origin. Just for clarification, lexicalized participles such as wütend or verrückt that are now true adjectives would of course be unaffected. All those who disagree with me: in which points exactly do you think I'm wrong or I drew wrong conclusions? I'd be very glad to hear your comments, especially since I really want to reach a consensus. I'm well aware that introducing a new POS to a language needs more justification than keeping the status quo -- but the status quo in this case is not an option, since currently we have no way at all to treat declined participles (AFAIK there is not a single such entry on Wiktionary yet). Longtrend 18:54, 15 July 2011 (UTC)

I imagine Dutch will be treated the same, because its participles are more or less identical to German ones. Is the situation for the Romance languages much like German as well (apart from the fact that they show gender agreement in predicates, which German doesn't)? —CodeCat 19:04, 15 July 2011 (UTC)
French and Spanish (the only Romance languages I speak) differ from German in important ways: (1) French distinguishes blatantly and obviously between present participles, which are very restricted in their uses and which do not inflect for gender or number, and adjectives derived therefrom, which are normal adjectives and often spelled differently from their participles; (2) Spanish has two different constructions that could be called "present participles", of which one (the gerundio; here we call it the "adverbial present participle") is considered to be a verbal adverb and does not inflect for gender or number, and the other (the participio presente; dunno if we have a name for it here) is no longer productive, but rather survives only as various nouns and adjectives; (3) neither French nor Spanish requires the declension tables that have Longtrend so bothered, since their adjectives and past participles inflect only for gender (masc/fem) and number (sing/pl), not for definiteness or position or case. The closest thing to that is Spanish forms like dándo-, which we're currently not worrying about SFAICT, and which anyway are further evidence for ===Verb===ness. Personally I still suspect that ===Verb=== is the way to go for German as well, but many of Longtrend's reasons for using ===Participle=== for German don't apply to French and Spanish anyway. —RuakhTALK 20:17, 15 July 2011 (UTC)
Thanks for the analysis and information you provide. It clarifies things much for German. My conclusion is that a Participle POS is no more justified in German than in English or in French. Why? Because specialists call them either verb forms or adjectives. It's possible to treat the declension of adjectives in an adjective declension table, and the declension of verb forms in the conjugation table. Lmaltier 19:57, 15 July 2011 (UTC)
But that would mean that some verb forms can have an adjective declension section. Do we want that? —CodeCat 20:05, 15 July 2011 (UTC)
@Lmaltier: Since when do you listen to specialists' analyses rather than the speakers' emotions? Or is it just because it's a convenient way to prove my point wrong? I already said why I think it is that participles are often treated either as verb forms or as adjectives. Until you respond to my arguments, I see no reason to take over your point of view instead. While responding, keep in mind that experts by no way agree in the decision whether participles are verbs or adjectives. Longtrend 20:12, 15 July 2011 (UTC)
I always said that we should not invent anything (and this is one of the basic principles of the Foundation),and that we should follow specialists, traditions of the language. And verb forms cannot have an adjective declension section, as they are not adjectives. The best place for the declension of these forms is the conjugation table. Also note that I don't propose anything on how to deal with the question in German (this is not easy if opinions differ among specialists, and it's true that a decision should be taken). I only think that, in German, we can do with the verb and adjective POS, according to what you explain. Lmaltier 06:30, 16 July 2011 (UTC)
You still haven't responded to my argument about the difference between grammars and Wiktionary. What linguists do agree in is that German participles have characteristics of both verbs and adjectives, so I have a bad feeling about just squeezing them in one of these groups. (And putting them in both groups would suggest that in one usage they are clearly verbs, while in the other they are clearly adjectives, which does not seem to be the case either.) I don't really see a problem about a Participle header, which to the contrary would solve those problems. This is my impression specifically for German, my proposal would not affect any other languages, since I know too little about them -- we should refer to linguists' analyses there as well. If you worry that Participle is not a proper POS, well, "proper noun", "prefix" and "symbol" aren't either, as is even in the ELE. Do you think the Participle POS is inappropriate in Latin, too?
You probably know what I meant by "verbs would have adjective declension sections": this was short for "verbs would have a declension section that would include exactly the same forms as adjective declension templates". And this would be a problem in my opinion. You propose to include those forms in conjugation tables. Just to make sure I understand you correctly: you want to change the verb conjugation template so it includes all the declined forms? But that's declension, not conjugation as the header would suggest. The contradiction remains. Participles inflect for completely different categories than normal verbs. Longtrend 09:56, 16 July 2011 (UTC)
For Latin, I don't know: when I was learning Latin, participles were ccnsidered as verb forms, but this tradition may be different in different countries, and may change with time. The tradition to be adopted is the one currently used for Latin in English-speaking countries. In French, nobody considers that it's a problem to consider aimée, aimées, aimés as conjugated forms of aimer. I don't see why it is a problem to decline a verb form. Lmaltier 11:19, 16 July 2011 (UTC).
Discussing this topic would be a lot easier if you replied to my arguments and all my questions... Longtrend 12:21, 16 July 2011 (UTC)
Maybe someone else wants to answer my questions and concerns then. To be honest, it's not that important for me to have a Participle header for German. I don't think a Verb header would be really wrong or anything. I just want a solution that works for German (and of course I want that solution to be as good as possible, so I still like the Participle solution best), and I can't imagine how a Verb header could work for declined participles. Any concrete suggestions? Even if so, why bother when "Participle" could do the job effortlessly and is obviously not really wrong (to say the least)? Longtrend 17:12, 19 July 2011 (UTC)
It seems strange to have certain verb forms listed under ===Participle=== rather than ===Verb===, given that we don't generally use different POSes for different inflected forms. I mean, assuming you're still planning on definitions like “Past participle of spielen.”? And I really don't see why the verb's ====Conjugation==== section, at the lemma (infinitive) entry, can't provide all forms of the participle. It just doesn't seem like ===Participle=== buys us anything. —RuakhTALK 17:35, 19 July 2011 (UTC)
Thanks for your reply. Well, IMO the advantage of ===Participle=== over ===Verb=== is that with the latter, we would say "this word is a verb" despite all its adjectival characteristics, while with the former we would admit that the issue is more complicated than that. AFAICT, it simply reflects the linguistic facts better.
My problem with listing all participle forms under the infinitive entry is the following: a form like spielendem (dative, masculine or neuter, singular) is clearly declined, not conjugated, it can clearly be traced back to a base form spielend. I'm not aware of any other case where there is a word form which on the one hand is an inflected form of some lemma, but simultaneously serves as the base form for a group of differently-inflected items. The latter in this case is declension, the former (allegedly) conjugation. Or is it just a terminological issue?
I also don't think it's so clear we're talking about "verb forms" here as you seem to assume tacitly. Saying that they are not verb forms but, well, participles, seems to work just well for Latin, see amāns: the "definition" is a translation, while the "present participle" part is in the Etymology section. The only problem I see is the following: German past participles, unlike present participles, all appear as part of complex tenses (which arguably makes them verbs, at least in this usage). And some intransitive ones cannot even appear in non-verbal positions or be declined, i.e. they show no adjectival characteristics. Of course, we might also use ===Participle=== in such cases and just omit the declension part, but this might fail to capture the fact that what we find here are quite unambiguously verbs.
Oh well. I certainly learned a lot about participles during this discussion, but regarding my initial question "How to list declined participles on Wiktionary?" I'm as perplexed as before. Longtrend 20:17, 19 July 2011 (UTC)
"Tacitly", my foot; I explicitly said I was assuming it, and added a question mark for good measure! Regardless, from everything you've said, it seems clear that German past participles, at least, are certainly verb forms, even if they are also adjective-like.   Re: "I'm not aware of any other case where there is a word form which on the one hand is an inflected form of some lemma, but simultaneously serves as the base form for a group of differently-inflected items": It happens. For example, in Hebrew, especially Classical Hebrew, if a verb-form has a personal pronoun as a direct or indirect object, then that pronoun can be incorporated into the verb-form as an additional nominal inflection; tishkakhénu, for example (in Lamentations 5:20; KJV "dost thou forget us"), is tishkákh ("thou dost forget", verb) + -énu ("us", object pronoun), where tishkákh is the second-person masculine singular imperfect/future/prefix-conjugation of shakhákh ("forget", verb). —RuakhTALK 04:07, 20 July 2011 (UTC)
Okay, but if Wiktionary's basic policy indeed is "Don't invent anything" (as Lmaltier claims, and as you might assume tacitly? SCNR...), then that's no option for us. Saying that spielendem is an inflected form of spielen (verbal infinitive) rather than spielend (present participle) is something I've never heard before. Compared to that, putting ===Participle=== as the POS is, if it all, just a ridiculously tiny "invention", and definitely not wrong (since they're definitely participles, we just don't agree if it's a proper POS). Longtrend 17:46, 20 July 2011 (UTC)
How to treat participles on Wiktionary — AEL 3

Sorry for not going through all this tl;dr discussion - has any kind of agreement been reached by now, or are people still arguing about tiny details? -- Liliana 15:29, 29 July 2011 (UTC)

There's no agreement, but at the moment there's no discussion either. I also don't think we ever argued about "tiny details". There is no established way to treat German inflected participles on Wiktionary, and no imaginable way seems perfect. If you have any input, please feel free to contribute. Longtrend 22:37, 30 July 2011 (UTC)

Just throwing in that on German Wiktionary, they do use POS headers called "Partizip I" (which is German for "present participle") and "Partizip II" (which is "past participle"). However, I couldn't find any discussion that led to the introduction of those headers (didn't search too long, though), and they don't seem to have any entry for an inflected participle, either. Note that inflected participle forms don't appear in verb conjugation tables. Longtrend 16:16, 1 August 2011 (UTC)

I'm not a linguist, but I added some 10,000 Swedish entries to the English Wiktionary, including many of the most commonly used words. In order to get things done in a limited time, I systematically treated past participles as adjectives, giving their role as a verb form in the Etymology section. See for example arresterad, bekräftad, debatterad. This works fine with the existing templates used for Swedish adjectives. So far, this has not been controversial at all. Some future linguist may perhaps argue that these are not actually adjectives, but if they have the time to change my edits, they will find the job easy to automate by the fact that I followed a single pattern. --LA2 10:26, 11 August 2011 (UTC)
Thanks for your input. Sounds like a workable solution, but the problem is that in many languages, including German, (past) participles are often clearly used as verbs, i.e. in compound tenses. Maybe that's not true for Swedish. Longtrend 22:31, 11 August 2011 (UTC)
For Swedish verbs, the form that is used with verbs is the supine, which was originally the neuter form of the past participle. But they are not always identical anymore, since verbs with participles in -en have a neuter form -et but the supine has -it (this distinction is not original, though). —CodeCat 22:47, 11 August 2011 (UTC)
I've just seen this discussion so thought I'd chip in with a note about Luxembourgish. It's pretty much the same as German; the Luxembourgish participle (only one, rather than two in German) can be used as an adjective (either attributive or predicative), but it is also used in many compound tenses. Most verbs in Luxembourgish only have conjugations for the present tense, so for those every other tense (past, future, conditional, etc.) is formed using the participle. Therefore just having the entry as either an adjective or a verb form would be inaccurate. BigDom (tc) 08:42, 5 September 2011 (UTC)

August 2011

earliest-attestation categories

In such cases as we an say with some certainty — perhaps through research or by appeal to the OED and other authorities — that the earliest a word can be attested is 1922, or circa 1922, do we want to categorize it as such? Say, category:English words first attested 1900-40, with corresponding categories for "...1940-60", "...1960-80", "...1980-2000", "...2000-20", and, working down, "...circa 1900", "...1860-1900", "...circa 1850", "...1800-1850", "...circa 1800", "...1750-1800", and so on (with, for earlier centuries, perhaps fewer than the four categories per century I've envisioned for the 19th and 18th. Anyway, my choice of specific categories here was off the top of my head. I'm asking about the general idea. Too, specific categories will vary by language, with Esperanto, say, having pre-1887, 1887-1904, and other categories, perhaps).—msh210℠ on a public computer 06:07, 4 August 2011 (UTC)

I like the general idea. --Daniel 23:41, 7 August 2011 (UTC)
It would be a lot of work, and we would find that many authorities disagreed on the earliest attestation of a word (I discovered that making this list; for quartz, for example, there's a range of more than a century). On the other hand, it could be quite useful for some things and to some people. I don't oppose the idea. - -sche (discuss) 23:46, 7 August 2011 (UTC)

Special:NewMessages

What's the deal wtih New messages, which appears in the upper right-hand corner of every page between "My watchlist" and "My contributions"? If I had new messages, wouldn't they be on my talk page? —Angr 13:38, 4 August 2011 (UTC)

We have installed here, for use by those users who want it on their talkpages, LiquidThreads. See, e.g., [[user talk:Yair rand]]. If you post something using that system and get a reply, it will show up on your watchlist as "You have new messages" or something like that with a link to [[special:newmessages]], which latter (is also linked to from the top of each page, as you've seen, and) lists all the replies you've gotten using LiquidThreads. If you want for any reason to hide the "New messages" link atop each page, add #pt-newmessages{display:none!important} to your CSS ([[special:mypage/vector.css]] if you use Vector).​—msh210 (talk) 15:51, 4 August 2011 (UTC) 16:03, 4 August 2011 (UTC)
OK, thanks. It does seem like it would be better to call it something other than "New messages", since that's exactly what new messages on one's user talk page are called. —Angr 15:57, 4 August 2011 (UTC)
Yes, the whole thing is somewhat poorly executed.​—msh210 (talk) 16:03, 4 August 2011 (UTC)

Updating anagram format

I've actually only avoided raising this issue as I consider it so minor in relation to other areas where we could make progress. Wiktionary:Votes/pl-2009-12/Modify anagram section of ELE is now out of date as {{alphagram}} displays nothing unless the first parameter isn't a valid page name. So this example:

* {{alphagram|opst}}: [[opts]], [[pots]], [[spot]], [[stop]], [[tops]]

displays in fact

That is, an isolated colon with a space either side of it, preceded immediately by a bullet point. I'd simply like to amend this vote to exclude {{alphagram}} and delete it (or RFDO it and let the community make that decision separately). Not necessarily delete it to never come back again, but it shouldn't be allow to be used whilst it's blank, and Conrad.Bot which added it in the first place is inactive, Conrad.Irwin hasn't said whether he intends to use it ({{alphagram}}) again. --Mglovesfun (talk) 14:07, 4 August 2011 (UTC)

Sounds good to me.​—msh210 (talk) 15:48, 5 August 2011 (UTC)

Romanizations of languages in ancient scripts

This point has been brought up before but it has never really been properly solved. Many old languages on Wiktionary were written in scripts that are no longer common and the texts in which they appear are more commonly published in romanized form than in the original script. The situation would be as if ancient Chinese texts were now almost exclusively published in pinyin. So although the original script was the only script used in contemporary attestations, modern readers will almost exclusively read texts in that language in Latin script. Grammars and dictionaries are written in Latin script as well, and this is the script that people will most likely want to look up words in. So I think using Latin script as the main script of these languages would have far more practical value for users than the original script ever will. I'm not saying that the words should not be present in the original script, but I would prefer it if we turned the tables: that the entries in original script link to the modern Latin-script versions of the terms. —CodeCat 14:15, 4 August 2011 (UTC)

Sounds like a good idea in principle, but there may be a spectrum with no clear boundary here. For example, Sanskrit is usually written in Devanagari in India, but usually in romanization in the West, so it's not clear which should predominate. (Of course, Devanagari is a script that's still widely used for modern languages too, so that may tilt the tables in its favor.) Some languages' scripts aren't even encoded in Unicode yet, like Tocharian, so everything in Category:Tocharian A language and Category:Tocharian B language is already necessarily in romanization. But I think you have a good point for, say Gothic and Primitive Irish. No one really goes around reading Gothic script or Ogam nowadays; instead, romanization is practically universal. Definitely worth thinking about. —Angr 14:42, 4 August 2011 (UTC)

To be honest, I'm a bit tired of you bringing up that topic once again. Last time, it's been shown that the community doesn't want this, repeating the whole discussion will not change a thing. You can try starting a vote if you really want to push this through, but whether or not this will pass is entirely up to the people here. -- Liliana 12:39, 5 August 2011 (UTC)

It bears consideration. Earlier this year somebody deleted our once considerable collection of romanized Sumerian, Akkadian, etc. I think these ancient languages that used inadequate or little-known scripts deserve at least the treatment that we permit for Chinese and Japanese. If an ancient language is usually studied in the English-speaking countries using the Latin script, then we should have the romanized spelling just like we do with Pinyin. Whenever entries can be created in the original ancient script (whether cuneiform, Devanagari, hieroglyphics, Mayan logographic script, Linear B, or whatever), then the romanized spelling could be made to redirect to the ancient script. Deleting ancient words that are added in the Roman alphabet was a dreadful loss, and maintaining all the entries in these ancient languages strictly in their lesser used and rather inaccessible scripts makes them not very useful to the people who want to study those languages. —Stephen (Talk) 13:24, 5 August 2011 (UTC)
I'm not at all sure about making the Latin alphabet the main script for these languages, as CodeCat suggests, i.e. having the Latin-alphabet entry be the primary one, while the original-alphabet entry merely says "<Original alphabet> spelling of <Latin alphabet>" or the like. But we should definitely have listings for the romanizations of such entries. For example, qino should be an entry saying something like "Romanization of 𐌵𐌹𐌽𐍉 (qinō)" rather than a red link. —Angr 14:13, 5 August 2011 (UTC)
Last time it came up, I saw no definitive resolution, certainly not that "the community doesn't want this". I find it silly to have entries only in scripts that they literally have never been published in.--Prosfilaes 21:32, 5 August 2011 (UTC)
Romanizations are generally mentions, not uses. However if you romanize a whole text, or a whole series of texts in Gothic, this would be Gothic in Latin script, right? So the uses of the words are uses and not mentions. Am I missing something here? So you don't need an exception to CFI. Mglovesfun (talk) 11:22, 6 August 2011 (UTC)
Yes, you’re missing something here. Ancient dead languages are not like modern living languages. Books and magazines are not being published in Sumerian or Akkadian. The vast majority of the known texts are in the form of images. Words of Spanish are used, but words in Sumerian are only mentioned. A couple of ancient languages are in the process of being revived, which is why we have an Old English Wikipedia; and a couple (Old Coptic, Ge'ez, Old Church Slavonic) are still in limited use liturgically, but most of these languages are simply studied, compared, and referenced. —Stephen (Talk) 12:36, 6 August 2011 (UTC)
I agree with Angr: ... qino should be an entry saying something like "Romanization of 𐌵𐌹𐌽𐍉 (qinō)" rather than a red link. — I support the idea of romanisation of some languages, Gothic and others, for practial reasons. I've been thinking about Gothic script for a while (see [1] and here) but I guess that most users won't be able to type any Gothic characters, so there should be some kind of romanisation. Most dictionaries and grammars use romanised Gothic, so we shouldn't be "more catholic than the pope". --MaEr 14:17, 6 August 2011 (UTC)
I agree with User:Angr and MaEr, "qino should be an entry saying something like "Romanization of 𐌵𐌹𐌽𐍉 (qinō)" rather than a red link". - -sche (discuss) 23:31, 6 August 2011 (UTC)
I agree too, it makes no sense to have entries that nobody can search for. Romanised entries would be very helpful. BigDom 09:48, 7 August 2011 (UTC)
I don't know if you're missing anything, but it doesn't seem to have been our practice to record Gothic in the Latin script, even though it so published.--Prosfilaes 18:37, 6 August 2011 (UTC)
Maybe I'm missing something, discussing is not my strong side. — Indeed, it isn't practice to record Gothic words in Latin script in this wiktionary, but in my opinion there should be some romanised entries that link to the Gothic script entries (as Angr suggested). Otherwise users have nearly no chance of looking up a Gothic word.
Imagine you find a word like aþþan in an etymological dictionary: how would you look up this word in Gothic script? --MaEr 09:22, 7 August 2011 (UTC)
Appendix:Gothic script -- Liliana 11:39, 7 August 2011 (UTC)
Do you really expect people to copy and paste each individual letter for every word they look up in Gothic?? —CodeCat 12:05, 7 August 2011 (UTC)
No. This is why the Gothic script is featured in the edittools, so you can just click on all the letters you need to create your words. -- Liliana 12:22, 7 August 2011 (UTC)
If I had to do that just to look up one word, I'd probably find a better dictionary instead... —CodeCat 12:29, 7 August 2011 (UTC)
I would think you could integrate the edittools toolbar onto the Main Page somehow, given good enough JavaScript skills. The setting on WT:PREFS doesn't seem to work for me, but in my opinion, this solution would be much easier to implement than a policy change. It wouldn't be a hassle for readers at all if it were implemented this way, as very many non-Latin dictionaries feature such a system. -- Liliana 12:34, 7 August 2011 (UTC)
This may work for some scripts like Gothic that are still superficially similar to Latin. But it would not work for cuneiform which is very different. Should we expect students of Hittite to learn cuneiform? —CodeCat 12:49, 7 August 2011 (UTC)
This would not work even for Gothic. If somebody wanted to look up the Gothic words eyz or noicz, how would he be able to transliterate into the Gothic script unless he knew the alphabet? (And, of course, he would need a Gothic font installed or he wouldn’t see anything but boxes.) Many Gothic transliterations are even more cryptic than these two. The edittools are there for our editors, and not really for someone trying to look up a word. People who study ancient dead languages have completely different goals than those who study modern languages, and most of them have little need to learn the ancient scripts, particularly if it is a difficult one like cuneiform or hieroglyphics, and more often than not will carry out most of their studies on words in the Roman alphabet. —Stephen (Talk) 13:13, 7 August 2011 (UTC)
It's not a hassle for readers at all to force them to transliterate words into an archaic script that isn't used any more? I think you have a different definition of the word hassle than I am, because I think that would be a PITA.--Prosfilaes 20:04, 7 August 2011 (UTC)
I count six people in this discussion who (seem to) want or tolerate Romanisations, and one person who doesn't want them; I presume more supporters and opponents have commented in other discussions. (I hope one of the other discussions explained why pinyin and romaji are allowed.) So, let's set up a page for a vote (but not start the vote yet), so that we can begin working out how the vote should be set up, and ultimately decide this issue. As to how to set up the vote: I suggest having (on the same page) different votes for different languages, so users can (for example) vote to allow Romanisations of Gothic but oppose allowing Romanisations of Hittite, if they want. - -sche (discuss) 20:35, 7 August 2011 (UTC)
I've created a vote: Wiktionary:Votes/pl-2011-08/Romanization of languages in ancient scripts. —CodeCat 21:34, 7 August 2011 (UTC)

AWB

Hi, I wondered if I could be put on the approval list to use AutoWikiBrowser. I used it over at Wikipedia even before becoming an admin there so I know how to use it. At the moment, I need it to fix a small error I found in the conjugation table {{lb-conj-regular}} which has affected a few pages and using AWB would be far quicker than going through the individual entries. Cheers, BigDom 16:51, 5 August 2011 (UTC)

I don't see a problem with it. Granted. -- Liliana 16:54, 5 August 2011 (UTC)
Thanks, appreciated. BigDom 16:58, 5 August 2011 (UTC)

Glosses in old languages

Some words in old languages, such as Old High German, are attested only as glosses with translations in foreign-language (usually Latin) texts, instead of in running texts. In theory this is a mention and not a use, so as far as I know those words would fail CFI. However, because of the special situation of old languages, especially those that are sparsely attested, should there be an exception for these cases? —CodeCat 21:46, 5 August 2011 (UTC)

This is also a problem for rare and recently extinct languages and dialects, such as the Vegliot dialect of Dalmatian, for which nearly all information comes from a German translation of an Italian text written by the scholar Matteo Giulio Bartoli and based on an interview with the sole surviving speaker of the language (and he was old, polylingual, partly deaf, and hadn't spoken the language in 20 years at the time of the interview).
For Classical languages and languages known only from scholarly publications, the attestation criteria are normally relaxed. --EncycloPetey 04:51, 8 August 2011 (UTC)
Liliana (Prince Kassad) is a fan of allowing mentions for otherwise unattested languages. I'm not really a fan myself, but I doub I would actually object to it if it were voted on. But I also doubt I'd support it. --Mglovesfun (talk) 11:00, 9 August 2011 (UTC)
For clarity's sake, I'd object to mentions being allow for all dead languages, particularly as some dead languages are quite well attested, better attested in writing than some living languages! But for otherwise unattested languages such as Dalmatian, I would neither oppose nor support it (what I said above). --Mglovesfun (talk) 11:01, 9 August 2011 (UTC)

Regional distribution of colloquial terms

For colloquial terms it is often difficult to find information about regional distribution. Evidence for pan-UK and pan-US usage is not too hard, but print evidence for other usage seems more difficult. How can we accumulate evidence on other colloquial use? Is there a tag-and-category arrangement that would help. Can we accumulate votes or opinions somehow, perhaps using an entry's talk page?

AFAICT, we have never had a systematic effort to address this. We have had individuals who advocated particular dialects (Ireland, Canada, Australia, and Singapore come to mind). No catch-all category addresses this problem. Do we need a project page for each region to provide focus for potential contributors who may have some familiarity with a particular region or dialect? What would be good regions or dialects for experimenting? Scotland? Ireland? Australia? AAVE? Canada? Southern US? India? DCDuring TALK 20:21, 6 August 2011 (UTC)

Two entries that illustrate issues are toey (See also WT:RFV#toey.) and stupid-head (which bears an invisible remark in {{attention}}). DCDuring TALK 20:43, 6 August 2011 (UTC)

Reviewing entry Talk pages that contain "black" yields some candidates for an AAVE page (or one otherwise named) that might also improve some definitions. DCDuring TALK 20:43, 6 August 2011 (UTC)

Compound tenses in conjugation templates

I have recently been creating conjugation templates for Luxembourgish verbs; example {{lb-conj-regular}}. I was wondering if there is really any need to have the compound tenses in there, as the patterns don't change between verbs so just showing the relevant auxiliary verb should be enough. Currently, the number of parameters required to use the templates is getting out of hand. This is mainly due to the Eifel Rule, which means that -n or -nn endings are removed if the following word begins with certain consonants. Would anyone object if I removed these tenses and just include the forms that are actual conjugations of the verb, as in the Dutch templates (e.g. {{nl-conj-wk}})? The only problem would be that I would have no idea how to convert the existing template calls into the new form. BigDom 20:34, 6 August 2011 (UTC)

P.S. the new template would look like this, which has parameters to add a line for preterite indicative and simple conditional if needed (only a few verbs have these conjugations). BigDom 23:09, 6 August 2011 (UTC)

You could take a look at the tables on the Galician verb cantar and Latin verb amō, to see how this has been handled for some compound tenses in Romance languages. I'm not familiar enough with Luxembourgish to offer an opinion to you. --EncycloPetey 04:43, 8 August 2011 (UTC)
That's fair enough, not many people are too familiar with Luxembourgish! I had a look at those, and came up with User:BigDom/Template:lb-conj, which is based on the Latin and French templates. BigDom 18:13, 8 August 2011 (UTC)

Including compound tenses in tables is a help to readers, and there is no good reason to exclude it (in a paper dictionary, the good reason would be the paper used). Lmaltier 19:31, 18 August 2011 (UTC)

d, di, de

It is not "bullshit", d and di are the alternatives of de (的) and de (地). Please see Basic Rules of Hanyu Pinyin Orthography Chapter 7.4. And see here and here. Engirst 11:28, 7 August 2011 (UTC)

That site (pinyin.info) does support your claim. How reliable is it? Do any printed reference works make the same claim? - -sche (discuss) 20:42, 7 August 2011 (UTC)
Please see the original Chinese edition Basic Rules of Hanyu Pinyin Orthography (The national standard of the People's Republic of China). Engirst 22:47, 7 August 2011 (UTC)
uhha... What exactly is your point? It obviously says de is the accepted form and the others are secondary (i.e. COULD be used) in 4.7.4, which comes back to my point, why are you putting in translations with secondary forms instead of the accepted primary form? Or maybe you just want to be difficult? JamesjiaoTC 22:52, 7 August 2011 (UTC)
Also please see printed reference works here and here. Engirst 04:36, 8 August 2011 (UTC)
And Chinese Romanization: Pronunciation and Orthography. Engirst 05:01, 8 August 2011 (UTC)
Right, "d" and "di" are clearly secondary forms. Do you think we should use them rather than the primary forms? Why? - -sche (discuss) 05:08, 8 August 2011 (UTC)
"But it may be desirable in certain situations to differentiate the three. In this case, they may be assigned different written forms: 的, the most commonly used, as "d"; 地 as "di"; and the third, 得, as "de"." (Please see here)
Anyway, the entry "di" shouldn't be deleted (Please see here). Engirst 05:56, 8 August 2011 (UTC)
Have you lived in China at all? Try using pinyin with people, yes that's not going to turn out good for ya. So no, pinyin will never replace characters and will always STAY a pronunciation scheme. Anyway the pdf you linked, it states at the very end of the section that Note: when necessary for technical purposes, the characters (referring to the 3 discussed here) may be spelled as d, di, and de respectively.. What technical purposes? What was your purpose to prefer di over de in your translations? It makes no sense whatsoever to do that. JamesjiaoTC 23:15, 8 August 2011 (UTC)
The subject of this topic only let everyone know that it is not a "bullshit". Engirst 01:36, 9 August 2011 (UTC)
de is the dominant form and will always be. You can list di and d as alternative forms, but should never use them in translations. It's misleading. By the way, stop creating pinyin entries until an agreement has been reached on how we will go about creating in the future. You will suffer a block again if you persist in your singleminded approach. JamesjiaoTC 20:52, 7 August 2011 (UTC)
We should follow Wiktionary's current rules and you too. Some entries of new format for experimental purpose just following your edit (please see here). Engirst 23:08, 7 August 2011 (UTC)
Alright, my understanding of the above-cited references seems to mirror Jamesjiao's: "d" and "di" are secondary forms which exist, and which should definitely be mentioned in the main entries ([[]] and [[]], I presume), but which should be disused elsewhere in favour of the primary forms. - -sche (discuss) 04:48, 8 August 2011 (UTC)
The pronuciation "di" (with no tone) exists for the two out of the three particles that have the normal reading "de" - Template:Hant and Template:Hant. This pronunciation is still common in songs and poems as the alternative to "de". It's seldom used in dictionaries and in my observation, it's discouraged in China like everything non-standard. I wouldn't include "d" at all. It must be incorrect pinyin, standard hanyu pinyin NEVER uses consonants on their own (without a vowel), apart from "r" (as a final only).
It is encouraged in China and is a National Standard. Please see the National Standard of the People's Republic of China for your reference. Engirst 06:45, 8 August 2011 (UTC)
YES, it is a standard. It is a standard for PRONUNCIATION for Mandarin speakers just like IPA is an international standard for pronunciation. A pronunciation standard is such that it doesn't contain any ambiguity (that's why English words themselves cannot be used as a pronunciation guide because their pronunciations are ambiguous!). It does not replace Chinese characters and never will. Jesus, how far would you go to twist and turn words like that to fuel your vain attempt at degrading this dictionary into a pinyin dictionary? I am not sure what I have to do to drill this into your brain. Why don't you just make an IPA dictionary as well for all the languages on this website? Go ahead. JamesjiaoTC 23:15, 8 August 2011 (UTC)
Where do you see "d" on its own? Hanyu pinyin is a national standard for romanisation and as the learning tool, not a replacement for the proper script - hanzi. --Anatoli 20:54, 8 August 2011 (UTC)
Pinyin entries are convenient for users to learn Chinese. Only you said that Pinyin are for replacing Hanzi. Engirst 00:44, 9 August 2011 (UTC)
BTW, pinyinfo is an interesting site using a lot of pinyin but their objective is replace hanzi with pinyin as the standard Chinese Mandarin script, same as our ill-famed abc123 aka Engirst, etc. --Anatoli 05:33, 8 August 2011 (UTC)
You have also chosen articles that favour your arguments. See this: w:zh:汉字改革, w:zh:汉语拼音, especially the section on 汉语拼音化 (pinyinisation). Oh yeah, another interesting to note that everything on the zh wp is written in characters, not pinyin!! Does that tell you something? JamesjiaoTC 23:29, 8 August 2011 (UTC)
Don't depart from the topic. The subject of this topic only let everyone know that it is not a "bullshit". Engirst 01:16, 9 August 2011 (UTC)
  • Alright, let's put usage notes at [[]] and [[]] explaining that "d" and "di" exist as (nonstandard? uncommon?) secondary romanisations of the characters, noting (if desired) which authorities/references give them as secondary romanisations. (Unless someone has a specific argument against providing this information, e.g. that the information is invalid. Even then — even if it is invalid — if it is in printed reference works, it would seem helpful to users to have a usage note like "XyzReference lists "d" as a secondary romanisation of this character, but this is wrong...") Consensus, however, is not to use those romanisations anywhere else. There furthermore appears to be an argument about whether Chinese is written in characters (such as ) or in pinyin, which is spilling over into this thread from elsewhere; consensus on that issue is clearly that Chinese is written in Chinese script characters. - -sche (discuss) 23:55, 8 August 2011 (UTC)

Vote: Attestation of extinct languages 2

FYI, I have opened the vote: Wiktionary:Votes/pl-2011-05/Attestation of extinct languages 2. --Dan Polansky 09:57, 8 August 2011 (UTC)

The problems of Mandarin entries

What is your suggestion for solving these problems? Engirst 10:11, 9 August 2011 (UTC)

Untoned pinyin is not allowed. We should follow rules. :) —CodeCat 11:20, 9 August 2011 (UTC)
We are talking about the search and redundancy problems. Please read these problems clearly first. Engirst 12:29, 9 August 2011 (UTC)
Entries are already searchable by pinyin on Wiktionary. Type in "yinyue" into the Wiktionary search bar and you'll see pinyin and characters are all searchable. ---> Tooironic 13:38, 9 August 2011 (UTC)
Please read these problems clearly first. Engirst 17:26, 9 August 2011 (UTC)
Engirst --
It is clear that Tooironic has already "read these problems clearly". As previously noted, entries are already searchable by pinyin on Wiktionary. Try it. Seriously. Enter toneless pinyin into the Wiktionary search bar, and the results you get are quite close to the "good solution" you link to on Jamesjiao's Talk page. WT effectively already implements what you are suggesting, obviating any need for toneless pinyin entries. -- Eiríkr Útlendi | Tala við mig 18:10, 9 August 2011 (UTC)
Thanks for your response. But the problems are: one problem about yapo mentioned by Contributions/71.66.97.228; another I am talking with Jamesjiao is about the duplication of traditional and simplified character entries. Engirst 20:33, 9 August 2011 (UTC)
Perhaps you could restate the exact issues, then? Reading User_talk:Jamesjiao#yapo, the primary issue appears to be about searching, which is already addressed, and about page overlap between toneless pinyin entries and other languages, which is moot since toneless pinyin pages are not needed and should be (are in the process of being?) removed.
I see your mention of duplication issues, but you do not give enough detail there for me to understand what you mean. Is your concern about duplication that the same entry content is duplicated across multiple heading words, such as and ? This is an issue for multiple languages, even English (c.f. color vs. colour -- the content should be mostly identical, as these are essentially the same word, only spelled differently -- just as, for example, 呪い and 詛い in Japanese).
Please explain. As it is, the main concern of yours that I can understand has already been dealt with. -- Cheers, Eiríkr Útlendi | Tala við mig 20:50, 9 August 2011 (UTC)
Yes, we are talking about the duplication such as and (Please see here as Jamesjiao mentioned).
This should be a good solution. There is no duplication of entries of the dictionary of this "good solution". Please see the search results of "蘋果", "苹果", "ping2guo3" and "pingguo", there is no duplication indeed. Engirst 03:57, 10 August 2011 (UTC)
  • Again, just ignore him, he's trolling again. It is true that the trad/simp entries could be synchronised in a way to make it easier to contribute, but so far no one has come up with any kind of solution. ---> Tooironic 00:46, 10 August 2011 (UTC)
For all that, Engirst has apparently hit upon a real issue that has been a conceptual niggling thorn in my side as well. However, the crux of the issue -- the need to have multiple index fields having the same descriptor content -- touches on one of the core limitations of the wiki structure: you can transclude, but you can't have more than one index field (i.e., headword) per page. Dictionaries like the one that Engirst points to as potential solutions use very different back-end database structures, something that is just not possible on the current generation of wiki software (and probably won't be possible for the foreseeable future). This structure works fine for an encyclopedia, but it has real shortcomings when people try to apply it to a dictionary.
Several months back, I recall participating in a similar discussion about how to unify English-language entries such as color and colour. There just doesn't seem to be an elegant way to do it; labeled section transclusion presents itself as one option, as does fancy selective transclusion using {{#ifeq:}} calls, but then the trouble is still that the content must reside under just one headword and then be referenced by the alternate spellings. Another option might be redirects, but then the destination of the redirects must include some way of explaining the alternate spellings and the reason for the redirection. The Semantic MediaWiki extension seems the most promising, and some folks have built interesting tools using this that might do the kind of many-headwords-to-one-entry structure that Engirst seems to desire, but I don't think this extension is enabled for WT, and it would require a gargantuan amount of work to support here.
So Engirst, if you're reading this, I do feel your pain -- but there's nothing for it, unfortunately, as the reason that Wiktionary needs separate pages for 蘋果 and 苹果, or and , or colour and color, comes down to the core fundamentals of how the wiki software is designed -- and that's not going to change any time soon. -- Cheers, Eiríkr Útlendi | Tala við mig 06:31, 10 August 2011 (UTC)
Don't feel too sympathetic, if you don't know the full story. The technical limitations were always there but the work on Mandarin and Serbo-Croatian was continued nevertheless, despite the necessity to maintain duplicate entries. People like Engirst slow down the work by not following the accepted rules and creating further redundancy, completely out of synch with existing simplified/traditional Mandarin entries, causing a lot of extra work for others. All the requests and blocks were ignored and he continued to do what he wanted using multiple anonymous accounts. --Anatoli 06:44, 10 August 2011 (UTC)
The problem doesn't really come down to "the core fundamentals of how the wiki software is designed", exactly, only to how the Wiktionary editing system works. If we were to switch to using javascript tools as the primary way to edit entries, synchronizing data would be pretty simple. Side question: Is there any specific reason that the pages with toneless pinyin titles don't get {{also}} added to them, pointing to the actual entries, or is it just that nobody bothered? --Yair rand 06:54, 10 August 2011 (UTC)

hypocorism vs diminutive

All the sources I've found listing some hypocorisms agree about some entries which we've qualified of diminutive currently here. For example, our definition of this last term is:

  • A word form expressing smallness or youth

And the article Johnny says:

  • A diminutive of the male given name John

And I doubt that all the Johnnies were named from an older or bigger John...

That's the reason why I suggest to uniform these etymologies, by replacing the different below mentions by Template:hypocorism:

  • Alec: diminutive
  • Alex: shortened form
  • Lex: pet form
  • Kat: short form
  • Joe: common nickname
  • Deb: abbreviated form

JackPotte 18:01, 9 August 2011 (UTC)

A diminutive also means a hypocorism. Isn't it simpler to add that definition to diminutive? Is the Wikipedia your only source? The diminutive definition is built in Template:given name. All diminutives/hypocorisms used to be defined as "given names", hence the confusion of terms above. Pet forms of given names are used in a different way in every language so strict standardization might not be a good idea. --Makaokalani 15:08, 10 August 2011 (UTC)
Wikipedia is reliable, and the frontier is clear, as there as into all the dictionaries I've read, including in French which translations (hypocoristique & diminutif) are fully transparent.
  1. ‘Nick’ is a diminutive of ‘Nicholas’, informally.
  2. diminutive: qualities such as youth, familiarity, affection, or contempt, as -let in booklet, -kin in lambkin, or -et in nymphet.
  3. hypocorism: the creation or use of pet names, as Dick for Richard.
"hypocoristic diminutive" isn't a pleonasm.
The origins of many surnames are obscured by one characteristic of the hypochoristic forms of many personal names, that is, the pet forms, diminutives, or 'short' forms of names.
Jacko is a diminutive (informal) whereas Jacky is an hypocorism...
I'll report these researches into our two articles when all the minds will be forged. JackPotte 20:56, 12 August 2011 (UTC)

Common nouns and proper nouns

I seem to be unclear on the difference between common nouns and proper nouns. Why, for example, is German a common noun when it means "a person from Germany" but a proper noun when it means "the German language"? It's capitalized in both meanings. —Angr 13:49, 10 August 2011 (UTC)

Capitalization does not make a noun proper, nor does lower case make it common. The difference between common and proper nouns is intrinsic and lexical, with the decision of whether to capitalize or not being secondary. Capitalization in most languages is more by convention than by type. Spanish does not capitlaize the names of languages, even though they are proper nouns. English capitalizes the days of the week, but does not really use them as proper nouns. German capitalizes all nouns. Further, capitalization even in English has varied through time, so that abstract nouns like socialism and liberty were once regularly capitalized even though they are not capitalized today. This reflects a change in style of writing, and not a change in grammar.
For more than you want to know about the difference between the two categories of noun, see the draft I started at User:EncycloPetey/English proper nouns. --EncycloPetey 14:36, 10 August 2011 (UTC)
Thanks for the link to your draft. It's the first time I've ever seen a definition of proper noun that wasn't circular. Usually when I try to pin someone down on why something is a proper noun, they say "Because it's capitalized." And when I ask why it's capitalized, they say "Because it's a proper noun". What will be easiest for me to remember is that proper nouns are always definite and don't get pluralized (although some proper nouns are pluralia tantum, like "the Netherlands" and "the United States"). As for weekdays, I think they can be both. If I say "I'll do it on Friday", it's a proper noun as it's referring to a single unique day, but if I say "There are five Fridays in this month", it's a common noun because it's referring to members of a class. Noticing that your draft is called "English proper nouns", I wonder if it's possible to come up with a cross-linguistic definition of proper noun. Other parts of speech like "noun", "verb", "adjective", and "preposition" can be defined without reference to the language they occur in (though of course not all languages have all parts of speech). —Angr 14:52, 10 August 2011 (UTC)
It is possible, but pushes into the realm of abstract linguistic philosophy, which will be understandable by few people. My choice was to work on a page treating English as exhaustively as possible, with enough examples and discussion to allow people familiar with other languages to make the extrapolation of the principles themselves. Even the quality of "definite" doesn't work across all European languages because there are shades of difference in what that means. Some languages have a "definite" and "indefinite" form for all their nouns. --EncycloPetey 14:55, 10 August 2011 (UTC)
  • CGEL makes a distinction between proper nouns and proper names which we never have in English PoS headings, AFAICT, though former versions of CFI did.
    "The central cases of proper names are expressions which have been conventionally adopted as the name of a particular entity - or, in the case of plurals like the Hebrides, a collection of entities."
    "Proper nouns, by contrast, are nouns which are specialized to the function of heading proper names."
    As I would apply these definitions "German" is both a common and proper noun. It is a proper name when referring to the language. This usage seems to make it a proper noun. When referring to the people "(the) Germans" would seem to be the proper name. When referring to an individual "German", it is a common noun (capitalized). It also seems to function as a full adjective, being gradable, comparable, and able to serve as a predicate without any article or determiner. DCDuring TALK 15:09, 10 August 2011 (UTC)
The CGEL distinction between "proper noun" and "proper name" hinges on the fact that they define a word as a cohesive unit lacking internal spacing. Since Wiktionary works with terms as its units, and permits internal spacing in these terms, the distinction between a "proper name" and "proper noun" becomes moot. But, your summary of what CGEL says is spot on. --EncycloPetey 15:24, 10 August 2011 (UTC)
It is also true that typically we do not have Proper noun PoS sections at entries like [[Germans]] for "the Germans". Shouldn't we? DCDuring TALK 19:11, 10 August 2011 (UTC)
Also, aren't informal demonyms also proper names and, therefore, proper nouns for Wiktionary purposes, eg, "(the) Brits"? Even derogatory ones would be. DCDuring TALK 19:18, 10 August 2011 (UTC)
Informal demonyms have some properties of a proper noun, but that's true of most substantive biological nouns, and not just demonyms. Compare: "We planted a conifer." to "The conifers grow in boreal climates." In the former sentence, you a speaking of a member of a group, but in the latter, you are referring to the category as a whole. We generally do not create a separate entry for these collective senses. --EncycloPetey 19:23, 10 August 2011 (UTC)

Nei Mongol - Why is it locked?

"Mongol" of Nei Mongol shouldn't be an "abbreviation for Mongolia" (please see here for reference). Anyhow, the entry shouldn't be locked. Engirst 02:33, 11 August 2011 (UTC)

Why is it locked?

The etymology of Nei Mongol seems has problem. Engirst 00:57, 1 October 2011 (UTC)

Library of Congress vocabularies

Over at http://id.loc.gov/ you will find search and download entries to the Library of Congress' subject headings, name authorities and other vocabularies. For example, in the Geographic Areas file, you will find that "Sweden" has a broader term "Europe" and narrower terms such as "Lapland". This means library books tagged as Lapland may contain information about Sweden and Europe. It might be a hint that the Wiktionary entry Sweden should contain a pointer to the Wiktionary entries Europe and Lapland. I don't know if this is a useful source of ideas, but you can download it and play with it. Wiktionary has 81 links to loc.gov but none yet to id.loc.gov. --LA2 10:52, 11 August 2011 (UTC)

It might be great as an authoritative substitute for our current unreferenced, whimsical topical category structure. It would allow the reversal of the hijacking of the usage context labels. DCDuring TALK 12:17, 11 August 2011 (UTC)

Admin-only definition editing options trial

It was suggested in the earlier discussion on enabling the definitions editing tool for a trial period that it would be better to first have opt-out trials for only administrators. So what do people think about turning it on for two weeks for admins? --Yair rand 18:54, 11 August 2011 (UTC)

Symbol support vote.svg Support.RuakhTALK 18:04, 12 August 2011 (UTC)
Symbol support vote.svg Support. DCDuring TALK 18:36, 12 August 2011 (UTC)
Okay.​—msh210 (talk) 17:53, 14 August 2011 (UTC)
Symbol support vote.svg Support, sounds good to me. --Neskayagawonisgv? 07:04, 15 August 2011 (UTC)
Okay, trial started. I'll put a disabling button in this section. Anywhere else that should have a disable button? Maybe in WT:News for editors, or is that not really kind of thing that goes there? --Yair rand 21:06, 17 August 2011 (UTC)
The trial is now over. --Yair rand 20:58, 31 August 2011 (UTC)

What counts as a "derived term"?

A string of dubious and excessive edits to Japanese entries leads me to wonder, what counts as a "derived term"? Do simple compounds warrant listing as "derivations"?

By way of example, have a look at the 魔法#Japanese page. The list of "derived terms" includes things like 魔法カード "magic card" and 魔法能力 "magic ability", among others. Both of these are just plain old compounds -- one word plus another -- and I could just as validly say 魔法茄子 "magic eggplant" or 魔法鉛筆 "magic pencil". Note that these terms are not customary set phrases, like magic carpet, but just plain old compounds.

Do compounds like this, of the exceptionally prosaic and unremarkable sort, merit inclusion in lists of "compounds" or "derived terms" on entry pages? -- Cheers, Eiríkr Útlendi | Tala við mig 06:14, 12 August 2011 (UTC)

To clarify, now that my brain has picked up some speed, how do we decide if a combination of words is just a sum of parts, or if it counts as something more? -- Eiríkr Útlendi | Tala við mig 06:16, 12 August 2011 (UTC)
Have you looked at WT:AJA? There is an associated talk page. Note the existence of an "Idioms" header. DCDuring TALK 18:40, 12 August 2011 (UTC)
Thank you DCDuring, yes, I have looked at that page. What I'm wondering about here is not quite about idioms, but rather what counts as a "derived term" or "compound". The WT:AJA subsection Derived terms doesn't quite answer the question. (But thank you for prompting me to read through that page again, as it clarifies that only kanji headwords should have a "Compounds" section.) -- Cheers, Eiríkr Útlendi | Tala við mig 19:01, 12 August 2011 (UTC)
I was hoping yours was a Japanese-specific issue.
I don't think this is a settled question at the margins - and the margins are ample. What can be under Derived terms would include all morphological or historically derived terms that meet WT:CFI. (I personally would prefer to put terms that are historically derived from other languages despite there being a morphological process to which the etymology could be ascribed in Related terms.) But there has also been inconclusive discussion about the desirability of inserting common collocations under Derived terms. I personally prefer having certain collocations illustrated in usage examples rather than in Derived terms, but, especially for large entries, citations appear on the Citations page where they are not searchable by default. DCDuring TALK 19:46, 12 August 2011 (UTC)

unified Serbo-Croatian... by bot

Would it be acceptable to convert the subpages of Category:Croatian parts of speech to Serbo-Croatian by bot, as opposed to only by hand. It would be risk free, it wouldn't be possible to add the Cyrillic spellings but that's about the only thing a bot can't do. Specifically

  1. Convert ==Croatian== to ==Serbo-Croatian==
  2. Convert |hr}} to |sh}} (etyl templates)
  3. Convert {{hr-decl-noun| to {{sh-decl-noun|
  4. Convert |lang=hr to |lang=sh
  5. Convert [[Category:Croatian to [[Category:Serbo-Croatian
  6. Convert [[Category:hr: to [[Category:sh:
  7. Convert {{infl|hr| to {{infl|sh|

This would leave {{hr-adj}}, {{hr-noun}} and {{hr-noun-coll}}. {{hr-noun-coll}} has few enough transclusions that it can be done by hand in a few minutes; not so for {{hr-noun}} and {{hr-adj}} though; these as a temporary measure could categorize in [[Category:Serbo-Croatian <adjectives|nouns>]] while waiting for them to be removed; or, depending on your taste, AWB can skip any pages featuring these two templates. I'm pretty sure you can set up AWB (AutoWikiBrowser) to skip if it finds a certain sequence of characters on a page, such as {{hr-noun|. Mglovesfun (talk) 12:59, 14 August 2011 (UTC)

The work of editors that have chosen Croatian headers, etc. must be respected. Lmaltier 19:14, 18 August 2011 (UTC)
Even if I bought into this argument (which I don't) Ivan Stambuk created the majority of the Croatian entries, and he supports converting them to Serbo-Croatian. Mglovesfun (talk) 08:58, 19 August 2011 (UTC)
I have no objections. —Internoob (DiscCont) 19:02, 21 August 2011 (UTC)
It looks interesting and I support it but not sure what to do with existing Croatian, Bosnian and Croatian translations, they may coincide (same words) or differ with existing Serbo-Croatian translations, many of them are not formatted with {{t}}, only have square brackets. Serbian may have nested Cyrillic and Roman (occasionally Latin) translations and sometimes no nesting. If they coincide with existing Serbo-Croatian translations, they should not add duplications. --Anatoli 00:26, 22 August 2011 (UTC)
User:Mglovesfun/vector.js converts Serbian to Serbo-Croatian in translation tables, but does not convert Bosnian and Croatian to avoid possible duplication. Note: translation templates don't appear in the above proposition. The reason the vector converts Serbian is that they're the closest alphabetically and that only Serbian uses both script. Also, some Serbian translations only use the Cyrillic script and use a transliterationn into the Latin script, this despite the fact the Latin script is official in Serbian. I read a book on the matter while I had no Internet, and the book wasn't even a recent print! Mglovesfun (talk) 07:58, 22 August 2011 (UTC)
Serbian uses both Cyrillic and Roman. --Anatoli 09:50, 22 August 2011 (UTC)
Yes... Latin and Cyrillic. Mglovesfun (talk) 10:12, 22 August 2011 (UTC)

Current votes

These are the current votes:

--Daniel 04:04, 15 August 2011 (UTC)

Thanks! --Neskayagawonisgv? 07:03, 15 August 2011 (UTC)

native-languages.org

Hello, do you think we could get informations from this website. On their FAQ, we can read

Q: May I reprint information from your website on my own website or blog?
A: Yes, as long as you link back to our website from the page where you have used our information.

Yet, there is also

Q: I am a teacher. May I use information from your website in my classroom?
A: Yes. All of the materials on our website may be freely used for noncommercial educational purposes.

Problem is the first affirmation looks mean we can get this information if we cite them but the second one says it is forbidden for commercial uses what is not compatible with Wiktionary licence. Maybe, we could write them to ask if we can exeptionnally import these data on Wiktionary? What do you think? Pamputt 07:41, 17 August 2011 (UTC)

To be honest I don't really trust their information, so it's probably not worth asking. -- Liliana 09:31, 17 August 2011 (UTC)

including context tags in inflected forms (of sh entries)

This vote (which links to kolovoza) raises for me a point worth discussing: for Serbo-Croatian entries, should we allow dialect/sublanguage context tags not only in main entries (kolovoza), but also in form-of entries? That would have the benefit of clarifying that the series of letters kolovoza is only used in Croatian; it might have the disadvantage of making readers think (until they clicked through to the main entry) that kolovoza was a Croatian genitive of a pan-Serbo-Croatian word and Serbian used a different genitive, like *kolovozu. Note that I say allow, not necessarily require (uniformity is good, but the work could be left to the editors who wanted to do it). - -sche (discuss) 07:55, 18 August 2011 (UTC)

Dunno, as an analogy I dislike something like:
==English==

===Noun===
'''favors'''

# {{US}} {{plural of|favor}}
As my initial reaction reading this is 'what is the non-US plural of favor'? -Mglovesfun (talk) 10:17, 18 August 2011 (UTC)
Inflected form entries are just glorified redirects, not mirrors of lemma entries. We should limit such additional content to cases for which the inflection has a context different from that of the lemma. Tne Serbo-Croatian aspect of this is a result of the vote on that matter. I'm surprised we haven't gotten more pushback on that vote. DCDuring TALK 12:26, 18 August 2011 (UTC)
I agree with Mglovesfun that this part of your comment: "it might [… make] readers think […] that kolovoza was a Croatian genitive of a pan-Serbo-Croatian word and Serbian used a different genitive, like *kolovozu" hits it exactly on the nose. Form-ofs shouldn't duplicate lemmata's context-tags. That said, what if kolovoza really were a Croatian-specific genitive of a pan–Serbo-Croatian word? What about rare/archaic/dialectal plurals of ordinary English nouns? I think context-tags are potentially useful in those cases. [After e/c: this also seems to be what DCDuring is saying.] —RuakhTALK 12:40, 18 August 2011 (UTC)
I agree with this, I can find one specific example:
====Verb====
'''spelt'''

# {{chiefly|British}} {{past of|[[spell#Verb|spell]]}}
This to me seems to be correct. Mglovesfun (talk) 19:22, 18 August 2011 (UTC)
In addition to that type, with an explicit context tag, we have entries like [[boyz]], which uses {{form of}} with the in-template equivalent. Arguably it should have a better register indication than "informal". DCDuring TALK 20:05, 18 August 2011 (UTC)

Pinball category?

What up homies. I've been adding some pinball terms. I don't know whether it merits a category, and don't know or care enough about categories to create one, but if anyone is so inclined then the following terms might possibly qualify: autoplunger, backbox, backglass, flipper, flipperless, gobble hole, kickback, knocker, multiball, rollunder, rollover, rolldown, outhole, outlane, pinballer, playboard, plunger, silver ball, sinkhole. Equinox 21:46, 18 August 2011 (UTC)

I don't see why not. Mglovesfun (talk) 08:13, 19 August 2011 (UTC)
Did you miss digit counter (it's in "Pinball Wizard"). SemperBlotto 08:19, 19 August 2011 (UTC)

{{suffix|verb|t}}

Could I please continue adding etymology to verb forms ending in -t‽ --Pilcrow 22:40, 18 August 2011 (UTC)

I think you'd better avoid adding "===Etymology=== {{suffix|verb|t}}" to the likes of "dreamt", and avoid adding "===Etymology=== {{temp|suffix|verb|ed}}" to the likes of "dreamed". In general, I think verb forms should better have no etymology section, with exceptions in those cases where the etymology is unusual and of special interest.
For the record, there exist the following categories:
--Dan Polansky 10:14, 20 August 2011 (UTC)
But the -t suffix is irregular. If etymology sections should not be included, could I at least categorize those forms‽ --Pilcrow 16:40, 20 August 2011 (UTC)

hand and

Note that the hand picture links for these Dutch, Swedish and Mandarin entries these point to #English in the links - how to fix this? ---> Tooironic 01:15, 21 August 2011 (UTC) Also note that Category:Visual_dictionary contains both English and LOTE entries - is it supposed to mixed up like that? ---> Tooironic 01:20, 21 August 2011 (UTC)

{{picdiclabel}} has a language parameter (lang). Mglovesfun (talk) 10:09, 21 August 2011 (UTC)
How do you use it? I'm a newbie about these kind of things. ---> Tooironic 12:46, 22 August 2011 (UTC)
Like this, as you can see it took me two goes to get it right. Mglovesfun (talk) 12:48, 22 August 2011 (UTC)
Awesome. Thanks. How about the Dutch and Swedish at hand? ---> Tooironic 21:34, 23 August 2011 (UTC)
Was just about to do this for you, but Mglovesfun has beaten me to it! It's fixed now anyway. BigDom 22:32, 23 August 2011 (UTC)

Preferred forms for Japanese lemmata

Haplology and I wound up conversing a bit on the subject of lemma forms for Japanese, as relating to the keiyōdōshi part of speech (also known as "quasi-adjectives", and better known among Japanese learners as "な (na) adjectives"). So far, every Japanese dictionary that I've ever seen uses the uninflected base form of a な adjective as the headword -- except Wiktionary. For reasons lost in the mists of history, Wiktionary alone uses an inflected form of な adjectives as the headword, by including the な on the end. This causes some odd inconsistencies, such as the base uninflected forms being mostly just stub entries, sometimes being missing, sometimes being redirects to the inflected forms with な, and also to the base forms sometimes being classified as nouns (which is never correct AFAICT).

I was under the general impression that, while Wiktionary happily includes inflected forms of a word, the main entry should be under the uninflected form, with the inflected forms mostly just pointing to the main entry. (I cannot find an explicit description of this policy, neither at WT:ELE nor at WT:AJA; perhaps this could be added?) This holds true at least for English, German, Spanish, Latin, Korean, and Navajo terms, and for Japanese verbs and い (i) adjectives, to the best of my knowledge. If this understanding is correct, would anyone object to Japanese editors keeping the main entries for な adjectives under the uninflected, な-less base forms? -- Eiríkr Útlendi | Tala við mig 16:14, 23 August 2011 (UTC)

I second this. It's easier as well, otherwise you would need to keep both entries - with な and without it. The same is true for の (-no) adjectives. You can add an adjective section to existing noun entries, e.g. Template:Jpan. --Anatoli 20:28, 23 August 2011 (UTC)

@Anatoli: Thanks for replying.

Some additional questions / considerations:

There are lots of entries with [Japanese word] + な, or [Japanese word] + に. By everything I've read (not just on WT), these な and に are essentially particles, which makes these entries just sum-of-parts and thus not meeting WT:Criteria for inclusion. I tried to explain some of how this area of Japanese grammar works, and why this means such entries are SOP and thus not valid, over at WT:RFD#親切に. The way Japanese keiyōdōshi work makes this even more important, in that keiyōdōshi are *both* adjectives and adverbs at the same time; in kanji compounds, there is no distinction between adjectival or adverbial senses, and in spoken or running text, the distinction is made by using either the な (or の for those rarer の-type keiyōdōshi) or the に particles -- i.e., by adding a separate word.

I propose that keiyōdōshi entry content be kept under the main keiyōdōshi headword, without any particles. Adjective/adverb senses can be shown by using Template:ja-na or similar, as is currently the case over at 特別. I further propose that the keiyōdōshi + particle entries be deleted, as these are sum-of-parts and are thus no more worthy of inclusion than English phrases like an apple or to the store.

However -- since keiyōdōshi are equally adjectives and adverbs, the WT:AJA#Quasi-adjectives (形容動詞) recommendation to use level-three or -four Adjective headings seems inadequate, as this ignores adverbial senses. So:

  • Should we include both Adjective and Adverb headings for keiyōdōshi?
  • Should we instead use some other heading, such as Keiyōdōshi, Quasi-adjective, or something else?
  • Should we use just the Adjective heading, and 1) add something to Appendix:Japanese_glossary about this? 2) create Appendix:Japanese_grammar? 3) refer users to w:Japanese grammar?

TIA for your input, -- Eiríkr Útlendi | Tala við mig 18:34, 25 August 2011 (UTC)

I would go with including both Adjective and Adverb headings for keiyōdōshi. While clear to people familiar with Japanese grammar, other headings would be confusing to most readers. If both Adjective and Adverb are included next to each other, it should be clear from the juxtaposition that the word is both parts of speech at the same time. Such a system would also be easier for new contributors to pick up. In addition, it would be the most aesthetically pleasing, in my opinion.
In any case I agree that pages with -na or -ni should be deleted and the content moved to the headword without -na or -ni. Haplology 15:36, 27 August 2011 (UTC)

quantitative easing

How to remove "Lithuanian nouns lacking gender"? ---> Tooironic 21:40, 23 August 2011 (UTC) Or is that supposed to be there...? :S ---> Tooironic 21:41, 23 August 2011 (UTC)

Someone has to add the gender info into the entries, then the number will be reduced. --Anatoli 22:20, 23 August 2011 (UTC)
Does {{g}} belong in translations? DCDuring TALK 22:22, 23 August 2011 (UTC)
Yes, the gender was lacking from the translation. I've added a pos parameter to {{g}}, so in theory at least you could do {{g|lt|pos=translations}} and move it to Category:Lithuanian translations lacking gender. Mglovesfun (talk) 22:29, 23 August 2011 (UTC)
I thought gender was supposed to be a named parameter of {{t}}. DCDuring TALK 22:44, 23 August 2011 (UTC)
Unnamed actually, the third parameter (like {{t|fr|foo|f}}). But when there is no gender, users can choose to add {{g|fr}} or {{g|French}}). This doesn't cause any problems per se, but could apply to probably every English entry with a translation table. I tend to add genders when they're missing, but as long as the gender is in the target (the translation) I say there's no need to worry about it. Mglovesfun (talk) 10:28, 24 August 2011 (UTC)

Adidas

Been thinking about this and seeing this specific entry. I've decided to use it as an example. WT:CFI line one says "As an international dictionary, Wiktionary is intended to include “all words in all languages”." Is this a word in a language? WT:CFI makes no attempt to define word or language in this specific context. Normally I'd be happy to consider this a word, for our purposes it might be better to consider commercial coinages like this to be nonwords. Furthermore, I wouldn't consider this English, but rather Translingual. For example on a bottle Palmolive of shower gel I had, the translations into Russian and Greek (as well as all the other languages) used the word Palmolive in the Latin script. So, one possible to the issue of brand names is to not consider them words in any language for CFI purposes. Mglovesfun (talk) 22:19, 23 August 2011 (UTC)

Bear in mind that a lot of brand names are actually translated, though. I've seen some ingenious cases: if you get an Arabic or Georgian bottle of Coca-Cola (yeah, my paper shop gets the bottles that "fell off a lorry"), it has almost the same logo, but reworked slightly so as to write the name in the appropriate script. I suppose that makes it a different word, even if it's only a transliteration. Equinox 22:21, 23 August 2011 (UTC)
Some of the best things we could do, IMHO, is remove the section for brand names from CFI, and include brand names whenever they are single-word and attestable. Brand names do not create any problems; they are just disliked as uncustomary for a dictionary, in spite of the fact that useful lexicographical information can be recorded on them, including pronunciation and etymology. You seem to be proposing the very opposite: to exclude all brand names. We can argue whether brand names are words, but fact is they have many properties typical of words: they get pronounced, they get printed, they take positions in sentences, they serve as a basis of derivation (there is the Czech word "adidasky" derived from "Adidas"), they have an etymology, etc. --Dan Polansky 09:03, 24 August 2011 (UTC)
That's another approach, used by the French WT:CFI. It can get a bit silly doing it that way; names of films and books and TV series and whatnot. Mglovesfun (talk) 10:28, 24 August 2011 (UTC)
You don't need to include all multi-word names of works ("Much Ado About Nothing") in order to include all attestable single-word brand names. By contrast, "Lysistrata", a play by Aristophanes, should IMHO be included, if only for its pronunciation--Wikipedia even has different UK and US pronunciations. --Dan Polansky 12:36, 24 August 2011 (UTC)
FWIW, regarding specifically Talk:Adidas, was the RFD ever closed? Because it looks like a fail with two people wanting to delete it, and none wanting to keep it. Mglovesfun (talk) 18:28, 25 August 2011 (UTC)
From Talk:Adidas and the archived discussion, it follows that a RFD started on 16 September 2007. There, two people wanted to delete the page--Connel MacKenzie (who claimed that it was "spam"[2], having tagged the entry in this revision, which had the non-promotional definitions "The German sports apparel manufacturer adidas AG, formally founded in 1949" and "A clothing product of this brand, especially a pair of shoes") and Williamsayers79, while two people were sympathetic with the entry even if stating no boldfaced "keep": DAVilla, and bd2412. Connel was utterly anti-brand, as follows from the vote linked to below: "There is no reason to include any brand name, product name or trademark in a dictionary [...] --Connel MacKenzie 17:41, 31 August 2007 (UTC)". In case of doubt, you may send "Adidas" to a new RFD (for which I vote keep) or to RFV via WT:BRAND, but this is the sort of entry that is likely to meet the current strict requirements of WT:BRAND. The 2007 RFD on "Adidas" was running in parallel with the second vote on brand names, which was running from 5 September 2007 to 5 October 2007 (Wiktionary:Votes/pl-2007-08/Brand_names_of_products_2), a vote that had bearing on whether "Adidas" met CFI. --Dan Polansky 08:23, 26 August 2011 (UTC)

"Category:en:Planets" with proper nouns only, etc.

Since Wiktionary:Votes/2011-07/Categories of names failed and our categories for names of things are named with language codes, I suggest letting these categories be populated only with proper nouns.

This means:

This is already the common practice concerning most of these categories. Compare:

Thoughts? --Daniel 00:25, 25 August 2011 (UTC)

I think I intuitively approve of this. Disregarding my dislike for Category:Fictional characters, I would rather it only contained actual characters and not things like protagonist and soubrette. Equinox 00:36, 25 August 2011 (UTC)
I think it could be useful to group words relating to (for example) rivers, though, like river, tributary, etc. I would not mind using appendices for that, though, or very long ===See also=== sections, or (perhaps the best option) linking to appendices from ===See also=== sections. I have no very strong feelings/interest in the matter. - -sche (discuss) 00:48, 25 August 2011 (UTC)
Category:en:Rivers would be a terrible place to look for tributary, because that word would be effectively hidden among a long list of names of rivers. --Daniel 01:19, 25 August 2011 (UTC)
I agree with Daniel. The terms "river", "tributary" and the like are also found in Wikisaurus:watercourse; while this is done outside of the category system, it is at least a workaround for those who prefer categories. --Dan Polansky 10:30, 25 August 2011 (UTC)

OK. I created this. --Daniel 23:42, 26 August 2011 (UTC)

Fancy button in rhyme pages

When editing rhyme pages such as Rhymes:English:-eɪm, I see a row that says "Add new rhyme:" followed by an input field. I find this pretty annoying and would like to see this disabled at least for me. How can I disable it?

What this button does is add an item to a wikilist. A person who cannot add an item to a wikilist should not edit a wiki, IMHO. --Dan Polansky 10:23, 25 August 2011 (UTC)

I don't think it's for people who can't edit a list; the tool just makes it quicker and easier. No idea how to remove it, although I'm sure someone will be able to help you there. BigDom 10:32, 25 August 2011 (UTC)
In the gadgets section of Special:Preferences, there's an option to "Disable the rhymes editor". --Yair rand 16:55, 25 August 2011 (UTC)
(BTW, it doesn't only add it to the wikilist, it also adds the {{rhymes}} template to the rhyme's entry, and a pronunciation section if it doesn't already have one.) --Yair rand 17:05, 25 August 2011 (UTC)
Thank you. Pretty straightforward; I should have looked in Special:Preferences myself. --Dan Polansky 07:58, 26 August 2011 (UTC)
Unfortunately, it adds the "Rhymes" to the top of the Pronunciation section instead of to the bottom, and this is never correct. It also means that random vandalism or erroneous additions to Rhymes pages require additional cleanup, since users no longer have to open the page code and see the warning about stress on the correct syllable. --EncycloPetey 17:34, 28 August 2011 (UTC)
I've changed the script so that it adds rhymes to the bottom of the pronunciation section. --Yair rand 21:52, 31 August 2011 (UTC)

Removing words from Wiktionary:Wanted entries

I recently removed yhe from Wiktionary:Wanted entries as I believe that all Google book hits are either OCR errors or just typographical variants (for the). Someone else reinstated it because "it might be a word in some language". Have we got a policy for such actions? SemperBlotto 14:02, 26 August 2011 (UTC)

Why do we even have the page? Is it an inheritance from WP? It is really not much help to have a term "wanted" without a language specified. We have the whole family of requested entries by language. DCDuring TALK 15:24, 26 August 2011 (UTC)
Would it be unreasonable to treat it as a cleanup page until emptied and discourage or "forbid" additions to the page? For example, we could restrict changes to admins and have blue links deleted daily or weekly. DCDuring TALK 15:36, 26 August 2011 (UTC)
The only stumbling block I can think of is the way the top of the WT:Wanted entries list shows up at the top of a user's Watchlist. If there could be some way of allowing users to specify a language's "wanted" list (or maybe for multiple languages?) to display on the Watchlist, then I'd be all for DCDuring's proposal here. -- Eiríkr Útlendi | Tala við mig 15:45, 26 August 2011 (UTC)
It actually says to check Special:WhatLinksHere before removing terms from the list. In some cases, all the incoming links are from outside the main namespace, often user pages and user talk pages. Such ones can be removed; any link in the main namespace should be checked for validity, such as typos. --Mglovesfun (talk) 15:59, 26 August 2011 (UTC)
Couldn't a bot take care of the Special:WhatLinksHere check, then? The list is large enough, that would seem to make more sense than going through manually. -- Eiríkr Útlendi | Tala við mig 17:23, 26 August 2011 (UTC)
Not really a Wiktionary bot, no, since no edits would be involved. A java script might be able to do it. But I'm not the person that can tell you about that. Mglovesfun (talk) 23:21, 27 August 2011 (UTC)
I agree with DCDuring; the language-specific pages should be used (including Wiktionary:Requested entries:Unknown language). - -sche (discuss) 19:05, 26 August 2011 (UTC)
I agree with removal since your work should be reflected somehow. Leaving all the terms there in perpetuity is not going to progress us toward our goal. If the language is unknown then it's even more useless. DAVilla 22:16, 27 August 2011 (UTC)

WOTD

I was keeping this updated this fairly well for the last few months, but RL work situation has made it impossible for the moment. In a couple of weeks when I'm back from Libya I am happy to crack on, otherwise if anyone else wants to update them then feel free. Ƿidsiþ 08:16, 27 August 2011 (UTC)

Stay safe! Mglovesfun (talk) 10:04, 27 August 2011 (UTC)
Yes, stay safe! I have set words and changed the templates for the 28th of August through the 3rd of September; that should give other editors (or me) time to set the rest of September. I have also set the 1st, 2nd, 4th, 5th, and 7th of October. (I think we should pick a word derived from German, or having to do with unity, for the 3rd of October, Germany's Day of Unity; it would be topical.) - -sche (discuss) 08:56, 28 August 2011 (UTC)
Thanks! I'm not familiar with all the details of word-of-the-day, but I do have a few tips from what I've observed:
  • The WOTDers try not to re-use words of the day. Therefore:
    • Before setting something as word of the day, they check the upper-right-hand-corner of the entry to make sure it wasn't used before. According to [[pareidolia]], that word was already word of the day (17 February 2011).
    • Conversely, when setting something as word of the day, they add {{was wotd}} to the entry (à la [[pareidolia]]) so that future editors can see that it has already been used (or is about to be used).
  • When setting words of the day, they list them at (e.g.) [[Wiktionary:Word of the day/Archive/2011/August]], so other editors can see what they are. (There are a few editors who keep watch for upcoming WOTDs and do last-minute cleanup.)
RuakhTALK 13:57, 28 August 2011 (UTC)
That pareidolia appeared twice is evidence of the conspiracy! (Wait, that's not an example of pareidolia, that's an example of paranoia.)
Ok, I've checked all of the other August, September, and October words I set, they all look new (no WOTD links at the tops of the pages or in Whatlinkshere); thanks for pointing that out! I've also added {{was wotd}} to the words. Thanks for adding August to the archive; I've started a September archive. - -sche (discuss) 18:24, 28 August 2011 (UTC)
Will someone be selecting words for September? I notice that "-sche" has selected some for October, but most of September has not been set as far as I know. I would be willing to do November, to give Widsith a break (if needed; I needed the same from time to time when I was selecting them), but I don't think I'd have time right now. I even went most of this last week without logging on much because of duties in the physical world. --EncycloPetey 17:24, 28 August 2011 (UTC)

Template:cmn-pinyin

Am hoping this will solve a problem or two; see my comment at Wiktionary talk:Votes/2011-07/Pinyin entries#Romanization. Mglovesfun (talk) 10:01, 27 August 2011 (UTC)

Languages written in more than one script, attestation

Is it a good policy or not to require that for a language using more than one script, each script form of a word (term, idiom, etc.) should be attested for it to be included? For example, do we expect верс to be attested separate with three citations from vers, or will three citations for both forms, Cyrillic and Latin do? Mglovesfun (talk) 10:24, 27 August 2011 (UTC)

I think it varies based on language. For something like Serbian, where the scripts have a one-to-one correspondence, I think demanding that each script be separately attested is pointless and bureaucratic. For other languages, that don't have a simple transliteration between scripts, it's probably necessary to attest each separately.--Prosfilaes 10:41, 27 August 2011 (UTC)
Or languages that have a dominant script and a rare one. I seem to think Tatar can be attested in the Latin and Arabic scripts, but is predominantly written in Cyrillic. Mglovesfun (talk) 11:06, 27 August 2011 (UTC)
And for some like Japanese, where a word might be usually spelled in kanji, for instance, but hiragana and romaji (i.e. Latin alphabet) words are added as well to aid learners.
Attestation concerns aside, given the kerfuffle about pinyin entries for Chinese and the discovery from that discussion that searching for pinyin finds hanzi entries just fine, I find myself asking -- do we really need Japanese headwords in kana and romaji? If we do, then wouldn't we also need pinyin headwords? What's the distinction? -- Eiríkr Útlendi | Tala við mig 18:33, 27 August 2011 (UTC)
We allow pinyin as headword and have done for some time, though don't ask me when the first one was created (and not deleted). Mglovesfun (talk) 21:32, 27 August 2011 (UTC)
I'm a bit invested in the romaji and kana pages so I'm biased, but at least they do serve to replace a Homophones section. Having one canonical kana page eliminated duplication or missed entries in Homophone sections scattered across many pages, and a single, comprehensive list of homophones might serve some benefit to learners of Japanese. Putting romaji pages in topical categories increases the duplication of entries but makes it easy for learners to read a group of related terms at a glance, which is helpful since learning groups of related terms at once is the best way to learn a foreign language. Having romaji terms in topical categories would allow people with no knowledge of Japanese to learn a handful of terms in a few seconds. There are arguments for and against, but I favor having them. Haplogy 04:16, 1 September 2011 (UTC)
The disambiguation role of romaji and pinyin is just fine. The trouble is when all the contents that should be in kanji/hanzi entries goes into pages only serving learners to cope with the complex script and find a proper word. It also seems that some people have an agenda of promoting non-standard writing. The proper native script should not be replaced with the romanisation.
On Serbo-Croatian (or Serbian alone). One-to-one conversion is only 99% ok. Care should be taken on borrowed words, as Roman script often uses the orginal spelling and some letters combinations have variants. (I'm leaving the differences in dialects Ekavian/Ijekavian).
On Tatar, Belarusian. Some nationalistically minded users created a bunch of entries in Roman, especially in Tatar. Tatar (but not Crimean Tatar, it's a different language) is officially written in Cyrillic, so is the majority of the online and printed material in Tatar. You may think I am biased but Azeri, Turkmen and Uzbek are now officially in Roman, even though they were written in Cyrillic, the entries should be primarily created and have translations in Roman. Failure to do so confuses the readers.
We are on the way to create pinyin policy. Perhaps we should address some other languages, written in multiple scripts. There is no definiteness for some (e.g. Konkani in India) but we could have a guide for patrollers to use. --Anatoli 04:35, 1 September 2011 (UTC)
The orginal question on верс. Yes, a correct Serbo-Croatian word and a correct conversion from Roman. --Anatoli 04:41, 1 September 2011 (UTC)
In response partly to Haplology and partly to Anatoli, my view on Latin alphabet (and kana) entries for languages that generally use another script is to view them a bit like disambig pages. For Japanese in particular, romaji and kana pages may include multiple possible kanji, making romaji or kana pages very much indeed like disambig pages. As such, entries in non-standard scripts should probably do no more than provide a bulleted list of the main terms, with brief glosses to help users make the correct selection. -- Eiríkr Útlendi | Tala við mig 02:23, 2 September 2011 (UTC)
This is in line with the current pinyin vote. --Anatoli 02:33, 2 September 2011 (UTC)
So to sum up, trying to stay global rather than comment on specific cases, it depends on the language. Different languages should get different treatments. How am I doing? Mglovesfun (talk) 07:01, 2 September 2011 (UTC)

Template:ante and Template:post

Why do {{ante}} and {{post}} abbreviate their output to a. and p.? It saves two whole characters on each of them, and makes them quite a bit more opaque in meaning.--Prosfilaes 22:07, 27 August 2011 (UTC) IFYPFY.​—msh210 (talk) 04:59, 28 August 2011 (UTC)

I would support changing them so that they do not abbreviate. - -sche (discuss) 23:01, 29 August 2011 (UTC)
I think the usual meanings are (and our glossary says the meaning are) "not after" and "not before" rather than "before" and "after". Thus, a quotation dated a. 1924, might be from 1924. So whatever this conversation decides, it should not be to change the displays to "before" and "after" (unless someone goes through every single time the templates are used and edits the dates!). "Ante" and "post" have a similar problem (as people know what they mean); so do "a." and "p.", but not as badly. But maybe "ante" and "post" don't have it badly enough to worry about: I don't know.​—msh210 (talk) 01:49, 30 August 2011 (UTC)
Surely we shouldn't justify having an abbreviation because we have a weird definition and that makes it more opaque.--Prosfilaes 02:09, 30 August 2011 (UTC)
Msh210: I understood this as a request to change "a." to "ante" and "p." to "post". But are you saying "a." means something different than "ante"? (Are you saying "a." means "no later than" but "ante" means "before"?) If so, I echo Prosfilaes' comment. If not, the expansion will not cause a semantic problem. PS, note that while Wiktionary:Glossary does not define "a." or "ante", it defines "p." as "post or after, often used in quotations", which disagrees with Appendix:Glossary... one or the other should be corrected. - -sche (discuss) 21:12, 31 August 2011 (UTC)
To address your last point first, Wiktionary:Glossary shouldn't define either, as they're not used in discussions here (and, anyway, are in Appendix:Glossary). To your first point: Ante and a. both mean (translate as) "before", and post and p. "after". But we don't use them that way, so in our citations they don't mean that. That's a bad thing (unless it's the same as what other dictionaries do, in which case it's okay, I suppose). But if that's the way it is, then yes, essentially, a. means "not after", since (as it's an abbreviation) we can make it mean whatever we want: people might not know it comes from ante; otoh, ante, a Latin word, clearly means "before". So a. is better. Best of all, though, would be to change our system (again, provided it doesn't match other dictionaries'), which should be doable by a complicated bot. (It would need to look for post and ante, written in by hand or by template, and change the year by one unless the "year" is a century or the like (in which case leave it) and unless the citation had been added or edited since the decision was made to switch over (in which case tag it for human attention). Or something like that.)​—msh210 (talk) 15:37, 1 September 2011 (UTC)
I dunno about dictionaries specifically, but isn't the use of "ante [year]" to mean "in or before [year]" pretty normal? I mean, do you take “They were reviewed in this journal when they originally appeared (ante 1973), III, 103–4 and (1976) IV, 125–6” and “The projected growth rates of labour supply under ‘normal’ (that is, ante 1973) demand conditions in both countries are about the same as those prevailing since the mid-1960s” (both c/o b.g.c.) to mean strictly before 1973? —RuakhTALK 17:58, 1 September 2011 (UTC)
I take it to mean "strictly before", yes. If I'm wrong as to the general intention of writers, or in the minority among readers, ignore my 2c, above.​—msh210 (talk) 18:42, 1 September 2011 (UTC)
I don't know. I understand it differently from how you do, but it really just might be me. (I'm pretty sure I'm the one who gave the current glosses at Appendix:Glossary.) Very relatedly — I'm reading a book called Semantic Antics, about various English words that have changed meanings in bizarre ways, and it frequently says that a certain word has a certain meaning (say) "before 1483". I've been taking that to mean "by 1483", since it seems strange to emphasize that a year after which the word is already known to have a certain sense, but again, maybe that's just me? —RuakhTALK 20:17, 1 September 2011 (UTC)

Position of Template:was wotd

The current vertical position of {{was wotd}} is about level with the L1 page title, above all the language (L2) entries (no matter where it is included). This implies that the whole page was featured when it fact it was just the English entry. This has two problems. (1) About 500 pages where this template is used have entries below the English section for which this implication is false, and (2) eventually we might want to feature an English entry where there is a preceding Translingual entry, which would break the even looser implication that the template applies to the following/top entry. I therefore propose that this template float on the right-hand side (I think this is the only RHS one that doesn't) like others do. We can then move the template uses into the English entries, at least where there's possibility for confusion. Sound OK?--Bequw τ 16:08, 28 August 2011 (UTC)

Well, we've only ever chosen English words, so the initial decision was to place the template as far up the page and out of the way as possible, so that it would not overlap any page content. In practice, this varies a bit by browser.
Placing it in the English section will not be any less misleading. We only ever feature one part of speech, and English words often have more than one part of speech, so I don't see that moving the template would solve any actual problem. --EncycloPetey 17:24, 28 August 2011 (UTC)
I know historically there were some "overlapping" layout problems, but our current floating RHS content doesn't suffer from any that I know of. And actually, the current position overlaps with the section-0 (page header) [edit] link that can be added with JS (originally from an en.wiki gadget). I find it quite useful and at least one other person uses this. As for the change being less misleading, the template could be placed in the actual part of speech that was featured. It'd be hard to see how narrowing the indication from the whole page down to a language's part of speech isn't more accurate. The move also helps with the logic/consistency that language content should be in their language sections. I can't think of any other language content template that isn't in its own section. Does the current position help the WOTD maintainers? If so we can have the default layout be "float"ing and then have a some WT:PREFS code to move to it's current, absolute position. --Bequw τ 14:58, 29 August 2011 (UTC)
Well, it helped me during the years I ran it. It was easiest to spot at the top of the page, rather than having to look around in (possibly) several places, where it might be hidden by images or wikipedia-link boxes. --EncycloPetey 05:11, 4 September 2011 (UTC)

The move you have made to the new position has resulted in a serious problem. The text that the template is supposed to display is no longer visible in many of the entries. Please correct the problem so that the text is visible, or please revert the change in position. --EncycloPetey 18:11, 5 September 2011 (UTC)

That's odd. In none of my browsers (Chrome, FF, IE 9 on Vista) has the display changed on entries where I've moved the template position into the English entry (eg putrescible). The CSS positioning is "absolute" so it shouldn't change (and there shouldn't be any difficult cache issues since I didn't change anything else). What's your setup? Does it happen when you're logged out? What if you view it using the Monobook skin (I assume you use Vector)? Does anyone else have this problem? --Bequw τ 01:22, 8 September 2011 (UTC)
It's fine on my Mac at home using Safari, but not in IE (Windows) at work. I'm not sure which old version we're using, but it's school software and can't be changed. If the template is going to display at the top of the page, then I don't understand how it will help anyone to position the coding for the template inside a language section. That will just confuse future editors. --EncycloPetey 02:57, 10 September 2011 (UTC)
Checking pages against IE 5.5, 6, 7, and 8 I couldn't find any oddities with WOTD. Maybe your school has a mixed-up configuration that isn't popular. I think it is best to make this template float by default and create WT:PREFS to return it to the original, top position for those that prefer it. This is partly why I moved into the English entries those WOTD invocations on pages with multiple entries. See the start of a broader cleanup at Wiktionary:Todo/Anomalous section0 content. --Bequw τ 03:09, 11 September 2011 (UTC)
I've made the template simply floatright like other RHS templates. It might look weird as caches catch up. If you prefer the old style, you can get the raw CSS from the documentation shown at {{was wotd}} or if you have the default (vector) skin you can get use WT:PREFS (look for "was-WOTD" in the bottom of the display section. --Bequw τ 13:21, 18 September 2011 (UTC)

Klategory?

There are loads of these Ku Klux Klan terms. The following might be eligible for a category (some used only within the organisation and others more widely): Kladd, Klankraft, Klaliff, Klokan, Klarogo, Klexter, Klokard, kludd, klavalier, Kloran, klonvocation, klecktoken, kligrapp, Kleagle, Klabee, Klansman, Klanswoman, kloncilium, klonklave, klansman, klan, Klan, Ku Klux Klan, antiklan, klavern, KKK, Klannish. Equinox 10:57, 31 August 2011 (UTC)

September 2011

Idiomatic translations

I've been wondering for a while how to add translations that are not strictly idiomatic in English or in the target language, but for which the translation itself is idiomatic and not obvious. An example I came across was 'I have a nosebleed', which is translated more or less word for word into Dutch as 'Ik heb een bloedneus', but in Catalan it is translated as 'Em sagna el nas' - literally 'The (my) nose bleeds to me'. Any translations given for nosebleed are only useful for Dutch, but they would not cover the Catalan case at all. The literal translation of 'nosebleed' in Catalan is 'hemorragia nasal', which is not helpful in this case, and is not even idiomatic itself so it can't be included. Cases like this are quite common between languages, and it seems like a rather big gap in Wiktionary to leave it out... —CodeCat 13:04, 1 September 2011 (UTC)

People keep talking about a phrasebook, maybe this would be a good use for it. Fugyoo 14:02, 1 September 2011 (UTC)
But there should be something directly under [[nosebleed]] too. In a paper English-Catalan dictionary you would expect to see something like "nosebleed - hemorragia nasal. I have a ~ : Em sagna el nas". So why not do something like that here: when an English term is best translated with a phrase in the target language, we give the phrase in addition to the straightforward noun=noun translation. —Angr 15:40, 1 September 2011 (UTC)
I would agree with this way but there is some overlap with entries that are idiomatic, which have their own entries. We might end up with a situation where the entry give contains translations for 'give up', while give up has its own translations as well. We would need to be careful that translations are not duplicated like this. —CodeCat 15:52, 1 September 2011 (UTC)
Not addressing your question, which is general, but, rather, only the specific example: Do other symptoms not translate into Catalan similarly? Is "I have a headache" in Catalan not literally "The head hurts to me"? If so (and, not knowing any Catalan, I have no idea whether it's so), then I don't think we should include such translations in any entry at all: they belong in a grammar, perhaps, but are not relevant to any one word of the language.​—msh210 (talk) 15:47, 1 September 2011 (UTC)
We already have some grammar in Wiktionary's entries, and I don't think it's much of a problem if we include things like this. They are very useful to someone who wants to say 'I have a nosebleed' in Catalan and looks at the translation table, and then notices immediately that what he wants to say is said differently. It's very user friendly that way. —CodeCat 15:52, 1 September 2011 (UTC)
On a tangential note, google:"I have a nosebleed" seems much less common than google:"my nose bleeds" and google:"my nose is bleeding". "my nose bleeds" seems like a fairly good candidate for a phrasebook entry, one that can be linked to from "nosebleed". --Dan Polansky 16:05, 1 September 2011 (UTC)
'My nose bleeds' seems very awkward to me. It sounds like you are saying it bleeds habitually rather than that it is bleeding right now. —CodeCat 16:06, 1 September 2011 (UTC)
Where I live in Northern England, people would say "my nose is bleeding" or possibly "I have a nosebleed" but never "my nose bleeds". I imagine most of the ghits for "my nose bleeds" would be part of phrases such as "my nose bleeds when..." or such like. BigDom 16:10, 1 September 2011 (UTC)
google:"My head hurts" and google:"my stomach hurts" also seem very common inspite of not using the present continuous tense. Also check the two phrases in Google books to see how very common they are also there. --Dan Polansky 16:12, 1 September 2011 (UTC)
This American agrees with Codecat and BigDom: my nose bleeds sounds like it does so habitually, not now. My X hurts OTOH means now. Go figure.​—msh210 (talk) 16:28, 1 September 2011 (UTC)
I agree. But to me it seems similar to the Americanism "Do you have" instead of "Have you got" (I was once asked "Do you have children?" I replied "Not very often." and she was very confused!) SemperBlotto 06:59, 2 September 2011 (UTC)
@ CodeCat, we often just split the link, [[hemorragia]] [[nasal]]. Mglovesfun (talk) 07:08, 2 September 2011 (UTC)

More generally, the translation table should include help when needed. Lmaltier 17:03, 2 September 2011 (UTC)

Question

I was directed here from a discussion section. So what is the "acceptability" of signatures? An editor since 8.28.2011. 06:36, 3 September 2011 (UTC)

Any signature is probably going to be acceptable unless someone takes exception to it. If somebody has a problem with it, they will explain and then you will know how to improve its acceptability. Or you can ignore the complaint and advice and choose instead to burn your bridges with that editor. If you burn too many bridges, you may find it difficult or impossible to function effectively here. —Stephen (Talk) 06:44, 3 September 2011 (UTC)
I have no clue how that is related to my comment? (Never mind.) Okay... what exactly are you trying to say? Oh, I see! I'm stupid when I'm tired. Thanks! An editor since 8.28.2011. 06:48, 3 September 2011 (UTC)
FWIW I find colorful signatures annoying... but I'd rather be annoyed than limit others' freedom to editor their own signature, unless the signature is really really silly. Mglovesfun (talk) 09:55, 3 September 2011 (UTC)
Thank you! An editor since 8.28.2011. 17:08, 3 September 2011 (UTC)
However: my signature is uni-colored. An editor since 8.28.2011. 17:11, 3 September 2011 (UTC)
Colorful doesn't necessarily imply more than one color. --Mglovesfun (talk) 14:18, 7 September 2011 (UTC)
Differently-coloured signatures might cause problems for people using different skins (colour schemes), perhaps because of poor eyesight. Fugyoo 14:26, 7 September 2011 (UTC)

User:Yair rand/uncategorized language sections/English

Just want to ask for a few volunteers to fix these entries, using templates such as {{en-noun}}, {{en-verb}} or just {{infl}}. No obligation of course, but even fixing one entry at this late stage is a help. Thank you, Mglovesfun (talk) 09:54, 3 September 2011 (UTC)

community's opinion on bot format

I received this message on my talk page: Hi there. It is a bit late now, but I have been meaning to ask you for some time if the form P.officer was a mistake. Also, in your subpages, we like to use {{conjugation of}} these days rather than {{form of}} e.g. {{conjugation of|pellettizzare||2|s|past historic|lang=it}} (Italian example).

The thing that I'm asking about is when it says "we like to use {{conjugation of}}...rather than {{form of}}". Is it important which template to use, if both produce identical results? To me, it looks unnecessary to change to {{conjugation of}}, if not a waste of time, but I'm eager to hear the voices of other users. --Pofficer 17:53, 4 September 2011 (UTC)

If they produce the same thing, I don't see the point in switching.​—msh210 (talk) 20:22, 4 September 2011 (UTC)
Conjugation of is more uniform, there are many minor variation on how to write "first-person singular present indicative" using form of, while conjugation of only allows one of these. Mglovesfun (talk) 21:34, 4 September 2011 (UTC)
I agree. Hence my "If..." clause.​—msh210 (talk) 21:37, 4 September 2011 (UTC)
OK, I shall continue the bot. If there are any problems, don't hesitate to leave me a message and I'll put a clamp on the bot. --Pofficer 09:56, 5 September 2011 (UTC)

I got a message just now about a bot flag, the "small formality of requesting permission to run as a bot, and then getting a sysop to set the bot-flag on your user id". Can I request permission to run P.officer (talkcontribs) as a bot? Instead, perhaps, I could change the name of the bot to Officebot (talkcontribs) as it could avoid confusion. --Pofficer 10:20, 5 September 2011 (UTC)

  • We do have a couple of bots without -bot or -Bot in the name but, if you don't really mind, I'll change it to PofficerBot before setting the bot flag (It seems to be functioning OK). SemperBlotto 10:31, 5 September 2011 (UTC) p.s. You would need to edit your user-config.py file to reflect the name change.
    • I certainly will change user-config.py. "Pofficerbot" is fine as a name. Thanks --Pofficer 10:40, 5 September 2011 (UTC)
  • OK. Changed to "Pofficerbot" and bot-flag now set. SemperBlotto 10:44, 5 September 2011 (UTC)
    • Still needs a vote technically, no? Not that I object. Does anyone actually object to this bot? Seems a bit of a waste of time to have a vote if nobody would actually oppose it anyway. Mglovesfun (talk) 10:46, 5 September 2011 (UTC)
      • It takes a second to remove the flag (I'm going to keep an eye on it for a while). SemperBlotto 10:48, 5 September 2011 (UTC)
        • Thanks again SemperBlotto. As if by magic, the edits have been removed from RecentChanges. --Pofficer 10:59, 5 September 2011 (UTC)

WT:About_Japanese

Calling all 日本語能力のある方...

Following comments in various other threads, it appears that the WT:AJA page needs some work. The issues I'm immediately aware of:

  • Quasi-adjectives (な adjectives): WT:AJA insists on including the な in the headword, which does not appear to be the current consensus.
  • の adjectives: WT:AJA does not include any clear guidelines for these. (Relatedly, {{ja-adj}} doesn't include any way of handling these either.)
  • Suru compound verbs: WT:AJA calls for using the {{ja-suru}} template. However, する is a standalone verb, so including the する conjugation on each and every compound verb page seems excessive.
  • {{ja-kanjitab}}: WT:AJA describes including this under an === Etymology === section if there is one, but including under the main == Japanese == section produces largely identical results, unless there are multiple etymology sections, in which case repeating the kanjitab seems excessive.
  • The Transliteration subpage could also use some work, particularly with regard to spacing and what constitutes a single word in Japanese (i.e., particles should be separate, suru should be separate, etc. etc.).
  • 連体詞: WT:AJA states that this should be given a POS of "prefix", but that is really not what these words are -- a prefix is part of a word, whereas 連体詞 are clearly standalone words. They are less prefixes and more like true adjectives, in that they must precede a noun.
  • Single-kanji entries: WT:AJA has no clear instructions on how to specify okurigana in kun'yomi listings, nor any clear instructions on how to format these to link to verb forms. For instance, shows one way of clarifying okurigana and linking to kanji+okurigana entries, but is a bit visually messy; ja:食#日本語 looks a bit cleaner with the use of hyphens to show the break between the kanji and the okurigana, and this roughly matches the format I've most often seen in dead-tree dictionaries, but the entry doesn't link to any kanji+okurigana entries, just to the hiragana entries; and doesn't show okurigana or link to any kanji+okurigana entries.

This post is really just meant to get the ball rolling. Many of these changes listed above are a departure from what WT:AJA currently says, so I'm hoping to spark a bit of discussion before making any edits. -- TIA, Eiríkr Útlendi | Tala við mig 17:41, 6 September 2011 (UTC)

Please keep discussion in the fora here in English where possible. For the record, 日本語能力のある方 seems to mean "those skilled in Japanese" or similar (based only on Google Translate, not that I know any Japanese, myself).​—msh210 (talk) 18:25, 6 September 2011 (UTC)
Also, you might want to continue this discussion at Wiktionary talk:About Japanese, since it may wind up taking up a lot of screen space and is specific to Japanese (and indeed the AJA page!).​—msh210 (talk) 18:28, 6 September 2011 (UTC)
Fair enough. I've tried posting there a few times and got the overwhelming impression of crickets chirping, which led me to try posting on a more-trafficked page. I'll copy this thread over to there shortly. -- Eiríkr Útlendi | Tala við mig 19:16, 6 September 2011 (UTC)
I think it is a good idea to have this post here, directing everyone to the Wiktionary talk:About Japanese page (where the discussion can take place). If no-one adds to the discussion, contact other active editors of Japanese directly on their talk pages. If there are none, or you have done that and they have not replied, then you (as the only active editor of the language) should make whatever changes you deem necessary. - -sche (discuss) 20:06, 6 September 2011 (UTC)
I agree: keep this here, but continue discussion there. (That's what I meant in the first place: sorry I wasn't clear.)​—msh210 (talk) 23:41, 6 September 2011 (UTC)
Sure, no worries.  :) I copied my initial post over to Wiktionary_talk:About_Japanese#Work_Needed. I hope to get into the nitty gritty over there. -- Cheers, Eiríkr Útlendi | Tala við mig 23:45, 6 September 2011 (UTC)

I've created a list of the 1000 most common species epithets

Hi Latin lovers and barflies,

User:Pengo/Latin/Top_1000

Based on the Encyclopedia of Life database, I've compiled a list of the most common species epithets. I'm hoping this will help those who want to create new Latin/Translingual entries.

There's more details on the page. --Pengo 14:14, 7 September 2011 (UTC)

Here's the top 5 words that are missing Latin/Translingual entries:

--Pengo 03:22, 8 September 2011 (UTC)

In many cases it is just the inflected form that is missing, eg, nana, nanus (a dwarf), but in some cases lemmata are missing, even classical ones, eg, variegatus, variego. DCDuring TALK 14:21, 8 September 2011 (UTC)
Looks like it would help if I grouped words with the same stem. I'm going to attempt to make another list that does that (at least crudely).
I, myself, don't know an inflection from a declension, so until I learn some Latin grammar and work out all the templates and formatting here, this list is really for you and other editors. So let me know if there's anything else that would be useful. --Pengo 02:53, 9 September 2011 (UTC)
Grouping by stems is less helpful IMO for speeding entry creation than grouping by inflectional ending and suffix. Ie, the forms ending in "ata" have a very similar Latin section structure. That structure will have links to the participle lemma ending in "atus", which will have links to the lemma verb. Some of those links may be red. The entries for the red link lemmas should probably be added by an editor familiar with Latin with access to multiple Latin references, including some for Medieval Latin. Purely New Latin terms are much less interesting to most Latinists, however important they may be to taxonomists and to Wiktionary. DCDuring TALK 12:25, 9 September 2011 (UTC)
Thanks for the feedback. Working on it. Will add some extra features too. --Pengo 04:56, 10 September 2011 (UTC)

abuse filter

As of recently, we have an abuse filter. It allows us to create rules against which edits (and moves and other things) are filtered; if an edit matches such a rule, it can — at our option for each rule — tag the edit with a little note in special:recentchanges, not allow the edit to go through until the editor first sees a warning that the edit might not be wise (which warning can be customized for each filter rule), block the edit altogether, or remove the editor's "autoconfirmed" flag. (Or combinations of those.) It can also do these things only after the editor in question makes too many rule-matching edits in a short period of time (which rate, too, is customizable per filter rule). For more on the abuse filter, see the MediaWiki extension page and/or the Wikipedia abuse filter page (except that they call it the "edit filter").

I've set up some rules that I thought would be helpful.

One of them actually blocks an edit from going through: this filter checks that the user is not an autopatroller, admin, or bot; that the edit is in the main (entry) namespace; that the entry had a level-three header before the edit; that the edit had no level-three header after the edit; and that entry (after the edit) doesn't have a speedy-deletion template or {{only in}}. It blocks that edit from going through. That filter has (in its current incarnation) caught scores of edits, with no false positives (i.e., it not block any edit that we wouldn't have manually rolled back had it gone through).

No other rule currently does more than tag an entry on special:recentchanges. I propose, though, that three do.

One of them is a copy of a filter at enWP. These filters look for an edit that adds a single bad word and nothing else. (Approximately. The actual workings of the filter are hidden on enWP, so I've hidden our copy also. Admins and "edit filter managers" over there can see their copy, and our admins can see ours.) On enWP, it prevents the edit from going through, and has done so for months. (I don't, however, know how fastidious they are in looking for false positives.) Here, it does nothing; so far we've had only a handful of matches, with no false positives. I propose it prevent edits from going through here also. I also ask admins to edit it to enwikt purposes (testing it well of course, especially if it disallows the edit from going through).

Update: Now we've had a false positive.​—msh210 (talk) 15:27, 8 September 2011 (UTC)

Another rule I think should do more than tag is one that checks whether a new (main namespace) entry is created by a non-autopatroller (non-admin, non-bot), lacks a level-three header, and either {has both a capital letter and a space in its title} or {has a right-parenthesis ) at the end of its title}. It's only had a handful of hits, with no false positives. Again, please improve it; and I think perhaps it should also block edits from going through.

The third rule I think should prevent edits from going through currently also just tags. It checks whether an entry is not new, is being edited by a non-autopatroller (non-bot, non-admin), and has its after-this-edit text the same (but for capitalization and other normalizations) as its pagetitle.

Thoughts?

(Of course, edits to improve the other filters are sought, too. And new ones.)​—msh210 (talk) 20:05, 7 September 2011 (UTC)

This is a really neat tool and I applaud your initiative in creating a few filters to start out. - TheDaveRoss 20:08, 7 September 2011 (UTC)
Yeah, it's an excellent thing to have. I think I saw a rule to block edits that create a page whose content is identical to its title, which (for some reason) is a very common useless edit. Equinox 22:58, 7 September 2011 (UTC)
We have it for existing pages: it checks whether the page content was reduced to its pagetitle. We could easily have it for new pages also (even by editing the existing filter rule).​—msh210 (talk) 15:19, 8 September 2011 (UTC)
I've updated that rule. We can watch and see if it picks up false positives.​—msh210 (talk) 15:30, 8 September 2011 (UTC)
I think we could disallow creating pages in the main namespace if the first character is a letter. All existing pages begin with either a header or with a template like {{also}} or {{wikipedia}}. —CodeCat 23:26, 7 September 2011 (UTC)
(You mean if the first char is alphanumeric?) Most people don't come here knowing the formatting rules, so if we did do that, we would need extra-prominent links to those and to places they might want, like WT:REE. Equinox 23:30, 7 September 2011 (UTC)
I think we should tag those but not disallow 'em. There might be some usable content. Mglovesfun (talk) 11:45, 8 September 2011 (UTC)
Yeah most people don't know ELE, so we shouldn't disallow them, but I think it might be wise to give the editors a notice before allowing them to save, which notice can outline the format, or something. (And tag the edit.)​—msh210 (talk) 15:19, 8 September 2011 (UTC)
I've created a filter along these lines. It checks whether the first character is anything but { or =. It does nothing for now (so we can check for false positives), but can warn the user.​—msh210 (talk) 18:07, 18 September 2011 (UTC)
Awesome! —RuakhTALK 00:29, 8 September 2011 (UTC)
Could we write a filter that shows editors a warning before allowing them to put their edit through, if their edit introduces <ref> (and does not introduce <references/>) to an entry that does not contain <references/>? I sometimes forget, on both en. and de.Wikt Face-blush.svg, to add <references/> when adding <ref>s. The warning would remind the editors to add the <references/> tag. - -sche (discuss) 18:50, 8 September 2011 (UTC)
I've created it but have not yet tested it (or checked how expensive it is).​—msh210 (talk) 21:34, 8 September 2011 (UTC)
It works well, as far as I can tell, and has caught a couple of users. I think we need to update the location of the message, though (either move MediaWiki:Abusefilter-warning/ref-no-references back to MediaWiki:Abusefilter-warning/ref-no-reference or change the link, whichever is easier; at the moment it displays a default message rather than the nicer and more informative custom one). - -sche (discuss) 05:53, 10 September 2011 (UTC)
I've fixed it, I think (not just now).​—msh210 (talk) 17:30, 11 September 2011 (UTC)

So (to repeat myself) we have a filter rule that catches edits that result in a page whose content matches its title (in the main namespace, and except for whitelisted folks, admins, and bots). Any objection to having that rule block the edit from going through? As of now we've had only about ten hits, but no false positives, and I can't think how there would be any.​—msh210 (talk) 17:30, 11 September 2011 (UTC)

Done.​—msh210 (talk) 15:18, 13 September 2011 (UTC)
A lot of anon users seem to create pages that just contain one or more instance of the text "[[File:Example.jpg]]" (perhaps they are accidentally clicking the delayed-loading JavaScript toolbar?). A filter for this might be worthwhile. Equinox 13:02, 17 September 2011 (UTC)
Alternatively, we could push to have the toolbar fixed. ;-)   Personally I have it turned off, because it's just too annoying to try to click in the textarea and suddenly have inserted something random. —RuakhTALK 14:49, 17 September 2011 (UTC)
I would like there to be a fixed-sized empty space on the page until the toolbar loads and replaces it. Whom do we nag? Equinox 14:54, 17 September 2011 (UTC)

Lemma entries for Japanese na type adjectives (形容動詞)

I've noticed that a the policy for な-type adjectives or keiyodoshi is to include the な as a part of the entry. This is not, as far as I know, standard practice in any Japanese dictionary or even the Japanese Wiktionary.

For example, both 元気 and 元気な are treated as lemma entries. I believe users would be better served to have the 元気な entry read: "Attributive (連体形) form of 元気", and have both noun and adjective lemma entries listed on 元気.

The -な suffix is merely a conjugation of form and should be treated as such. The most egregious example, and the one that brought this issue to my attention, is たくさんな. There is a page for the kanji version of this word, 沢山, but there isn't even a link to it from たくさんな, instead there is a broken link to 沢山な. But all of this is besides the point, the real issue is that たくさんな is a much less often used form than either たくさんの or even just たくさん. All of these forms would be better served by the lemma entry たくさん, which I would be happy to write tonight after work, but that doesn't solve the system wide problem of な-type adjectives being written with the な as part of the lemma.

The only policy on this I can find, Wiktionary:About Japanese#Quasi-adjectives_.28.E5.BD.A2.E5.AE.B9.E5.8B.95.E8.A9.9E.29, is not very clear on the issue. I propose that it be changed to include the ideas I've put forth, but I'm not sure exactly how to do so. Entries would still need to acknowledge that these are な-type adjectives, but this could easily be done in a header or something, right?

Also, perhaps a bot of some sort to change all of the entries made in the way I clearly find so offensive. *^_^*

MichaelLau 19:04, 9 September 2011 (UTC)

Hello Michael, thanks for chiming in --
Those of us dealing with Japanese here on the English Wiktionary have been chewing on some of these issues recently, c.f. WT:BP#Preferred forms for Japanese lemmata, WT:BP#WT:About_Japanese, and a number of posts starting at Wiktionary_talk:About_Japanese#Lemma forms for keiyōdōshi and continuing further down that page. The emerging consensus is in largely line with what you describe. I'd really appreciate it if you could have a look at the other posts I've linked to here to get up to speed with what has already been discussed of late, and then it'd be great if you'd add to the discussion over at Wiktionary_talk:About_Japanese#Work_Needed. -- Cheers, Eiríkr Útlendi | Tala við mig 20:00, 9 September 2011 (UTC)

Template:ja-kanji

I'd like to update this template to handle shinjitai / kyūjitai, much as the Japanese POS templates already do (see {{ja-noun}}, {{ja-adj}}, {{ja-verb}}, etc.).

Some kanji don't get used as words on their own, and thus the individual kanji entry won't have anywhere graceful to put shinjitai / kyūjitai information. It would seem most appropriate for that information to go in the {{ja-kanji}} template itself, rather than (or possibly as well as - removing would take work) in the POS templates.

Are there any admins who could either implement this change, or change the protection level of {{ja-kanji}} to allow me to do so? -- Eiríkr Útlendi | Tala við mig 20:40, 9 September 2011 (UTC)

Looked at this again and realized I can indeed edit the template, so I did. I'll update the template documentation later to account for the new args. -- Eiríkr Útlendi | Tala við mig 21:51, 20 September 2011 (UTC)

Classical/Literary Chinese entries

Is there a correct way to add a definition of a Classical or Literary Chinese word? I've seen information about noting an etymology, but I'm not talking about an etymology for a modern word, I'm talking about defining a word as used in Classical Chinese texts. Such an entry might have the same meaning in modern Chinese, or might have a different meaning, or might not be used at all any more. I've looked for a list of "official" wiktionary languages, and found the "random entry" list. It has Old Chinese, Middle Chinese, and Late Middle Chinese, which are names of reconstructed languages (mostly phonology) from different periods. Those were spoken languages, and Classical Chinese was the most common written language used during all of those periods. How about Early Vernacular Chinese, for example words used in the novels 红楼梦 or 金瓶梅? There are entire dictionaries devoted to this language, but is the distinction appropriate on wiktionary? If so, can I just enter Early Vernacular Chinese as the language? Craig Baker 06:26, 11 September 2011 (UTC)

Such distinctions can be a bit arbitrary, I edit Old French, Middle French and French so I'm familiar with the issue. Important note one, please don't remove Mandarin headers. Mandarin is standard here, it's also a widely accepted language name. Like you say, depending on date it could be Old Chinese, Middle Chinese, and Late Middle Chinese. There's no reason not to create an ad hoc code for Classical Chinese if editors want it. But only if editors want it. For example we have 'ad hoc' codes {{roa-jer}} for Jèrriais and {{roa-leo}} for Leonese. Mglovesfun (talk) 13:38, 11 September 2011 (UTC)
There is already a code {{lzh}} for Literary Chinese. —CodeCat 13:53, 11 September 2011 (UTC)
Right then, in which case definition don't replace Mandarin with Literary Chinese, as Mandarin is a language. We don't replace English with Middle English, we include both when the word/term is used in both languages. Mglovesfun (talk) 14:00, 11 September 2011 (UTC)
Please see here for your references. Engirst 15:16, 11 September 2011 (UTC)
Regarding "replacing" Mandarin, what about in entries I've added where the words are not found in Mandarin? Or, where I have no evidence that the word is found in Mandarin? Is there an expectation that when I add a Literary Chinese definition, I will also research whether the word is found in Mandarin? Craig Baker 15:52, 11 September 2011 (UTC)
Could we enter Classical Chinese like this? Engirst 16:25, 11 September 2011 (UTC)
The transliteration in this entry is based on Mandarin, the way Classical Chinese is taught in China. There really can't be another way, as they teach the words, grammar, sentence structure but not the pronunciation. So, in short it's not 100% accurate. --Anatoli 00:09, 12 September 2011 (UTC)
Speedy deletion is only for patently wrong entries. Unlike Wikipedia, deleting one language section of an entry with more than one language section would be considered a speedy deletion, about equivalent to blanking a whole Wikipedia entry. You should likely be going to WT:RFV with these, though unless there's a pretty robust answer to the question 'what's the difference between Classical Chinese and Mandarin?' then a lot of these debates will be a waste of time. Mglovesfun (talk) 19:41, 11 September 2011 (UTC)
From what I understand from Wikipedia, Literary Chinese is an obsolete writing standard based on the Middle Chinese spoken language that was used up till the early 20th century. It would be comparable to Ottoman Turkish, but being in use a lot longer. —CodeCat 21:45, 11 September 2011 (UTC)
Mglovesfun, just to be clear, my change of two entries to "Classical Chinese" which you reverted were new entries which I added earlier that day, and initially categorized them as Mandarin because I didn't know that Literary Chinese was an option. I otherwise wouldn't have considered changing the language of an existing entry, which is what I assume you mean by "speedy deletion". What I'm more curious about is new entries for which I can provide Classical Chinese definitions, but don't have any information about Mandarin or other modern varieties. Craig Baker 03:26, 12 September 2011 (UTC)
I don't think we have contributors in Classical Chinese and in my opinion, we don't need to split Mandarin and Classical Chinese if a specific pronunciation for a specific period is not chosen. Also, The way Classical Chinese is used in Modern Mandarin, Cantonese, etc, the words can be classified as simply Mandarin, Cantonese, etc with some {{qualifier}}. The reason is that, they are borrowed into modern Chinese varieties and adjusted to the appropriate pronunciation, used in quotes quite often. The few words that are NEVER or SELDOM used in modern languages, like classical pronouns, prepositions, have a modern usage, anyway, e.g. (), (), (zhī), etc. and the modern pronunciation. Numerous Mandarin chengyu are an example how Classical Chinese is used in modern Mandarin. To understand their meaning, some knowledge of the Classical Chinese grammar and vocabulary is required but I don't think their components should have a separate entry as Classical Chinese. In any case, hanzi as such a complicated component, which is hard to classify as a part of speech, they often convey a meaning and only in combination become nouns, verbs, etc. --Anatoli 00:09, 12 September 2011 (UTC)
The {{ qualifier}} idea sounds ok to me. As long as there is a way to note that they are Classical Chinese words, the information will not be lost, and it will be possible to use the dictionary when reading Classical Chinese texts for example. I'm curious why choosing a pronunciation is related to splitting the languages; pronunciations are not necessary to write a dictionary, though maybe some technical limitation of Wiktionary requires it? To me, the written form seems most important in a language like Classical Chinese where the pronunciation was not really even recorded, although I do think reconstructions can be interesting and useful in some ways. I agree that a good number of Classical Chinese words are Mandarin words too, but in general I don't agree with your example of chengyu; in most cases I think chengyu should be considered to be a single word in Mandarin (etc.), but just an ordinary phrase or sentence in Classical Chinese. In such chengyu, what used to be Classical Chinese words are no longer free to act like words in Mandarin sentences, and the meaning of the chengyu has fossilized and often shifted. In the terms used on the "Criteria for inclusion" page, the chengyu is idiomatic in Mandarin, but not in Classical Chinese; and the words it is composed of are not attested in Mandarin outside of that chengyu. In the end, I suppose the "language status" is not very important to me, as long as the two can be separated in some way by the reader or perhaps by an automatic script for the reader's use, so that the dictionary is useful for reading both Classical Chinese and Mandarin texts. Craig Baker 03:26, 12 September 2011 (UTC)
I only said that chengyu in Modern Mandarin demonstrate the grammar and syntax of Classical Chinese, didn't say that one can use its components as they were then.
As a dictionary, Wiktionary deals less with stylistics and syntax, it would be really hard to define each hanzi for both modern Mandarin and Classical Chinese. 文言文 (Wényánwén) (Classical Chinese), unlike 白話 (Báihuà) (Vernacular Chinese) was almost 100% monosyllabic, each word consisting of only one hanzi, and defining the classical sense and usage of hanzi would require major work on these entries. At the moment, most definitions for hanzi are under the Han character heading. The specific CJKV language sections mainly deal with the READINGS of those characters. --Anatoli 03:47, 12 September 2011 (UTC)
I see your point about definitions for single characters currently being under the "Translingual" section. Of course it would require major work, but it's hard for the work to even begin without a language category, or to attract anyone capable of doing the work. I notice that many (most?) single-character entries already have definitions in the Japanese section, as well as etymologies (while the Translingual section has just a character etymology, not a word etymology). I would assume that the eventual goal is for definitions to be provided in the other languages/dialects too, so that we have information about how the word is used in those languages (or how it is not used—one of the most difficult things about reading Classical Chinese with a dictionary that includes both modern and ancient definitions is filtering out the modern definitions). I will continue reading around the Community Portal to try to understand the plan for this. Perhaps it would also help to note that there are already many large, good dictionaries devoted to just Classical Chinese, so they are definitely useful. Craig Baker 03:08, 14 September 2011 (UTC)
Sorry to have not seen this sooner. I have created thousands of classical chinese words on wiktionary over the last several years. I have created a number of translations at wikisource that link back words to wiktionary definitions. My long term project is s:Romance of the Three Kingdoms. So far, the format has been largely decided by me, since I haven't come across anyone knowledgeable in the subject that wanted to contributed entries. My approach has been to view the problem through the lens of Mandarin. I'm not suggesting that this is the ideal approach, merely the most practical. Since Classical Chinese can be read in modern Mandarin, it made sense to create mandarin entries that used either the {{literary}} or {{archaic}} labels. The {{obsolete}} label might be another potential option, although I haven't used it all that much. These context labels recently underwent a minor change. They now put the words into categories called: Category:Mandarin archaic terms in traditional script, Category:Mandarin archaic terms in simplified script, Category:Mandarin literary terms in traditional script and Category:Mandarin literary terms in simplified script. These categories should gradually replace Category:zh-tw:Archaic, Category:zh-cn:Archaic, Category:zh-tw:Literary and Category:zh-cn:Literary as well as Category:Traditional Chinese archaic terms, Category:Traditional Chinese archaic terms, Category:Traditional Chinese literary terms and Category:Simplified Chinese literary terms. See 飲酒 and 征東將軍 for some typical examples of how I format entries. Also, I used as a model of how we could do it if time and people were not limitations. Thanks. -- A-cai 01:37, 29 September 2011 (UTC)
P.S. Other pieces that I've done in this way on wikisource: s:Departing from Baidi in the Morning, s:Preface to the Poems Composed at the Orchid Pavilion, s:Song of Everlasting Regret, s:The Peach Blossom Spring and s:Touring Shanxi Village -- A-cai 01:43, 29 September 2011 (UTC)

adding-translation script

Discussion on de.Wikt: de:Wiktionary:Teestube#.C3.9Cbersetzung-Hinzuf.C3.BCgen-Skript.

Hello English Wiktionary-Users,

in the German Wiktionary we would like to add the function that allows people to add translations without manually editing the code section. Could anyone explain how to do it? That would be great. Thanks in advance! Kampy 08:11, 11 September 2011 (UTC)

The German Wiktionary seems structure translations sections completely differently from the English Wiktionary, with translations showing which senses they correspond to by having numbers next to the translations, rather than putting the translations for each sense in a separate box, so simply copying the code wouldn't really work. --Yair rand 20:17, 11 September 2011 (UTC)
The code would have to be modified, yes, but that shouldn't be too complex a task. One option: the de.Wikt programmers could code another box between the ISO box and the translation box, which would take the sense number(s) as input. Is all of the code that operates the function contained in User:Conrad.Irwin/editor.js? - -sche (discuss) 05:48, 12 September 2011 (UTC)
No, it also uses the newNode function in MediaWiki:Common.js#Dom_creation. Another issue is that the script seems to make use of the translation table glosses, which dewikt doesn't have, for locating tables. (Not completely sure about that.) --Yair rand 06:08, 12 September 2011 (UTC)
I think if we (on de.Wikt) changed
(values.qual? '{'+'{qualifier|' + values.qual + '}} ' : '') +
to
(values.qual? '[' + values.qual + '] ' : '') +
, we could use the code as-is (abgesehen von the problem of glosses, which we could add), with users adding the sense-numbers (1, 1–2) in the "qualifier" field. - -sche (discuss) 01:06, 13 September 2011 (UTC)
About the numbers I think it shouldnt be too much of a problem. We dont use headlines saying the definition again instead we use those numbers. So there will only be one box at all times. Anything added to this box just needs an additional input box for the number it relates to. Can anyone code this? Kampy 00:05, 14 September 2011 (UTC)
There will be more than one box (and there will be no numbers) once the translations are (per the vote) separated by sense, though...
I have copied the code to my de:Benutzer:-sche/common.js, and have copied the en.Wikt and de.Wikt translation tables into subpages of my userspace for testing (de:Benutzer:-sche/sw4), but even with classes and gloss-support added to the German translation tables (de:Benutzer:-sche/sw1c), I haven't got it to work yet. - -sche (discuss) 01:01, 14 September 2011 (UTC)
Oh, I didn't know a vote was going on. I agree that the English version is more practical. I will support a change. Kampy 10:57, 14 September 2011 (UTC)
Is it possible that the code in my de.Wikt .js isn't considering itself enabled, Yair rand? - -sche (discuss) 01:08, 14 September 2011 (UTC)
[[de:Benutzer:-sche/common.js]] has some syntax errors that will cause browsers to stop processing it. [[de:Benutzer:Ruakh/common.js]] fixes the most severe errors — you can take it as a starting point for further debugging — but it still doesn't create the form for adding translations, so there's still something wrong. :-/   —RuakhTALK 02:30, 14 September 2011 (UTC)
Thank you for catching that! I'm guessing that (among other things) I should add \ to all of the other instances of sche/, like you did to sche/sw1c (ie sche/sw2bsche\/sw2b etc), yes? Or not to all of them? - -sche (discuss) 03:05, 14 September 2011 (UTC)
Doesn't make a difference, it's only necessary inside the regexps, not simple strings (/.../, not "..."). --Yair rand 03:22, 14 September 2011 (UTC)
It seems that makes the script work! The "±" sign displays atop the gloss, but that's a relatively minor problem. - -sche (discuss) 03:09, 14 September 2011 (UTC)
That's because the dewikt tables don't have the show/hide button as the first node in the NavHead, and the script places the "±" after the first node. Can be fixed by replacing insertDiv.insertBefore(edit_button, insertDiv.firstChild.nextSibling); with insertDiv.insertBefore(edit_button, insertDiv.firstChild);, so that it's placed before the first node.--Yair rand 03:22, 14 September 2011 (UTC)
Other issues: The dewikt language templates leave parserfunction residue when substed. This could be fixed by modifying the language templates to have {{{|safesubst:}}} before the #if: ({{ {{{|safesubst:}}}#if:{{{nolink|}}}|Französisch|[[Französisch]]}}). Also, the use of {{t}} needs to be replaced with whatever template dewikt uses. --Yair rand 03:32, 14 September 2011 (UTC)
Thanks; that change puts the "±" in the right place! :) I'm working on replacing {{t}} with the de.Wikt counterpart {{Ü}}. I am also considering that certain functions, like "Page name:", may not be applicable to de.Wikt. (In fact, I replaced the code to input qualifiers with code to input sense numbers; it may be that I should undo that and instead use the AFAICT-unneeded-on-de.Wikt pagename-with-diacritics code as the vessel for adding sense numbers.) (No, that wouldn't work at all.) - -sche (discuss) 04:01, 14 September 2011 (UTC)
Re safesubst: actually, the necessary change isn't to the templates (although having templates that subst safely is probably a good idea); the necessary change is to the code: we don't use use "Französisch" in translation tables on de.Wikt, we use {{fr}}. - -sche (discuss) 04:35, 14 September 2011 (UTC)
I've changed the code so that it does not subst language codes. However, changing {{t}} to {{Ü}} caused the function to display "Could not find translation entry for 'pt:worde'. Please reformat" when I tried to add worde (with ISO code pt given) to a section containing other translations. However, it added correctly to an otherwise empty section. I thought residual "{{t"s or a "{{Üxx" I added might be confusing the script's sorting mechanism, but it was also confused by this version of the page. (That version also shows that I/we/de.Wikt-programmers need to change how/where gender information is added.) - -sche (discuss) 05:03, 14 September 2011 (UTC)
The function getEditFunction might be the problem. It's built to look through the translation table wikitext for the translation to insert the new translation before (I think), but it's searching by first looking for * [[langname]]:, then for * langname:, and then for {{subst:langcode}}:, in case it's a newly added translation, but dewikt doesn't format translations like any of these. --Yair rand 21:21, 14 September 2011 (UTC)
You're right; removing subst: (so that it only looks for the language code) makes it work. Now to remove cruft... - -sche (discuss) 21:55, 14 September 2011 (UTC)
I have adapted the code to work with de.Wikt's Ü-templates. It even nests nb and nn correctly, when neither no nor nb nor nn is already in the table. However, de:Benutzer:Yoursmile gave me feedback that the adder appears but doesn't work (Could not find translation table for 'fr:reg'. Glosses should be unique) when other scripts are around, e.g. de:Benutzer:Yair rand/TabbedLanguages.js. I thought that might be because DOM-node code is redundantly in both codes, but when I separated the DOM-node and translations codes, and imported both, I found that the trans-adder no longer appeared. (If it had appeared, I would have imported only the trans-code and TabbedLanguages, to see if the absence of redundant DOM-code allowed the two to work together.) Any idea why splitting the two seemingly discrete scripts causes them to cease funktioning (ie causes the trans-adder to cease appearing)? Any idea why the trans-adder appears but does not work when TabbedLanguages are around?
Separate issue: any idea what I did wrong when I tried to remove the "script" bit? That edit caused the adder to cease appearing. That edit also removed a bit of "gender"-code, but that wasn't problematic; I successfully removed it later. I rendered the "script" bit harmless (we don't use script templates on de.Wikt) by causing it to input nothing and removing the interface, but that leaves a lot of cruft. - -sche (discuss) 23:56, 17 September 2011 (UTC)
The top four lines of that edit are actually removing part of something completely unrelated to the "script" bit, but that part isn't what actually broke it. The edit contained an extra comma (ota:{,wsc:"ota-Arab"}) which caused a syntax error. --Yair rand 07:32, 19 September 2011 (UTC)
Thank you! I have got the script to work with Tabbed Languages; with your syntax-fix, now the unneeded "script" part has been successfully removed. I tested Tabbed Languages and the Trans-Adder together in the main namespace on de:Katze. I moved the code to de:Benutzer:-sche/uebersetzung.js, if anyone wants to see for themselves (remember that at the moment it is still oriented to {{Benutzer:-sche/sw1c}} and therefore only works on test pages or modified pages). The only issue I note now is that it wouldn't add more than one translation without me refreshing or navigating away from and back to the page; I wondered if I just didn't wait long enough (de:Katze had a lot of translations for it to sort through), but it displayed the same behaviour on my simple test page. (A minor problem I remind myself to fix is the unneeded space between * {{langcode}}.) de.Wikt will have to adopt glosses for this to work. - -sche (discuss) 09:27, 19 September 2011 (UTC)
Of note: the code works differently in different browsers and in the main vs the user namespace. (Those interested can temporarily restore this version of Katze and try using the code on it.) - -sche (discuss) 22:54, 19 September 2011 (UTC)

Wiktionary talk:Etymology#Where to put etymologies

See Wiktionary talk:Etymology#Where to put etymologies --MaEr 10:56, 11 September 2011 (UTC)

Correlative conjunctions

How should the entries for correlative conjunctions look? Both...and, neither...nor, both-and, neither-nor or something else? I'm not asking just about English, but about other languages too. Arath 14:51, 12 September 2011 (UTC)

Making an arse of it ... ?

Interloper from Wikipedia here. While testing some new back-end scripts, I ran across template:a bum - a nonexistant template that is linked from many Wiktionary entries. Not sure if it's vandalism, preparation for some widescale future vandalism, or just a typo (album?) somewhere deep that invites same. I had a crack at finding the source, butt alas got nowhere. - Topbanana 17:24, 12 September 2011 (UTC)

Thanks! It's from {{t|ar|[anything]}} (also t+ and t-), but I can't seem to track it down farther. Incidentally, you spelled alass wrong.​—msh210 (talk) 17:47, 12 September 2011 (UTC)
[e/c] Got it. The culprit was template:ar/script, which had been vandalized. Thanks again.​—msh210 (talk) 17:55, 12 September 2011 (UTC)
(After an edit conflict) I dug through the histories of two of the entries, multiculturalism and dictionary. It seems to have been added to multiculturalism in this edit, and to dictionary in this edit. In other words, its transclusion seems to be caused by the removal of "sc=" from uses of {{t}}. - -sche (discuss) 17:54, 12 September 2011 (UTC)

I've protected template:ar/script; can some admin who knows how to write such a bot please flood-protect the [langcode]/script templates as highly visible?​—msh210 (talk) 17:57, 12 September 2011 (UTC)

highly visible templates that may need protection

User:Topbanana has generated a list of highly transcluded templates that lack full protection. Some can even be edited by non-autoconfirmed users. We may want to protect some of these. The list was at [[User:Topbanana/Template_protection]], which I've deleted so as not to advertise the list to would-be vandals (cf. w:wp:BEANS), and admins can find it now at [[Special:Undelete/User:Topbanana/Template_protection]].​—msh210 (talk) 15:16, 13 September 2011 (UTC)

Template:given name - sorting missing

Would an admin be kind enough to fix {{given name}} to allow sorting? This template currently fails to properly categorize Japanese given names, for instance. The code to tweak (not all the code, just a snippet):

<includeonly>{{#if:{{NAMESPACE}}|| [[Category:{{langname|{{{lang|en}}}}} {{#if: {{{diminutive|}}}|diminutives of}} {{{gender|{{{1}}}}}} given names {{#if:{{{from|{{{2|}}}}}}|from {{{from|{{{2}}}}}}|}}]]

Change to:

<includeonly>{{#if:{{NAMESPACE}}|| [[Category:{{langname|{{{lang|en}}}}} {{#if: {{{diminutive|}}}|diminutives of}} {{{gender|{{{1}}}}}} given names {{#if:{{{from|{{{2|}}}}}}|from {{{from|{{{2}}}}}}|}}|{{{sort|{{{skey|{{PAGENAME}}}}}}}}]]

As a minor side note, {{given name}} and {{surname}} are a bit inconsistent, in that {{surname}} includes a period at the end, and {{given name}} does not. Immaterial really, but it'd look a bit more put together if these two were in agreement. -- TIA, Eiríkr Útlendi | Tala við mig 16:08, 13 September 2011 (UTC)

  • I'll change protection so you can do it yourself. Then I'll protect it again. SemperBlotto 16:12, 13 September 2011 (UTC)
    Brilliant, thank you SemperBlotto, done and sorted! (Pardon the pun.) -- Eiríkr Útlendi | Tala við mig 16:20, 13 September 2011 (UTC)
    Could somebody fix {{surname}} to allow sorting too? Thanks Haplogy 16:14, 21 September 2011 (UTC)
    Thank you Haplogy for pointing that out, and thank you Mglovesfun for changing the protection settings on the template. Yes check.svg Done and sorted. -- Eiríkr Útlendi | Tala við mig 16:51, 21 September 2011 (UTC)
    Of course, now there's the bother of going through Category:Japanese surnames and making sure that all the romaji entries are categorized alphabetically. In most cases, this just means deleting the Category:Japanese surnames line containing the mistaken hiragana sort key; the cat line is redundant anyway since the {{surname}} template already supplies the category.
    For that matter, I don't suppose anyone has a bot handy for this? It would just need to go through these entries and delete the Category:Japanese surnames portion. The entry set is defined as:
    -- Cheers, Eiríkr Útlendi | Tala við mig 17:07, 21 September 2011 (UTC)

Question about cats

Cleaning up some Japanese given name entries, I've stumbled on a bit of a puzzle. The name 恵美 (Emi) can also be the name 恵美 (Megumi). Using the {{given name}} template and the {{{sort}}} argument for each reading on the page, I'd expect the entry to show up in Category:Japanese female given names under both え for Emi and め for Megumi -- but it only shows up under め. I'm guessing that the cat applied by the second {{given name}} call is overriding the first.

So the $60,000 question is, is there any way for a single entry that belongs in a single category to show up under two different indices in that same category? If not, am I right in guessing that the index to use is the first one alphabetically (or in this case, hiraganically)? -- Eiríkr Útlendi | Tala við mig 16:52, 13 September 2011 (UTC)

You may have to create a redirect using a zero-width non-joiner (&#8204;), as discussed here, and sort the main entry into one category and the redirect into another ... if it is possible to use sort in redirects. - -sche (discuss) 20:57, 13 September 2011 (UTC)
Interesting. What a wonderfully ugly cludge. Well, so long as it works. That might just be what I do if no one has a better idea.  :) -- Thank you, Eiríkr Útlendi | Tala við mig 21:12, 13 September 2011 (UTC)
Well, I just went ahead and created the page 恵&#8204;美 and put the second cat index there, and it works -- the entry 恵美 is now properly indexed under both えみ (Emi) and めぐみ (Megumi). Thank you, -sche! -- Eiríkr Útlendi | Tala við mig 15:52, 15 September 2011 (UTC)

Pokemon get their own ja-noun template?

I'm going through Category:Japanese_nouns to clean up after realizing I'd left the indexing argument hidx out of a lot of entries I'd been working on, and I discovered that Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック still get categorized under katakana even after I added the hidx arg to the POS template. Then I realized that these entries get their own POS template: not the usual {{ja-noun}} template, but {{ja-noun/pokemon}} instead. Is this kosher? It seems awfully dodgy to me. If we're keeping this template, shouldn't we at least align its formatting with {{ja-noun}}? -- Bemused, Eiríkr Útlendi | Tala við mig 23:30, 14 September 2011 (UTC)

Those entries aren't actually supposed to exist anymore. They're only still there because nobody bothered to merge them into the list yet. --Yair rand 00:02, 15 September 2011 (UTC)
Okay. Merge them into what list, though? RFD? -- Eiríkr Útlendi | Tala við mig 06:56, 15 September 2011 (UTC)
There was a vote on the issue, so you don't need an RFD. --Mglovesfun (talk) 08:56, 15 September 2011 (UTC)
Is there anything I can / should do to help deal with these entries? My editor instincts for having things sorted out are getting itchy.  :) -- Eiríkr Útlendi | Tala við mig 15:38, 15 September 2011 (UTC)
The community decided these entries for fictional things should be in lists. Some appropriate lists are Appendix:DC Comics (Bat-Signal, Kryptonian...) and Appendix:The Legend of Zelda (Hylian, Ocarina of Time...).
The coverage of Pokémon is still a bloody mess; the work of making these pages with lists is half-done. You can help by creating the lists, if you are interested. Everything you need is here: Special:PrefixIndex/Appendix:Pokémon. --Daniel 17:53, 15 September 2011 (UTC)
Thanks, Daniel. Looking at that list, I notice that Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック are the only two katakana entries on that list, and both already have their romanized versions of Appendix:Pokémon/Arbo and Appendix:Pokémon/Arbok. Am I right in guessing that all other katakana Pokémon entries have been removed and/or converted to the official romanizations? If so, can we delete the Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック entries? -- Eiríkr Útlendi | Tala við mig 21:24, 15 September 2011 (UTC)
No, nobody had the idea of deleting the katakana entries and leaving only the official romanizations.
Ideally, Wiktionary can have a complete glossary of all 649 names of Pokémon in English, the 649 names in katakana, the 649 official romanized names and even the 694 romanizations from katakana. I think that would simply be two big appendices: one of English and one of Japanese; but I may be wrong. If these lists are created, then someone searching for, say, Beedrill, will at least be able to know it is the name of a Pokémon species.
We have few Pokémon listed simply because nobody bothered to make the full lists. --Daniel 01:05, 16 September 2011 (UTC)
This is straying more into Grease Pit territory, but if we're going to keep the Pokémon appendix entries in all valid scripts, how do we make sure they get indexed properly? Appendix:Pokémon/アーボ and Appendix:Pokémon/アーボック are currently indexed as Japanese nouns under ア, when they should be indexed under あ. I tried adding the hidx arg used in the normal {{ja-noun}} template, but {{ja-noun/pokemon}} doesn't implement any explicit sorting.
Plus, the formatting of {{ja-noun/pokemon}} is a bit jarring in its differences from {{ja-noun}}. My gut instinct is to unify these, but I'm not sure what the designer of {{ja-noun/pokemon}} intended, or what anyone else thinks. Any advice? -- Eiríkr Útlendi | Tala við mig 01:19, 16 September 2011 (UTC)

Appendix:DC Comics and Appendix:The Legend of Zelda don't use anything like {{ja-noun}} or {{ja-noun/pokemon}}. These lists don't have headword-lines like entries. If the appendices of Pokémon follow suit, then {{ja-noun/pokemon}} should just get deleted.

Indexing should be simple. Appendix:DC Comics is indexed under "D", because "DC Comics" starts with that letter. Appendix:Pokémon/Species would be sorted under "P".

However, the appendices should not stay in Category:Japanese nouns, because there it would be regarded as clutter. In that category, the focus is keeping the Japanese nouns of entries rather than the ones of appendices. --Daniel 02:33, 16 September 2011 (UTC)

Thanks, Daniel. It's probably easy enough to remove the Pokémon entries from Category:Japanese nouns just by editing {{ja-noun/pokemon}} for now. Some sort of header template is probably a good idea, in order to include info like official Japanese name and romanization thereof (which sometimes differs markedly from the official name in the Latin alphabet), so I'm not sure we should just delete it -- maybe tweak it to categorize things properly, and maybe move it? I'll at least make a few minor changes tonight. -- Eiríkr Útlendi | Tala við mig 05:21, 16 September 2011 (UTC)

Yes, the consensus clearly is deleting individual pages like "Appendix:Pokémon/Super Potion" in favor of having big lists like, possibly, "Appendix:Pokémon/Items" or "Appendix:Pokémon/Objects".

Now I edited {{ja-noun/pokemon}} so it doesn't categorize appendices into Category:Japanese nouns anymore.

Actually, when the appendices of English terms of Pokémon were created, they were using a version of {{en-noun}} that did not categorize them into Category:English nouns; but now it categorizes. Naturally, if all the appendices that use {{en-noun}} are replaced by lists, the miscategorization will stop.

I created Appendix:Pokémon/Species with a basic format for organizing headwords, definitions and translations together. One small problem of this system is that the translations are all redlinks to entries. This should be fixed eventually. --Daniel 06:12, 16 September 2011 (UTC)

Adjectives in the translation sections of nouns

User:DCDuring and I have been discussing how to handle adjective translations of English attributive nouns. Many languages use adjectives where English uses nouns attributively. For example, a "corkn." = a "пробкаn.", but "corkattr. n. insulationattr. n. materialn." = "пробковыйadj. (cork) изоляционныйadj. (insulation) материалn. (material)"1. As I see it, there are several ways we can handle this:

1. List the adjectives in the nouns' translations tables, roughly like in 'cork'. This has long been en.Wikt's general practice — to include in translation sections, where appropriate, words not the same part of speech as the word they translate. The translations section of the adjective abroad contains the German preposition + article + noun im Ausland; German routinely uses nouns in compounds to express things English expresses with adjectives, so adjectives like racial routinely contain nouns like Rassen-.

2. List the adjectives in separate tables in the translation sections of the nouns, like in 'brass'. This was Matthias Buchmeier's suggestion. It has the advantage of distinguishing those many translations, which are systematically different; it has the disadvantage of inviting confusion or duplication of languages which do not use other parts of speech but rather use nouns like English does.

3. Have separate sense lines for attributive nouns, and list the adjectives in the separate tables of those sense lines, like in 'spruce'. This would represent a significant change to our current practice. This has the advantage of establishing great clarity. It has the disadvantage of inviting duplication of translations from languages which use nouns attributively like English does, and it is duplication in the English section, or at least overspecificity. It would represent a significant change to our current policy and practice. Multiple sense lines would be required, or the attributive sense lines would hold multiple definitions: "cat paw" is a paw "of a cat", "cat booties" are booties "for a cat", a "cat addiction" is an addiction "to (having) cats"...

4. Include adjective POS sections for attributive nouns. This would represent a significant change to our current policy and (in most cases, eg insulation) our current practice. It would create some clarity (words which were used in some of the same ways as adjectives would have adjective POS sections), but would also create some unclarity or would mislead (words would be called adjectives, though they could not be graded or used alone after 'become' or in other key ways adjectives can be used). It would also have the disadvantage that users who understood that "cork insulation material" was [attributive noun + attributive noun + noun] would only find the translations of the attributive noun "cork" in the adjective section.

5. Allow foreign-language entries to host translation sections, as we do on de.Wikt. This would represent a significant change to our current policy and practice. It would have advantages in other areas (few languages have verbs with exactly the same meaning, for example), but that is for another discussion (please!): it would be a poor solution to this issue: the adjective пробковый and all other languages' same-meaning adjectives could host translation sections, but to find such a translation section, a user would have to know one of the translations already (and it would have to be a bluelink).

6. Not present the information. This would represent a significant change to our current policy and practice. It would have the disadvantage of preventing us from having complete translation sections; if brass can be used in two ways, "that metal is brass" and "the brass knob", but we only provide the translation that works in uses like "that metal is brass", we'd be missing an accurate translation.

User:DCDuring (who may have other ideas on how to handle adjective translations of attributive nouns) and I felt we should bring this up for discussion here, in the Beer Parlour, for several reasons. Wiktionary has long included translations which are different PsOS than the words they translate, but the inclusion of this giant category (words, in all languages, which translate attributive nouns) has not — AFAICT — been discussed. Should we continue to include it? If we do, should we simply list the adjectives in the noun tables without distinction, or should we introduce them with some clause, or list them in separate tables (like brass)? If we set them apart with some clause, what should it say?

How should we handle nouns (I give oak as an example, although at the moment it still has an adjective section) which are often used attributively, but which have corresponding adjectives (oaken)? Should adjectives like дубовый be in oak (for consistency, accuracy, and completeness, because the nouns are used in places in English that other languages uses their adjectives for — because English, despite having oaken, still says "oak table") and in oaken, or only in oaken (since it, an adjective, exists)? - -sche (discuss) 02:23, 16 September 2011 (UTC)

Oak really shades into adjectiveness. It's used both attributively and predicatively in ways that resemble an adjective (google books:"oak furniture", "the furniture was oak"). In such cases my own opinion is that it's best to include an adjective section, even if Occam's razor would prefer we consider it solely a noun; but even if we don't, I think we could have a sense “(shading into adjective use) Made of this wood; oaken” and a corresponding translations box. Adjective translations in that box could be tagged with {{pos a}} or {{qualifier|adjective}} or whatnot.
However, oak is an extreme case. In the typical case, the ordinary translation of the noun is as a noun, and if a given language uses an adjective to translate certain cases, I think it's probably best addressed through usage notes in that language's noun entry. For example, [[étoile#French]] could explain that in many cases where English tends to use the noun star, in French the adjective stellaire is likely to be used: système stellaire (star system), amas stellaire (star cluster), etc. (I realize that explaining it in the French entry does not preclude explaining it in the English entry's translations-box as well, but I think the latter is just too hard to do intelligibly.)
At [[oaken]] and [[stellar]] and such, we'd of course give the usual adjective translations. (Well, maybe [[oaken]] would just have {{trans-see|oak}}, or vice versa.)
RuakhTALK 03:15, 16 September 2011 (UTC)
Ruakh is right - put them in "usage notes in that language's noun entry". All such long exlanations will - if carried to conclusion - result in unmanageably large (and probably unreadable) translations tables.
I would also say that all synonyms, gender forms, &c are also best put in the foreign language word's entry. —Saltmarshtalk-συζήτηση 10:54, 16 September 2011 (UTC)
@Ruakh: I've continued the discussion of whether or not the specific word oak is an adjective at WT:RFV.
@Saltmarsh: I may be misunderstanding what you mean when you suggest omitting "synonyms". Our general practice has AFAICT always been to include (like other translations dictionaries) all words in the foreign language which mean what the English word means. If you would not include all words, how would you decide which word was the main word and which words were just "synonyms" of it? (For example, how would you decide whether to include священный and omit святой from holy, or include святой and omit священный?)
@Ruakh and Saltmarsh: it is possible to have the adjectives in the translations sections of nouns without long notes, if your concern is that the notes are clutter: the translations could be given without introduction, like this, or in separate tables, like this (which seems similar to what you consider doing at [[oak]]).
If you are more broadly concerned about having adjectives in the translations sections of nouns, do you think we should remove the nouns that are in the translations sections of adjectives, where languages routinely use nouns in compounds where English uses adjectives, eg German Modell-noun in modeladj., Finnish yhteiskunta-noun in socialadj., Swedish stjärn-noun in stellaradj., etc? Do you think we should remove all Navajo translations from our adjective entries? Navajo expresses "itpronoun isverb adjectiveadj.", eg "itpronoun isverb whiteadj.", as "łigaiverb (it is white)".
What do you think should be done in cases where the use of a specific different POS is not routine (eg German im Auslandarticle+prep. + noun in abroadadj.)? - -sche (discuss) 05:38, 17 September 2011 (UTC)
@-sche/synonyms - dont synonyms usually have subtle differences? And in some cases you may have 4 or 5 of them - how does the user judge which is best for their needs? Look at all five? Easier to give the most accurate translation or the most frequent. The user will follow this link, where more information for each form can be given - as under the See also head (which could have been Synonyms) for επειδή. —Saltmarshtalk-συζήτηση 06:29, 17 September 2011 (UTC)
I think that the way used in cork is good. It's understandable. Lmaltier 16:31, 16 September 2011 (UTC)
To clarify, do you mean with the introduction "corresponding to English attributive use, meaning ‘made of cork’:", or without it? - -sche (discuss) 05:38, 17 September 2011 (UTC)
WIth the introduction "corresponding to English attributive use, meaning ‘made of cork’:"; without it, it is very misleading. Lmaltier 08:35, 17 September 2011 (UTC)
An adjective in a non-English language should IMHO be found on the page of the noun whose attributive use it captures, whether in a usage note, in derived terms or on the headword line. Thus, I dislike the Dutch translation line in the English section of this revision of "cork": I prefer "Dutch: kurk (nl), kurklaag" to "Dutch: kurk (nl), kurklaag; corresponding to English attributive use, meaning ‘made of cork’: kurken"; I disagree with "Dutch: kurk (nl), kurklaag; kurken (nl)", as "kurken" has no place to seek there. In this I seem to agree with Ruakh and Saltmarsh. --Dan Polansky 07:02, 22 September 2011 (UTC)

Japanese POS templates and how entries are indexed

Chewing on the issue of how entries are categorized and indexed, I re-read WT:About Japanese, in particular the section Wiktionary:About_Japanese#Sorting. As I'd suspected, romaji entries should be indexed alphabetically, not under the corresponding hiragana. However, a quick look at Category:Japanese nouns, for instance, shows many romaji entries indexed hiraganically.

Looking deeper, {{ja-noun}} has the following line towards the end:

[[Category:Japanese nouns|{{{hidx|{{{hira|{{PAGENAME}}}}}}}} {{PAGENAME}}]]

This needs to be changed to:

[[Category:Japanese nouns|{{#ifeq:{{{1}}}|r|{{lc:{{PAGENAME}}}}|{{{hidx|{{{hira|{{PAGENAME}}}}}}}}}} {{PAGENAME}}]]

This tweak will index JA noun entries lower-case-alphabetically if the first template arg value is the letter "r", which it should be for romaji entries. I just made a similar change to {{ja-verb}}, but {{ja-noun}} is locked down. I'm about to crash for the night, so if anyone unlocks the template, don't expect much for the next ten hours or so.  :) -- Cheers, Eiríkr Útlendi | Tala við mig 06:41, 16 September 2011 (UTC)

Perhaps you also want to request a dump from the database, similar to Index:Russian, so that a Japanese index is updated. The current Index:Japanese is a joke. Then you'll see both red (from translations) and blue Japanese words. I don't know how this is done though. --Anatoli 07:14, 16 September 2011 (UTC)
Don't ask me why, but it does seem to be very common to sort using the hiragana. Perhaps it's WT:AJA that needs to be changed. Mglovesfun (talk) 08:50, 17 September 2011 (UTC)
@Anatoli:
Who would I make such a request of? And what does a dump do? My (admittedly limited) understanding of DB management is that a database dump is just an output of its contents. And what are the language indices used for? Just looking at Index:Japanese, I'm not sure why it would be a joke, but then again I've never had occasion before this to look at indices for a whole language. -- Ta, Eiríkr Útlendi | Tala við mig 23:11, 18 September 2011 (UTC)
@Mglovesfun:
Some of us (so far User:Haplology, User:MichaelLau, and myself) are discussing and working on editing WT:AJA, having created a working draft at Wiktionary:About Japanese/Draft. We haven't done too much yet, but the idea was that a separate draft page would make it easier to implement drastic changes on the fly and see how it all looks, without confusing folks by changing the main About page until we have something we're happy with. Please chime in at Wiktionary talk:About Japanese/Draft if you have any ideas or opinions to discuss on how to change things.
With regard to sorting, the current policy stated at Wiktionary:About_Japanese#Sorting seems to state that kanji and kana headwords should be sorted by hiragana, while romaji entries should be sorted alphabetically -- this makes the most sense from a learner's standpoint, in that someone just starting with Japanese can still find a word in the index even if they don't know kana. However, it seems that 1) the Sorting section might stand some clarification, and 2) the description of how to use the hidx sorting parameter provided in the documentation for each of the Japanese POS templates is a bit less clear than WT:AJA. Moreover, many contributors of Japanese terms seem to be ignorant of WT:AJA, or at least not fully versed in it (which isn't too hard to understand given how long the document is, and all the complexities of dealing with Japanese when writing in English).
Anyway, I'll give Wiktionary:About_Japanese#Sorting another look this week, maybe tweak our Draft version of it a bit, and also go through the Japanese POS templates and update their documentation to make things a bit clearer when it comes to indexing. -- Cheers, Eiríkr Útlendi | Tala við mig 23:11, 18 September 2011 (UTC)
I didn't know who created a database dump, I think it must be User:Conrad.Irwin, I remember reading about it and people requested indexes for other languages. I suggested to explore who created and refreshed Index:Russian and I find it a really good index, it looks like it has indexed tens of thousands entries and translations (many are in red). The possible challenge I see in creating the Japanese index is in, well, sorting and indexing the Japanese words. Why it (the Japanese index) would be a joke? Because it only has about a hundred KANJI, not tens of thousands WORDS. --Anatoli 04:59, 19 September 2011 (UTC)

do not and does not

Any reason why the entries do not and does not don't exist? --The Evil IP address 13:56, 17 September 2011 (UTC)

...and will not, could not, must not, might not, may not, etc.? Or is there something special about do? Equinox 13:58, 17 September 2011 (UTC)
Nothing special about it, but I just noted that we had the contractions don't or doesn't without the spelled out versions (which are not uncommon). --The Evil IP address 14:09, 17 September 2011 (UTC)
Are they not just sum of parts? Do + not? —CodeCat 14:10, 17 September 2011 (UTC)
Yeah, I think so. We don't have the contractions in order to define them per se, but in order to explain what sequence of words they are short for. Equinox 14:14, 17 September 2011 (UTC)
I think these are an exception. The entry could be quite useful, for example both negate a sentence. This cannot be said about most other verbs, as well as most verbs can't have a "not" after it. The etymology section could also be very interesting, stating why do can be negated, but most other verbs not. --The Evil IP address 14:22, 17 September 2011 (UTC)
Maybe, although it feels more like something for a grammar book or grammatical appendix. I was thinking the other day about how isn't and don't mostly behave alike, but not always ("didn't you do..." is fine, but "didn't you be..." is impossible). Equinox 14:25, 17 September 2011 (UTC)
Why didn't you be more certain of that and look for it in Google book search? SemperBlotto 14:33, 17 September 2011 (UTC)
Ha! Interesting. Those sentences sound very odd to me, especially "why didn't you be including the bombing of civil populations amongst their crimes?". Equinox 14:38, 17 September 2011 (UTC)
Using 'not' is rare with most verbs now but it wasn't so rare in Shakespeare's time I think? This is also considered 'modern' English so we would need to include it. —CodeCat 15:18, 17 September 2011 (UTC)
I think it's sufficient to relegate this to a usage note at [[not#Adverb]] (where it is already, as the first usage note) or an English-verbs appendix.​—msh210 (talk) 18:15, 18 September 2011 (UTC)

The negative forms that most English auxiliary verbs have are just that: negative forms. They are not contractions like apostrophe d for would or apostrophe s for is or has, etc., though that's how they started. Rather, the n't is an inflectional ending. The rational is here.--Brett 01:13, 25 September 2011 (UTC)

A new list of Latin Epithets (same suffixes together)

new list: Epithets by Suffix (contents)

I've created this new list which I'm calling "Epithets by Suffix". It's pretty huge (300,000+ entries) and is "retro-sorted" from nausicaa to tausaghyz. This allows words with the same suffixes to be worked on together. (as requested by DCDuring)

This is a follow-up to the Top 1000 list I posted recently. I created the Top 1000 list because I think it's really important that all these words have entries. For example, there are 882 species on this planet which have been given the specific epithet fasciata (including five animals and one plant which are all threatened with extinction, such as the Fiji Banded Iguana, Brachylophus fasciatus). There's no Latin entry for fasciata, nor for another 82 epithets which are each used by over 300 species.

I made these lists for Wiktionary's editors, so we can create or improve entries for any of the most common specific names as well as those of threatened species. I am interested in the diversity of life and its conservation, and would like to see the subject become less difficult for others studying or interested in biology. Scientific names are often seemingly opaque in meaning, and can be intimidating and difficult to work with when there's no easy to way understand them.

Wiktionary is becoming the go-to source for definitions. I want to encourage those who are improving it, and thus (deliberately or not) making the biological sciences less frustrating, less intimidating, and less mysterious for all. Latin is very well represented on English Wiktionary, currently having more entries than any other language (including English, counted by number of definitions). So I hope adding scientific Latin overlaps with the interests of Wiktionary's Latinist contributors, and I hope it's not too much of a stretch to sometimes delve into the less "pure" world of New Latin and scientific interlingual words. I'm trying to learn enough Latin and Wiktionary syntax to help more, but even if I were a grandmaster, I can't do it all on my own, so I'm hoping the top 1000 list as well as this new suffix-sorted list will encourage the Latinists here to consider looking at modern scientific Latin usage. And thanks also to those who have made some kind of start.

I've since improved the Top 1000 list so that filled entries have a strike-through, making it easier to identify blue links which don't yet have Latin or Translingual entries. Also I've highlighted epithets which belong to threatened species in order to increase their visibility (my own personal interest). These markups are on both lists.

This has all been a lot more work than I expected, and has all been done in my spare time, so I'm hoping it pays off with people actually using the lists to improve Wiktionary. If the lists get used, I'll take that as a show of their usefulness and spend the time to keep updating and improving and expanding or creating tools to help flesh out entries. Otherwise I'll leave it at this. It would be nice to move on to genera too.

TL;DR: New list of "Latin" specific epithets sorted by suffix (mostly Latin adjectives). I really do hope these lists are helpful and lead to the creation of new Wiktionary entries. Pengo 06:28, 18 September 2011 (UTC)

American vs European music terms

I'm not sure whether this has been discussed before but I'm not sure what to do with regard to some musical entries. You may or may not be aware that in America music theory is taught with quite an array of different terms to the words used in European (chiefly British) music theory which have the same meanings.

e.g. whole note (American) --> semibreve (British)
quarter note (American) --> crotchet (British)
staff (American) --> stave (British)

From my experience a lot of American musicians have a hard time understanding the British equivalents as they often don't get taught, and vice versa. I find this hard to incorporate in definitions such as 2/2 where it may be difficult for a British reader to understand (half note would be a minim, and a measure would be a bar). Equally it would be hard for an American reader if the definition was in British English. There are quite a lot of entries where this problem arises; I often have to look up the meanings of the words because I was never taught the American terms.

I'm not sure if there's a way around it except to assume that someone will look up a word if they don't understand it. It's quite a niggling issue in the academic music world. —JakeybeanTALK 05:55, 20 September 2011 (UTC)

Perhaps a table expanded from one similar to that shown below could be added to each relevant page - or assigned to an appendix?
  British American
Notes semibreve whole note
minim half note
crotchet quarter note
Miscellaneous stave staff
Saltmarshtalk-συζήτηση 10:07, 20 September 2011 (UTC)
It appears to be a simple AE vs. BE difference. At least in Germany, the American terms are used. -- Liliana 11:52, 20 September 2011 (UTC)
Perhaps the definition for 2/2 can read {{music}} A [[meter]] of [[two]] [[half note]]s {{gloss|[[minim]]s}} per [[measure]] {{gloss|[[bar]]}} or the like, providing the BrE and AmE terms each time. (The above looks like: Template:music A meter of two half notes (minims) per measure (bar).)​—msh210 (talk) 16:34, 20 September 2011 (UTC)
We don't need both "measure" and "bar" since both terms are used in American English, so in the definition we could just change "measure" to "bar" without loss of understanding to American readers. —Angr 17:13, 20 September 2011 (UTC)
But non-native speakers, like me, don't understand bar, only measure, because the former is not associated with any musical term in Germany. -- Liliana 18:34, 20 September 2011 (UTC)
We can't start trying to second-guess what words non-native speakers might know and what words they might not. The German word is Takt, which isn't obviously connected to either "measure" or "bar", and which English word Germans learn depends on which variety of English they're exposed to. I don't think we can get around the fact that non-native speakers may have to look up words in a gloss whose meaning they don't know, but it would be good if we could minimize that for native speakers. —Angr 20:28, 20 September 2011 (UTC)
Oh, I wouldn't know. I was trying to split it on BrE/AmE lines, and may have erred. My general idea though is that {{gloss}} be used when the Brits have one word and the Yanks another and never the twain shall meet.​—msh210 (talk) 18:53, 20 September 2011 (UTC)

Japanese kanji entries and classical vs. modern readings

Going through Category:Japanese_terms_needing_attention to do some mostly-mindless clean-up work, I've run across a number of kanji entries where the list of readings includes things that semantically sorta make sense, but that I've never seen. 不#Readings, for instance, lists the kun'yomi "せず (sezu), にあらず (niarazu), いなや (inaya)", which make sense since 不 essentially means "not" and all these kun'yomi are related to negativity, but I've never heard of 不 having any kun'yomi at all. Moreover, neither the Jisho.org entry nor Jim Breen's site (you'll have to enter the kanji yourself, I can't link directly) list any kun'yomi, nor do my dead-tree dictionaries. The Weblio entry does list these kun'yomi, but various things about Weblio make me think that they include classical Japanese readings, not just modern. That said, classical Japanese was much more varied in terms of how things can be spelled -- imagine Chaucerian English spelling, only far looser -- and thus classical readings aren't always terribly pertinent to the modern language.

This leads me to wonder if we should mark classical readings somehow? Or should we leave them out altogether? -- TIA, Eiríkr Útlendi | Tala við mig 20:25, 20 September 2011 (UTC)

Japanese multiple readings are pain in the butt, especially names. Im amused at how 夜神月 is actually read Yagami Raito. Just use {{qualifier}}, I guess. I'm sure many kanji don't have a comprehensive list of all possible readings. --Anatoli 01:34, 21 September 2011 (UTC)
Hm, yes, marking non-standard readings using qualifiers or something similar seems to be the emerging consensus. However, I don't think we even could go for "all possible readings", given the flexibility of how kanji are used.
FWIW, 夜神月 seems to be a manga or anime character, in which case all bets are off as to reading - the author(s) could just as well decide that a given kanji string should be read Furī Uirī, or Ai Raiku Dōnattsu, and that would be that. Manga and anime readings are sometimes the very picture of arbitrariness.
With that in mind, I'd be more inclined to have kanji entries here limit the list of readings to attested historical readings, and leave out anything that's clearly a creative neologism of limited currency -- basically apply something like CFI to the readings themselves. :) -- Eiríkr Útlendi | Tala við mig 15:48, 22 September 2011 (UTC)
夜神月 ( (tsuki) meaning Moon in this name is read as Raito, from English "light", watch Death Note - highly recommended, the best quality anime I've seen (the movie is not as good)!) is an extreme example but this arbitrariness is not restricted to names and not only manga names. I see your point but I find that listing too many readings for a kanji can also be counterproductive. Readings can be borrowed from other kanji with similar meanings, like with your example of いなや (inaya)", which is normally written as 否や in kanji. --Anatoli 22:56, 22 September 2011 (UTC)

Hindi and Urdu vs Hindi-Urdu or Hindustani

I don't want to be mean and just change the headings from Hindi-Urdu to separate Hindi and Urdu as in the translations for Hindustani. I don't think there was a policy of merging the two languages together, even if Hindi and Urdu templates allow to display words in both scripts. Any thoughts? --Anatoli 00:22, 21 September 2011 (UTC)

They should definitely be separate. There was no discussion on merging them (and even then, there is no code for Hindi-Urdu we could use). -- Liliana 11:37, 21 September 2011 (UTC)
I think we should at least discuss it. We could create a code, perhaps {{inc-hin}}. Currently our Hindi and Urdu template include things like 'Hindi spelling' and 'Urdu spelling', implying that they are the same language. I have absolutely no input on whether we should treat them as the same language, but we should discuss it. --Mglovesfun (talk) 17:15, 21 September 2011 (UTC)
They are the same, yes. We only treat them separately due to two different scripts being used, so we can have all Hindi words in Devanagari and all Urdu words in Arabic script. -- Liliana 17:19, 21 September 2011 (UTC)
A small correction. There are layers of heavily Sanskritised words in Hindi, which are not used in Urdu, the reverse is true as well. There are many words of Persian and Arabic origin in Urdu, which are not used in Hindi. Having said this, Urdu can be written entirely in Devanagari (this type of writing is, in fact, more precise about consonants, which are missing in Sanskrit, like z, f, x, q, ġ, etc., Hindi writers often replace them with j, ph, k, g, etc.) and Hindi can be written entirely in Perso-Arabic script as well. The high level words are getting more out of use, as Hindustani, a spoken variety of both Hindi and Urdu is getting popular due to Bollywood, songs and media. Hindi and Urdu now borrow a lot from each other and from English making them even closer. --Anatoli 23:39, 21 September 2011 (UTC)
Structurally they are different standardized registers of the same language, comparable to Croatian, Serbian, etc. being standardized versions of the same language (which is called Serbo-Croatian). Because they are the same language, I would be in favor of a unified header, like we have for (Roman and Cyrillic) Serbo-Croatian. Though, the damage would not be like we used to have in the case of Serbo-Croatian (three or even four identical entries on the same page), because, AFAIK, there should never be both a Hindi and an Urdu entry at the same page anyway because they use different scripts (well, at least the standardized registers, of course). --JorisvS 19:26, 22 September 2011 (UTC)
I love arguing with Indian people about Hindi and Urdu being different languages. I quote religious Urdu stuff that they understand perfectly and I'm like "really, because half those words are from Persian, so if Urdu wasn't the same language as Hindi you wouldn't understand this." Anyway. As JorisvS points out, the mess isn't as serious as Serbo-Croatian once was because the headers aren't used on the same page. If what's desired is one header, I think Hindi-Urdu is a bit odd, and Hindustani would probably be the most neutral. In translation tables (I already do this for descendants tables and in Etymology) we could have
* Hindustani 
*: Hindi: 
*: Urdu:
I'm sure (=positive) some people (mostly racists) would bitch and whine, as with Serbo-Croatian. But they're lesser people. If there are words that aren't used frequently in India, they can be marked as predominantly Pakistani in Usage notes, and vice versa. Wouldn't be a big deal. The main concern would be categorization. We have problems with Chinese (simplified and traditional) and some people worrying about Serbo-Croatian, but I wouldn't really be opposed to something like Cat:Urdu spellings of Hindustani nouns/verbs/whatever. </ideas> — [Ric Laurent] — 20:04, 22 September 2011 (UTC)
Hindustani sounds more interesting than Hindi-Urdu. I also noticed that Hindi speakers like to say that their language is closer Sanskrit and Urdu speakers say Urdu is closer to Persian. In reality, they both have enough from both. It may be harder to find Urdu equivalents for "clever" Hindi words like प्रदूषण (pradūṣaṇ) (pollution) but otherwise most Hindi words have Urdu equivalents and vice versa. Didn't you say you were avoiding Beer Parlour? :) Thanks for your input, Ric. --Anatoli 23:05, 22 September 2011 (UTC)
Sometimes things of actual importance are discussed here, so when I see notifications of good conversations, I try to throw a few cents at it lol. (In fact, I'm considering our below-discussed Arabic problems, get some ideas out there) — [Ric Laurent] — 22:44, 25 September 2011 (UTC)

Filipino and Tagalog

I found a page (paalam) with an entry for the Filipino language, and another for Tagalog.

These are the same language.

The government of the Philippines wanted to make a national language and they decided in 1937 that it would be "based on Tagalog", the language of the capital. In 2007, the chair of the government's Commission for the Filipino Language (Komisyon sa Wikang Filipino) reported on these efforts:[3]

Are “Tagalog,” “Pilipino” and “Filipino” different languages? No, they are mutually intelligible varieties, and therefore belong to one language. [...]
The other yardstick for distinguishing a language from a dialect is: different grammar, different language. “Filipino”, “Pilipino” and “Tagalog” share identical grammar. They have the same determiners (ang, ng and sa); the same personal pronouns (siya, ako, niya, kanila, etc); the same demonstrative pronouns (ito, iyan, doon, etc); the same linkers (na, at and ay); the same particles (na and pa); and the same verbal affixes -in, -an, i- and -um-. In short, same grammar, same language.

This explains why there are no Tagalog-Filipino dictionaries, no Tagalog-Filipino translators/interpreters, and no documents or cultural goods ever produced in separate versions for each.

I can also personally confirm this, as a speaker of the language.

To fix the above-mentioned article, I removed the "Filipino" section from the page (and pasted it on the Talk: page, for reference). Gronky 11:32, 21 September 2011 (UTC)

I am not opposed to it. I always wondered why we cover Filipino and Tagalog separately. -- Liliana 16:35, 21 September 2011 (UTC)
I have no input, other than it's an important issue and should be discussed rather than individual editors working using their own opinions. --Mglovesfun (talk) 17:18, 21 September 2011 (UTC)
Is there a right place to discuss it, with an eye to setting policy?
This isn't controversial. Gronky 23:25, 21 September 2011 (UTC)
I support this. We should probably just use Tagalog. It's the most common word now in use for the official language of the Philippines and we already use Tagalog much more often than Filipino/Pilipino. The difference is subtle and there's nothing that can't be resolved with occasional {{qualifier}} tags. --Anatoli 23:29, 21 September 2011 (UTC)
Here. And it'd be nice to mention to the frequent contributors in both languages (or the language, whatever) that the discussion exists.​—msh210 (talk) 00:09, 22 September 2011 (UTC)
I support this too. Different registers of the same language should use the same header; in this case Tagalog is the name of the language and so should be used. When differences exist these can indeed be properly tagged anyway. --JorisvS 19:36, 22 September 2011 (UTC)
Do we have any? As for me, I used {{tl}} (Tagalog that is) for some translations. --Anatoli 02:49, 22 September 2011 (UTC)
I'll set up a vote, but leave enough time for the discussion to continue. Mglovesfun (talk) 20:26, 25 September 2011 (UTC)
I understand that, for the moment, it's the same language, but that Filipino should become a mix of different languages used in the country, and that a commission is working toward this objective. It seems logical to use Tagalog only for the moment but, if somebody creates a Filipino entry nonetheless, there is no reason to delete it. It might become useful in the future. Lmaltier 10:06, 2 October 2011 (UTC)
The "mix of different languages" proposal was the plan that was announced in 1937, but no effort was put into it, so it never even got off the ground. In the intervening 74 years Tagalog has been used in every situation where "Filipino" was meant to be used, and it has been taught in every school in the Philippines for the past two generations.
There are no efforts currently under way to create a "new" Filipino. ::The speed at which the Spanish language mostly disappeared from the Philippines is an example of how quickly things can change. (Sidenote: Spanish was ubiquitous there a century ago and most Philippine authors wrote in Spanish, but now, the lack of Spanish knowledge among first-language Tagalog speakers is such that when 19th century Philippine literature is being translated to Tagalog, they usually have to do two-step translations Spanish->English->Tagalog.) But changing a language does take at least a generation, and no such effort has begun yet or is being proposed. Gronky 20:48, 3 October 2011 (UTC)
"Should become" makes me think of Wikipedia's Crystal Ball. We cannot do something just because we think something might happen/be in the future. If and when such a situation arises (and this is, as Gronky points out, not all too likely) we can deal with it then. --JorisvS 21:40, 3 October 2011 (UTC)

Deprecating zh, zh-cn and zh-tw in category names

That's it really. AFAICT this always refer to Mandarin, though it's possible that in some cases zh could be used erroneously for another Chinese language such as Cantonese (NB, {{zh}} displays Mandarin). Would anyone like to expressly support or oppose this proposal? The proposal is 'replace zh, zh-cn and zh-tw in topical category names' like Category:zh-cn:Computing to Category:cmn:Computing. Mglovesfun (talk) 20:03, 21 September 2011 (UTC)

I agree with you. Engirst 20:12, 21 September 2011 (UTC)
Err... but how are you going to separate traditional and simplified script? ---> Tooironic 21:26, 21 September 2011 (UTC)
How are we going to - well, if we want to Category:cmn:Computing in traditional script, or something similar. Mglovesfun (talk) 21:37, 21 September 2011 (UTC)
FORTRAN?​—msh210 (talk) 00:05, 22 September 2011 (UTC)
I posted a comment related to this issue at the end of Wiktionary:Beer_parlour#Classical.2FLiterary_Chinese_entries. -- A-cai 01:09, 12 October 2011 (UTC)

Language merges

Looking at this page and WT:RFDO, there are four merges being proposed:

  1. Category:Koongo language into Category:Kongo language (very small)
  2. Category:Colloquial Malay language into Category:Malay language (very small)
  3. Category:Filipino language into Category:Tagalog language
  4. Category:Hindi language and Category:Urdu language into a Category:Hindustani language

Of course, Bosnian, Croatian and Serbian were merged a few months ago

On top of that, I would personally like to see Category:Anglo-Norman language merged into Category:Old French language (which I might add, would render a few hundred of my own edits useless or worse). Interesting issue, isn't it? Mglovesfun (talk) 20:08, 21 September 2011 (UTC)

At the risk of turning this into a very, very broad topic, I've occasionally wondered if having all the 'Arabics' separately is appropriate. Mglovesfun (talk) 20:10, 21 September 2011 (UTC)
The Arabic languages are almost as distinct as the Slavic languages. They share a formal standard literary/media language but the languages of daily speech are so different as to be incomprehensible to people at the other end of the Arabic language area. —CodeCat 21:37, 21 September 2011 (UTC)
That's correct. However, it doesn't make sense to create Egyptian, Levantine, Moroccan, etc. entries for words which are identical. Most formal vocabulary and many other words are shared between dialects and MSA or have a very slight difference in the pronunciation. We don't use the pedantic case endings here, anyway (e.g. غرفة ghurfa vs ghurfatun) . The difference in pronunciation between j/g, q/' (Standard/Egyptian) could be ignored, since the spelling is the same, the conversion is rather consistent but Egyptians pronounced Arabic words differently. So MSA قلم qalam is Egyptian 'alam and MSA حج Hajj is Egyptian Hagg. The words, which ARE different in dialects should have separate entries, IMHO, e.g. tomorrow غدًا "ghádan" (MSA) vs بكره "bukra" (Egyptian). --Anatoli 23:24, 21 September 2011 (UTC)
I know no Arabic so can't opine, but, assuming what Atitarev says is true, pronunciations differences can be relegated to the Pronunciation section.​—msh210 (talk) 00:07, 22 September 2011 (UTC)
The contributors in Arabic dialects have almost died out. I looked at some Egyptian Arabic nouns, many (not all) are just Arabic. The quality Egyptian Arabic entries with different plural forms shouln't be merged, like يد. We should also check with Stephen G. Brown and Dick Laurent on this. --Anatoli 02:47, 22 September 2011 (UTC)
Look at water. The translations in Arabic dialects are all quite different from each other. -- Liliana 03:25, 22 September 2011 (UTC)
Using this same example, if one looks at the translations in Chinese, they are written in the same way (Dungan excluded of course), and yet we divide Chinese into God knows how many languages. 60.240.101.246 12:49, 22 September 2011 (UTC)
Many of these are difficult to discuss, because we don't have very many people who specialize in foreign languages. Therefore, it's hard to gain any sort of consensus. -- Liliana 03:38, 22 September 2011 (UTC)
(Edit conflict). That's the trouble with most common words, they are very few but make the speech very distinct and hard to understand with no previous exposure. The same will be for the question word what. Still the written dialects (if they are written, only a few are ever written down) tend to be much closer than the spoken forms, much closer than Slavic languages. The fact that dialects are for speaking not for writing make making entries for them less important.
Agree to your last message. --Anatoli 03:40, 22 September 2011 (UTC)

Yes, it may make sense to create distinct entries for words in different Arabic languages. I would allow both the macro-language (for those not willing to create distinct entries) and individual languages (for those seeing an interest in creating them). The fact that specific words are often mostly oral and not found in usual dictionaries makes all the more important to include them here.

In addition, systematically accepting sections for languages with an ISO code would be a simple rule, would make things much simpler and would avoid many discussions. Lmaltier 16:45, 22 September 2011 (UTC)

I dislike putting simplicity ahead of accuracy. ISO 639 isn't really designed for our purposes; they don't care if a language is actually not a language but a dialect of another language, they just attribute a code when a code can be useful. There's a code for no linguistic content but I don't think we want Category:No linguistic content language. Mglovesfun (talk) 20:59, 25 September 2011 (UTC)
No, they try to define codes for languages, not for dialects. They created a code for no linguistic contents butthis is an exception, they don't state that this is a language. It's not always obvious. They created Occitan as a macrolanguage, and codes for individual varieties, then they changed their mind. And the word language may be interpreted in different ways, so you may disagree with their decisions. But this is what they try to do. Lmaltier 09:49, 2 October 2011 (UTC)
Jesus. Arabic. So, on one hand, there are things that would be pretty smart about having separate L2s for the major dialects, but as has been pointed out there would be lots of overlap. However when you get to the details, you have variations in pronunciation, verb conjugation... these would have to be compensated for if we wanted to be complete. Without separate L2s the only logical way to represent variations in regional conjugation in an L2 Conjugation section is with several drop-down tables. We'd have a few {{a}} tags for pronunciation variance, stuff like that. Really it would be possible to treat all Arabic dialects under one header. In all likelihood, it wouldn't be pretty, and it would require a lot of tags - for example, for words specific to certain dialects, it would be very easy to just do like we do with English with regional tags before definitions. (I apologize for the scattered nature of these statements lol... there's a lot to consider.) — [Ric Laurent] — 22:57, 25 September 2011 (UTC)

Hey! If you're going to merge Hindi and Urdu (see above), then arguably Romanian and Moldovan are also candidates for merging! -- Liliana 13:54, 28 September 2011 (UTC)

Yes, let's add Category:Moldavian language→Romanian to the list of merges to consider. - -sche (discuss) 07:28, 30 September 2011 (UTC)
Definitely! --JorisvS 12:43, 30 September 2011 (UTC)
Naturally — [Ric Laurent] — 23:45, 5 October 2011 (UTC)
MRL!...Exista o singură (daco-romană) limbă în România și Republica Moldova: Romanian language. Este bine să se poarte o discuție aici, dar rezultatul trebuie să fie clar: o (una) singură limbă (oficial "dacoromană", Romanian ) pentru spațiul carpato-dunărean al României și R. Moldova. Un argument clar și decisiv pentru Romanian language: Românii din provincia Moldova (din România, Vest-Moldova), - români-moldoveni - vorbesc aceeași limbă (language) ca și românii din Est-Moldova (Republica Moldova): Romanian, și ei nu afirmă că "oficial" limba lor se cheamă "moldovenească" (moldavian)!

Aside from merging "Colloquial Malay" into "Malay", for consistency and accuracy "Indonesian" and "Malaysian" should also be merged into "Malay", because these are standardized varieties of the Malay language (like Croatian etc. are of Serbo-Croatian and Hindi and Urdu of Hindustani/Hindi-Urdu). On the other hand I'd like to point out that there are several "Malay languages" that should not be merged (do we have any entries?). --JorisvS 12:43, 30 September 2011 (UTC)

Category:Banjarese language comes to my mind. But yeah, Standard Malay and Standard Indonesian are virtually the same language, so there isn't really a need to have them separately. -- Liliana 11:54, 1 October 2011 (UTC)
I think the way we handle Spanish would be appropriate in a lot of these cases, one unified language header and then context tags for meanings which are distinct to a region. Spanish has regions which conjugate differently, pronounce differently, use significantly different vocabulary, but we manage to represent all of these things without too much confusion. Obviously I don't know enough about the particular languages brought up here, but we do have a model for how we can make this work. - TheDaveRoss 12:31, 1 October 2011 (UTC)

Non-idiomatic translations

As a side-thought to the Idiomatic translations section above, what is the preferred method of handling translations out of English that are non-idiomatic phrases in the target language? I'm thinking now of disarm, where the Japanese translation of the intransitive sense "to lay down arms" could be 武器を捨てる, which is redlinked here as it should be, and which is not included in any other dictionary due to the same SOP restriction we have here. That said, is it kosher in translation tables to only use the {{t}} template for parts of a translated phrase?

Instead of:

{{t|ja|武器を捨てる|tr=ぶきをすてる, buki o suteru|sc=Jpan}}

should we have the following?

{{t|ja|武器|tr=ぶき, buki|sc=Jpan}}{{t|ja|を|tr=o|sc=Jpan}}{{t|ja|捨てる|tr=すてる, suteru|sc=Jpan}}

This is so incredibly ugly and unwieldy that I'm pretty sure it's not the way to go, but that brings me right back to the question -- does anyone have advice on how to input non-idiomatic phrases as translations of English terms? -- Eiríkr Útlendi | Tala við mig 20:26, 22 September 2011 (UTC)

Yeah, I've wondered about that, too. Probably it's better to just use {{l}}/{{onym}} for those:
* Japanese: {{onym|ja||[[武器]][[]][[捨てる]]|tr=ぶきをすてる, buki o suteru}}
(It's not ideal — {{onym}}, unlike the {{t}} family, italicizes its transliterations — but I think we've more than used up the sensibly mnemonic t template-names. What are we gonna create, a {{t:}}?)
RuakhTALK 20:41, 22 September 2011 (UTC)
Cool, thanks for the feedback. I think I'll use more manual formatting, since {{onym}} is apparently deprecated and since it italicizes the kana. I thought about adding a lang or sc param, but it looks like these aren't implemented for {{onym}}, so there you go. So would the below be acceptable?
* Japanese: {{Jpan|武器捨てる}} ({{Jpan|ぶきをすてる}}, buki o suteru)
It looks like the font used in translation tables is smaller, so maybe I shouldn't use {{Jpan}} either? -- Eiríkr Útlendi | Tala við mig 21:26, 22 September 2011 (UTC)
I don't think {{onym}} is deprecated. I don't know what you mean by "it looks like [lang and sc params] aren't implemented for {{onym}}"; but manual formatting seems just fine to me. (As does using {{Jpan}}.) —RuakhTALK 22:13, 22 September 2011 (UTC)
Cheers, thanks. {{onym}} is marked RFDO, which I discovered when I looked at the template page itself to figure out about params. Some templates like {{term}} have a lang param for specifying a language, and the template handles formatting differently in some cases for specific languages, like using a slightly bigger font and no italics for Japanese. The sc param shows up in some other templates as a way to specify a certain script, again so the template can select an appropriate font size and style. {{onym}} doesn't change its Japanese output when I add either lang=ja or sc=Jpan. Japanese being the odd duck that it is, typographically speaking, I may have run across some of these wrinkles more than folks working with European languages. :) -- Ta, Eiríkr Útlendi | Tala við mig 22:33, 22 September 2011 (UTC)
{{onym}} doesn't have lang=ja because it just has ja, as in the example wiki-text above; that is, {{onym|ja|foo|tr=bar}} is like {{term|foo|lang=ja|tr=bar}}. It does support sc=Jpan, theoretically, but Jpan is already the default script for Japanese, so {{onym|ja|foo|tr=bar}} implies {{onym|ja|foo|sc=Jpan|tr=bar}} anyway. Neither version applies the script template to the transliteration, though. (And {{t}} doesn't, either.) —RuakhTALK 03:34, 23 September 2011 (UTC)
Why not create a {{t-SOP}}? Matthias Buchmeier 09:09, 23 September 2011 (UTC)

Serbo/Croatian

There is NO Serbo/Croatian language. These are two languages: Srbian and Croatian. So please treat them in that way.

Why? —CodeCat 09:33, 25 September 2011 (UTC)

re-e... or ree...

I'm currently adding some Italian words that start rie... - they mostly translate as English words starting ree... - I never know whether to use ree... or re-e... for the English. (See riesposizione as an example. We seem to use both forms (sometimes as alternative forms). Is there any sort of rule? SemperBlotto 10:00, 25 September 2011 (UTC)

In my experience, both forms usually exist. Some writers don't like the fact that it looks like a single ee vowel. Same thing as with cooperate. Equinox 10:10, 25 September 2011 (UTC)
The New Yorker would doubtless use reë.... Older works, too, for older words. So older words' reë... versions are (most of them) probably attested.​—msh210 (talk) 15:15, 25 September 2011 (UTC)

Target audience

In reaction to comments such as Gtroy (talkcontribs) on WT:RFD#Cyrillic script saying "keep both are really common, a new speaker or child would find it useful." This calls into question our target audience. WT:CFI#Idiomaticity alludes to the same thing in saying "An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components." How easy it is to derive the full meaning depends on how good you are at deriving meanings. What level should we aim at? Educated adults, children, non-native speakers? The problem I have with aiming for non-native speakers (and above, naturally) is how far do we want to go to accommodate very weak English speakers. Someone without much experience in English may find I laughed like a child difficult to understand, but I doubt we would include that. Reading WT:FEED, many of the users seem to be unable to spell even very basic English words right, so maybe dumbing down is a good option. Mglovesfun (talk) 20:51, 25 September 2011 (UTC)

I don't think it harms us any to aim at the lower threshold. First, who is likely to be looking up something like target audience in a dictionary unless they don't know what it means and need help figuring it out? Second, what possible harm could come from having the entry? So long as we guard against becoming shills for people who are trying to market a product or make Urban Dictionary style humorous coinages, I think we should be very liberal about allowing combinations like that. bd2412 T 23:27, 25 September 2011 (UTC)
I actually doubt that [[Cyrillic script]] would be useful to a child or a new speaker, because neither group would know that anyone might consider "Cyrillic script" to be a single unit to look up. I think it's more likely to be useful to an adolescent or adult native speaker who already recognizes that "Cyrillic script" is a common phrase, and wants more information about it, but doesn't quite understand how to use a dictionary, or doesn't quite understand the difference between a dictionary and an encyclopedia. (It could be useful to translators, who would likely prefer to find the usual translation for "Cyrillic script" rather than have to assemble the translations for the relevant senses of "Cyrillic" and "script", since — for example — their target language might use different words for alphabetic scripts as for other kinds of scripts.) —RuakhTALK 23:43, 25 September 2011 (UTC)
I think it's not bad, but it can be dangerous. Wiktionary is already very big, and that means a lot of pages will be created and only visited once or twice after that, if that at all. There is very little opportunity for review, which could mean a very obscure term might have a bad entry for a very long time. Wiktionary is not paper, but its users aren't omnipresent either... —CodeCat 23:46, 25 September 2011 (UTC)
The amount of attention we give to such phrases in RFV/RFD discussions suggests that they can be adequately spotted and brought up to our standards. bd2412 T 23:51, 25 September 2011 (UTC)
That assumes that someone first RFDs or RFVs them, which might take a long time for an entry hardly anyone ever needs. —CodeCat 23:55, 25 September 2011 (UTC)
That could be said of an enormous number of our entries. Consider all the conjugations of verbs, some into forms that are almost never used or called for. bd2412 T 01:32, 26 September 2011 (UTC)
That's true, but I'm wondering if we should make it even worse. Verb form entries are usually bot generated, so they are correct as long as the bot was written correctly. —CodeCat 10:32, 26 September 2011 (UTC)
The Simple English Wiktionary is small, but it is specifically aimed at people with lower levels of English proficiency.--Brett 11:39, 26 September 2011 (UTC)
"Wiktionary is very big" ... really? I would say Wiktionary is very small, compared to the size required to adequately cover its subject matter. The scope of Wiktionary is not "all words in all languages as long as the word will get more than one or two hits from direct search. It is important to remember that only a part (and a small part I would guess) of the usage of Wiktionary's information is in direct search. Indirect search (onelook, Google, Yahoo etc.) and the many places Wiktionary has been parsed and culled for particular relationships probably get a lot more attention, and we have no idea what those users really want or need. - TheDaveRoss 13:39, 27 September 2011 (UTC)

Completing the projects of User:Robert Ullmann

I can't think of a better way to honor Robert Ullmann's work on this project than to complete the many unfinished projects that he initiated. Many of us have begun projects for the introduction of substantial bodies of material into this dictionary, and any one of us could die with our work unfinished. I'd like to think that if that happens to one of us, the rest will pick up that work and carry it to its completion. Robert's work is epitomized in his many subpages - Robert Ullmann subpages, more Robert Ullmann subpages. Of these, I think the projects where we as a community can really pull through are User:Robert Ullmann/Missing, User:Robert Ullmann/Oldest redlinks, and the various pages showing use of foreign language words in news articles. Let's do this. Cheers! bd2412 T 23:47, 25 September 2011 (UTC)

I would revert his user page to this revision, from which some of the projects including "Oldest redlinks" and "Missing" are accessible. The text "Robert Ullmann passed away on March 19, 2011 in Massachusetts General Hospital in Boston at an age of 50 years. [1]", which is currently the only contents of his user page, could be placed into an infobox at the top of his original page. --Dan Polansky 06:25, 26 September 2011 (UTC)
Some user lead projects (including my own) are pretty massive, and may not be finished for years, maybe 10 years or more. Would be nice to continue to make progress however. Mglovesfun (talk) 06:49, 26 September 2011 (UTC)
I think this is a good idea. An interesting thing about your two primary examples is that neither one will ever be finished, unless we somehow finish the entire project. Are you proposing that we finish the lists that Robert generated or ought we update those lists from the most current dump and progress from there? Also I think reverting his userpage and putting a note at the top is appropriate. - TheDaveRoss 11:26, 26 September 2011 (UTC)
OK. I shall have a go at the Italian ones. Also, since I'm probably the most aged regular contributor, I shall try to document my future projects in case I also fall off my perch in the next few years. SemperBlotto 11:37, 26 September 2011 (UTC)
While there will always be "oldest" redlinks and missing pages, my thought was to prioritize the last lists that Robert generated. Of course, the dynamic nature of language insures that we'll never "finish" the project, but we will get it much closer to the leading edge of being a complete lexicon. I also agree with restoring a working version of his userpage, with the current note retained on it. Cheers! bd2412 T 12:51, 26 September 2011 (UTC)
We could also add Category:Tbot entries to the list of Robert Ullmann projects. --Mglovesfun (talk) 16:47, 26 September 2011 (UTC)
It is funny you should say that, because I got an e-mail from WF yesterday urging me to propose a project to continue Blotto's work after his death. This seemed like an obvious troll to get me to upset everyone by mentioning the death of somebody present, and anyway I hate community huggy-kissy stuff, so I let it go. And then within 12 hours I read SB talking about falling off his perch! My main concern is whether falling off one's perch is an idiom and whether we should have an entry. And secondarily who is going to learn Italian and fix all the vandalism. (And seriously: yes, RU had some good stuff and it would be a pity for nobody to continue with it.) Equinox 20:43, 26 September 2011 (UTC)
I restored the userpage content as some have suggested, and I also thought it was a good idea. Also, the link to the obituary is broken. —Internoob 18:18, 26 September 2011 (UTC)
I have added a working obit link. bd2412 T 17:44, 27 September 2011 (UTC)
Thanks. One more, more informative, obituary: http://www.obitsforlife.com/obituary/344565/Ullmann-Robert.php. See also W:Wikipedia:Deceased_Wikipedians#Robert_Ullmann_.28Robert_Ullmann.29. An image such as File:Nuvola grave.png or File:Nuvola grave with cross2.png could be placed to the box on his user page, as {{notice|image=Nuvola grave with cross2.png|...}}. (I cannot edit his user page.) --Dan Polansky 07:18, 30 September 2011 (UTC)
Thanks, I've switched out the obit link and made it a running text link instead of a note. I think it looks nicer and is easier to find. I've also chipped away a tiny bit at his missing entries pages. Cheers! bd2412 T 03:29, 14 October 2011 (UTC)

Adjective+noun entries.

There are many adjectives that only have a particular meaning when modifying a noun with a particular sense (or a hyponym thereof). For example, there is a sense of prime that applies only to natural numbers, so you get phrase like "prime integer", "prime and composite numbers", "numbers that are prime", and so on, but in particular you get the phrase "prime number", which passed RFD. Other similar cases — adjective-noun pairs that are SOP, but only because the adjective has a noun-specific sense — include "vintage car", "active volcano", "acute/obtuse/right/reflex/round angle", "exploitative competition", "oblique leaf", "Cyrillic alphabet/script", and others. (Actually, some of the "angle" ones are debatable; no one produced any evidence that "round", for example, is used to mean "360°" outside the one phrase "round angle".) All of these entries were created, but all came to RFD, and few were kept. (A few still appear at RFD.) All of the discussions were fairly ad hoc; although various arguments were presented for keeping or deleting specific entries, no really general principles were proposed, and the result of a given discussion generally seems to have depended largely on who participated in the discussion.

I'd like to raise this question more generally. I would posit that no single editor agrees with the result of every single one of the above discussions. Which ones do you agree or disagree with, and why? Should we have kept all of them? Deleted all of them? What criteria should we have applied? Should we strive for some sort of consistency on this, or are ad hoc discussions the way to go?

RuakhTALK 00:39, 26 September 2011 (UTC)

  • I think we should generally keep these, as they may be useful to someone looking up the term who does not know, for example, which meaning of "acute" and which meaning of "angle" are likely to be implied by reference to an "acute angle". This is a voluntary project; if some Wiktionarians want to make such entries, and the meanings provided can be backed up with sources (most of the above are clearly in widespread use), then others who don't care for them should not spend time making them, and should instead focus on adding the many words still missing from the lexicon. bd2412 T 01:37, 26 September 2011 (UTC)
  • If "number that is prime", "a number is prime", "prime, large number", "prime large number", "very prime number", "more prime a number", or "prime integer" exists, that's enough IMO to say the adjective is separable enough from the noun to delete the "prime number" entry. People shouldn't see the phrase's entry and think the adjective's tied to the noun. However, if the above phrases, and others like them, don't exist, and the only variant on "prime number" that exists is "prime or composite number" or "large, prime number" (with a comma), then I don't know (but am tending at the moment to want to keep in the former case, as the "or" doesn't really break up the phrasiness, and delete in the latter, as the "large" shows that "prime" is just an adjective). If, on the third hand, the only attested variant on "prime number" is "prime effin' number" or "large prime number" (no comma), then I'd say keep "prime number".​—msh210 (talk) 03:50, 26 September 2011 (UTC)
  • I am in favour of these (or the very great majority) being allowed (as some of our users will expect them to be here). But, as bd2412 says, we shouldn't go out of our way to create them. —This unsigned comment was added by SemperBlotto (talkcontribs) at 03:41, 26 September 2011 (UTC).
  • I agree with msh210 on this. DCDuring TALK 11:00, 26 September 2011 (UTC)
msh210 has more or less nailed it. I am thoroughly in favour of specialist adjective definitions like "(of a [noun class]) [its meaning in that context]". A pet hate of mine is palindromic prime. Equinox 20:33, 26 September 2011 (UTC)
IMHO "palindromic prime" could be deleted as sum of parts, and is unlike "prime number", in that the meaning of "palindromic" used in the phrase is not specific to primes. In fact, the meaning of "palindromic" relates to strings over an alphabet. Thus, a number (prime or not) can be palindromic only with respect to a particular system of encoding, such as decadic, binary, or using Roman numerals. --Dan Polansky 08:22, 28 September 2011 (UTC)
If the adjective has a noun-specific sense, this is a very strong clue that the adjective + noun phrase is a set phrase belonging to the vocabulary of English. But this is only a clue, this should not be the criterion. It seems obvious to me that prime number belongs to the vocabulary of the English language, that this is a mathematical term (while blue bicycle doesn't belong to the vocabulary of English), and this is the reason why prime number must be included. I also agree with Equinox, but this is not a reason to exclude prime number. Lmaltier 20:45, 26 September 2011 (UTC)
How do you feel about entries for prime integer, prime member (of a set), etc.? These are equally natural for mathematicians. I don't know whether they are "set phrases" in English but it's hard to see how they are different from "prime number" since integer and member both refer to numbers. Equinox 20:50, 26 September 2011 (UTC)
Are they really equally natural for mathematicians? This was not my feeling, because it's not the case in French (nombre premier must be considered as a word, not entier premier). I feel that prime member is built by the brain when needed (prime + member) while prime number is already available as a whole in the brain, and this is the reason why this is a word. Lmaltier 05:25, 27 September 2011 (UTC)
I suspect that you're right, but I think it would be nice to have somewhat more objective criteria, no? —RuakhTALK 11:51, 27 September 2011 (UTC)
In many cases, it's obvious from one's intimate knowledge of the language (reasoning doesn't help). For specialized terms, it may be less obvious if you are not a specialist. In such cases, I think that we must trust specialists and that, whenever the phrase is defined in a specialized lexicon, it must also be accepted here: if specialists find that a definition is needed, a definition is needed here too. Note that my Pocket Oxford Dictionary (printed in 1972) defines prime number... Lmaltier 16:44, 27 September 2011 (UTC)
  • My tentative principle is this: If a phrase "<adjective> <noun>" is such that (a) the meaning of <adjective> used in that phrase is specific to things referred to by <noun>, and (b) "<adjective> <noun>" is much more often used written together as a phrase rather than separately as in "<noun> is <adjective>", then (c) we should have an entry for "<adjective> <noun>", regardless of (d) there being a suitable definition in the <adjective> entry that makes "<adjective> <noun>" a sum of parts. Examples include algebraic number, algebraic integer, bound variable, cardinal number, complex number, free variable, imaginary number, rational number, real number, transcendental number, free software, open set, closed set, complete graph, normal distribution; see also talk:free variable. I am not sure I require (b) to hold; (a) is the crucial part of the condition of the principle. As to the rationale, I tend to store such terms under "<adjective> <noun>" in my mind, and I estimate this is also the headword under which people tend to look these things up. In German, I store "vorstellen" under "vorstellen" in spite of its also ocurring in the separate position as in "stell dich noch mal vor". Thus, I deem this approach convenient for the users of the dictionary. --Dan Polansky 08:45, 28 September 2011 (UTC)

I think this is a very necessary discussion, but the problem is that I'm not sure there are really objective criteria that can be brought to bear. A lot of this is to do with subjective feelings from native speakers about the extent to which a given phrase ‘feels’ like a set unit. The tests that Msh210 mentions are definitely suggestive but not, I think, definitive. That is why the RFD discussions are a good way of settling it and why they will probably always be needed. Personally I find some terms are semantically transparent but still feel like individual lexical units (like the late lamented downloadable content), whereas other terms apparently meet our CFI but to me do not appear idiomatic or natural at all (like Egyptian pyramid). Another point I want to make is about the usefulness of these entries, which is often called into question. The point of them is not to answer the question ‘what does XY mean?’ but rather ‘do native speakers of this language actually use the term XY?’. A good dictionary should be able to say: yes, and here are citations proving it, and preferably some indication of when it was first used. Ƿidsiþ 06:44, 30 September 2011 (UTC)

A small idea for formatting discussions

I noticed some people use bullets with * instead of indenting the text with : . I think this is a lot clearer because you can easily see when the next message begins, even if they both have the same indenting level. Do you think we could make this general practice on Wiktionary, maybe? —CodeCat 11:59, 26 September 2011 (UTC)

This would get confusing if somebody actually posted a bulleted list. Equinox 12:28, 26 September 2011 (UTC)
True, but sometimes people post blockquotes also:
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
I'm not sure bullets are worse in that regard.​—msh210 (talk) 16:41, 26 September 2011 (UTC)
Okay, to rephrase: I don't think the current discussion formatting is perfect, but I think we need to distinguish, codewise, "semantic" bullets (intended to serve as bullets) from "pragmatic" bullets (just happen to be useful). Cf. the difference between BOLD and STRONG elements in HTML. Equinox 19:26, 26 September 2011 (UTC)
Maybe a special template could be used instead of a bullet or indending, when used in discussions? Something like {{*}}? —CodeCat 20:28, 26 September 2011 (UTC)
Actually, I have a better idea. Look at my monobook css and js files and you'll find a bunch of code which colors and indents talk pages to make them much more readable. It should be a cinch to make this work with pages such as the BP. -- Liliana 20:34, 26 September 2011 (UTC)

JA translations suddenly all borked

I just noticed that Japanese translations are very badly borked, suggesting that someone this morning (Pacific Time in the US) has either inadvertently broken or vandalized something somewhere. For reference, have a look at move#Translations, fill#Translations, and anything else that has Japanese included in the translation table.

Does anyone know where to look for sudden changes? Neither {{trans-top}} nor {{t}} have been changed today, and I'm not sure where else to search for such mistakes or foul play. -- Eiríkr Útlendi | Tala við mig 17:57, 27 September 2011 (UTC)

I just noticed that the translation table headers are normal when the page is first loading, and only go funny when the translations JavaScript gets applied. Alternately, loading the page with JS disabled shows things as they should appear. I have no idea who maintains this script -- who should we talk to about this? -- Eiríkr Útlendi | Tala við mig 18:01, 27 September 2011 (UTC)
I can't reproduce that. What sort of b0rkage do you see? Does a "hard refresh" (Ctrl+F5) help? —RuakhTALK 18:21, 27 September 2011 (UTC)
Sorry, I should have been more specific. The header lines of translation tables now include the Japanese entry as well. So for move, the header for the first translation table should be:
to change place or posture; to go
Instead, the Japanese gets munged into the header, producing:
to change place or posture; to go - Japanese: 動く (ja) (うごく, ugoku)
Note that the Japanese still appears within the table where expected. Hard-refreshing doesn't help, and this issue is present on pages I've never looked up before (and thus wouldn't be in my cache), like at procrastinate#Translations.
In addition, attempting to use the assistance JS dialogs to add a translation directly would fail out earlier this morning when attempting to add Japanese, giving an error message, but this seems to be working now. Is someone editing these scripts? That's certainly what it looks like... -- Eiríkr Útlendi | Tala við mig 18:43, 27 September 2011 (UTC)
I think that's not a bug, but a feature: do you see the little "Select targeted languages" link inside each translations-table? —RuakhTALK 18:56, 27 September 2011 (UTC)
Now I feel stupid.  :) In my defense, I only clicked that because the translation edit assist was acting strangely on the full page, where hitting the Preview translations button would throw up errors about "Japanese translation [term] not found" (of course it's not found, I'm trying to add it...). I figured the two behaviors were linked, but it seems the only link was myself.  :-/
FWIW, the translation edit assist seems to be working now, so that's all good. -- Eiríkr Útlendi | Tala við mig 21:23, 27 September 2011 (UTC)

{nonstandard, rare} form of

Could somebody please make these templates:

Template:nonstandard form of

Rare form of Lua error in Module:links at line 97: At least one of the following should be provided: the term, alternative display, transliteration. --Pilcrow 01:16, 28 September 2011 (UTC)

Moved from WT:ID JamesjiaoTC 01:20, 28 September 2011 (UTC)
A good tip is to find a template that does a similar job, copy its contents and adapt it. So {{nonstandard spelling of}} might be a good choice here. Mglovesfun (talk) 19:57, 29 September 2011 (UTC)

Thames河

Why is it deleted? Cite: Google Books 2.25.211.161 13:08, 28 September 2011 (UTC)

It seems attestable to me, albeit rare. But since I don't know Mandarin, I'd like to have a second opinion. -- Liliana 13:17, 28 September 2011 (UTC)
For your reference: Nouns 2.5 - Personal and place names not in the Chinese Han language 2.25.211.161 16:05, 28 September 2011 (UTC)
See Talk:Ampere定律 and Special:WhatLinksHere/Talk:Ampere定律. Mixed-script entries are deleted if they aren't cited. On the other hand, they are allowed, if cited. - -sche (discuss) 18:33, 28 September 2011 (UTC)
If you want to learn Chinese, learn the proper script. Otherwise, don't bother. Don't go around and tell others that their script is shit and advocate that they should use blah blah blah instead, like these people. 60.240.101.246 23:20, 28 September 2011 (UTC)
I deleted it. Use 泰晤士河 or 泰晤士. Proper names are transliterated into Mandarin using Chinese characters, most foreign names in Roman letetrs can be attested in a Mandarin because not everyone knows how to write it in Hanzi or can't be bothered. --Anatoli 00:07, 29 September 2011 (UTC)
To anon 2.25.211.161. Don't go creating English words with the Mandarin heading, this is wrong. --Anatoli 00:12, 29 September 2011 (UTC)
Discussion continued at Wiktionary:Requests_for_deletion#Thames.E6.B2.B3. - -sche (discuss) 02:51, 29 September 2011 (UTC)
We have a more serious matter at hand here than just this entry. We should not allow Chinglish to be spread, no matter if it's attestable or not. Chinese often use foreign words in a Chinese context (not to be confused with mixed script words) but I repeat, these words don't become Chinese if they are used in a Chinese context. --Anatoli 04:23, 29 September 2011 (UTC)
Could someone help me to set up a vote on this (Banning foreign proper nouns as Mandarin). --Anatoli 05:21, 29 September 2011 (UTC)
I didn't realize Engirst had already created a thread here. Anyway, I have commented on the talk page on Wiktionary_talk:About_Sinitic_languages. —This unsigned comment was added by Jamesjiao (talkcontribs).

A general comment: we should not create Chinese sections only for pure Chinese words, but for all words used in Chinese (and not only mentioned, this is very important). This rule applies to all languages. Take a word such as autoroute: can this word really be considered as an English word? Yet, it deserves an English section. Creating a section for a language does not mean that the word fully and naturally belongs to the language, only that it is used in the language. Lmaltier 05:34, 29 September 2011 (UTC)

What you're offering for Chinese is quite dangerous. In fact, any foreign proper name can be used in Mandarin in the Roman script. Your example is a borrowing, which can happen in any language. Thames河 is not a borrowing, it's not common and is just an example of a person not willing to write this word in Chinese. As much as many users want to see Chinese switch to Roman, this is not happening and we shouldn't promote it. --Anatoli 05:43, 29 September 2011 (UTC)
Actually, it's also a borrowing (most such proper nouns are borrowings). What you contest is only the way it's written. We should not promote anything, only describe the language as we find it is written, with appropriate comments and explanations (uncommon, etc.) Lmaltier 06:03, 29 September 2011 (UTC)
If a word is normally written in one script and transcription, and someone uses that word from its original script, I don't think that counts as borrowing into the main language of the text. If I decide to use 这个单词 instead of "this word", that doesn't turn 这个单词 into English. This is how most (all?) alphabetically spelled words are used in Chinese and Japanese -- specifically as foreign words. Sure, they're being used in a Chinese or Japanese context, but that doesn't make these words Chinese or Japanese. -- Eiríkr Útlendi | Tala við mig 17:39, 29 September 2011 (UTC)
The difference is that English speakers are generally not expected to be able to read Chinese characters. I imagine that Chinese speakers on the other hand have at least some understanding of the Latin script used to write English. The same kind of mixing of scripts is done elsewhere too, like in Russian, Greek or Arabic. Just as a language such as English is often expected to be known to non-native speakers, in the same way the Latin script name 'Thames' may be expected to be known in China, while the reverse is not true. —CodeCat 18:18, 29 September 2011 (UTC)
Yet is there anything intrinsically Chinese about using Thames in a Chinese text? If an alphabetically written word used in Chinese contexts takes on a specifically Chinese meaning, then I would be open to the idea of categorizing it as Chinese. If it never has anything but its original meaning from the source language, such as when it is only ever used as a disambig, then no, I would say that it is still decidedly not Chinese, in part as the main reason it's being used is precisely *because* it's not Chinese.
And as a side note, the times that I've seen alphabetic text used in Japanese (the non-English language that I read the most), it is again used precisely because it is not Japanese. In cases such as placenames, the non-Japanese rendering is given generally in parentheses, and is provided not necessarily because the expected audience should know it, but more to clarify the original spelling should a reader want to look into things more, such as here or here. -- Eiríkr Útlendi | Tala við mig 19:46, 29 September 2011 (UTC)
that doesn't make these words Chinese or Japanese: nobody thinks that this makes them Chinese words. Nonetheless, if they are used in Chinese, a Chinese section is useful. highway is not a French word, but a French section would be helpful nonetheless (for sense, gender, pronunciation, etc.), because it's used in French (as a foreign word, but used nonetheless). In the case of a foreign word such as psychanalyste mentioned in a sentence such as The French word for psychanalyst is psychanalyste., it's very different, the word is not used in the sentence. Lmaltier 19:25, 29 September 2011 (UTC)
If there is no argument about whether these words are Chinese or Japanese, then what is the argument? Foreign words belong under their respective headings. vis-à-vis is listed as English because it's been accepted into the English language, and is used enough in purely English contexts that its meaning is diverging from the French meaning over time. Likewise with terms like al fresco or honcho -- they came into English as foreign terms, but have since taken on specifically English senses. I would strongly argue that Thames has no such specifically Chinese sense -- and, thus, does not belong under a Chinese header. -- Eiríkr Útlendi | Tala við mig 19:46, 29 September 2011 (UTC)
Acceptance of a word in a language is something subjective. Use of a word in a language is something objective. The meaning is irrelevant. Lmaltier 05:29, 30 September 2011 (UTC)
If we don't stop this madness, Mandarin Wiktionary space will be full of - (river name)河, (city name)市, (disease name)病, (mountain name)山, (island name)岛, etc! They are attestable all right but they are not Mandarin. Then any serious person will doubt the quality of this dictionary. --Anatoli 23:07, 29 September 2011 (UTC)
 ?? People read pages of interest to them, not other ones. It's better to keep simple principles and to apply them consistently. See KISS principles. Lmaltier 05:29, 30 September 2011 (UTC)
Mandarin has become the biggest language on the internet. A person with a pinyinisation agenda will dig out a couple of quotes out of thousands see 泰晤士河 in Google Books [4] just to prove his point and move his agenda, a passage where a place name is written in Roman letters. It doesn't prove anything. Not to me. Only that people who can't read Mandarin will be able to read that word. --Anatoli 05:51, 30 September 2011 (UTC)
I want to ask you, Lmatier. You seem to be very thorough about the quality of English entries, which is only commendable, but why do you disregard the opinion of editors who are active in Chinese and who may know a lot about the language and who voiced their strong opposition to these kind of entries, created by a person known for his ignoring of Wiktionary rules? Don't you think that by encouraging this you you may jeopardise your own reputation and your opinion against violators of rules set by you will not be supported in the future? For obvious reasons, I think it's only fair to have language specific policies and allowing entries Thames河 will open the door for low quality entries. --Anatoli 06:07, 30 September 2011 (UTC)
I only want to support simple, sound, easy-to-understand and consistent rules. I don't want to exclude words only because contributors think that the use of these words or of these writings should be discouraged, because it's a question of opinion (just like political opinions should not lead to the exclusion of some pages on Wikipedia). Lmaltier 17:42, 30 September 2011 (UTC)
No one is arguing that Thames or are not words, and no one is trying to exclude these words -- both are already here, as clearly indicated by the blue links. The argument instead revolves around the use of two terms in two languages in a single attempted lemma entry. So far, only one IP user seems to be a strong proponent of the view that Thames河 constitutes an integral Chinese term. Most others have been arguing along varied lines that generally converge on the points that 1) this is a generic sum-of-parts phrase, and thus has no place in Wiktionary, and that 2) Thames is a word in English and is a word in Chinese, and using the two together does not constitute a new Chinese term, but is instead a prime example of w:code-switching.
Any discussion of human endeavors revolves around opinion to some degree or other. The opinion at the core of this particular issue is, are SOP terms that involve code-switching valid terms for inclusion in Wiktionary? The emerging consensus is that no, such terms do not belong here.
The main holdouts from this consensus are the aforementioned IP user, and apparently yourself. The behavior of the IP user has been quite trollish and stubbornly POV from my perspective, but I confess I have less of a handle on why you (Lmaltier) seem to be contrarian about Thames河 and similar terms. Are you of the opinion that mixed-language code-switching SOP phrases do indeed merit inclusion as lemmata here? Are you just unfamiliar with the phenomenon of code-switching? Do you have some other strong opinion pertinent to this issue that might help elucidate your position? I'm honestly curious and I do not understand your opposition to deleting terms like Thames河. -- Cheers, Eiríkr Útlendi | Tala við mig 18:48, 30 September 2011 (UTC)
I think that code-switching would not explain the number of attestations. My only reason is that no term in actual use should be deleted. Lmaltier 20:08, 30 September 2011 (UTC)
  1. There are only 4,580 google hits for google:"Thames河". Roughly 1,000 of these also include the 泰晤士 official Mandarin spelling of Thames (and, incidentally, also the spelling of times, as in newspaper names), reducing our pool to only 3,500 at google:"Thames河"+-"泰晤士", and this is before weeding through to exclude sources that don't meet WT:CFI. Compared to the 892,000 google hits at google:"泰晤士河", it certainly looks to me like the number of attestations is actually quite small. I find only 8 hits at Google Books here, of which the first two seem to be the same book, one uses Thames in parentheses after first using the alternate Mandarin spelling 太晤士, one is clearly Japanese, and one offers no context or snippet at all, leaving us with only four or five books using this particular combination in a way that might meet WT:CFI. It seems to me that Thames河 is quite rare, actually.
  2. You haven't actually addressed my question about mixed-language code-switching SOP phrases. Do you view Thames河 as somehow not SOP? If so, why, and by what reasoning? -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)

There is no only one standard for Chinese language. Chinese is not only for Mainland China, but for Taiwan, Hong Kong, Macau, Singapore and overseas. Such as President Bush is written as 布什, 布殊 and Bush as well. 2.25.212.4 12:59, 30 September 2011 (UTC)

You already made this point at Wiktionary:Requests_for_deletion#Thames.E6.B2.B3. As I already stated there, you are being disingenuous. As others already pointed out there, Bush is not "standard" Chinese. Likewise, "Thames河" is not "standard" Chinese -- and as such, as well as for other reasons, "Thames河" does not belong here as an entry in Wiktionary. -- Eiríkr Útlendi | Tala við mig 18:53, 30 September 2011 (UTC)
What is the difference between a standard term and a non-standard term? Inclusion in existing dictionaries? And why do you think that only "standard" terms should be included? Lmaltier 20:08, 30 September 2011 (UTC)
I interpreted Engirst's argument as being that Bush is standard Chinese, under a particular standard for Chinese. That standard seems to be that any word in any script that is used in a Chinese sentence merits inclusion as a Chinese word -- a stance that I categorically refute. Using Bush in a Chinese context does not make it a Chinese word any more than using Москва or natsukashii in an English context makes these English words. -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)
Also note that mixed script terms do exist, even in English, e.g. α-particle. Lmaltier 20:36, 30 September 2011 (UTC)
See also α粒子, alpha粒子. Engirst 20:52, 30 September 2011 (UTC)
And these are clear examples of non-SOP terms -- there is no way of deriving the meaning of α-particle etc. purely from its constituent parts. Meanwhile, Thames河 is practically the definition of an SOP term -- the meaning is baldly plain to anyone who knows that Thames and both mean. The script alone is not part of my argument.
To clarify, there are two issues here that I am arguing, both of which are against inclusion of Thames河:
  1. Thames河 is an SOP phrase, and as an SOP phrase, specifically as an SOP phrase with no special idiomatic meanings, it has no place here in Wiktionary. This applies equally to other non-idiomatic SOP phrases like blue necktie, fresh apple, or 赤い花, where the meaning is plain from the meanings of the constituent parts.
  2. Thames as it is used in Thames河 has no specifically Chinese meaning -- it is being used as an English term, and therefore it is not a Chinese term, and thus should not be treated as a Chinese term. -- Eiríkr Útlendi | Tala við mig 22:39, 30 September 2011 (UTC)
Lmaltier, this IP user (2.25.212.4) is abc123, Engirst and his multiple IP addresses he is automatically generating, avoiding all the blocks, including range blocks. He doesn't deserve to be here and was blocked multiple times by multiple administrators. The only reason he is still here is because we can't do much about him and we don't want to stop anonymous users from contributing here because of one. You are doing a disservice to by supporting his crazy ideas of pinyinisation of Chinese. He has dug a few examples of code-switching or Chinglish, which are not typical and represent nothing. Chinese people write 泰晤士河, not Thames河, which can be easily proven by checking the internet. Non-standard terms could be included if they are typical, Thames河 is not typical at all. Mandarin also has mixed script terms and they are already included here. Place names are always written in native scripts in any language. You can find all weird things on the internet if you try hard or have an agenda, that's what Engirst is doing. His last block is expired today, so he has reappeared as Engirst. --Anatoli 21:00, 30 September 2011 (UTC)
I don't know anything about Chinese, it's difficult to argue, I just try to understand. Let me take other examples.
  • I know the author of a very good, prize-winning, thesis in mineralogy. Yet, she consistently writes Abkhazia instead of the French term Abkhazie because she was unaware of this French word. Does this make Abkhazia a word worth inclusion in a French section? Certainly not, because this use was a mistake due to ignorance.
  • Now assume that she was referring instead to a Chinese province, using the Chinese characters. It could be called code-switching. The case is closer, but this assumption is absurd, because nobody would do that (because almost no reader of the thesis would understand the Chinese characters in a French text).
  • alpha particle is used, and α-particle too, because English readers are expected to understand the α character. Is this the same kind of case?
My feeling is that the case under discussion is somewhere between the 2nd and the 3rd example. Am I right?
I also feel that this writing is used by some people because 1. Chinese people might read the name of English rivers more often in English texts than in Chinese texts (??). 2. Most Chinese are expected to recognize English letters 3. This writing of foreign proper nouns is felt by some people as less ambiguous than a transcription, because closer to the original word. This is probably truer with tiny unknown rivers or unknown people when you want to refer to them in your language. When you use the same alphabet, it's not shocking to use the original word (it's even systematic), it's more shocking when you don't use the same writing system. If there is no word in the language but there is a clear transliteration system, then this system is used (e.g. in Russian), but it's not the case in Chinese. How would you translate the ru de Marivel (a very tiny French river not visible any more) to Chinese without using the Roman script?
(true or not, I don't know:) This way of writing foreign proper nouns is uncommon, but might become less and less uncommon, and this is considered as very bad by people liking their writing system (and I understand them very well).
If my feelings are not wrong, then these writings should not be promoted, but there is no reason to delete these pages, provided that required (sound, helpful and correct) information is provided (e.g. explaining why the Roman script is used, explaining that this is not standard, explaining how you pronounce it in Chinese). People not liking them may simply ignore them. Providing information to people looking for these terms is better that a message no page found.
((I'm not interested at all in whom writes something here, only in what is written.)) Lmaltier 05:59, 1 October 2011 (UTC)
"(true or not, I don't know:) This way of writing foreign proper nouns is uncommon, but might become less and less uncommon, and this is considered as very bad by people liking their writing system (and I understand them very well)." - It is definitely true because of globalization nowadays. Engirst 13:19, 1 October 2011 (UTC)
This is not code-switching or code-mixing, it's just (un)intentional reluctance to transcribe into Chinese characters. There is no way, for example, for "Thames河" or any other proper nouns written in the Latin alphabet to appear on news from Xinhua (official press agency in PRC). This is how Xinhua handles this in news: [5] It writes the name of the Assistant Secretary of Bureau of Near Eastern Affairs of the US as "杰弗里·费尔特曼" (Jiéfúlĭ Fèiĕrtèmàn), and that of the Syrian ambassador to the US as "伊马德·穆斯塔法" (Yīmădé Mùsītăfă), without providing the original scripts (English and Arabic) or transliteration of the latter. Although I have no way of knowing the original names just from looking at these transcriptions (the first one is probably Jeffrey Felt(e)(r)man(n)), these names are in Chinese, unlike "Jeffrey Feltman" which may appear (without a transcription given) in non-official Chinese news.
As for code-mixing or code-switching, it happens all the time in Singaporean Mandarin and Hong Kong Cantonese (and Chinese spoken overseas in general; also Singlish, to a lesser extent). Code-mixing doesn't make "office" Chinese or "tahi" ("poo" in Malay) English. 60.240.101.246 10:17, 1 October 2011 (UTC)
If Singdarin were written, I'd happily classify it as a pidgin or creole and record it as such. That whole section on Hong Kong Cantonese you point to is full of linguistic information that should be recorded somewhere, and is hardly English, like "yeah" meaning trendy and mouse being pronounced mau1-si2.--Prosfilaes 23:37, 1 October 2011 (UTC)
I agree with you. Engirst 00:10, 2 October 2011 (UTC)
The vote to ban this kind of entries is set up here. Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. --Anatoli 01:03, 3 October 2011 (UTC)
Our "friend" who avoided all blocks so far is now busy editing Chinese characters entries, wow, adding examples he dug out where proper names are written in English. He'll teach us good Mandarin spelling - so "London University" in Mandarin is "London 大學". Pinyin is not enough, now Chinese will be written half in English, half in Chinese! --Anatoli 11:10, 3 October 2011 (UTC)
I have protected 英國 (Yīngguó) for his "英國 London 大學地質學博士" but there are other bad edits. Can we stop this somehow? --Anatoli 11:15, 3 October 2011 (UTC)
In the UK, there are a lot of spoken and written usages of mixed scripts, can you ban them? Engirst 11:22, 3 October 2011 (UTC)
It's called code-switching, very common with communities living outside their homeland. People have already spent too much time repeating the same thing to you but you keep trolling. We already have London and Thames in English, they don't need a Mandarin umbrella. Everybody knows here that you here only because nobody is able to block you completely. It's YOU who needs to be banned indefinitely, troll. --Anatoli 11:31, 3 October 2011 (UTC)
Above is talking about "英國 London 大學" (UCL) but not Thames河. Engirst 12:08, 3 October 2011 (UTC)

Categories and single entries with multiple indices

This issue has already been touched on above at Wiktionary:Beer_parlour#Question about cats, but I'm realizing this is a sizable issue for Japanese.

The underlying problem is that kanji (Chinese characters used in a Japanese context) only carry the meaning of a word, and can often be read using multiple pronunciations. Kanji entries in Japanese are generally indexed by their readings, so a single kanji compound may appear at several locations in a Japanese dictionary. The example at #Question about cats regarded the given name 恵美, which can be read either Emi or Megumi. Using WT's categories as-is only lists the entry under the final category call on the page, so 恵美 was only categorized under めぐみ (Megumi), when it should have been categorized under both the めぐみ and えみ (Emi) indices.

The solution -sche brought up was to create a redirect under a visually identical header, and categorize the header under the additional index. This does work, happily.

However, there is no dearth of kanji terms in Japanese that have multiple readings. 砂岩 can be read either sagan or shagan; 月食 can be read either gesshōku or gasshōku; 一度 can be read either ichido or hitotabi; 正直 can be read shōjiki, jōjiki, or seichoku; etc., etc. All of these should ideally be categorized under all readings. Manually going through and creating redirects for all entries that currently have uncategorized additional readings is not a tenable proposition.

Is anyone aware of any way of getting the categorization mechanism to allow multiple categorizations? I.e., is there any way of getting something like:

[[Category:Japanese nouns|しょうじき]]
[[Category:Japanese nouns|じょうじき]]
[[Category:Japanese nouns|せいちょく]]

to allow a single entry to show up under all the provided indices? Listing multiple cats as above only categorizes under the reading supplied in the last cat listed. Simply adding additional sorting indices as additional arguments, like [[Category:Japanese nouns|しょうじき|じょうじき|せいちょく]], only indexes under the first argument given. Is there a WikiMedia / MediaWiki dev we should contact about this? -- Curious, Eiríkr Útlendi | Tala við mig 21:51, 28 September 2011 (UTC)

Translation FROM non-English language

How can I add the Norwegian version of Swedish koka soppa på en spik on that page? (koke suppe på en spiker) __meco 18:14, 29 September 2011 (UTC)

In my opinion, just like any other translation. Sometimes, translations in foreign word pages are very helpful. This is why this should be allowed. Lmaltier 19:30, 29 September 2011 (UTC)
If you're asking for an explanation of how to create it, I'd say copy the entire contents of koka soppa på en spik, paste them into koke suppe på en spiker, change every mention of sv to no and Swedish to Norwegian. If there's a policy question in there then I'm afraid I've missed it. Mglovesfun (talk) 19:31, 29 September 2011 (UTC)

I probably misunderstood. I interpreted Norwegian version as a translation of the phrase. Lmaltier 19:35, 29 September 2011 (UTC)

Good idea. Also, what if we use Category:English non-idiomatic translation targets? It may be another workaround for non-idiomatic translations, especially for English terms, which may not pass CFI. There are many foreign terms that are translated as single words where English uses two or more (e.g. fur coat) or idioms as above where there is no English equivalent. --Anatoli 00:22, 30 September 2011 (UTC)
OK, the See also solution is probably the best for this situation where we have two phrases in non-English languages but no English equivalent. Later, when we see more instances of this happening, a more integrated solution will probably have to be devised. __meco 08:26, 30 September 2011 (UTC)
  • Or instead, just create sv:koka soppa på en spik and add the Norwegian translation there. The target audience is only going to be Swedish or Norwegian speakers. --Rockpilot 10:03, 30 September 2011 (UTC)
I don't see making this connection irrelevant for the English Wiktionary, do you really? __meco 20:21, 30 September 2011 (UTC)
I do. Having said that, I'm sure we have an idiom in English that means the same as the Swedish koka soppa på en spik. --Rockpilot 12:45, 1 October 2011 (UTC)

TheDaveBot wants to tidy up a bit.

I would like to use my bot account to run a new version of AutoFormat. It is entirely new, so new that it isn't finished yet. There are a couple of "modules" which are ready for testing. Since the task is already pretty well designed and approved of (this new script doesn't do anything AutoFormat wasn't doing) and the account is already a bot I am just posting this to give people an opportunity to make suggestions or complaints or requests for more information or whatever. I have done a few on-wiki tests, none in NS:0 yet, but that will proceed within the next few days I would imagine. Once I am reasonably comfortable with the test results I will go through the normal bot approval routine. The bot wont be going unsupervised before that time.

Now is also a good time for any other format related tasks to be brought up so they can be added, but unless they are non-controversial and conform to the ELE I probably won't add them. - TheDaveRoss 21:21, 29 September 2011 (UTC)

Thank you for offering to autoformat!! As for other things to do, well, [[user talk:AutoFormat]] is full of suggestions.​—msh210 (talk) 17:41, 2 October 2011 (UTC)
Support, I've been meaning to suggest that we need at least two auto-format bots running at a time as the workload is too much even for a bot! Mglovesfun (talk) 17:58, 2 October 2011 (UTC)
Can the old account be transferred to Dave? It would be nice because everyone still calls it AutoFormat... —CodeCat 14:13, 3 October 2011 (UTC)
I think it would be better just to continue calling the task AutoFormat but let the account names be whatever they are. Ideally this is something which would eventually migrate to the toolserver and not be run by an individual, but I don't have the skills required to make that a reality. - TheDaveRoss 21:44, 3 October 2011 (UTC)
Any valid autoformatting should be supported by all. I've missed it lately and have a bad feeling about the likely backlog. What is the entry-processing capacity of a fully functional AF-type bot, in entries per week? Does having it on the tool-server improve capacity or effectiveness? DCDuring TALK 23:09, 3 October 2011 (UTC)
Putting it on the toolserver means that it will have a much higher uptime, and be maintainable by multiple people. As far as throughput goes, it is limited by the server more than the program. I think one entry per second would be easily achievable by a single instance of the bot running, and an arbitrary number of instances of the bot can be run simultaneously. The downside is that if there are 20 edits per second happening constantly the server load would be dramatically increased. - TheDaveRoss 01:36, 4 October 2011 (UTC)

Template:etyl and Template:proto

{{etyl|ang}} Old English links to Wikipedia; {{proto|Germanic|qwerty}} Template:proto links to Wiktionary. Why the discrepancy? This, that and the other (talk) 12:20, 30 September 2011 (UTC)

Dunno really. Isn't not a problem per se. Mglovesfun (talk) 11:40, 1 October 2011 (UTC)

halfpace

"A platform of a staircase where the stair turns back in exactly the reverse direction of the lower flight." This is an archaic word; what's the modern term for this? I've always, rather awkwardly, had to describe them as "U-turn staircases". Equinox 20:58, 30 September 2011 (UTC)

I think they are called half landings, also this is the Beer Parlour not the Tea Room you silly person. - TheDaveRoss 21:04, 30 September 2011 (UTC)
Thanks. Also, I just found dogleg, which is a word I have wanted to know for years. (I always dream about that kind of staircase.) Equinox 21:11, 30 September 2011 (UTC)
Upon further research, it seems like a "half landing" can be any landing which is not at the top or bottom of the stairs. To be precise many people are calling them "180 degree half landings" but that is not a succinct as it should be. - TheDaveRoss 00:13, 1 October 2011 (UTC)
  • I always just called it a landing. Ƿidsiþ 06:03, 1 October 2011 (UTC)
  • In my house there is a square landing at the top of the stairs, then a 90 degree turn and a single step to the landing proper. We call it "the little landing at the top of the stairs" - probably not the technical terminology. SemperBlotto 07:00, 1 October 2011 (UTC)

October 2011

Wiktionary:Votes/2011-09/Unified Tagalog

The discussion about Tagalo/Filipino quickly died out above. So I'm posting this link to avoid this vote being forgotten and made obsolete. Mglovesfun (talk) 11:38, 1 October 2011 (UTC)

Romanizations of words in languages including Gothic

In light of the comments on Wiktionary:Votes/pl-2011-08/Romanization of languages in ancient scripts, I have created two new votes:

Please give feedback before the votes start. Please vote after the votes start. - -sche (discuss) 01:36, 3 October 2011 (UTC)

Where I do I give feedback? Well I'll say it here for now:
I think this is a horrible idea. I don't see why we should use another alphabet to write Gothic when there is a perfectly good one which was created specifically for Gothic. One of the reasons the Unicode Consortium adds characters from "unused" alphabets is so people like us can write words in ancient languages in their own script, instead using a transliteration. Maybe you could add a heading for trasliterated forms in ELE, and make the transliterated form redirect to the actual entry.
In the rationale it's written "Modern readers will most likely want to look up words in their Romanized form; these readers will not necessarily know or be able to input the words' original-script forms." To be honest, I think this applies more to modern languages such as Arabic or Russian. What kind of person would look for a word in Gothic? Some one interested in the Gothic language; such person is quite likely to know the Gothic alphabet. But what kind of person would look for a word in Arabic or Russian? Could be anyone; could be some guy who heard the word on TV, saw it on a newspapers, or whatever. None of these are expected to know the Cyrillic or Arabic alphabets.
I'm not saying that you should add entries transliterated from Cyrillic or Arabic; I'm just trying to show that adding transliterated Gothic entries is a worse idea. Ungoliant MMDCCLXIV 15:50, 6 October 2011 (UTC)
To be clear, we're not proposing to move the content to the romanisations: entries in the Gothic alphabet (like 𐌵𐌹𐌽𐍉) will still exist and define that Gothic word, but romanisations (like qino) will exist as soft-redirects, similar to pinyin and romaji entries. This seems to be almost or exactly what you're proposing regarding redirects. The two major reasons which our Gothic contributors have given for allowing romanisations are: that users might know the Gothic alphabet but be unable to type it, and that Gothic texts and secondary sources (dictionaries, etc) are often published in romanised form (and we should have entries for the forms as they are in fact published, which means: both in the Gothic alphabet and in romanised form). - -sche (discuss) 21:51, 6 October 2011 (UTC)
Ok. I misunderstood that. Ungoliant MMDCCLXIV 23:09, 6 October 2011 (UTC)
Actually, that's not why the Unicode Consortium added Gothic. For one, I think most Unicode members with an an opinion would encourage the continued use of transliteration; see Don't Proliferate; Transliterate!. If you look at the historical record, approaching Unicode 3.1, Unicode had a problem. The concept of being 16-bit didn't last long; it was obvious that Unicode would need to expand, and there was a theoretical expansion area added in Unicode 2.0. But most programs only supported 16-bit Unicode, so characters that were added outside the base 16-bits wouldn't be accessible to many users; so nobody wanted their characters to be added outside that base 16-bits. But nobody had incentive to fix their programs until there were characters in that section. So they found some scripts that were completely useless, that were wanted by non-scholars because scripts are cool, and encoded them outside the 16-bit limits. Stuff like Old Italic, the Deseret alphabet and Gothic.--Prosfilaes 20:30, 7 October 2011 (UTC)
"What is the scope of Unicode?
A: Unicode covers all the characters for all the writing systems of the world, modern and ancient."
http://www.unicode.org/faq/basic_q.html
These scripts aren't completely useless. Epigraphers, medievalists, classicists, and bible scholars find them important. Consider the Medieval Unicode Font Initiative (www.mufi.info), why would they recomend such characters if they didn't think they're useful?
Ungoliant MMDCCLXIV 23:15, 7 October 2011 (UTC)

Merging Moldavian and Romanian

A couple of editors suggested above that Wiktionary discuss merging Moldavian and Romanian. Let's discuss! I favour merging the two. The issue seems quite like that of Serbo-Croatian: that is, the distinction is politically motivated. It is also similar in that Moldavian can be written in Cyrillic, whereas Romanian is not written in Cyrillic anymore: but we could handle that just like we handle Cyrillic/Latin Serbo-Croatian. A possible vote (not started!) is here. - -sche (discuss) 06:31, 3 October 2011 (UTC)

Moldavian or Moldovan is essentially dead. I don't think anyone is trying to really revive it and we don't have active editors using it. Moldavian Wikipedia is locked. Romanian is written entirely in Roman script. Maybe Moldavian is worth keeping for historical reasons? There is still material written in Moldavian out there. It doesn't create any maintenance problems, like Serbo-Croatian, as far as I can tell. I don't have a strong opinion on this, though. --Anatoli 06:53, 3 October 2011 (UTC)
Cyrillic could be treated as an alternative but obsolete script, like Arabic for Turkish. —CodeCat 10:26, 3 October 2011 (UTC)
Obsolete? It is still used in the de-facto regime of Transnistria, I'd hardly call that "obsolete". -- Liliana 18:17, 3 October 2011 (UTC)
Right, I would keep (and add) Latin spelling entries and Cyrillic spelling entries, and just explain the use of Cyrillic (that it is no longer standard to write Romanian in Cyrillic in Romania, but that the language continues to be written and published in Cyrillic in the region of Transnistria) on WT:About Romanian. - -sche (discuss) 21:55, 3 October 2011 (UTC)
Sounds OK to me. --Anatoli 10:47, 3 October 2011 (UTC)
If Romanian templates are modified like Hindi/Urdu or Serbo-Croatian (Cyrillic/Roman) then we could always add the optional Cyrillic spelling flagged as "Moldavian spelling". Russian/Ukrainians from Transnistria have to use Russian and Romanian to communicate with Moldova. It's just my opinion, prove me wrong, if you disagree. --Anatoli 06:40, 4 October 2011 (UTC)
I've asked the Robbie SWE for input. :) - -sche (discuss) 23:44, 6 October 2011 (UTC)
I'm kind of torn; the Cyrillic alphabet is a thing of the past for Romanian, most certainly not something most Romanians would want to promote today. The fact that Bogdan Stăncescu - the founder of the Romanian Wikipedia project - cancelled the Moldavian ISO 639 code (mo an mol) back in November of 2008 indicates that there is no substantial difference between Romanian and Moldavian. This initiative was welcomed by Marius Sala, vice-president of the Romanian Academy.
Personally, I don't think that the solution should be in the form of "Cyrillic/Latin Serbo-Croatian". I can't however provide a solution, but will follow this discussion and see how it evolves. --Robbie SWE 10:35, 7 October 2011 (UTC)
Wiktionary including Cyrillic Romanian does not mean that we 'promote' it, as we only document. If indeed still used in Transnistria (as Liliana has pointed out), then we can't just tag them "obsolete", because we would then be misrepresenting things. However, we could tag them with both "archaic" and "Transnistrian", and problem solved. --JorisvS 12:18, 7 October 2011 (UTC)
I'm not saying that including Cyrillic Romanian promotes a regression. It just makes things problematic; I mean where do we draw the line? Will we start including runes for old Swedish? From what I've heard (might be wrong; the socio-political distance between Sweden and Moldova is quite far), most inhabitants of Transnistria speak Russian and therefore use the Cyrillic alphabet. --Robbie SWE 12:33, 8 October 2011 (UTC)
I also checked and it's being taught at schools in Cyrillic. I still don't see a problem in merging (despite being a native Russian). "Moldavian" as a name of the language is still used colloquially but "Romanian" is used increasingly in both Moldova and Pridnestrovye (Transnistria). There is no serious efforts to separate them again (unlike say Serbo-Croatian). Perhaps "Moldavian spelling" is better than "Cyrillic", e.g. "România f (genitive/dative României), Moldavian spelling: Ромыния" --Anatoli 12:49, 8 October 2011 (UTC)
Of course we'd allow runic Old Swedish entries. Perhaps that's a bad analogy as it's a dead language. We have a couple of runic Old English entries. Mglovesfun (talk) 12:53, 8 October 2011 (UTC)
Ok, I'm sorry for the bad analogy. I think that the use of "alternative/variant" might work, maybe worth giving it a try. I'm not sure though that we'll be doing the same thing in the Romanian Wiktionary project; the task seems too big for two active users. --Robbie SWE 18:19, 8 October 2011 (UTC)

When reading w:Moldovan language, it's clear that this is a controversial issue, which would be a good reason to allow Moldovan as a separate language (even if the category is almost empty). This would be a good reason because we must be neutral about controversial issues, and because the definition of language may involve political issues as well as linguistic issues. Also remember that dead languages are allowed here, even when nobody is able to contribute to them and their categories are empty. However, as the use of the ISO code for Moldovan is now discouraged, I don't know. Lmaltier 19:03, 8 October 2011 (UTC)

How is Serbo-Croatian any more neutral than this? -- Liliana 19:24, 8 October 2011 (UTC)
@Lmaltier: Separating Moldav(i)an and Romanian is as neutral or non-neutral an approach as unifying the two. If those who consider there to be two languages would find it controversial if we unified them into one language, those who consider there to be one language will find it controversial if we separate it into two languages.
@Anyone who knows: do words have the same Cyrillic spellings in Transnistria today that they had in Romania in the past, when Cyrillic was used there? - -sche (discuss) 01:48, 9 October 2011 (UTC)
That's where the problem arrises: the Wikipedia article states "Its structure is based on the Russian Cyrillic alphabet (excluding three Russian letters and adding another), and does not have a direct resemblance to the historical Romanian Cyrillic alphabet used from the Middle Ages until the second half of the 19th century in the Principalities of Vallachia and Moldavia[1] and until 1932 in the Soviet Union." We're basically talking about two different interpretations of the Cyrillic alphabet. --Robbie SWE 13:09, 9 October 2011 (UTC)
So we should have three entries for the same word: one using a Roman script, and two different ones using Cyrillic scripts and one of which tagged as "obsolete" and the other tagged as "Transnistrian", right? Of course all in accordance with WT:CFI. I wouldn't have any problem with that. --JorisvS 14:18, 12 October 2011 (UTC)
Right, ro.Wikt may wish to wait and not add Cyrillic entries at this time because they do not have enough users to manage such an addition, but we already have entries (in Category:Moldavian language) and presumably enough users to manage them. Robbie and JorisvS, please take a look at Wiktionary:Votes/2011-10/Unified Romanian and see if anything needs to be changed. - -sche (discuss) 00:37, 13 October 2011 (UTC)
Maybe mention that there can be two different Cyrillic spellings that will be allowed, one archaic, one Transnistrian? Or is that superfluous? --JorisvS 09:00, 13 October 2011 (UTC)

Loss of usage-context categories

At one time, before the "reform" of our category system, we had categories that indicated the usage context of many specialist terms. We now have topical categories instead. I propose that we need to reinstate the usage context categories.

Topical categories for a specialist field are intended to include senses of all terms that relate to the topic in question. Context tags (of the occupational type) are intended to indicate that a given term is likely to be understood only by those with a specialized knowledge in the area.

I think that all terms in a specialist context connected with a given topic should be member of the topic category, but that not all terms in a given topical area should bear the context tag and be in the usage context. DCDuring TALK 11:45, 3 October 2011 (UTC)

What reform of the category system are you referring to, and at what timepoint is it supposed to have occurred? From what I recall from 2006, the category for, say, physics was always a topical one. We have many usage context categories; what we do not have are usage context categories for the likes of physics, chemistry, mathematics, etc. A usage context category for, say, mathematics cannot be reinstated, as it never was there in the first place; rather, it can be newly introduced, such as "Category:English terms only used in the "context of mathematics", or "Category:English terms restricted to mathematics" or the like. --Dan Polansky 11:57, 3 October 2011 (UTC)
I had always interpreted Category:Physics as a usage context, defined by a usage context label which was not applied to terms that that were not so limited. I had no interest in topical categories and have little interest now at Wiktionary, as I find such categorization information at Wikipedia when I need it.
What I perceive as a reform was probably the unintended result of the actions of those who did/do not perceive there to be a worthwhile distinction between topical categories and usage contexts. The use of context tags to create populate the topical categories without also creating appropriate usage-context categories is my evidence of the lack of sensitivity to the distinction. DCDuring TALK 12:37, 3 October 2011 (UTC)
I think DCDuring has either slightly midunderstood or is being sarcastic (the latter, I think). Labels like {{physics}} are allowed, just they should serve as true contexts and not just a shortcut for convenience. So boson legitimately uses physics, but it would be silly to use it for entries like solid, light, liquid, gas and so on where they are clearly not only or chiefly used in the field of physics. Ditto England shouldn't have a {{geography}} tag. Mglovesfun (talk) 12:41, 3 October 2011 (UTC)
However, solid could be in a physics category (a category of physics-related terms), without having any sense-line tag. - -sche (discuss) 13:05, 3 October 2011 (UTC)
Just as we have regional and register context tags that populate usage categories, IMO, we should also have usage categories that reflect occupational usage contexts. Maintaining a consistent distinction between topical and usage categories, while, of course, recognizing the distinction, would be quite worthwhile. For example, terms that are in a given topical category can have some senses (Type I) that are not in any topical category, some senses (Type II) that are in the topical category but properly understood outside any specialist context, and some topical senses (Type III) that are properly understood only by cognescenti and belong in a usage context category. (The last sometimes verge on being prescriptive, but, to be included here, must show evidence of use by multiple authors.) As categories themselves are limited in usefulness for a dictionary because they do not specify a specific Etymology or PoS, let alone sense, it will always be tempting for well-intentioned contributors to apply usage-context labels to senses of Type II.
To be clear: Entries with senses of Types II and/or III should be in topical categories if folks want to maintain such things. Entries of Type III should certainly be in a usage-context category if we aspire to be useful as a dictionary.
I doubt that it makes sense at this time to have some sense labeling to indicate which sense it is that qualifies a term to be in a given topical category, though such labeling would discourage misuse of context tags that have associated topical categories. DCDuring TALK 13:42, 3 October 2011 (UTC)
Like how {{slang}} denotes a sense that is likely to be understood only by those with specialized knowledge of slang? —RuakhTALK 13:20, 3 October 2011 (UTC)
Sorry, I should have also said what -sche said, that in cases where a term is used in a context but not specialized (that is, the term retains its general-use meaning) a written category could/should be added at the bottom. So rather than tagging foul with every sport that has the concept of fouls, add the categories at the bottom by hand. I'm not sure why some users are so reluctant to add categories at the bottom, it's not particularly difficult to do. Mglovesfun (talk) 13:49, 3 October 2011 (UTC)
I thought it obvious that we have different types of usage contexts (which actually reflect reality). We have register (informal, formal) and regional. We have some contexts which indicate offensiveness and we have some that indicate media-related restrictions (colloquial, IM/internet). There may be some types missing and there are other useful ways of classifying usage contexts. Occupational contexts are another type of usage context. They are a superior approach IMHO to marking some terms as "jargon" and hoping that a user could figure out from topical categorizations the specific places in which a given term could be expected to be understood when used. DCDuring TALK 13:57, 3 October 2011 (UTC)
It is indeed obvious that we have different types of usage contexts, but I don't think any of them indicate — nor should indicate — who is likely to understand a given term. Rather, they should indicate the context in which a term is used. Hence the term "usage contexts". ;-)   If "solid" is only used in physics contexts, or has a specific sense when used in a physics context, then it doesn't really matter that it's a term everyone knows. —RuakhTALK 14:17, 3 October 2011 (UTC)
I've been thinking along these lines myself. I think it may be worthwhile. I'm really not sure, though: after all, the benefit of a jargon dictionary is that it provides all the terminology for a field, and (e.g.) solid is terminology in physics, even if it's also used by others. But if this is something we want to do, then a good way to proceed might be as follows: Keep [[Category:en:Physics]] as a topical category and restore [[Category:en:Jargon:Physics]] (or perhaps [[Category:English jargon:Physics]] or even [[Category:Jargon:en:Physics]]) as a term-of-art category.​—msh210 (talk) 16:06, 3 October 2011 (UTC)
I like this idea. - -sche (discuss) 20:28, 3 October 2011 (UTC)
I strongly object to the use of "jargon" in any category name or usage-context label. Whatever it may mean in a linguistics context (!), a few of the common senses are definitely pejorative. We have enough difficulty trying to prevent contributors (not just newbies, either) from being prescriptive without providing such encouragement. Even our inadequate entry has one of the pejorative senses, though it is not so labeled. AFAICR that is why {{jargon}} was deleted. DCDuring TALK 23:02, 3 October 2011 (UTC)
Here's how to use written categories instead of contexts: diff. Mglovesfun (talk) 12:18, 9 October 2011 (UTC)
The only trouble is that {{sports}} puts the entry in a topic category, not a context category.
Also, there is no reason for the context to be "sports" in general rather than the specific sports in which this is understood.
The usage contexts are one set of categories, which are linguistic. I think they are relatively well defined. The topical categorization is not well-defined, except as it is derived from the usage categorization. For example, bending brake could be in topical categories "Tools", "Metalworking", even "Roofing". I'm not sure about the range of usage contexts, but I doubt that it is in the vocabulary of the most in the general population. "Metalworking" and "Roofing" would seem to be possible usage contexts, but not "Tools". DCDuring TALK 02:11, 11 October 2011 (UTC)

英國, Nei Mongol

Why are they locked? Engirst 13:00, 3 October 2011 (UTC)

Dubious; it is normal to lock a page if there's an edit war, but it becomes ethically dubious when one of the contributors in the edit was protects a page with their version instated. It's always better for someone outside of the conflict to lock such a page. FWIW the edit to Template:Hani looks valid to me; the fact the citation contains some Latin script characters doesn't make it valid. I'll accept that Engirst is doing it to prove a point that Latin terms are used in Mandarin as 'borrowings' but that doesn't invalidate the citation. FWIW the Middle French version of 'Le Tiers Livre' I read contains some Greek citations in Greek characters, but I'd like to think that doesn't make it invalid as a source for Middle French. Mglovesfun (talk) 13:54, 3 October 2011 (UTC)
Engirst was edit-warring on Thames河 to prove his point. He never writes anything in Mandarin except when he needs to troll his ideas. I removed his edit, which is 1) not synchronised with the simplified version, 2) doesn't provide pinyin and translation into English - it's a long-time convention Engirst has been violating. And most importantly 3) pushes his English words in Mandarin before the decision is made about the usage of English words in Mandarin. All his edits are generally considered bad by Mandarin contribitors. This is not a personal attack. He has been banned for his behaviour (i.e. the attempt was made to ban him multiple times). I will remove the block on 英國 when the decision is made about the usage of foreign proper noouns in Mandarin. It's a concern that a person who worsens the quality of our Mandarin entries continues editing. --Anatoli 21:30, 3 October 2011 (UTC)
英國/英国. I've synchronised the entries, added formatting, pinyin and translation, removed example with a very uncommon English name in the Mandarin sentence. Will have to lock the entries if edit warring starts. --Anatoli 22:22, 3 October 2011 (UTC)

Company names

I feel there should be a vote on confirming the Company names section of WT:CFI. As it is, too many people disagree with it, and it clearly doesn't constitute consensus anymore. -- Liliana 18:19, 3 October 2011 (UTC)

There should better be a vote on removing the section "Company names" from CFI. See also Wiktionary:Beer_parlour_archive/2011/April#Poll:_Including_company_names. Rather than not being supported by consensus any more, the section never was supported by consensus in the first place, I figure. --Dan Polansky 18:36, 3 October 2011 (UTC)
The straw poll you (Dan) link to is interesting. Five of its twelve respondents opined that "No company should have a dedicated sense line in any entry", which is at least as restrictive as our CFI and possibly (depending on how you read our current CFI) much more restrictive. The other seven opined that "Some companies should have dedicated sense lines in some entries", which is possibly (depending on how you read our current CFI and depending also on what those respondents would include in their "some") the same as our current CFI, though possibly more or less restrictive than our current CFI. So while a vote (fairly composed) might lead to some change, I suspect it will not: I suspect that the current CFI are a good compromise in this regard, where there is no consensus.​—msh210 (talk) 18:45, 3 October 2011 (UTC)
The current CFI on company names, above all, is unsupported by consensus. CFI should not contain an unsupported compromise between two positions; if no consensus has been reached, CFI should state only so much. And if there is no consensus on specific rules for inclusion of company names, then the regulation of company names can be left to the section for names of specific entities, which is achieved by removing the section WT:CFI#Company names, and by removing the second bullet item from the section WT:CFI#Names of specific entities. While the statement "Some companies should have dedicated sense lines in some entries" does not contradict current WT:CFI#Company names, I find it likely that those who support the statement would like to see WT:CFI#Company names removed as unclear and too restrictive. The critical part of WT:CFI#Company names to be removed is this: "To be included, the use of the company name other than its use as a trademark (i.e., a use as a common word or family name) has to be attested." --Dan Polansky 19:05, 3 October 2011 (UTC)

Requesting short-term block for Special:Contributions/90.205.76.53

As can be seen at http://en.wiktionary.org/w/index.php?title=%E9%AC%BC&action=history, among other places, this user is becoming a persistent nuisance. Re-reverting registered editor fixes multiple times should be grounds for a short-term block, no? -- Annoyed, Eiríkr Útlendi | Tala við mig 22:05, 3 October 2011 (UTC)

Let's try a bit harder to reach out to this person before blocking, they do seem to be editing in good faith. - TheDaveRoss 22:23, 3 October 2011 (UTC)
Is anyone but a checkuser likely to succeed at communicating to an anon? DCDuring TALK 23:17, 3 October 2011 (UTC)
I am not sure what being a checkuser might do to increase success, everyone can see what the IP address of an anonymous editor is. - TheDaveRoss 23:20, 3 October 2011 (UTC)
D'oh. DCDuring TALK 00:57, 4 October 2011 (UTC)
For future reference, the place for this is [[WT:VIP]].​—msh210 (talk) 23:24, 3 October 2011 (UTC)
Thanks msh210, I'd posted there in August about a different IP address (that seems to be the same user) and got no response, so I thought I'd try posting here instead. -- Eiríkr Útlendi | Tala við mig 23:31, 3 October 2011 (UTC)
 :-)  Good point.​—msh210 (talk) 00:20, 4 October 2011 (UTC)
What do we do with people not acting in good faith but capable of avoiding all administrator blocks like Engirst and his many-many aka's? Maintaining and fixing his entries is time-consuming and unproductive. His whole activity is about proving his points, which is otherwise called trolling. A rhetorical question, perhaps, his activity and entries have been discussed many times. --Anatoli 01:24, 4 October 2011 (UTC)
There are fancy blocks for people who change IPs frequently. Other than overt vandalism I can only think of two times we have bothered making that effort and both times there was strong community support for banning a particular person outright. - TheDaveRoss 01:28, 4 October 2011 (UTC)
Well, to me Engirst (+ many aka's and anons) is such a case where a sophisticated block might be in order or long overdue. It was discussed too but I think the attempt to do it failed. He is just wasting a lot of time "promoting Mandarin written in Roman letters". First pinyin - full of errors, incosistent and out of synch with both traditional and simplified entries, now English proper names used in Mandarin in Roman letters. I agree with people saying he's clearly got some agenda (pinyinisation, converting Mandarin to Latin alphabet?). --Anatoli 01:44, 4 October 2011 (UTC)
As for Japanese at least he doesn't have the adequate knowledge to make useful contributions in good faith in the first place even if he wanted to, which he does not seem to. Haplogy 02:10, 4 October 2011 (UTC)
@Haplogy: Just to be clear, do you mean User:Engirst, or User talk:90.205.76.53? -- Eiríkr Útlendi | Tala við mig 05:24, 4 October 2011 (UTC)
I mean the IP user. Sorry I should have been more specific. --Haplogy
Based on the conversations people have had with him on wiki I think it is clear that this is a young person who has recently become interested in Japanese. While I agree that language learners are not the most useful editors to have on the project there is certainly merit to having them. If there is any way to channel this person's energy into more useful edits we should try that rather than putting more effort into blocking someone who is probably just trying to figure things out. - TheDaveRoss 10:48, 4 October 2011 (UTC)
I generally agree with Dave here. What got up my nose about this particular IP user was their insistence on reverting my fixes, multiple times, in the same entries. Figuring things out is one thing; being a persistent nuisance is another. -- Cheers, Eiríkr Útlendi | Tala við mig 16:16, 4 October 2011 (UTC)
There is always that question, whether something is willful disregard or simply confusion or ignorance. Depending on how much experience someone has with a wiki community they may not know that reverting multiple times is taboo, or even really understand why their edits are getting undone. - TheDaveRoss 19:24, 4 October 2011 (UTC)
True enough. However, when the IP user's own edit summary is "Undo revision ...", it starts to look a lot like they're aware of the editing and history features, and are choosing to ignore other edit summaries. This is just my own perspective, which calls into doubt the user's good faith - I'd be happy to be proven otherwise.  :-/ -- Eiríkr Útlendi | Tala við mig 20:54, 4 October 2011 (UTC)

Gheg Albanian

We have a category for Gheg Albanian, and we have ten entries with Gheg Albanian as an L2 header. We also have numerous entries which handle Gheg Albanian like this/this (with context tags). Should we convert the ten Gheg entries to use an ==Albanian== header and a (Gheg) context tag, or should we move the Gheg information out of the (standard) Albanian sections like this?

The former is preferable for me. BTW, the second example seems to be missing some important categories - parts of speech. There should not be many under Gheg Albanian header. --Anatoli 03:31, 4 October 2011 (UTC)
How different are the two variants, anyway? -- Liliana 05:12, 4 October 2011 (UTC)
We don't have skilled people here but Tosk Albanian is the standard and most common, most entries/translations are in Tosk. Albanian Wikipedia and Wiktionary are not separated into Tosk and Gheg, perhaps we shouldn't separate either, like we don't separate Belarusian. --Anatoli 06:36, 4 October 2011 (UTC)
The two are sufficiently different to be mutually unintelligible, and so can be considered distinct languages. The old-Tosk derived Arbëreshë and Arvanitika are also unintelligible with Standard Albanian (Tosk), even their "dialects" are perceived as unintelligible by their speakers. --JorisvS 11:44, 4 October 2011 (UTC)
So, should we split the Gheg and Tosk entries? - -sche (discuss) 07:43, 7 October 2011 (UTC)
I'd say, therefore, yes. --JorisvS 10:37, 7 October 2011 (UTC)
Alright, that sounds reasonable to me, as they do have separate ISO and Wikt codes, and it was User:Dick Laurent (who speaks at least some Albanian, sq-1) who created some of the ==Gheg Albanian== entries. I'll start splitting. Less than 100 words will be affected (when split, less than 200). - -sche (discuss) 18:45, 7 October 2011 (UTC)
I think we should nest Albanian in translations tables (like this). - -sche (discuss) 18:52, 7 October 2011 (UTC)
Sure, why not? Though Tosk should possibly be the default, ala water. -- Liliana 19:26, 8 October 2011 (UTC) addendum: oh I see someone changed that too, hmm

We also have a good number of Albanian entries with a Gheg "pronunciation". I suspect, however, that the orthography is actually just Tosk and that it should be different when properly written in Gheg (as opposed to Tosk written by Gheg speakers). While we could add entries by using the key at Wikipedia's Gheg Albanian article, I don't know how reliable the IPA used in these entries is, nor whether the result would be how the words are actually written. --JorisvS 14:06, 12 October 2011 (UTC)

User persistently inserting examples of proper names written in Roman, using his aliases or his own user account

A user persistently inserting examples of proper names written in Roman into Mandarin entries or creating "Mandarin" using English proper nouns - Leeds, Hyde Park, London, Thames, etc. with or without Chinese siffixes, using his aliases he has no problem generating (this time it was Special:Contributions/2.27.73.75 or his own user account - Special:Contributions/Engirst. I had to fix - convert to proper Mandarin spelling and protect quite a few entries from him. --Anatoli 04:51, 4 October 2011 (UTC)

He has no problem generating new IP addresses: Special:Contributions/2.27.72.78. --Anatoli 04:58, 4 October 2011 (UTC) (Note: range blocks were tried before). --Anatoli 05:01, 4 October 2011 (UTC)

User:Liliana-60 has unprotected Nei Mongol with a summary "this is the wrong way to deal with single user issues".

Copying my question to Liliana-60, which may be of interest to others:

It may be wrong to protect pages because of one user but can you suggest anything else? I've been trying not just revert everything he does but fix and use some of the positive information he adds. It's hard though. He is very productive and inventive as far as avoiding blocks goes and his whole activity is about to show that Mandarin can be written in Roman letters, proving his point and edit-warring. --Anatoli 05:35, 4 October 2011 (UTC)

Use abusefilter to record all changes to Mandarin entries, block all edits which create Mandarin sections in entries with names containing two consecutive Latin letters, block all edits which create Mandarin entries with name containing one word which doesn't contain any Pinyin diacritics (āáǎàōóǒòēéěèīíǐìūúǔùüǖǘǚǜêê̄ếê̌ềĀÁǍÀŌÓǑÒĒÉĚÈ), and block edits which add #: or #* examples to Mandarin Pinyin entries (characterised by presence of ===Romanization=== and/or {{pinyin reading of|...}} and/or {{cmn-pinyin}}). 60.240.101.246 07:23, 4 October 2011 (UTC)

Pinyin entries (===Romanization===) are allowed but I see what you mean. ;) --Anatoli 10:27, 4 October 2011 (UTC)
I think that using abuse filters to enforce policy might be a bad idea, I think expanding their scope beyond pure vandalism can have potentially harmful side effects. Does anyone have links to the range blocks or discussions about blocking this user? I am always concerned about getting rid of people who have so much passion about this project, even if that passion seems (or is) completely misdirected. - TheDaveRoss 10:41, 4 October 2011 (UTC)
From memory he generated IP addresses way outside his normal ranges. Also, there is a chance that if he knows that he may be blocked somehow (that he is not so "invincible") - temporarily or permanently, he may change his attitude and won't work against the rules and consider opinions of others. No, not asking for a definite block just yet. In any case, if a complex block would be used, that would be a collective decision, not individual. I heard something about the possibility of contacting ISP, if there is a serious attack or vandalism but it's not that bad. Yes, passion should be controlled, otherwise they cause problems or work for others.--Anatoli 12:00, 4 October 2011 (UTC)

(merged with above) --Anatoli 21:52, 4 October 2011 (UTC) Planck常数 (Planck Chángshù) It is a real word, please see Google Books. 2.25.214.61 21:38, 4 October 2011 (UTC)

Yes, mixed language example are citable alright but I hope after the vote, only 普朗克常數 / 普朗克常数 (Pǔlǎngkè chángshù) will be allowed, the reasons explained many times, I won't repeat. Your pinyin entries, romanising mixed language will also become invalid. You are the only person pushing to write foreign words in Mandarin using Roman letters. Otherwise, there wouldn't be a need for the vote. Also, use your real account, Engirst, no need to pretend you are many.

A freshly generated IP-address - 2.25.214.61

I have started keeping records of the number of IP addresses you are using and wil try to find your old user names and IP addresses, as this is a rather rare case of abuse. Nothing personal. --Anatoli 21:52, 4 October 2011 (UTC)

A recently generated and blocked (not by me) IP Address: Special:Contributions/2.25.212.83. --Anatoli 23:12, 4 October 2011 (UTC)

For what it is worth, even though there seem to be a lot of IPs it is really only one ISP. That ISP has 3 /16s, but we can narrow it down to maybe 4 or 5 /21s or higher I think. I would want to check and see how much collateral damage that would mean but blocking all of the IPs this person uses would not be hard if that was the consensus. - TheDaveRoss 01:50, 5 October 2011 (UTC)
I previously range blocked him before the vote on pinyin entries had passed using a subnet filter of 16 bit. I have been trying to help him in good faith, but he's really testing my patience. His sole purpose is to make romanization of Mandarin the standard in this dictionary. Personally, I think this is simply not viable in the long run. It's the sheer amount of information in the language that will be lost and the amount of confusion that this will cause as a result, should everything be written in pinyin. Mandarin, unlike English, has a huge number of homophones and heteronyms. This is THE reason why it should not be romanized. The syllable , has over 50 known homophonous characters associated with it, each with its own set of meanings and in some cases, its own set of heteronyms. This is also one of the reasons the Japanese adopted Kanji characters to distinguish between the meanings of homophones. If things keep worsening, I will consider blocking him again. JamesjiaoTC 03:23, 5 October 2011 (UTC)
The "user with many names" seems to be following pinyin entry rules, more or less. It's a new issue. abc123 (his original name) or Engirst is ready to fight (edit war) over Thames河, Hyde公园 and many others, forces examples like "London是英国的首都。" instead of "伦敦是英国的首都。". I had to protect some pages (temporarily) from his edits. As a Chinese speaker, what's your opinion on this type of entries? --Anatoli 03:50, 5 October 2011 (UTC)

A freshly generated IP-address - 2.25.212.57 --Anatoli 09:41, 6 October 2011 (UTC)

My proposal: I will block anonymous contributions from the ranges which have recently been abused. We leave Engrist unblocked (unless new reason for blocking arises) for the time being at least until the end of discussions on how to handle the particular Mandarin issues currently in debate. Once those issues have been resolved, Engrist can choose whether or not to abide by the results; if Engrist chooses not to follow the community resolution we formally ask Engrist to leave the project, modify the blocks to include logged in users, and actively block future socks. I think this can be done with minimal collateral damage. - TheDaveRoss 21:25, 6 October 2011 (UTC)
At the moment, we are only blocking individual IPs. I propose to range block. I've done some research and it seems that his ISP (in London, UK) gives out dynamic IPs in the range of 00000010 00011000 00000000 00000000 and 00000010 00011011 00000000 00000000 with a subnet mask of 11111111 11111100 00000000 00000000 (in IPv4: 2.24.0.0 to 2.27.255.255 with a 14-bit subnet mask). Blocking the whole range would mean possible collateral damage, but it wouldn't be too bad if we still allow account creation. JamesjiaoTC 22:29, 6 October 2011 (UTC)
FWIW, here is a list of IPs: 2.25.191.81 (?), 2.25.191.225 (?), 2.25.193.30, 2.25.212.57, 2.25.213.147, 2.25.214.61, 2.27.72.254, 2.27.73.75, 2.27.72.78 (I am not convinced of this one). AFAICT, few other editors have edited recently from IPs in that range. - -sche (discuss) 22:28, 6 October 2011 (UTC)
All of these are incarnations of the same entity. JamesjiaoTC 22:32, 6 October 2011 (UTC)
This is why I proposed to do the range block, I checked the potential ranges and can avoid pretty much all collateral damage as well as target the IP address space which is allocated to whatever region this user is in. - TheDaveRoss 22:46, 6 October 2011 (UTC)
I actually forgot about my previous reply to your proposal. Silly me. Well at least you have my support. JamesjiaoTC 22:50, 6 October 2011 (UTC)

Another IP-address for the record - 2.27.72.125. Does anyone still think it's different people? --Anatoli 01:20, 7 October 2011 (UTC)

Immediately after me trying to talk to him, he "moved" to a new IP address: Special:Contributions/2.25.212.90. It must a game of chasey for him. --Anatoli 02:06, 7 October 2011 (UTC)
Does anyone else think that the range is too wide between 2.27.... and 2.25...? --Anatoli 02:08, 7 October 2011 (UTC)
The IP hops might be intentional, they might be the way the ISP operates. Seeing as we are not blocking most of these IPs I can't imagine why they would change IPs between edits. There are three much smaller ranges (/23s) which are more realistic. - TheDaveRoss 02:13, 7 October 2011 (UTC)
Some ISPs allow a fresh IP address to be assigned when you cold restart your modem. I think Engirst might have found this trick. JamesjiaoTC 02:32, 7 October 2011 (UTC)
Some ISPs also give their users subnet IPs and force all traffic through proxies, which means that every few minutes or hours they may have a different IP presented to the outside network based on which proxy they end up on. AOL was like this for many years. What you say makes sense if we were blocking each IP, since we hardly have any blocked it makes little sense. - TheDaveRoss 04:03, 7 October 2011 (UTC)
I don't understand why, though. I have addressed him several times with no answers but every time he changes his IP address. His user account (Engirst) is not locked. He prefers the backdoor, as if nobody can see what is happening. BTW, his first account was "123abc", not abc123 as I said before. Then, there was "Ddpy". Most of his edits are now gone but there are still many to be fixed or deleted. --Anatoli 02:43, 7 October 2011 (UTC)
I'd be happy for us to delete any pinyin where we don't have the Hanzi equivalent. Perhaps we could make it a formal rule. I'm not sure such a rule is needed, as I deleted about 100 such entries last night and nobody objected. Mglovesfun (talk) 07:54, 7 October 2011 (UTC)
Would that be bot-able? Basically check all pinyin entries to see if there are any hanzi entries that list the same pinyin, and delete the pinyin entry if no such hanzi entries are found? -- Eiríkr Útlendi | Tala við mig 16:49, 7 October 2011 (UTC)
Well, a pretty good but imperfect solution is this edit to {{pinyin reading of}} which checks if the first parameter (aka tra, trad) exists. If it doesn't exist it categorizes the entry in Category:Mandarin pinyin entries without Hanzi. This of course won't work for entries that don't use pinyin reading of, and it will miss entries that exist but lack the correct language (that is, they have only Japanese/Cantonese/Korean or whatever). Mglovesfun (talk) 16:53, 7 October 2011 (UTC)
Hanzi live in a particular range of Unicode, so I think it would be possible to find all of the pinyin readings that way, regardless of template usage. - TheDaveRoss 19:56, 7 October 2011 (UTC)
New "additions" of Special:Contributions/2.25.214.239, all mixed language items, the foreign names are all deliberately untranslated: Ohm定律, a correct Mandarin is "欧姆定律" (Ōumǔ dìnglǜ), Banach空间 - "巴拿赫空间" (Bānáhè kōngjiān), Hilbert空间 - "希伯特空间" (Xībótè kōngjiān), also by another 123abc's sockpuppet: Special:Contributions/2.27.73.100 Hausdorff空间 - "豪斯多夫空间" (Háosīduōfū kōngjiān). Happy to block to user and delete all these entries, they are not Mandarin. Soft redirect might be considered if we have the Mandarin, not mixed entries.--Anatoli 21:56, 9 October 2011 (UTC)

Planck常数 (Planck Chángshù)

Concurrent discussion: Wiktionary:Requests_for_deletion#Planck.E5.B8.B8.E6.95.B0, Talk:Planck常数

It is a real word, please see Google Books. Anyhow, shouldn't deleted and blocked. 2.25.214.61 21:38, 4 October 2011 (UTC)

I answered above. --Anatoli 21:55, 4 October 2011 (UTC)
All languages in the world, big or small should be equal. They are real words, only dictators want to ban real words. 2.25.214.61 22:03, 4 October 2011 (UTC)
Am I banning a language? I don't want you to insert mixed language entries and translations - simply called Chinglish. Wiktionary is not to show how people with poor language skills use it. Quoted "普朗克常数" has 1,500 hits in Google Books, why would anyone want to promote "Planck常数" instead (29 hits)? You are just spreading illiteracy. No-one is trying to force "Planck 상수" (Korean) or "постоянная Planck" (Russian). I have to protect pages because of you, just stop it, will you? --Anatoli 22:32, 4 October 2011 (UTC)
"poor language" is just your personal idea, but Planck常数 is used in professional books. 2.25.214.61 22:39, 4 October 2011 (UTC)
Who defines poor language? CFI does not have a quality stipulation on acceptable words, nor is it acceptable for any Wikimedia project to promote anything. If there's 29 hits of Planck常数, then it has the same justification as Planck's constant, and quite probably should have a usage note pointing to 普朗克常数.--Prosfilaes 05:12, 5 October 2011 (UTC)
I have no choice but temporarily protect pages from you. To anyone, please contact me if you think I'm abusing my administrator rights. I really see no choice at the moment. The reasoning is explained many times, I won't repeat. Seems like déjà vu. "Poor language" is my abbreviation of all said before. --Anatoli 22:42, 4 October 2011 (UTC)
Yes, you are. You are a language dictator. 2.25.214.61 22:47, 4 October 2011 (UTC)
Whatever, when a person like you says it, it means I'm doing the right thing, thank you. Knowing your records, I'm 100% sure that if you had adminsitrative rights you would dictate Mandarin without Chinese characters onto Wictionary or something. I don't think you are a passionate linguist, you're obsessed with your "transition to Roman letters" ideas. If you are Chinese, you must be ashamed, I support the Chinese person who told you off. --Anatoli 22:55, 4 October 2011 (UTC)
(@2.25.214.61 etc) Please refrain from name calling, it will certainly not further your cause. It is very important to recognize that this website is a collaborative project. Even though we all have our own opinions about what Wiktionary should be, we agree to sacrifice some of what we want so that Wiktionary can be what the whole community wants. Please take some time and consider what your goals are, and then present them to the community for discussion, persuade us, and allow us as a community decide which course to take. A sure fire way to lose any support you may have had is to simply try to impose your ideas on the community against its will and then get defensive about it. If you continue to be as combative as you have been then we will most likely ask you to leave for the good of the project. That may not be a bad thing, Wiktionary is not a good fit for everyone, but I would rather you decide that the community effort is worth some sacrifice and join us under the agreed upon terms. Thanks, - TheDaveRoss 01:40, 5 October 2011 (UTC)
Thank you, TheDaveRoss. I just want to comment that similar ideas are shared by people on pinyininfo.com, some of them do make sense to me - standardisation of pinyin and transliteration of Chinese names for all Mandarin speaking countries and areas. I'm sure he will be welcome there. Another thing - he was already told to leave, blocked numerous times, only a few by me. Talk to User:Tooironic. Then he reappeared with no difficulty, making the administrator right to block a contributor who creates more problems than adds value, a joke. --Anatoli 01:52, 5 October 2011 (UTC)
Problems are caused by somebody abuse power. Wiktionary has no rule to ban mixed scripts till now. 2.25.214.61 02:12, 5 October 2011 (UTC)
I agree, problems can be caused by those who abuse power. Problems can also be caused by those who refuse to listen to others in the group. The way rules are developed on Wiktionary is very organic. We don't have a rule for something until a disagreement about it arises. Once a disagreement does arise, those who are in disagreement stop what they are doing and open the issue up for discussion, either between those who are close to the problem or the community as a whole. If it makes sense, there is a rule created based on the result of the discussion. Just because something doesn't have a "rule" doesn't mean that it is allowed. We don't have a rule about deleting the main page, yet it is something for which a person would be punished. Thank you for your willingness to discuss the issue. - TheDaveRoss 02:27, 5 October 2011 (UTC)
(@2.25.214.61 - Engirst) Really? It must be me? What about your toneless pinyin story? Remember this Wiktionary:Beer_parlour_archive/2010/May#block_list?
BTW, I added many Mandarin translations using mixed scripts, eg. DVD player, edited 卡拉OK, created T恤衫. I have no problems with many others. Good try but you need more correct answers. --Anatoli 02:34, 5 October 2011 (UTC)

On such topics, decisions should be based on consensus, not on votes based on personal opinions. E.g. even if a majority wants to exclude a language (for political or whatever reasons) while a minority wants to keep it, it should be kept if it can be called a language. It's the same for words. There should be a discussion between open-minded people until a consensus is reached. Lmaltier 18:05, 5 October 2011 (UTC)

Lmaltier, although I broadly agree with you here, I'm not sure you know what consensus means, and as a result, you seem to contradict yourself.
Regarding individual terms, the crux of the current issue revolves around what constitutes "Mandarin", and the majority opinion (i.e. rough consensus) appears to be that Mandarin does not include "Alzheimer's" or "Einstein" or "Thames" or "Planck". I think all the Chinese editors here would agree that 常数 is Mandarin, which makes Planck常数 a curious mixed-language hybrid term.
I see only two clear paths forward for keeping terms like Planck常数:
  1. Create a ==Chinglish== (or similar) language heading, and categorize such terms under this.
  2. Include such terms, but keep the entries extremely simple, just listing the terms as alternate spellings or misspellings and linking through to the hanzi-spelled entries that contain the definitions, usage examples, etc.
Without any general consensus (there's that word again) as to which course to take, I'm inclined to view these as truly mixed-language terms, that would thus not belong under any single-language header. -- Cheers, Eiríkr Útlendi | Tala við mig 18:42, 5 October 2011 (UTC)
In any case, there should be an agreement before we allow creation of so many hybrid entries, it's not a common practice here. Only one user (even if under different names or anonymously under different IP addresses pushes it so ardently). That's why I created the vote - to get a collective decision. Lmatier, you'll have a chance to vote and express your opinion. If the vote fails (hope not), then we need to discuss the details. I agree with Eirikr that we shouldn't keep them as just Mandarin entries because they are not. --Anatoli 23:12, 5 October 2011 (UTC)
What I mean is that there should be clear and simple principles, the main ones being all words in all languages and a header for a language means that the word is used in this language. A consensus is not the result of a vote, only the fact that all open-minded people agree, after discussion based on arguments, that principles are met, even if some (or the majority) would prefer not to include the word for personal reasons (because they don't like it, etc.), or agree that principles are not met. Lmaltier 05:21, 6 October 2011 (UTC)
We don't allow some SoP's, even if some people think they are words, do we? We don't need "blue sky" or "tram no. 20" or "Chinese for London is 伦敦". To me and a few others, as you can see "Planck常数" is not a word but two: Planck + 常数, and one of them is not Chinese, even if it's used in a Chinese, it's still English inside Chinese. There will be more and more English words used by Chinese but why do we need to include them here if they are haven't become part of the language? --Anatoli 05:41, 6 October 2011 (UTC)
Anatoli, "Planck常数" is not a semantic sum of parts: its meaning cannot be obtained from the knowledge of the meaning of "Planck" and "常数". The same is true of the English "Planck constant", for which we have Planck's constant. This discussion should not be in BP anyway, but rather in RFV (if you think the term is not attestable) or in RFD (if you think the term is a semantic sum of parts or have other reasons to believe the term does not meet CFI). Furthermore, "Planck常数" is not a proper noun, so the vote you have proposed (Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters) will have no effect on the inclusion of "Planck常数". See also WT:RFD#Planck常数, which I have created based on your having tagged the term for RFD on 4 October. --Dan Polansky 06:41, 6 October 2011 (UTC)
The Chinese term for "Planck constant" is "普朗克常数", not "Planck常数". There's nothing Chinese about the name "Planck". If a person called Mark says "我叫Mark" - my name is Mark, rather than "我叫马克 (wǒ jiào Mǎkè)". "Mark" doesn't become Chinese translation of English "Mark" but "马克" is. (One of the books with "Planck常数" has "Heisenberg 和 Schr6dinger"(?)). Engirst hardly contributes in the main Mandarin area, only pinyin. Why when he does, it's "Planck常数", not "普朗克常数". The ratio is 1,500 to 29. Don't you see he has an agenda? Quite amusing were his examples he was forcing - "London是英国的首都". Is this Mandarin?! You can find "Obama总统", "Cameron首相". So what? Do we start adding them as Mandarin? --Anatoli 08:36, 6 October 2011 (UTC)
Do you acknowledge that "Planck常数" is not a semantic sum of parts? --Dan Polansky 11:43, 6 October 2011 (UTC)
That would be recognising it as a term, no I don't recognise it as a word, there are two languages in one sentence. It's artificial and not assimilated at all. "A rare misspelling of" is the best I can give it, it could be a single unit to someone using a hybrid of languages, like the person who could say "我是America人" - "I'm American". Are you actually reading what I said before? --Anatoli 12:02, 6 October 2011 (UTC)
I am asking about whether you acknowledge that the term is not a sum of parts. I am not asking whether you acknowledge the term to be a word, whether you deem the term worthy of inclusion, or whether the term is "artificial" or "assimilated". This question of whether it is a sum of parts can IMHO be fairly objectively answered in the negative, so I am asking whether you can confirm the observation that the term is not a semantic sum of parts, disregarding for a while your goal of getting the term excluded. If you claim that the term is a semantic sum of parts, can you explain whether you deem "Planck constant" a semantic sum of parts and why? --Dan Polansky 12:10, 6 October 2011 (UTC)
普朗克常数 (Pǔlǎngkè chángshù) is a semantic term. Planck常数 is European Planck + Chinese constant. If you look up 常数, then you will have the translation of Planck常数 without any need for Planck常数. —Stephen (Talk) 12:17, 6 October 2011 (UTC)
As another example of a term, which is not assimilated but was uttered and there is one citation, is "сраный ковбой" (shitty cowboy). I requested its deletion because it never caught on, not assimilated in the meaning American (abuse). Mixing English names and words into Chinese is not a new trend and we already have the English names. Most famous English proper nouns can now be found in a Chinese text, rivers, cities, mountain ranges, formulas, theorems will follow the original term with a Chinese suffix. Will "Mont Blanc山" or "California州" become Chinese only because they are followed by 山 and 州? --Anatoli 12:19, 6 October 2011 (UTC)
@Dan Polansky. "Planck常数" is a sum of parts. --Anatoli 12:25, 6 October 2011 (UTC)
Are you saying that "Planck常数" is a semantic sum of parts, while the English "Planck constant" is not a semantic sum of parts and "普朗克常数" is not a semantic sum of parts? Is this conjunction of three assertions what you are saying? --Dan Polansky 15:48, 6 October 2011 (UTC)
@Anatoli, @Dan --
I think you two might be talking past each other. @Anatoli, by saying that "Planck常数" is not SOP, I think Dan is stating that the meaning of this phrase is not clear just from the parts -- if I only know (or only look up) "Planck" and "常数" as individual pieces, I have no idea that "Planck常数" is intended to mean h in physics.
Meanwhile, @Dan, I think what Anatoli is getting at is that "Planck" is a term in English (and other European languages), and "常数" is a term in Mandarin, and while the average Mandarin reader would understand the latter, the former would only be understood by that subset of Mandarin readers who are also at least somewhat familiar with European languages. By saying that "Planck常数" is SOP, I think Anatoli is stating that this term is comprised of two distinct parts, and only one of these parts is recognizable as Mandarin.
@Anatoli, @Dan, have I understood each of you correctly? -- Hoping this helps clarify, Eiríkr Útlendi | Tala við mig 20:05, 6 October 2011 (UTC)
Eiríkr Útlendi (or just "Eiríkr"?), you understand me perfectly well. I believe my use of the phrase "sum of parts" and "semantic sum of parts" is in perfect align with the customary use of the phrase in English Wiktionary, and also fits the natural reading of the phrase "semantic sum of parts". My use refers to WT:CFI#Idiomaticity and its use of the term "idiomatic", defined in CFI in this way: 'An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components.' Where CFI says "idiomatic", I say "not a sum of parts" and "not a semantic sum of parts".
Re: 'By saying that "Planck常数" is SOP, I think Anatoli is stating [...]': This would mean that Anatoli has invented a new meaning of "sum of parts" as applied to terms, a meaning that has nothing to do with WT:CFI#Idiomaticity and hence is irrelevant. I reject this new meaning as part of a meaningful discussion about inclusion-worthiness of terms; the term "sum of parts" has, in Wiktionary discussions, a specific bound meaning such that editors are not free to redefine the term as they see fit. --Dan Polansky 20:48, 6 October 2011 (UTC)
┌─────────────────────────────────┘
@Dan, one thing I haven't heard articulated yet by you (or at least haven't understood) is your views on the status of the term Planck常数 with regard to language. SOP or not, the main reservation from Anatoli (and myself if I'm perfectly clear about that) is that Planck常数 is not a single-language term. In arguing for this term's inclusion, do you view Planck常数 as common use in Mandarin contexts, and therefore meeting CFI as a single-language term?
Anatoli and IP user 60.240.101.246 are self-identified as Mandarin speakers, and neither are happy including this term, with both stating that Planck常数 as a whole is not Mandarin. James Jiao identifies as a native speaker and weighed in here regarding the term Thames河, and at least some of his points in that thread would seem to apply to Planck常数 as well. The main user(s) adding such terms and arguing for their inclusion, specifically Engirst and multiple IP users who may or may not also be Engirst, have never to my knowledge indicated whether they are Mandarin speakers, even when asked point-blank. (I'm not fluent in Mandarin nor familiar enough with writing styles to say much about specific Mandarin terms on my own authority, but I am concerned about the possible precedent and how that might affect entries classified as Japanese; hence my participation here.)
I would appreciate it if you could explain a bit about your specific reasons for wanting to include Planck常数. Your views on this term's non-SOP-ness are clarified by your post above, so what other reasons do you have? I'm honestly curious, and I do not feel like I understand your position well enough to really agree or disagree in any clear and reasoned fashion. -- Eiríkr Útlendi | Tala við mig 21:12, 6 October 2011 (UTC)
I have not said anything about whether I want "Planck常数" included. Rather, I wanted Anatoli to stop erroneously claim that "Planck常数" was a sum of parts. This is a hard subject; the thought about it is not made clearer by fallacious argumentation that involves erroneous claims of sum-of-part-ness, and terms such as "madness", "spread illiteracy", "have an agenda", and "Chinglish". A reason for wanting the term included would be that it meets CFI. A term that meets CFI can still be tagged as "rare"--which it definitely is--or even as "nonstandard"--which it seems to be as well. The dictionary's containing a term does not yet mean that the dictionary somehow endorses the term or recommends its use. The dictionary merely registers observations about the actual use of language. I have no strong feelings about "Planck常数"; it is so rare that it can be considered a rare malformation or something, not much unlike a rare misspelling; I do not really know. I do admit that no one will probably want to look up the term, an indication that it could be deleted. What I am passionate about is elimination of wrong argumentation, though, wrong as far as I am able to tell anyway. Furthermore, CFI does not say anything about "common use", other than in "Any word may be rendered in pig Latin, but only a few (e.g., amscray) have found their way into common use", which is a sentence in a rather poorly phrased section of CFI that has been kept for no consensus for deletion (5:4:0 for deletion) in the vote Wiktionary:Votes/pl-2011-01/Final_sections_of_the_CFI, but should better be deleted anyway so as not to mislead, as WT:CFI#Attestation does not say anything about "common use". --Dan Polansky 21:39, 6 October 2011 (UTC)
Ah, thank you, now I have a better understanding of where you're coming from. FWIW, I am slowly warming to the idea of inclusion with a soft redirect to the main entry at 普朗克常数 and a note about rarity, iff acceptable citations can be provided.
As a minor point, WT:CFI#Attestation does state “Attested” means verified through 1. Clearly widespread use -- notably, not as the sole limiting criterion, but "common use" would appear to be one of the criteria. -- Eiríkr Útlendi | Tala við mig 22:08, 6 October 2011 (UTC)
Above all, point 1. of WT:Attestation is an item of a disjunctive list (A or B or C or D), so it is not a necessary requirement for attestation. Point "1. Clearly widespread use" should IMHO be deleted from CFI; it just misleads. Fact is, point 3. of WT:Attestation ("Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year, [...]") provides more lenient criterion than the point 1., so the point 1. is redundant. What point 1. currently does is make it possible for people to claim in WT:RFV that a questioned term "is clearly in widespread use", but that is IMHO a matter of procedure rather than an extended definition of "attested": CFI does not state how the attestation should be documented, in particular, whether the quotations need to be actually entered into the Wiktionary database. So again, removing point 1. would simplify things without changing the substance of CFI, IMHO. --Dan Polansky 07:16, 7 October 2011 (UTC)
That's a clear explanation, thank you. Wiktionary:CFI#Attestation_vs._the_slippery_slope suggests that attestation alone should not be the sole justification for inclusion, however; what's your view on that? (And perhaps this particular discussion should be moved into a separate thread? This is getting unwieldy.) Never mind on that second part, just saw your reply over at Wiktionary:Requests_for_deletion#Planck常数, which answers my question. -- (Updated) Eiríkr Útlendi | Tala við mig 16:54, 7 October 2011 (UTC)
My view on that is that Wiktionary:CFI#Attestation_vs._the_slippery_slope is purely informative; it provides information on interpreting the rest of the document, not actual rules for what passes CFI.--Prosfilaes 17:00, 7 October 2011 (UTC)
  • I think that Planck's constant itself is SOP. It means, after all, any constant invented by some guy named Planck. I had a friend called Martha Planck who made a constant once.--Rockpilot 22:18, 6 October 2011 (UTC)
(After an edit conflict)I do not claim to have a new definition of "sum of parts" but "Planck常数" is a sum of parts because it's not a Chinese term at all. I do have strong feelings about NOT keeping this type of entries because they are simply wrong. A physical Chinese dictionary simply uses 普朗克常数 (Pǔlǎngkè chángshù), explaining that 普朗克 (Pǔlǎngkè) is the transliteration of the name "Planck". I'm not skilled at presenting my arguments in English well but allowing "Planck常数" would present a bad precedent, like "Archimedes螺线" instead of "阿基米德螺线" (Archimedean spiral) or similar (don't quote me on the exactness of the possible way of someone writing in a Chinese text). I'm less worried about "sum of parts" rules than the quality of foreign language entries. Sum-of-parts problem is noticed quickly when an entry is English but if they are in FL, many get through unnoticed. Are you angry with me because I used "madness", "spread illiteracy", "have an agenda", and "Chinglish"? It's madness to convince everyone that "Thames河" is Mandarin for "Thames", it's also illiterate, although it's often forgiven to overseas Chinese not knowing how to write a foreign name in Chinese. "Madness" is a strong word but I do have strong feelings about it. I'm not calling Engirst (he doesn’t want to use this account any more?) mad but I DO think he has an agenda. His agenda (it's only one male user, not many) was confirmed many times by Chinese speaking contributors, let me call it "Mandarin in Latin script". Next term - "Chinglish", among other things, means "mixed Chinese and English" or a hybrid language, not offensive. People do use Chinglish, Japlish, Runglish, Konglish, etc. but we don't have CFI for them. I don't think I was offensive to anyone but if I was I apologize. I had an argument with a Russian Wikipedia editor that "Bluetooth" is not a Russian term, well he quoted sources like our case with "Planck常数", still "Bluetooth" hasn't become a Russian word in this spelling. Languages not using Roman letters all have different perceptions of what IS part of their language, especially if it is written in a different script, generally, in 99.9% cases - if a word is not in a native script, it's not part of this particular language, with a very few known exceptions. Shall we agree to disagree at this point? You are welcome to take part in the vote and present a summary of your reasons.
After reading new Eirikr's comment - yes, rather than deleting, having a soft redirect could be a compromise I would accept, Chinese struggle themselves knowing how to transliterate a foreign name and there could be variants, not just between China/Taiwan/HK but even in one country. --Anatoli 22:33, 6 October 2011 (UTC)
Re: '"Planck常数" is a sum of parts because it's not a Chinese term at all': If this is not a redefinition of "sum of parts", then I do not know how a redefinition would look like. It does not seem to have anything to do with WT:Idiomaticity: 'An expression is “idiomatic” if its full meaning cannot be easily derived from the meaning of its separate components'. --Dan Polansky 07:16, 7 October 2011 (UTC)
  • Should we decide if we are willing to accept soft redirects before starting the vote? - -sche (discuss) 07:19, 7 October 2011 (UTC)
I already expressed my acceptance. --Anatoli 21:59, 7 October 2011 (UTC)
Right, but if the vote passes, it will ban soft redirects for entries that contain or are proper nouns. If we're OK with making the entries "soft redirect" (point in an explanatory way to the main entries, like 'ave points to have), we shouldn't necessarily hold that vote; we should just make mixed-language/mixed-script entries into soft redirects. - -sche (discuss) 22:35, 7 October 2011 (UTC)
We have some time before the vote. We need to see the reaction of opponents of the proposal first. --Anatoli 22:42, 7 October 2011 (UTC)
Besides, I don't see a contradiction of the vote and redirects. Banning will not disallow redirects like Mockba. I may add a clause. --Anatoli 22:45, 7 October 2011 (UTC)
I am going to oppose the vote Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters. It is essentially prescriptivist ('"Thames河", "Planck常数", "Alzheimer病", etc. could be made soft redirects to the correct Mandarin entries.', emphasis on "correct" mine). Furthermore, it seems fairly incoherent at this point. The vote seems to modify CFI for Mandarin, yet the sixth reason stated in the vote claims the discussed terms such as "Alzheimer病" already fail CFI as being sum of parts. The vote seems to be predicated on the assumption that it is the business of a dictionary to "prevent spreading illiteracy", whereas the business of a descriptivist dictionary is to document what can actually be observed, and mark it as "rare" and "nonstandard" if it fits observation. By including a term, a dictionary does not promote the term, especially when the term is marked as nonstandard. In particular, by including vulgar terms, a dictionary does not promote their use; by including terms marked as obsolete, a dictionary does not promote their use. By an analogy, a library of all books ever published on the Earth contains all books, regardless how objectionable the books may seems to the librarians or majordomos of the library. There are some further issues with the vote. --Dan Polansky 07:54, 8 October 2011 (UTC)
It's normal to have language specific policies, especially if they are more restrictive than CFI for English (not breaking the existing rules). People who know the language they are editing in, will know better, other people may offer decisions that may be wrong or not followed. Dan, languages you work in, are mostly Roman based, not sure you understand that words like iPhone, for example, can be used in a Russian, Mandarin or Japanese text, you'll find a millions of citations of it, but they are not part of that language even if they are the official forms - a sign will say "iPhone", not "айфон", does it make sense? Similar with "Planck常数", only Chinese living overseas and mixing languages will know what it means. I start suspecting that you too have some agenda. Why so much enthusiasm towards Mandarin all of a sudden when we are dealing with a mixed script? You are even being aggressive towards me calling my arguments "fallacious". Also, why does it worry you personally, what is included as a Mandarin term in Witionary? Do you work with Mandarin? No Mandarin dictionary in whatever country, no matter how large, would include such terms. Anyway, the discussion in Talk:Planck常数 seems to lead to a possible compromise. If we won't reach it, we'll decide on the vote. BTW, I don't think extending it to one month is needed, two weeks will suffice. --Anatoli 11:13, 8 October 2011 (UTC)
"Have an agenda" and concern with personal motivation are fallacies of irrelevance; my enthusiasm is of no one's concern. I find many of your arguments fallacious ("characterized by fallacy; false or mistaken"), and feel entitled to say so without considering it a personal attack. I am worried with a proliferation of prescriptivist inclusion criteria, and with spread of prescriptivist thought in Wiktionary, as well as with incorrect use of the term "sum of parts" AKA "nonidiomatic", incorrect with respect to WT:Idiomaticity.--Dan Polansky 13:08, 8 October 2011 (UTC)

I don't understand what this discussion is about. This is a non-Mandarin word (Planck) combined with a Mandarin word (常数). Mandarin speakers don't perceive "Planck常数" to be a Mandarin word (ask any native speaker if in doubt), so even if this is not a SOP, it shouldn't be kept. "Москва" is used in English but English speakers don't regard it as an English word, hence it was deleted unanimously. 60.240.101.246 13:21, 8 October 2011 (UTC)

Re: 'Mandarin speakers don't perceive "Planck常数" to be a Mandarin word': What evidence for this assertion do you plan to provide? Are you saying that not a single Mandarin speaker considers "Planck常数" to be a Mandarin word or that most Mandarin speakers considers "Planck常数" to be a Mandarin word? --Dan Polansky 13:33, 8 October 2011 (UTC)
With your capability you will not be able to find one that does (exclud. possibly Engirst, who doesn't seem to be Hanzi-literate). 60.240.101.246 13:36, 8 October 2011 (UTC)
Care to answer my questions? What evidence? Not a single does or most don't? --Dan Polansky 13:39, 8 October 2011 (UTC)
OK. Here is what you wanted. I can't speak for everyone of course, but I do understand the mindset and perceptions of languages by native Chinese speakers better than you do. There are too many references on this issue, eg. 《直用原文──现代汉语外来语运用中的一个新趋势》,《試論漢語文字和中國人的傳統思維方式》,《原形借词——现代汉语吸收外来语的新发展》,《论外来语对现代汉语的冲击》,《关于外来语及其周边概念的考察》,《关于汉语文字的几点认识》,《2010年中国语言生活状况报告》,《现代汉语中字母词研究综述》,《外来语在汉语中的使用及对汉语的影响》, they basically all comment that the increase in loanwords needs to be noted and become alerted to; they don't fit into Chinese phonology and sound very foreign; a recent trend is words in other languages used directly without being transcribed or translated; these words are used to avoid confusion or for convenience; they do not appear in formal situations where transcription and translation always occur and the general public doesn't regard these words as being assimilated into the Chinese lexicon; phonologically adapted loanwords tend to be replaced by native calques eventually; this tendency contrasts starkly with the Japanese and Korean cases, where massive and indiscriminate importation is currently occurring; and in conclusion the import of loanwords damages the structural integrity and purity of Chinese, although some young people view this as fashionable, it should be regulated and discouraged. 60.240.101.246 14:32, 8 October 2011 (UTC)
Essentially the Wiktionary community is a miniaturised version of the general public. Out of those who actively participated in the deletion discussion of these mixed script entries, people who know some Mandarin (Anatoli, Tooironic, Jamesjiao, me) all voted against the inclusion, and people who don't know the language (Lmaltier, Dan Polansky, Prosfilaes, -sche (initially)) tended to keep these. The chance of this occurring assuming equal probabilities for the two cases is 1/256, or 0.4%, low enough to be considered statistically significant. 60.240.101.246 14:40, 8 October 2011 (UTC)
Thus, you do not plan to provide any evidence; instead, you offer yourself as a witness.
Let me highlight this quotation: "[...] the import of loanwords damages the structural integrity and purity of Chinese, although some young people view this as fashionable, it should be regulated and discouraged". This quotation is outright prescriptivist. A prescriptivist lexicographer sees it as a goal of a dictionary to protect "the structural integrity and purity" of a language. Such a prescriptivism is typical of language academies around the world. By contrast, the English language has no such central regulatory body of language; an Anglo-American descriptivist dictionary does not see it as its aim to protect the purity of language but rather aims at documenting the use of language as it actually occurs, regardless whether language authorities approve or disapprove of its use. Moreover, it is still possible in a descriptivist dictionary to note that some authorities consider a term incorrect, whether by means of the template {{nonstandard}} or by means of a usage note. The current entry "Planck常数" lead the user of the dictionary to synonyms: 普朗克常数, 浦朗克常数, 卜朗克常数. If this entry is deleted, the fact that '"Planck常数" is a nonstandard term whose standard and widely accepted synonyms include 普朗克常数, 浦朗克常数, and 卜朗克常数' remains undocumented in the dictionary, an unfortunate circumstance. --Dan Polansky 14:53, 8 October 2011 (UTC)
Re: "[...] they do not appear in formal situations": Neither does ain't and gonna; you do propose to delete these as improper English? Should Category:English informal terms be deleted? And what about such foreign importations as English háček, which threatens the purity of the English language? --Dan Polansky 14:59, 8 October 2011 (UTC)
Dan, I find your doggedness in this issue to be a bit odd. I understand your concerns about prescriptivism versus descriptivism; that part makes sense to me. That said, IP user 60 here, Anatoli, and James Jiao, among others, are basically making the point that terms like "Planck常数" are about as intelligible to Chinese readers as "Москва" is to English readers. If that is the case, and if "Москва" has been deleted as "not English", why are you apparently so opposed to deleting "Planck常数" as "not Mandarin"? I confess I'm confused by your stance, and I must assume it's because I don't fully understand your perspective. -- Eiríkr Útlendi | Tala við mig 03:14, 9 October 2011 (UTC)
"Москва" is perfectly intelligible to English readers; or at least as intelligible as pemoline, ironbark or votator. Furthermore, we didn't delete Москва because it wasn't an English word; we decided it was Russian in English and deleted the English section and not the Russian section. You want to delete Planck常数 as a whole and act like this attestable word doesn't exist just because it doesn't fit your constraints.--Prosfilaes 04:33, 9 October 2011 (UTC)
@Prosfilaes: Are you speaking for yourself, or on Dan's behalf?
That aside, your comment here comes off as disingenuous. English readers unversed in Cyrillic will not find "Москва" at all intelligible, certainly not as "moskva". Moreover, we decided it was Russian in English and deleted the English section sounds an awful lot like "Москва" has been deleted as "not English", leaving me uncertain what distinction you are making. Is your intended point that, since some headword "Москва" still exists, removing the English is acceptable?
Regarding you want to delete Planck常数 as a whole and act like this attestable word doesn't exist just because it doesn't fit your constraints -- my only constraint is that a term be filed under the appropriate language. "Москва" is not English, so I support that term not being listed under an English heading. Planck常数 doesn't appear to fit under any of our existing language headings, so I support that term not being listed under any of our existing language headings.
Regarding attestation that a particular string exists in use somewhere, ittyshay seems like it might be attestable given the number of hits at google:ittyshay, but WT:CFI, as it's currently written, counsels against including pig Latin. In a similar mien, google:"my+natsukashii" suggests attestability for natsukashii in English contexts, but it is not included here under an English heading, ostensibly as it is not recognized as English. Attestation alone appears to be insufficient for inclusion -- which strikes me as reasonable, for it is unreasonable to argue that attestation in a given language context alone makes a term that language -- which is kind of the whole point of this thread, that Planck常数 is not Mandarin. -- Eiríkr Útlendi | Tala við mig 05:27, 9 October 2011 (UTC)
@Eiríkr Útlendi: CFI does not advise against pig Latin but rather says that pig Latin can be included as long as it is attested, giving amscray as an example; you should read the relevant "slippery-slope" section again, and read again my response that the "common use" and "general use" used in that section are misleading and match neither the current practice nor WT:Attestation. Again, in case in doubt, you can create a new thread here in Beer parlour in which we clarify whether people agree that "common use" should be required for pig Latin. google:"my+natsukashii" searches world wide web, which does not count for attestation; google books:"my natsukashii" finds nothing and google books:"ittyshay finds nothing; the relevant point of CFI is "Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year". From what I can tell, you still have a poor grasp of how CFI usually gets applied, especially the attestation section. Instead of focusing on idiomaticity and attestation as specified in CFI and as usually applied, the supporters of deletion variously claimed that "madness" must be stopped, that importation needs to be regulated, that we must not "spread illiteracy", or, now, that the phrase is non-intelligible to many native speakers. But the non-intelligibility to many native speakers is not a concern per WT:CFI; "Planck常数" or "Planck 常数" seems attested Google books. There are many specialist terms entered into Wiktionary as English that are not readily understood by the majority of English-speaking population. Wiktionary registers attested terms rather than terms that are readily understood. The assertion that 'Planck常数 is not Mandarin' seems implausible, as the term seems attestable in running Mandarin text; the presence of Latin characters alone does not exclude the phrase from Mandarin, as then also "AA制" and "T恤" would be no Mandarin. As regards "Planck常数" vs "Москва", "Москва" can be claimed to be Russian embedded in English and is borderline-attestable in Usenet, none of which holds for "Planck常数", which seems attestable in Google books. As regards my motivation described above as "doggedness in this issue", that again is of no one's concern and has no bearing on the correctness of my arguments, and thus, again, is a fallacy of irrelevance. I do not see why my stubborn attempt to defend CFI and lexicographical descriptivism is "doggedness", while the stubborn attempt to import lexicographical prescriptivism into English Wiktionary (most conspicuously documented in one of the responses of the anon 60.240.101.246 above in this thread) should be considered non-dogged or reasonable. I also don't see why your repeated responses, the last of which mostly ignores points made in my post, should be considered non-dogged; you had the option of not butting in in the conversation between me and 60.240.101.246, now disclosed as a marked prescriptivist who wants to protect the purity of language. In any case, "have an agenda", "madness", "doggedness", and similar non-concerns are best avoided in the discussion. --Dan Polansky 07:04, 9 October 2011 (UTC)
I speak for myself, of course. Москва was not deleted; someone looking it up will still find the word. As a practical matter, you haven't changed the entry at all for users. If Planck常数 does not fit under any of our existing language headers, then we need to create one that it does fit under.
I disagree hugely on your reading of WT:CFI. Even ignoring my argument that the whole "Attestation vs. the slippery slope" section is informative, not prescriptive, it starts "This is not a problem, as each term is considered on its own based on its usage, not on the usage of terms similar in form." ittyshay, looking at Google Groups, is fair game. We don't use the Web for attestable materials, and the first dozen pages on Google Books and Groups for "natsukashii" don't show many examples that are clearly uses and clearly English.
"it is unreasonable to argue that attestation in a given language context alone makes a term that language"? What? Can you offer a general rule to figure out what language a word is from then? I was going on the general rule of thumb that "Platonic" was English because it's always used by English speakers in English sentences, but apparently that's not good enough.--Prosfilaes 07:30, 9 October 2011 (UTC)
┌─────────────────────────────────┘
(after edit conflict) @Prosfilaes: I would argue Москва is less intelligible than pemoline (or is not intelligible) because it isn't in the same script as English readers know. One of my English friends, after living in Kyiv for a year, told me how carefully he had guarded slips of paper with the addresses of his destinations on them, because he could read no Cyrillic and had no way other than those slips of paper to tell taxi drivers where he wanted to go. In contrast, if his destination had been "Pemoline-Ironbark Station on Votator Street", he could have lost the paper and still pronounced for the taxi driver where he wished to go, even if he had no idea what the words meant. However, I appreciate your underlying argument about the difference between Москва and Planck常数, even if I don't entirely agree with it. I trust you wouldn't object to usage notes in a [[Planck常数]] entry explaining how it was nonstandard, proscribed? If you wouldn't, I'm trying to advance soft redirects (after others suggested them) as a compromise to banning mixed-script entries, precisely so as to address that concern, which you and Dan and Lmaltier express well.
@Eirikr: what would you think of soft redirects, like this? You're warm to them? I want to know if this is compromise is acceptable to more people than a ban is.
If it is... we should all stop arguing XP hehe. - -sche (discuss) 05:39, 9 October 2011 (UTC)
I'm okay with soft redirects as a general idea, provided the sinophone editors are on board (since they're more the ones to say about Chinese entries anyway :). As an interesting wrinkle, google:allintext:+"planck常数"+は shows some use in Japanese contexts, albeit only seven hits, which I haven't gone through to evaluate. -- Eiríkr Útlendi | Tala við mig 06:35, 9 October 2011 (UTC)
Historically, there may have been a few English speakers who only knew the w:Deseret alphabet. (Okay, historically they're probably outnumbered by the number of bilingual English/Russian children who only knew the Cyrillic alphabet.) I would find usage notes on Planck常数 almost essential. As a compromise, I'd have no complaint with soft redirects.--Prosfilaes 07:30, 9 October 2011 (UTC)
(after edit conflict) I removed the controversial point about SoP. It wasn't the main reason, anyway. I agree that mixing words from English and Mandarin when one is speaking or writing in Mandarin is always brushed off as Chinglish by native speakers, no matter how educated the speaker or writer is, who uses it. I stand by what I said before. Leaving people's or place names untranslated is only used when either the writer or the reader may not be able to read or write that name, no exceptions made for small or rare names. @60.240.101.246 Well, if we had an agreement, it would be deleted immediately but obviously we don't. So we have to go through the vote or decide in favour of a soft redirect option (the latest version suggested by -sche). To me it's obvious that "Planck常数" or "Alzheimer病" are not Mandarin just because Mandarin common words are attached to them but I don't want to argue about this forever, let the vote decide, hopefully the common sense will prevail, not the desire to include everything for which there is some attestation. Honestly, it's tiring. --Anatoli 13:52, 8 October 2011 (UTC)
Then what are they? I don't care if they're Mandarin or not; they're words. Label them how you will, but don't just delete them because you don't like them.--Prosfilaes 04:33, 9 October 2011 (UTC)

Italian Wikipedia

The Italian Wikipedia is closed in protest over far going plans by the Italian government that threaten independence see Jcwf 02:51, 5 October 2011 (UTC)

How does an Italian law apply to a website edited by users from around the globe, even if it is, in Italian? JamesjiaoTC 02:57, 5 October 2011 (UTC)
One of the many open questions I suppose. I do not know. But if all governments start doing this I do think we are in trouble. Jcwf 03:02, 5 October 2011 (UTC)
"This proposal, which the Italian Parliament is currently debating". Wait, so it's not actually a law yet? Wikipedia has jumped the gun. The European court of human rights might throw it out, no? As it would seem to contradict the laws protecting freedom of expression. Mglovesfun (talk) 09:15, 5 October 2011 (UTC)
As Jamesjiao says, who does the law apply to? I thought the Wikipedia server was in the US. Would it only apply to Italian citizens? If so, why only in Italian? And if it doesn't only apply to Italian citizens, and I wrote something on the Italian Wikipedia, could I hypothetically break Italian law and get extradited to Italy? Also, nobody is 'in charge' of the Italian Wikipedia, so if Wikipedia fails to conform with a ruling against it, who gets charged? It does say defamatory statements should not be made, just that any such statements should be removed if requested. So the person making the original statement isn't guilty of anything. I'll look into it. Mglovesfun (talk) 09:19, 5 October 2011 (UTC)
This seems to be a voluntary action of Italian Wikipedia in protest over a planned law. If this is so, I would like to see the vote on Italian Wikipedia that has lead to this decision. it:W:Pagina_principale now redirects to W:it:Wikipedia:Comunicato_4_ottobre_2011, as does W:it:Portale:Comunità, so we cannot even read a discussion in the community portal that could have lead to that decision. --Dan Polansky 14:24, 5 October 2011 (UTC)
This seems to be the vote: it:W:Wikipedia:Bar/Discussioni/Comma_29_e_Wikipedia. --Dan Polansky 14:36, 5 October 2011 (UTC)
...which says the notice is up for but a day (according to Google Translate, anyway).​—msh210 (talk) 15:39, 5 October 2011 (UTC)
Read on, and the community decided to make it indefinite, with post-notice discussion w:it:Wikipedia:Bar/Discussioni/Sciopero:_il_punto_della_situazione (here). - -sche (discuss) 18:46, 5 October 2011 (UTC)
The protest action does not seem to affect the mobile version of Italian Wikipedia, so the page is available for reading here: http://it.m.wikipedia.org/wiki/Wikipedia:Bar/Discussioni/Sciopero:_il_punto_della_situazione. --Dan Polansky 06:11, 6 October 2011 (UTC)

m:Wikimedia Forum#Italian Wikipedia -- Liliana 14:40, 5 October 2011 (UTC)

Now m:Wikimedia Forum/Italian Wikipedia. —Angr 06:46, 6 October 2011 (UTC)

biblical quotes as example sentences

Should long-winded biblical quotes be used as example sentences? I'm asking because an anon IP (probably User:123abc) has been adding them to many different Mandarin entries (e.g. 肚子, 一切, 什么, etc), but none of them are really practical for learners nor really relevant to the words themselves. Do we have a policy on example sentences? It's also possible that the translations are copyright. ---> Tooironic 03:34, 5 October 2011 (UTC)

Yes, it's abc123/Engirst. I have wikified and fixed his examples in 肚子. He just copies them from one entry to another. The other issue is that only simplified is given (if the entry is for both) and the traditional version is out of synch 甚麼 or 什麼. It's not answering your question but I wanted to mention this as well. --Anatoli 04:01, 5 October 2011 (UTC)

His contributions (Special:Contributions/2.25.214.61) are also discussed here. --Anatoli 04:06, 5 October 2011 (UTC)

(after an edit conflict) There is nothing wrong with quoting the Bible to illustrate the use of a word, or to attest to its existence; I've added lines from the Bible to entries. We prefer sentences which illustrate the usage of a word well, and therefore we shorten overly long sentences by using ellipses and move sentences that do not illustrate words' usage well to the Citations namespace, but we generally do not remove accurate quotations of literature, because these attest to the existence of the word. The sentences in the entries you link to fail to acknowledge their source, however, which is indeed a copyright/credit issue. The sentences also fail to bold the portion of the English translation that corresponds to the Chinese headword. I would remove the sentence in 一切 because it is fails to acknowledge its source and is badly formatted and opaque; if the source were added, I would just format it correctly and move it to the Citations namespace (because it is still not good as an illustration of the use of the word). I will try to format and source the sentence in 肚子, and shorten it to "你必用肚子行走,終身吃土", because that is a good example sentence. - -sche (discuss) 04:11, 5 October 2011 (UTC)
I think it is helpful and good to wikify/linkify the individual words in Chinese example sentences, because it is otherwise unclear where the word-separations are, but I think it is our policy not to linkify any words in example sentences. (Do we make an exception for Chinese? That would be fine by me.) - -sche (discuss) 04:13, 5 October 2011 (UTC)
I think it is our policy not to linkify any words in example sentences. Oh, I didn't know that! If that's true I wasted my time but I'll seek confirmation. I think it's very useful too (like Wikibooks) and you can also see what's missing. The word forms could link to lemmas. I'll just wait for others to comment on the quotes. --Anatoli 04:18, 5 October 2011 (UTC)

All of these overly promotional or propaganda-like quotes should go. Adding a few quotes from Bible is fine, but adding tons of content from the Bible to entries which barely have any citations is unacceptable. 60.240.101.246 07:33, 5 October 2011 (UTC)

I disagree, if an entry barely has any citations, what it needs is citations! Removing the only current citation seems perfectly counterproductive. We use Bible quote for other languages, notably English and Hebrew. Can't we just treat the Bible like any other book? I'd be happy for Qu'ran, Torah etc. quotes to be used as well. Anyway, these are citations not example sentences; an example sentence is 'made up' for convenience, as it's quicker than finding an actual quote. Mglovesfun (talk) 09:02, 5 October 2011 (UTC)
How about using Qur'an for common English or Japanese entries, like "all", "what", "belly" (using translations of course)? It'll be weird, wouldn't it. I'm happy with Buddhism quotations - that's something acceptable and deeply ingrained in Chinese culture. But Christianity quotations? No. And those sentences (e.g. 一切, 肚子) - they are not how Chinese sentences are normally constructed. They just sound so - "preachy". 60.240.101.246 09:47, 5 October 2011 (UTC)
I'm not saying not to replace the citations with other, better citations. Mglovesfun (talk) 09:57, 5 October 2011 (UTC)
As Mglovesfun said: entries without many citations are the ones that need citations! Citations which do not show normal, fluent sentence construction should be moved to the citations page, though. - -sche (discuss) 18:54, 5 October 2011 (UTC)
I have only seen quotes from Genesis, which would make them equally Hebrew and Christian, but that is not the point. I am not sure what the problem is here, it almost seems like we are looking for reasons to get mad at Engrist. I have used quotes from the Bible (Old and New Testaments) and have not heard a thing about it, it is a seminal text in Hebrew, Greek, Italian and English. I do understand that it doesn't have the same cultural weight in Chinese, but that doesn't have any bearing. If you said "these are bad usexes because they don't accurately or readily convey the usage of the word" I would be on board. As it is it seems more like you are either against Engrist or against the Bible and neither of those stances make a compelling argument. —This unsigned comment was added by TheDaveRoss (talkcontribs).
It's quite obvious that Engirst is here to preach and to promote Pinyinisation of Chinese. Do we really want to see all basic Mandarin entries accompanied by nothing but one or more quotes from the Bible, and the Chinese category dominated by Pinyin not character entries? It's madness really. I'm sure if I were a user who adds uninterruptedly advertising quotations, or a user who constantly writes Chinese Communist Party propaganda by adding English-language quotes from the official PRC press, I would have been banned instantly. There really is no difference. What's more - e.g. in the two quotes added to 一切, both have errors in their Pinyin somewhere. 60.240.101.246 10:53, 5 October 2011 (UTC)
Since pinyin entries are valid, there's no need to 'promote' then, no more than Russian in Cyrillic script needs 'promoting'. Mglovesfun (talk) 11:02, 5 October 2011 (UTC)
Pinyin IS promoted and preached by him and by some other people. Please read this site]. I agree with standardisation movement but not with the replacement of Chinese chracters with pinyin. Mao planned this too. A few Westerners took it literally, including the owner of pinyin.info and Engirst. Some of the material on the site caused outrage by Chinese people. Anyway, this transition is not happening and writing purely in pinyin is only used in educational purposes but we may get into situation when we have more pinyin than Chinese characters. --Anatoli 05:15, 6 October 2011 (UTC)
Pinyin entries are at present allowed iff the corresponding character entry exists and no quotations should be included in Pinyin entries (Wiktionary:Votes/2011-07/Pinyin entries). Both rules were made to control the Pinyin enthusiasm of Engirst, but neither rule is obeyed by him [6][7][8]. 60.240.101.246 11:13, 5 October 2011 (UTC)
I personally delete pinyin when there's no corresponding traditional or simplified. I mentioned this on Wiktionary talk:About Sinitic languages but nobody's supported me as of yet. Mglovesfun (talk) 13:05, 5 October 2011 (UTC)
I did express my weak or tentative support (Sounds like a reasonable suggestion...), read my reply @03:11, 5 October 2011. We only check randomly the pinyin entries, many wouldn't SoP by any standards and we wouldn't create Mandarin entries to match. There are so many of them, he could have spent more time creating the matching hanzi. On the other hand, toned pinyin entries can be good if they are correct, follow the rules, we may not catch up fast enough on creating Mandarin entries, besides, I do a lot of translations, many of them are red-linked, anyway. E.g. qíshǒu is missing at the moment, we don't have 騎手 and 骑手 (rider, horseman) yet but there is nothing wrong with the term. Not to sound like we are "bullying" him, perhaps the pinyin editor should be invited to the discussion. --Anatoli 05:05, 6 October 2011 (UTC)
Does 123abc speak much Mandarin? I think if he were a native speaker he'd be able to write in Chinese characters and also I hope would make fewer mistakes in pinyin. The thing is he's immune to blocks, you can block him as much you as like and he just comes back with a new IP address. He's put himself above the rules. Mglovesfun (talk) 19:29, 6 October 2011 (UTC)
If folks have identified his (assuming this user is male) ISP, it's just a matter of blocking everything from that ISP. Or possibly contacting that ISP and getting the user banned at that level. This single user's disruptiveness is wasting a considerable amount of time and energy, so much that I'm beginning to think that losing the potential contributions of other anons by blocking the whole ISP's block would be more than offset by the actual savings made by getting rid of this one user.
Unless they can somehow be persuaded to change... except they seem immune to any attempt at two-way communication. < sigh. > -- Eiríkr Útlendi | Tala við mig 19:55, 6 October 2011 (UTC)

I just checked a couple entries on my watchlist that IP user Special:Contributions/2.25.212.57 edited today, specifically and . Both edits added biblical usexes that didn't actually show very clearly how the word in question is used, so I reverted both edits. Looking at this user's contributions shows what can only be described as a crapflood. Would someone please block this IP? The time to assume good faith is long since past. -- Eiríkr Útlendi | Tala við mig 20:16, 6 October 2011 (UTC)

  • My instinct is that having extensive Biblical quotations in many Mandarin entries is a poor idea. What remains unclear is whether to have a Biblical quotation in a Mandarin entry is better than to have no quotation at all. If someone starts removing these Biblical quotations, I do not think I will object. --Dan Polansky 21:04, 6 October 2011 (UTC)
I don't object to the idea of including biblical quotes. To me they are just example sentences. However, what I do have a problem with is the fact Engirst is not attaching such a quote to an existing definition (in the cases I've seen - ); as a result, it renders the effort meaningless as users will likely be more confused than enlightened. JamesjiaoTC 21:51, 6 October 2011 (UTC)
Right. The quotations are sometimes OK, and sometimes great examples of the figurative usage of terms (presuming other Mandarin texts use them figuratively, not just the Bible), but other times they are not good illustrations, and should be moved to the Citations: page for that reason (if sourced). Quotations should be removed if unsourced, as incompatible with the GDFL (because they appear to be quotations created by the user and released under the GDFL, but are in fact quotations created by another person and possibly not released under such a licence). Other times there is no definition and adding incorrectly-formatted quotations is unhelpful. - -sche (discuss) 21:57, 6 October 2011 (UTC)
I just went through a slew of them out of curiosity -- most didn't clearly show the word in question, most appeared to be copy-pasta of the same few quotes, and many were for words where the entry doesn't even have a def and the usex doesn't really provide one either. What a waste of time and effort. -- Eiríkr Útlendi | Tala við mig 06:44, 7 October 2011 (UTC)
abc123/Engirst strikes again as Special:Contributions/2.27.72.125 with his biblical examples. --Anatoli 01:17, 7 October 2011 (UTC)
More biblical examples by a fresh IP address Special:Contributions/2.27.73.100. --Anatoli 00:35, 10 October 2011 (UTC)

Japanese and Korean affixes

Japanese Wiktionary hasn’t been using a hyphen for Japanese affixes, and they decided officially not to use it (→ ja:Wiktionary:編集室/2011年Q3#日本語の接頭辞・接尾辞). Korean Wiktionary has already decided not to use a hyphen for Korean affixes either (→ ko:위키낱말사전:자유게시판#접두사 및 접미사 and ko:위키낱말사전:자유게시판/2010-12#접미사에 하이픈을).

The affixes with a hyphen in the following categories must be renamed, except the ones written with Latin letters.

Although page names must follow the rule strictly for the sake of interwiki links, entry names in a page can have a hyphen. — TAKASUGI Shinji (talk) 04:29, 5 October 2011 (UTC)

Note: we also have the option (if our Japanese and Korean editors prefer to include the hyphens in the page titles) of creating unhyphenated pages as redirects, and asking the Japanese and Korean Wiktionaries to create hyphenated versions as redirects. This is how en.Wikt and de.Wikt (which use l') link to and from fr.Wikt (which uses l’). - -sche (discuss) 05:03, 5 October 2011 (UTC)
FWIW, I prefer hyphenless headwords, but that might just be me.  :) -- Eiríkr Útlendi | Tala við mig 05:10, 5 October 2011 (UTC)
I have no preference. - -sche (discuss) 05:23, 5 October 2011 (UTC)
I have no preference either, and I made most of them and hyphenated a lot of them. I can go back and make the necessary changes if we decide to go without hyphens. That's cool. The other languages linked to Category:Japanese suffixes, such as fr:Catégorie:Suffixes_en_japonais, have no hyphens. Only English. There's only one complication I can think of--Template:suffix and Template:prefix automatically add hyphens. Redirect from [[-affix]] to [[-affix]]? Stop using them? Anyway let's vote at Wiktionary:About Japanese. Another newbie like me might make the same mistake. Haplogy 05:32, 5 October 2011 (UTC)
As you know, I’m talking only about Japanese and Korean affixes. Japanese and Korean Wiktionarians use a hyphen for affixes in languages written in latin alphabet, just like English Wiktionarians. — TAKASUGI Shinji (talk) 05:50, 5 October 2011 (UTC)
We could add a nohyphen= parameter to {{suffix}} et al, or create {{ja-suffix}} etc. - -sche (discuss) 05:53, 5 October 2011 (UTC)
{{suffix}} already has a language switch, we could easily add ja and ko to it. Mglovesfun (talk) 11:04, 5 October 2011 (UTC)
{{suffixcat}} would need to be changed as well. —CodeCat 11:14, 5 October 2011 (UTC)

So it looks like the consensus is Japanese and Korean affixes should have no hyphens. Wiktionary:About Japanese does not address the issue, so are there any objections to making a vote to add a section called Affixes with this information? The page says any changes must be put to a vote so I guess I can't just change it myself without a vote. I assume this means that counters should not be hyphenated as well. AJA is unclear--change that too? Haplogy 13:36, 5 October 2011 (UTC)

I'd suggest that no vote is needed, since everyone seems to agree. Mglovesfun (talk) 13:41, 5 October 2011 (UTC)
In that case, I'd like to make that change if there are no objections. Please take a look and change the wording or whatnot if necessary. The sole alterations in extant text are that I removed the hyphen from " e.g., -" under Counter word (助数詞) and added "Do not use a hyphen" to Counter word, and changed Counter word to Counter words since every other POS header is plural there. @Dan: The argument on the Japanese beer parlour is mainly that hyphens are not customarily used in Japanese, and that other languages should follow suit for consistency. If there was anything else I didn't get it, but consistency is good enough for me. Haplogy 15:18, 5 October 2011 (UTC)
Things were more complicated: counter words are traditionally classified as suffixes in Japanese but as nouns in Korean, even though they function quite similarly. Now we don’t have to show the disagreement. — TAKASUGI Shinji (talk) 00:43, 7 October 2011 (UTC)

What is the reason to use no hyphens for Japanese and Korean affixes, while we customarily use hyphens for English affixes? I speak no Japanese, so I cannot read any rationale provided in the Japanese Wiktionary. --Dan Polansky 14:09, 5 October 2011 (UTC)

I urge that no action be taken yet: The 'consensus' referred to above is over the span of but half a day! My initial instinct is that ja and ko be treated the same as en, but I await an answer to Dan's question.​—msh210 (talk) 15:53, 5 October 2011 (UTC)
Haplogy added an answer above to Dan's question, but I'll chime in too and note that Japanese does not use hyphens at all -- those *exceedingly* rare situations where I've seen a hyphen used in Japanese text, it was used precisely because it looks unusual and out of place. No monolingual Japanese dictionary that I've ever seen uses hyphens. Bilingual dictionaries that I've seen appear to be a bit more varied, suggesting no hard-and-fast convention but rather editor preferences. My gut instinct is to follow the JA WT decision, partly for consistency and partly from my perspective that hyphens in Japanese just seem wrong somehow. -- HTH, Eiríkr Útlendi | Tala við mig 16:32, 5 October 2011 (UTC)
I don't really support either option over the other one at this point, but if I had to defend the support of hyphens of Japanese affixes in English Wiktionary, it would be thus: The use of a hyphen before or after a term is immediately understood by English speakers as indicating an affix. Thus, it makes sense to use hyphens with Japanese affixes in English Wiktionary, even if Japanese Wiktionary decides not to use them. A notable feature of the decision of Japanese Wiktionary (ja:Wiktionary:編集室/2011年Q3#日本語の接頭辞・接尾辞) is that only two people voted in support (Mtodo, and Goat), with presumably TAKASUGI Shinji (talkcontribs) having proposed the whole thing and thus implicitly having voted in support, making up only three people in total. --Dan Polansky 17:01, 5 October 2011 (UTC)
Dan makes a good point here -- EN WT is targeted at readers of English, something some of us (myself included) occasionally lose sight of when getting our heads deep into our other languages. Consider me back on the fence for now regarding this issue. -- Eiríkr Útlendi | Tala við mig 17:09, 5 October 2011 (UTC)
Before I started working on affixes, most of them did not have hyphens, and that leads me to think that I have been the only person to use them with Japanese. Eirikr and I are the only particularly active editors right now in Japanese that I know of, and Eirikr is more knowledgeable than me, so I thought that was consensus enough. I've already changed AJA, too early it seems. I noticed that Goat cited EN WT's decision to delete トランス-[9], but it was deleted for a completely unrelated reason. By the way Category:Mandarin_suffixes uses hyphens most of the time. Hyphens or none are both okay by me, but not half and half as they are right now. Haplogy 18:04, 5 October 2011 (UTC)
Just for clarification, I didn’t propose to stop using a hyphen; it was already a de facto rule not to use it. I just proposed to make it official on Japanese Wiktionary. I don’t think the number of voters matters a lot. — TAKASUGI Shinji (talk) 00:14, 7 October 2011 (UTC)
In light of Eiríkr Útlendi's comments, I lean towards deleting the hyphens (from entries and headwords), so that users of Wiktionary have the correct impression that hyphens are not used in Japanese. We could use etymology sections or usage notes to note that the prefixes etc are prefixes etc, like this. - -sche (discuss) 19:07, 5 October 2011 (UTC)
Is there some alternative to a hyphen that would make sense? One comment above is that the hyphen "looks wrong" in Japanese. Would U+FF0D FULLWIDTH HYPHEN-MINUS look better; for example, Template:Jpan instead of Template:Jpan? (This is similar what we do with Hebrew, using e.g. Template:Hebr instead of Template:Hebr, though we keep the latter as a redirect.) —RuakhTALK 19:45, 5 October 2011 (UTC)
Hmm, my comment about hyphens looking wrong in Japanese is simply because no one in Japanese uses them. They look about as out of place as using Japanese punctuation in English would look、 a bit like this 「sample」 here。  :) I don't think using these different types of hyphen fixes the "wrongness", simply because they're still hyphens, and still look out of place in a Japanese context. -- Eiríkr Útlendi | Tala við mig 21:26, 5 October 2011 (UTC)

I agree that Japanese and Korean affixes should not bear hyphens. The same should be applied to Chinese affixes as well. (btw, there are many more practices in the Japanese Wiktionary which are potentially beneficial here. They disallow romaji, pinyin, or any other romanisation entry; combines Chinese into one header; writes wago with kana and kango with kanji, etc. Their entries do look a lot clearer than ours: ja:字, ) 60.240.101.246 20:23, 5 October 2011 (UTC)

It bears noting that JA WT doesn't need to use romaji because they can safely assume that everyone using JA WT already knows at least kana. We can't make that same assumption here on EN WT with regard to kana, kanji, hanzi, Devanagari, Hebrew, Khmer, what-have-you.
Whether we should allow or encourage the creation of Latin-alphabet entries for languages that traditionally use other writing systems is a different question, but the persistence of many editors suggests that there is a demand for such entries, perhaps in part because of the limitations of the MediaWiki software. For instance, I may know that Hindi and Urdu for formal second-person plural is āp, but if I don't know how to write this using the Nastaliq or Devanagari scripts and can only search for the Latin-alphabet rendering, I am instead directed automatically to a page about Tocharian A and B, with no hint that the pages for آپ or आप even exist. Similarly, if I know that the Mandarin for stone is pronounced shí but I don't know how to input , a search for shí would show me just Irish and Navajo, leaving me confused and frustrated, were it not for the editor(s) who added the romanized Mandarin entry to that page.
Until such serious usability shortcomings are addressed, Latin-alphabet renderings are an easy workaround. -- Cheers, Eiríkr Útlendi | Tala við mig 21:26, 5 October 2011 (UTC)

Another complication- so we are going with no hyphens. But is that no hyphens in romaji too? For example, for the suffix do we have {{ja-pos|k|suffix|hira=かい|rom=kai}} or {{ja-pos|k|suffix|hira=かい|rom=-kai}}? User TAKASUGI seems to think that Japanese character pages (kanji and kana) should not have a hyphen but Roman character pages should. TIA Haplogy 04:54, 7 October 2011 (UTC)

I just think they are separate. My understanding is that the use of hyphens is not language-dependent but character-dependent, like spaces, which Japanese don’t use when they write with kanji and kana but they use when they write with Latin letters. Anyway the community should decide it. — TAKASUGI Shinji (talk) 09:00, 7 October 2011 (UTC)
I understand now. That makes sense, using hyphens with romaji but not using hyphens with kanji or kana. AJA has been updated to reflect this and most affixes have been updated per policy as well. Haplogy 17:07, 10 October 2011 (UTC)

Edit tools for search

Would it be possible to have edittools for the search bar? Right now, we can use them to type special characters in entries, but not when they appear in the title of an entry. If I want to create new Gothic or Proto-Germanic entries I first have to edit an existing page, use the edittools to type the name there and then copy it into the search bar. It's not very convenient that way. —CodeCat 11:18, 5 October 2011 (UTC)

You don't have to use the search bar to create a new entry, though. You can just create a new redlink in your sandbox and then click on it. —Angr 11:36, 5 October 2011 (UTC)
And that's what I said is inconvenient... —CodeCat 12:35, 5 October 2011 (UTC)
Well, you said you had to then copy the name into the search bar. Clicking the redlink is slightly less inconvenient than that, but admittedly still more inconvenient than having the edittools right there at the search bar. I just wonder how much clutter that would create, considering the search bar is present on every page, regardless of whether it's being edited or not. —Angr 12:46, 5 October 2011 (UTC)
Maybe the edit tools could appear in a small menu to the left of the bar, and only appear in a small window below when you click on it? —CodeCat 13:01, 5 October 2011 (UTC)

We used to have a preferences option for this, but IIRC it broke a while ago with nobody having fixed it to date. -- Liliana 14:39, 5 October 2011 (UTC)

My preferred edittools character set does not appear under the search box by default for me, but can be made to appear and persist if I select a different character set and then select the one I prefer. It would be nice not to have to bother, but this is just two clicks. I am unsure how long the edittools characters persist. DCDuring TALK 15:57, 5 October 2011 (UTC)
Apparently it disappears after each save. DCDuring TALK 15:58, 5 October 2011 (UTC)
Yeah, we really need a way to type special characters in the search bar. I'm thinking a little popout keyboardy thing next to the search bar. Started working on a script at User:Yair rand/keyboards.js. --Yair rand 00:44, 7 October 2011 (UTC)

Gtroy sockpuppets

Does the community want me (or any other sysop) to continue to block sockpuppets of the permanently blocked User:Gtroy? His latest ID was User:Totallynotfairbro, most of whose contributions seemed reasonable (but he still forgets basic formatting issues from time to time). SemperBlotto 07:22, 7 October 2011 (UTC)

Not a very useful comment, but I don't know why he needed sockpuppets. He seemed to me to be slowly gaining respect after a bad start, then decided whilst not blocked (though has been blocked since) to create a load of supplementary accounts, even working off two accounts simultaneously. Mglovesfun (talk) 07:27, 7 October 2011 (UTC)
I suggest we allow him to edit, for now. His pronunciation of beefcake is interesting; many entries are borderline SOP, but... we have RFD for those, and his other pronunciations are OK. - -sche (discuss) 07:38, 7 October 2011 (UTC)
I suggest when unblock Gtroy (talkcontribs), his primary account, give him a stern warning and indef block if said warning is not sufficiently adhered to! Mglovesfun (talk) 07:49, 7 October 2011 (UTC)
I agree with Gloves. He's not a perfect editor, but does more help than harm. Also, chasing sockpuppets can last for ever. --Rockpilot 08:03, 7 October 2011 (UTC)
I agree, I think his entries are mostly quite good, and most of the problems seem to be typos or relating to complex formatting issues. He’s new here and does not know how seriously Wikimedia views legal threats. He should be warned about that. He seems to be allergic to Ric (like Ric was allergic to Razorflame). I don’t really understand this personality friction very well, but I suspect if they knew each other just a little better, they would be friendly. Gtroy takes Ric entirely too seriously. The pronunciation at beefcake, while interesting, is not, I think, very useful and probably should be replaced with a plain vanilla model. —Stephen (Talk) 09:58, 7 October 2011 (UTC)
Haha just heard this, sounds like the death metal interpretation to me. Yeah, should be deleted. Mglovesfun (talk) 10:08, 7 October 2011 (UTC)
I agree, unblock and mentor. bd2412 T 20:26, 13 October 2011 (UTC)
Just to be clear on this, Gtroy = Wonderfool, right? Or at least, Totallynotfairbro = Acdcrocks = Rockpilot = Wonderfool (whether or not Gtroy = Totallynotfairbro). - -sche (discuss) 08:20, 12 October 2011 (UTC)
Nope, Gtroy appears to be another user entirely. -- Liliana 10:04, 12 October 2011 (UTC)
I am 99% sure WF doesn't have an American accent, and GT was recording new audio with one, so no, he isn't. Equinox 20:32, 13 October 2011 (UTC)
My suspicion was aroused because (Rockpilot=Wonderfool) and (Acdcrocks=Totallynotfairbro=?) both nominated words on the WOTDN talk page (Wiktionary_talk:Word_of_the_day/Nominations). I suppose it's simple that Gtroy could have seen WF do it and decided it was a good idea (since neither could edit the semi-protected WT:WOTDN page itself). - -sche (discuss) 21:11, 13 October 2011 (UTC)
User sent yet another email to info-en(a)wikitionary.org about this block; ticket 2011101210018795. I explained the problem of legal threats to him before but I'm not getting involved in this. I'm going to be blunt and state that en.wiktionary is poorly set up for me to suggest avenues with which to request consideration of an unblock by anyone but the original administrator. There is no {{unblock}} template at all and if you compare MediaWiki:Blockedtext to w:MediaWiki:Blockedtext, it's pathetic. The stated advice to email the OTRS team is not the correct course of action. Your email address is not necessarily monitored by admins at Wiktionary to handle this sort of thing and you leave me no options to provide to the user. Adrignola 03:52, 13 October 2011 (UTC)
How bout a vote of confidence on whether I should be blocked or not by all the admins that takes everything that both I and Dick have said and done into account with a public comment period?Catch22 09:11, 15 October 2011 (UTC)
I've updated our MediaWiki:Blockedtext a bit, and recreated the (previously deleted!) {{unblock}} template. - -sche (discuss) 05:57, 13 October 2011 (UTC)
Thanks. Adrignola 16:09, 13 October 2011 (UTC)
This is Acdcrocks/Gtroy, I got blocked by Dick again but with no cause, he seems to have willfully ignored this entire discussion and only blocked me for "sockpuppetry" even thought I have not created any new accounts. I would like to maintain the ACDCrocks account and be able to maintain a contributions history and watchlist in one place. I can't place unblock on my talk page as me because I am blocked from editing my own talk page, I can also not e-mail any users as I am blocked from doing that too.71.142.74.66 21:07, 13 October 2011 (UTC)
  • For the record, I blocked Troy (Gtroy/ACDCrocks) indefinitely because he started making weird legal threats at me. No matter how furious I get with other editors (and it does happen, believe it or not) I never lose my mind enough to threaten them. Troy doesn't handle criticism well, constructive or otherwise, doesn't seem to take direction well, and I don't know if he'll ever quite understand the criteria for inclusion - the sum of parts issue in particular. In my opinion, the pros of letting him stay are outweighed by the cons. The quality of his editing weighed against the content of his apparent character... doesn't inspire me. Say what you will about my personality, but I do kickass work, and I listen when you say "hey you did this wrong asshole". (PS Troy, I wasn't ignoring this topic - I just didn't know it existed. I don't frequent the BP. I tend to have more constructive things to do.) — [Ric Laurent] — 22:58, 13 October 2011 (UTC)
I made one such claim after Dick made some very offensive and vulgar insults at me and he continues to use the most uncouth and incendiary rhetoric about me whenever possible. Not much class there. He is using his admin powers despotically and is insincere in his claims of being the victim in this situation. I handle criticism very well, what I didn't do was at first understand how wiktionary differed from wikipedia but I did figure that out over time. And learned a lot from the suggestions of others particularly SemperBlotto and Equinox. Dick's comments here really just show he does not like that items that I have added and instead of taking them to verification and deletion just is justifying blocking me for not harboring his opinion of sum of parts and exclusionary wordview by blocking me for the restraining order comment but from his own narcissistic comment preceding this one its clear to me the his true reason from blocking me was the ulterior motive of disliking my lexicographic style and my person. I think when there is clearly just a personality conflict it should be left the the community to decide what to do, not either of the parties involved.Catch22 09:11, 15 October 2011 (UTC)
  • I only created this account because acdc rocks has been blocked I am not trying to be a sockpuppet and I in no way deny I am Troy McCormick / gtroy / acdc rocks.Catch22 09:13, 15 October 2011 (UTC)
    • I'm as good as my word, I warned you, you reoffended and I blocked you with an expiry time of infinite. Mglovesfun (talk) 11:44, 15 October 2011 (UTC)

Twice-borrowed terms

We have categories for twice-borrowed terms, which are words that were borrowed into another language and then later borrowed back from that language into the language it originated from. I've been adding Dutch words to this category but there is a question I have. At what point can you consider something 'the same language'? I would consider Frankish (the source of many French words) a form of Dutch, so any of the French words of Frankish origin that were borrowed into Dutch later would be twice-borrowed terms. But is a word that was borrowed from Old Norse into Norman French and then from French into modern Norwegian a twice-borrowed term? What about words that were borrowed from Proto-Germanic into Latin and then from Old French into Middle English? —CodeCat 12:52, 7 October 2011 (UTC)

Usability of translation tables

Translation tables are currently actual tables in HTML, but they don't actually contain tabular data. The two-column layout is nice for people with wide screens, but for those who have less width available it's not really convenient. I also noticed that the 'mobile view' feature still shows the translations in two columns, like here. This is obviously less than ideal for people using mobile phones. I'm not quite sure how this could be improved, but I would like there to be at least some kind of option to show the translations in one column (and in <div> if possible). —CodeCat 13:13, 7 October 2011 (UTC)

Hmm, yes, it's been a while since I've messed around with CSS and such, but isn't there some way of specifying the minimum and maximum widths of a display element? Would it be possible to rework things like {{translations}} and {{der-top}} to allow for dynamically resizing these lists into however many columns fit best on the user's screen? -- Suddenly feeling the urge to break open my HTML references, Eiríkr Útlendi | Tala við mig 16:47, 7 October 2011 (UTC)

...-based pidgins or creole languages

Look at the beginning of Category:Pidgins and creole languages and you'll see what I mean. I never put a high value in these "pidgin/creole by source language" categories, and this proves excellently why they're pointless - some languages have so many conceivable sources that you can put them in five or six of these categories. (heck, Category:Gullah language has 15 source categories!) Therefore, I propose to delete them, and no longer categorize creoles by source languages. -- Liliana 16:33, 8 October 2011 (UTC)

I think that we should only categorize by superstratum languages. Creoles have so many substratum languages that it's very hard to identify them all. —Internoob 21:25, 10 October 2011 (UTC)

Vote on banning Latin-containing Mandarin

Some thoughts on Wiktionary:Votes/2011-10/CFI for Mandarin proper nouns - banning entries not in Chinese characters, in a separate thread.

The vote seems to be a response to the reckless activity of 123abc (talkcontribs) aka Engirst (talkcontribs). The vote seems unneeded to me, going overboard. The reckless activity of the user can be checked by changing the RFV procedure for Mandarin terms containg Latin as follows:

  • A term that contains Latin letters and is marked as "Mandarin" can be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations.

This would be a change of procedure rather than definition of what is included in Wiktionary, a change concerning only a well-defined subset of would-be Mandarin terms, many of which are unlikely to be attestable. The process simplification would be major: instead of sending terms created by Engirst to RFV one by one, admins could speedy delete such terms. The only place in which the citations would be collected for these entries would be citations namespace, so the mainspace entry could remain deleted until the citations are provided. --Dan Polansky 07:53, 9 October 2011 (UTC)

This appears to be a good idea. Introducing exceptions to the CFI rules is unwise, because it makes things more complex (see KISS principle) and less neutral. Banning a whole class of entries because of a user is very unwise (it's like closing the project because of vandalism). Also don't forget that users specializing in some categories of entries help very much, especially when they specialize in uncommon terms, less likely to be addressed by other editors.
I would propose the same procedure change for all infinite series such as numbers or the like. Lmaltier 08:10, 9 October 2011 (UTC)
I don't think you should delete 卡拉OK. Fugyoo 08:21, 9 October 2011 (UTC)
卡拉OK would be kept as soon as it would be attested in Citations:卡拉OK. Attesting those few Latin-containing Mandarin entries that we already have and are genuinely attestable should be a manageable amount of work, don't you think? --Dan Polansky 08:28, 9 October 2011 (UTC)
(after 3–4 edit conflicts, haha) We should possibly say "non-Hanzi" in place of "Latin letters" (to exclude Cyrillic, Greek etc), but amending procedure in this way is a good, practical idea. Should we generalise it to all languages? (Ie, speedily delete any mixing of scripts? I can think of arguments in both directions, though the arguments in favour of generalisation are more hypothetical: someone could create a flood of inيظ#English entries.) - -sche (discuss) 08:15, 9 October 2011 (UTC)
Good point, Fugyoo. Maybe we should just be direct (without a vote to make it any formal part of policy or procedure, just using our common sense) that it is a single editor whose contributions we have reason to doubt, while we would allow a month at RFV for doubtful terms from other editors? We'd use the same common sense to speedily delete any flood of inيظ#English entries. - -sche (discuss) 08:38, 9 October 2011 (UTC)
I would keep the procedure as narrow and non-generalized as possible, tailored to check Engirst. Thus, I would go for "Latin letters" and for "Mandarin". I would not oppose a generalized procedure, though. A more general procedure needs more testing and is more likely to have unexpected side effects. --Dan Polansky 08:53, 9 October 2011 (UTC)
You want to ban riemannsche ζ-Funktion? -- Liliana 14:16, 9 October 2011 (UTC)
No, of course not. English and Mandarin use foreign letters, if they have to. Both α粒子 and α-particle are perfectly OK but when they are transliterated, they are transliterated using native scripts - Template:Hans (ā'ěrfā lìzǐ) and alpha particle. --Anatoli 23:42, 9 October 2011 (UTC)
I think Liliana was directing that comment at me, anyway, for asking if we should make the rule apply to all languages. I was only asking, though, and I see the arguments against making it apply to all languages are convincing. - -sche (discuss) 23:53, 9 October 2011 (UTC)
卡拉OK and OK#Mandarin and a few others will be kept, they are legitimate exceptions and they are common nouns. The vote is about proper nouns, not common nouns. The common noun containing Latin proper nouns in full, in particular Planck常數, will not be allowed either. Proper nouns containing Latin or other letters invented by Chinese will be allowed as well. It's all on the page. If we all agree to soft-redirect, there won't be a need for the vote. --Anatoli 09:12, 9 October 2011 (UTC)
Re: "The vote is about proper nouns, not common nouns": Wrong. From the vote: "This vote only affects proper nouns and common nouns using non-Chinese proper nouns as part of a common noun [...]". "Planck常数" and "Alzheimer病" are common nouns.
Are there any people who oppose having the soft-redirects?
What do you think about the speedy-delete procedure for Latin-containing mixed-script Mandarin terms? --Dan Polansky 09:49, 9 October 2011 (UTC)
If you reread my comment I mention common nouns containing proper nouns in full - Planck and Alzheimer are proper nouns. We have one person, native Chinese speaker opposing soft-redirects. Speedy-delete procedure? Good idea. Forms like Thames河, London市 should be deleted on sight. If we agree on soft redirects, we have the common and standard Chinese term and somebody insists on having them, they could be converted to soft redirects. This practice should not be encouraged, native Chinese people don't consider them Mandarin. Borrowings are transliterated or translated into Chinese characters, exceptions are abbreviations. --Anatoli 10:10, 9 October 2011 (UTC)
You would do well to ensure that every sentence you say is true. It is a poor practice to expect me to correct one your sentence from a later sentence. The sentence "The vote is about proper nouns, not common nouns.", ending in fullstop, is false, and you should acknowledge as much.
If the only person who opposes soft-redirects is 60.240.101.246, there is nothing to worry about: he is a self-proclaimed prescriptivist, who wants to protect the purity of language. --Dan Polansky 10:41, 9 October 2011 (UTC)
I disagree. 60.240.101.246 is a native speaker. It's not prescriptivism, it's common sense. There is no real equivalent of "Alzheimer病" in English I can quote but think of errare humanum est. Is it attested? Yes. Is it used by English speakers and writers? Yes, a lot. Is it English, though? No. You and Engirst are using citations as a weapon to introduce words into Mandarin, which don't belong there. --Anatoli 23:42, 9 October 2011 (UTC)
As I've said many times in the course of the debate, I don't care what language they're listed under, so long as someone can look them up. You want to rip them out of the dictionary as a whole, which is against the spirit of a multilingual descriptive dictionary. I'll see your errare humanum est and raise you noli illegitimi carborundum. Is it Latin? Certainly not. It doesn't look like English. So should we delete it from our dictionary and screw over all the users who might want to look it up?--Prosfilaes --70.180.206.122 09:29, 10 October 2011 (UTC)
Of course, errare humanum est' is English. Of course, if it's used in English, it should also get an English section. It's useful, because it's an indication that it's used in English, and for pronunciation (I suspect it's not pronounced the same in English and in French, but I don't know how it's pronounced in English). The most popular French dictionary (Petit Larousse) has a famous section (pink pages) about these foreign phrases used in French. The principle presence of a section for a language if the term is used in the language is a very sound principle (and the only possible principle if we don't want to be subjective). Lmaltier 17:13, 10 October 2011 (UTC)
I would be careful about generalizing and banning all mixed scripts in all languages. Some modern languages mix scripts as standard practice. Examples include some of the Caucasian languages that prefer to use Latin I instead of Cyrillic Ӏ; Ossetic prefers Latin æ to Cyrillic ӕ; and Chuvash prefers ă/ĕ/ç to Cyrillic ӑ/ӗ/ҫ. I know that some of us think we should force everyone in the world who uses a non-Roman script to adopt the recently devised Unicode Consortium ranges to write their languages, excluding all exceptions, but really, the native speakers and writers of each language do have a right to come to an agreement with each other to use the letters and code points that they decided upon. And in technical usage, it is not uncommon to find terms such as u-bend translated into some non-Roman script languages with the Roman letter u. There are many, many valid exceptions to a rule to ban all mixing of scripts. —Stephen (Talk) 09:48, 9 October 2011 (UTC)
The vote doesn't concern languages other than Mandarin, especially if it's the norm for these languages to mix scripts. There are valid exceptions in Mandarin (and other languages) as well. 三K黨 and 三K党 (Ku Klux Klan) are perfect examples of Mandarin proper nouns containing Latin letters. They are Chinese inventions. --Anatoli 10:10, 9 October 2011 (UTC)
Yes, I know, but -sche suggested making this a blanket ban against all mixing of scripts, asking, "Should we generalise it to all languages?" —Stephen (Talk) 10:21, 9 October 2011 (UTC)
The vote is complicated as is, no need to generalise. I won't agree to generalisation. I think -sche meant Japanese. There is no current controversy there. The few exceptions are known and no-one is pushing unwanted mixed-script terms. --Anatoli 10:29, 9 October 2011 (UTC)
Good points. Keep it specific to Chinese (perhaps even use our common sense not to speedily delete but to RFV existing Chinese entries which we know are good but which are not cited, as the vote does say they "can" be deleted, not that they "must" be). - -sche (discuss) 20:21, 9 October 2011 (UTC)
Why are Banach空间, Banach空間 and Hilbert空间 deleted? They are cited. Please see here, here and here. 2.27.73.100 22:41, 9 October 2011 (UTC)
Why should we reply to you when you never reply to anyone? Anyway, for others, the only compromise the majority of Chinese speaking editors except for one native speaker - Special:Contributions/60.240.101.246 (he is outright against such entries), could reach is a soft redirect, like this one Planck常数, provided the correct Mandarin entry exists. That, of course excludes, city, park, state, people, whatever names entirely in Roman letters with or without qualifiers, "London#Mandarin" or "London市#Mandarin" will be deleted on sight. As your entries are all bad - no value in them, close to 100%, we may use bulk delete of all your entries, under any IP-address you use for the sanity of Mandarin entries. I don't think there will be strong opposition to expelling you completely and deleting all your "work" in one go. --Anatoli 22:55, 9 October 2011 (UTC)
@2.27.73.100/Engirst: FYI, citations have to be formatted correctly and placed in the entry or the Citations: page, it isn't enough to link to a Google search. Raw Google results are not acceptable citations, anyway; citations (for any word in any language) must be durably archived, which in practice means you should look on Google Books and Usenet (which you can access via Google Groups, but notice that not all Google Groups are Usenet groups). WT:" tells you how to format a citation of a Book, and you can look at entries like rainburn to see a common format for citing Usenet posts. - -sche (discuss) 23:12, 9 October 2011 (UTC)
Understanding of what is a good Chinese entry will now differ unfortunataly as we now have 123abc's entries' advocates with no knowledge of Mandarin. "Ohm定律" and "Planck常数" are bad enough but next will be place and personal names in Roman letters - entirely of with place name qualifiers. In 123abc's point of view "London" or "London市" is also a Mandarin word. --Anatoli 22:26, 9 October 2011 (UTC)
Nah, we'll delete "London" and "London市" (unless someone proves it means something non-SOP); there's been overwhelming consensus on both of those points, because we have London#English already, and because "London市" is sum-of-parts, just like "London city" or "the city of London" would be, except in the narrow, uncommon sense of that term. The "existing Chinese entries which we know are good" I referred to above are entries like "卡拉OK", which Fugyoo brought up. - -sche (discuss) 22:51, 9 October 2011 (UTC)
Alright, to codify the two ideas we've reached agreement on (soft redirects, and speedy deletion), I will start a Wiktionary:Votes/ page for the soft-redirect policy vote, and then invite everyone to tweak and improve my wording. Dan, would you set up the vote on changing RFV procedure? :) - -sche (discuss) 00:02, 10 October 2011 (UTC)
Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries. Discuss the vote's wording on the talk page, please, where I ask several questions. - -sche (discuss) 02:49, 10 October 2011 (UTC)
I answered some questions, renamed to "Mandarin", added some comments and made some changes. We need to describe the criteria for the established and standard Mandarin terms containing Latin, Greek, etc. letters. --Anatoli 03:21, 10 October 2011 (UTC)

@-sche and vote: I would create a vote for my proposal, but I want to let it sit in Beer parlour a bit longer, so people can comment on it, oppose it, and propose changes in wording. I think the discussion should better sit from 3 to 5 days in BP before I create a vote. An updated proposed wording is this:

A term that contains Latin letters and is marked as "Mandarin" can be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations. Such a term can but does not have to be speedy deleted: each admin can decide to avoid deleting "卡拉OK" in spite of there being no citations in "Citations:卡拉OK".

A deletion summary, which is not part of the vote-to-be, could be this: "Mixed-script Mandarin entry that is not yet attested by quotations in citations namespace; see also WT:Attestation" Anyone please feel free to create a vote if I forget to do so in a couple of days. --Dan Polansky 09:59, 10 October 2011 (UTC)

I don't like "Such a term can but does not have to be speedy deleted". An admin can choose not to delete any file they want, and this sentence gives no protection for when an admin walks by and does delete 卡拉OK. It provides no guidance and doesn't change the rules at all.--Prosfilaes 13:20, 10 October 2011 (UTC)
The second sentence merely highlights the use of "can" rather than "should" in the first sentence, as such distinctions get easily overlooked. It emphasizes that a deletion is not a necessary consequence of missing quotations. The second sentence could be dropped, but it seems to me that it makes the first sentence clearer. --Dan Polansky 13:31, 10 October 2011 (UTC)
I'm not a huge fan of that. I think it better if admins are mop wielders, not deciding whether or not a page is "good enough" to stick around. I also think it provides at best illusionary protection to 卡拉OK; whether that says can or should, an admin can walk by anytime and be fully justified in deleting it. If you want 卡拉OK to stick around, cite it; otherwise accept the fact that your new rule will make it speedyable.--Prosfilaes 13:42, 10 October 2011 (UTC)
Here's an alternative for you:

A term that contains Latin letters and is marked as "Mandarin" should be speedy deleted without RFV process unless the citations namespace of the entry already contains attesting citations.

If there's going to be a vote, both alternatives can be offered for consideration. --Dan Polansky 13:51, 10 October 2011 (UTC)
An example of cited mixed-sript Mandarin entry could be like this Banach空间 (This cited example has been deleted by Anatoli):
Mandarin
Noun

Beer parlour (simplified, Pinyin Banach kōngjiān)

  1. Banach space

2.27.73.173 12:30, 10 October 2011 (UTC)

Oh you can talk? No, they will be deleted on sight in this format. That's a general consensus. --Anatoli 12:36, 10 October 2011 (UTC)
Engirst AKA 2.27.73.173, free free to collect three properly formatted quotations at Citations:Banach空间. However, chances are the entry will be restored only days later: you have been evading blocks and showed very little cooperation with other editors, so restoring entries that you have created is no priority for Wiktionary editors. --Dan Polansky 13:43, 10 October 2011 (UTC)
The entry should be formatted exactly as Planck常数 ("mixed language") - a soft redirect to 普朗克常数 ("correct term"), Dan Polansky, you and all editors agreed to this. Banach空间 will not be created before 巴拿赫空间 exists. Some didn't agree to this condition. We may need to go ahead with the vote - Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries. --Anatoli 22:00, 10 October 2011 (UTC)
  • Alright, let's go ahead and make "卡拉OK" speedily-deletable. (Admins should use common sense in deciding what to delete, and defer to Mandarin-speaking editors when uncertain, but let's accept for the moment the presumption that they will not.) As Dan said, "卡拉OK would be kept as soon as it would be attested in Citations:卡拉OK. Attesting those few Latin-containing Mandarin entries that we already have and are genuinely attestable should be a manageable amount of work, don't you think?" I'll start citing some of them. - -sche (discuss) 22:42, 10 October 2011 (UTC)
Yes, we should save genuine "mixed script" Mandarin entries and make a clear distinction between "mixed script" terms and "mixed language" (code-switching). --Anatoli 23:39, 10 October 2011 (UTC)
An administrator shouldn't uses double standard. Please see here. 2.27.72.128

Attestation vs. the slippery slope

I would like again to get the section WT:CFI#Attestation vs. the slippery slope removed from CFI. A previous attempt at Wiktionary:Votes/pl-2011-01/Final_sections_of_the_CFI ended 5:4:0 for deletion.

I argue that the section is needless and misleading.

The section is needless, as, if it gets removed, the following dialogue covers the case:

  • Alice: Adding the entry for the particular term "ttt" will lead to entries for a large number of similar terms. Thus, we should delete "ttt".
  • Bob: That is not a CFI consideration. CFI mandates that a term should be included if it is attested and idiomatic.

Done; no need to list every wrong argument for deletion in CFI.

The section is misleading, as two of its bullet points refer to "common use" and "general use" in contradiction with "Attestation" section, implying that a term in pig Latin should be included only if it "has found its way into common use". My undestanding of how CFI should work is that a term in pig Latin should be included only if it is idiomatic and attestable, regardless of whether it "has found its way into common use".

Do any opposers of the vote find any of this convincing? Are there any new supporters of the removal of the section? --Dan Polansky 08:25, 9 October 2011 (UTC)

For anyone who would want to respond is a poll-like fashion, in which discussion is of course also welcome, here are some templates: {{subst:support}}, {{subst:oppose}}, {{subst:agree}}. --Dan Polansky 08:34, 9 October 2011 (UTC)

I supported (and still support its removal) as it's not criteria for inclusion, but rather more of a discussion about what to include and what not to. If anything it's more suited to Wiktionary talk:Criteria for inclusion! Mglovesfun (talk) 11:46, 9 October 2011 (UTC)

fr:Template:en-nom-rég2

Hello, I propose to merge this template with {{en-noun}}. It automatically displays the plurals and their pronunciations. JackPotte 13:49, 9 October 2011 (UTC)

We don't indicate the pronunciations of words in the headword line of our entries on en.Wikt, though, we indicate pronunciations in the ===Pronunciations=== section. A very large number of English words have at least two different pronunciations (UK and US); some words have eight or more possible pronunciations (of the singular alone!), like pecan. That would require the headword line to be more a headword paragraph! - -sche (discuss) 20:33, 9 October 2011 (UTC)
What -sche said. JamesjiaoTC 03:24, 10 October 2011 (UTC)
Clever thing mind you, it attempts to work out the pronunciation of the plural based on the IPA inputted and it attempts to work out the plural using only the PAGENAME. --Mglovesfun (talk) 12:03, 10 October 2011 (UTC)

I've just finished fr:Template:fr-accord-rég2. JackPotte 18:51, 15 October 2011 (UTC)

Making 'see also' clearer to users

We use the template {{also}} to show links to other pages that are written with the same letters but with diacritics or capitals. The recent discussion at Wiktionary:Feedback#Prestige shows that this can be very confusing to new users. It has to be added to every page and it's easy to miss a few possibilities or even just to forget to add it. And compared to fr:Prestige, it's just too small and doesn't stand out. It's very easy to miss. The 'see also' text itself isn't really always confusing to users, only if the difference is just capitalisation. When you edit a new page beginning with a capital, like Nonsenseword, the wiki software warns you that the title might not be correct. But there is no warning if the page already exists.So for that reason I think it would be nice if warnings about capitalisation could be automatically added to every page, perhaps even outside the wikitext. —CodeCat 11:59, 10 October 2011 (UTC)

maybe 'the title of this page is {{PAGENAME}}, see also [] '. --Mglovesfun (talk) 12:01, 10 October 2011 (UTC)
That isn't really any clearer at all, it just repeats the name of the page. The problem isn't that users don't see the name of the page, it's that they don't understand the significance of the capitalisation. The current system with {{also}} helps somewhat to clarify this, but it's not very obvious to users and it's not used consistently enough either. —CodeCat 12:05, 10 October 2011 (UTC)
It might help to provide a more visible contrast between say Fish and fish. Mglovesfun (talk) 16:27, 10 October 2011 (UTC)
I'd like to put something Wikipedia does for ambiguous titles, eg "This entry is a name, for other senses see fish" or "This entry is about a German noun, for other languages see prestige". I think a bot could do it. Fugyoo 22:23, 11 October 2011 (UTC)
Maybe have {{also}} display "Entries for similar words:" or similar.​—msh210 (talk) 00:44, 12 October 2011 (UTC)

Why is Banach空间 deleted?

Discussion moved to Talk:Banach空间.

Bot generation of Portuguese verb forms

I have noticed that there are quite few entries for Portuguese verb forms (mainly some forms generated by WF's bot)and currently no bot dealing with their creation. So I have modified my User:BuchmeierBot code in order to be able to deal with Portuguese verb conjugation tables. I would like to generate the forms of verbs, that already have a conjugation table (of course after checking for correctness of the conjugation). Should I start a vote? Matthias Buchmeier 16:10, 10 October 2011 (UTC)

  • No. We trust you. Just go for it. SemperBlotto 16:12, 10 October 2011 (UTC)
    • Seconded. Just start slowly and build up speed (I speak from experience). Mglovesfun (talk) 16:24, 10 October 2011 (UTC)

We shouldn't use double standard

- -sche said: "deleted, per the precedent and discussion of WT:RFD#Москва". pizza#Mandarin is deledted, so OK#Mandarin should be deleted as well. Actually their meaning can be found from English entries. So, they are not necessary. 2.27.73.173 18:56, 10 October 2011 (UTC)

You're right, we shouldn't. We don't want to have a double standard for you as opposed to anyone else who won't listen to what people say. —CodeCat 19:01, 10 October 2011 (UTC)
  • I propose this post by a banned user who does not cooperate with Wiktionary editors, rarely answers questions but feels himself entitled to start a new BP dicussion whenever he sees fit, and possibly cannot even read Chinese characters, is left without any further response. --Dan Polansky 19:02, 10 October 2011 (UTC)
You have NO STANDARDS. That's the problem. You keep inventing stuff AND not being consistent with what you do. You have been blocked again for 3 days for doing this shit: [10]. JamesjiaoTC 21:28, 11 October 2011 (UTC)
Japanese Romaji used "-" for suffixes as well, please see here. So, you use double standard indeed. Alexando 07:16, 12 October 2011 (UTC)

Phrasebook, again

I think Equinox pretty much got it at Talk:I'm transsexual - our phrasebook, as is, is a sick joke. Pretty much half the phrases are about sex (some of them as silly as I'm horny - I mean c'mon, who actually says that?), while actual phrases that you would find in a printed phrasebook (what day is it, can you give me directions, etc.) are curiously absent. This shows it needs some kind of reform, and most importantly a radical pruning. -- Liliana 22:07, 11 October 2011 (UTC)

Agreed. Having travelled to countries with languages that I speak very little of, I'd say most of the sex-related phrases should be removed, unless you travel for the sole purpose of fornication. JamesjiaoTC 22:14, 11 October 2011 (UTC)
Being transsexual has nothing at all to do with sex, though. —CodeCat 22:31, 11 October 2011 (UTC)
is it necessary though? I mean, if you were a real transperson, the last thing you would do is disclosing it to strangers... no? -- Liliana 22:33, 11 October 2011 (UTC)
Yeah, it's hardly something you'd just drop into a conversation, is it? One that made me laugh earlier was "I'm mute", as though a mute person could actually say it. BigDom (tc) 22:39, 11 October 2011 (UTC)
I'm illiterate is a good one as well. -- Liliana 22:42, 11 October 2011 (UTC)
Well, a mute person could write the phrase. - -sche (discuss) 22:44, 11 October 2011 (UTC)
The pronunciation section is rather pointless though. -- Liliana 22:46, 11 October 2011 (UTC)
True.. but how hard is it for a mute person to express this notion via body language? I'd bet body language (point at mouth, wave hands) can convey this more swiftly. JamesjiaoTC 22:48, 11 October 2011 (UTC)
It could be said over the internet? —CodeCat 23:11, 11 October 2011 (UTC)
Someone who is recognizably foreign might be misunderstood as trying to convey their inability to speak the local language, rather than their inability to speak at all. (But I agree with DCDuring, below. To the extent possible, we shouldn't be asking "Could this be useful?", only "Is this useful?") —RuakhTALK 20:13, 12 October 2011 (UTC)
Does a phrasebook require more constancy of purpose and contributor discipline than we can sustain? Subtle aspects of policy don't seem sustainable for very long here. We seem be susceptible to anarchism.
If we had users constantly asking us how to say or write phrasebook-type expressions, we could at least focus on meeting the needs of real users. But we have only the vaguest notion of what we are trying to do. Though pruning might be necessary, I doubt that it is the key breakthrough that a phrasebook needs to achieve success at Wiktionary. DCDuring TALK 23:25, 11 October 2011 (UTC)
Agree to the clean up. Also agree that "I'm mute" is useful. We are are a written dictionary. You can write or print the translation in another language. --Anatoli 06:50, 12 October 2011 (UTC)
I'll agree as well then, sometimes I think our phrasebook is about who can create the silliest entry without it being deleted. I'm might go with I'm fucked meaning I'm drunk, I'm tired, I'm disabled/crippled, I'm in trouble (etc.). Mglovesfun (talk) 10:47, 12 October 2011 (UTC)
For the record, Liliana, I say "I'm horny" all the motherfucking time. Frequently there are some qualifiers between the subject/verb and adjective. But yeah. All the time. I'm horny right now, even. — [Ric Laurent] — 11:25, 14 October 2011 (UTC)
Do you frequently feel the need to say it in languages that you don't even speak well enough to construct a simple sentence in? —RuakhTALK 13:17, 14 October 2011 (UTC)
Oh I am so glad you're unable to hit on me. On a more serious note, what about other phrasebooks? I know it's not a CFI rule, but still a good guideline. -- Liliana 19:53, 14 October 2011 (UTC)

123abc, again???

See http://en.wiktionary.org/wiki/Special:Contributions/Christofo -- it appears that the many-times-banned user is back, now creating hyphenated pinyin entries as if that is a normal thing to do. Please direct attention to this development. 71.66.97.228 01:13, 12 October 2011 (UTC)

Yes, your diagnosis was correct. Nuked all his entries. --Anatoli 06:46, 12 October 2011 (UTC)
Japanese Romaji used "-" for suffixes as well, please see here. So, you use double standard indeed. Alexando 07:18, 12 October 2011 (UTC)
this has been mentioned multiple times before. Japanese entries have nothing to do with Mandarin entries. If there are issues, they need to be treated separately. Obviously you don't listen. So... I am gonna block you again.. this time, I will not allow you to create new accounts. JamesjiaoTC 03:44, 13 October 2011 (UTC)
He is very good at avoiding all blocks and generating new IP-addresses whenever he wishes. He was blocked multiple times including range blocks. He doesn't have a lot of linguistic or communication skills but he's got that skill.
As for the issue with Japanese, first of all, it's a language policy. If most editors agree to do it one way, it goes, if not, then there's a vote. Japanese editors may be happy to discuss the issues of triplication related to Romaji entries. The Romaji entries usually contain the minimum information, so no one complained and Romaji entries were created ONLY when Kana/Kanji entries were also there. --Anatoli 06:42, 13 October 2011 (UTC)

User 123abc again, again

Blocked, and still edits?

He's now assiduously adding bible verses (and links) to Mandarin entries. What is wrong with this project that this has happened several dozen times now, over a period of nearly a year? 71.66.97.228 19:54, 12 October 2011 (UTC)

I guess the problem is that {users with enough knowledge of Chinese to deal with his edits} and {users with enough technical knowledge to deal with his edits} seem to be two mutually exclusive sets. (The overlap of these sets with {users with enough time and patience to deal with his edits} may also be relevant.) Previously, I've tried to address this by starting a vote that would reduce the amount of knowledge of Chinese that was necessary — in fact, my goal was to make the formatting for Mandarin pinyin entries so restrictive that it could be enforced by a bot — but Chinese-speaking editors' responses to the vote, while positive in tone, just left me more confused than ever. So maybe I should work on it from the other angle: trying to reduce the amount of technical knowledge that is necessary, in the hopes that that will enable the Chinese-speaking administrators to cope with his edits better. —RuakhTALK 20:09, 12 October 2011 (UTC)
The pinyin entries are now as simple as can be, see yánlì, all Category:Mandarin pinyin should be formatted as per Wiktionary:Votes/2011-07/Pinyin entries. If they are automatically created by a bot and the job is good, we should revisit it. The Chinese entries are indeed, a bit complicated, noteably the "rs" value (radical sort for the initial character) but this info is available in Wiktionary. Anatoli 21:47, 12 October 2011 (UTC)
The main problem is that somebody don't know the function of Pinyin entry especially for learners, but oppose Pinyin just because of don't like Pinyin. For your reference, an good example for make use of Pinyin entry for learners, please see here. Afex 20:41, 12 October 2011 (UTC)
The problem is your unwillingness to engage in dialogue unless things are going you're way. You're happy to engage in dialogue when people are agreeing with you, and when people stop agreeing with you, you just clam up. Mglovesfun (talk) 20:46, 12 October 2011 (UTC)
  • Record of fact cannot be deleted. Please see here. Afex 21:27, 12 October 2011 (UTC)
Do you intend in engaging in dialogue? Mglovesfun (talk) 06:26, 13 October 2011 (UTC)
Why not if you like, but don't block me and try to close my mouth first. Sundy 12:26, 13 October 2011 (UTC)
If you want to engage in dialog, use Engrist. As it is you are abusing multiple accounts which is against the rules. All other accounts will be indefinitely banned on sight. - TheDaveRoss 20:13, 13 October 2011 (UTC)
123abc talked on Mglovesfun's talk page, for the first time I see more than one sentence at a time. --Anatoli 22:37, 13 October 2011 (UTC)

Colloquialisms and nonstandard terms

Are colloquialisms considered nonstandard terms? My take is that they are not, hence my edit to Template:lexiconcatboiler/colloquialism. --Dan Polansky 10:04, 12 October 2011 (UTC)

Sort of, I suppose all slang terms, informal terms and colloquial terms nonstandard. Mglovesfun (talk) 10:53, 13 October 2011 (UTC)

New administrator nomination - User:Haplology and User:Eirikr

Please don't ignore the new nomination - Wiktionary:Votes/sy-2011-10/User:Haplology for admin. He has been very active in Japanese and works quite professionally - Special:Contributions/Haplology.

I also nominated User:Eirikr, another Japanese editor but he is not available at the moment, the vote will start as soon as he accepts it. --Anatoli 06:58, 13 October 2011 (UTC)

Linking to a particular sense within an entry

Is there any way to link to a particular sense within an entry, rather than to the entire entry? I know how to use the pound sign to link to a section, but for most words (I'm working with Chinese entries) this will only go as far as the section for a particular language, not to the individual senses. I have read about MediaWiki's "subpage" feature, but I don't know if that would work. In particular, I would like to be able to link words in a Wikisource document to the particular sense used in that context. If this is not currently possible, where would I start in proposing this feature, or perhaps in helping to implement it? Craig Baker 20:52, 13 October 2011 (UTC)

You can use {{senseid}} to link to a particular sense. - TheDaveRoss 21:10, 13 October 2011 (UTC)
There is no documentation to {{senseid}}. How does one link to a properly formatted sense ? DCDuring TALK 23:45, 13 October 2011 (UTC)
Essentially it just sets up a span id which you can then refer to like any other anchor. The formatting is as follows: (taken from peach) # {{senseid|en|fruit}}, the first parameter is the language section and the second parameter is a unique (for the page) gloss which is also the name of the anchor when referring back to the sense. To refer back you include the language and gloss [[peach#English-fruit|peach]] resulting in peach. This certainly should be documented at the template too. - TheDaveRoss 01:36, 14 October 2011 (UTC)

An idea... wanted languages

We have pages for wanted entries, but so far we're lacking a list that shows which languages are in the most need of improvement. For example, our Old Norse coverage is quite bad given its popularity, and there aren't many Estonian entries either. It would be nice to see at a glance which languages need the most work, so that editors (also potential new editors) can see if their skills would be especially needed on Wiktionary. —CodeCat 21:04, 13 October 2011 (UTC)

Some kind of easily available statistics per language would also be good. It won't show the quality of entries or translations but some education. Also, it may sound harsh for small languages but what people think about ratings or "languages in bad need of contributions"? Well, we have few entries in Old Norse but how important is it? We also have very little Burmese, Lao, Malay, let alone Sinhalese content. These are state languages with millions of speakers but we don't have very few contributions in these languages. --Anatoli 23:10, 13 October 2011 (UTC)
Re: "Some kind of easily available statistics per language would also be good": We have Wiktionary:Statistics; and if there's anything that you want that isn't already there, I bet you can convince Conrad to add it. —RuakhTALK 03:10, 14 October 2011 (UTC)
Thanks for the advice. After posting, I actually found Wiktionary:Statistics. That's useful. --Anatoli 03:41, 14 October 2011 (UTC)
Why not go ahead and start a draft somewhere? -- Liliana 03:46, 14 October 2011 (UTC)
Perhaps worth discussing first what we want to achieve. Will a new policy attract new editors? Having a list of languages in need of improvement is a good start or something (better than nothing). Statistics may show only the quantity, not quality.
If the statistics is true for the last year, look at number of entries for some official languages:
  • Sinhalese - 75
  • Malagasy - 84
  • Kazakh - 173
  • Burmese - 134
  • Kyrgyz - 152
  • Malay - 409
  • Lao - 558
Do we need to advertise? --Anatoli 04:15, 14 October 2011 (UTC)
I adore your Russian bias. Belarusian is very much in need of improvement. -- Liliana 04:17, 14 October 2011 (UTC)
Not sure whether you were sarcastic, I actually didn't mention Russian or any Slavic languages. Yes, that's right. Belarusian needs improvement but Belarusians themselves do not seem to be worried about their language loss. --Anatoli 04:39, 14 October 2011 (UTC)

Not the kind of jumper that makes you itches

[11] "He said, I know a little Latin, man a cus man a kai / I said I don't know what it means; he said neither do I". Do any of us know? Sounds more like Greek. Equinox 21:05, 13 October 2011 (UTC)

Maybe it is w:Manacus, Manacī. —Stephen (Talk) 22:45, 13 October 2011 (UTC)
I think it's a garbled version of amicus amici. Fugyoo 00:23, 14 October 2011 (UTC)

Block page spoiled with JavaScript

When you have to block someone, the page now has an extra dropdown box that disappears or reappears depending on your selection. It disappears with a stupid JavaScript "delayed fade" effect. This means you cannot efficiently use the Tab key to move from one UI control to the next. Who makes these retarded decisions? Equinox 21:39, 13 October 2011 (UTC)

I dunno, the tabbing works pretty O.K. for me. Even if I tab to the control right before it disappears, Firefox remembers my position in the tab order, so if I hit tab again, it moves me to the next field. How does it behave in your browser? —RuakhTALK 01:46, 14 October 2011 (UTC)
If I tab while the "tabbee" is in mid-fade, the focus apparently vanishes. It could be a problem with Opera, since the focus should certainly never be on an invisible thing, but I can't be sure exactly where the focus is, and anyhow given the general awfulness and incompatibility of browsers you'd hope that stuff like this would be tested thoroughly. My main objection is that the "fading" is purely a cosmetic gimmick, offering nothing useful (modally hiding controls is nasty anyway — why not disable them?), and yet manages to get in the way. Equinox 01:52, 14 October 2011 (UTC)
I see. What happens if you hit tab after the fade-out? By the way, I think that if — if — you're going to hide controls this way, then the fading is actually a good idea, since it gives the user time to register what's happening. Otherwise they'll just catch that something changed, but they won't understand what. But yeah, I agree with you that it would be better to just disable the control. We can probably override this somehow with site-wide JavaScript, though I don't know if it's a good idea to do so, since I doubt it's intended to be messed with. —RuakhTALK 03:00, 14 October 2011 (UTC)
Duh! I knew something annoying had happened, but couldn't quite figure out what it was. If it ain't broke, don't fix it! SemperBlotto 07:18, 14 October 2011 (UTC)

Brand names and physical products

WT:CFI (WT:BRAND in particular) says this: "A brand name for a physical product should be included if it has entered the lexicon". Some people in RFV (DCDuring, Equinox, and others) have been acting as if the part "for a physical product" were not there, arguing that WT:BRAND is intended to cover banking services, among other things. I have repeatedly argued that, whatever the part of CFI is intended to do, what it actually does is speak only of physical products, which are tangible, space-extended objects with non-zero mass, such as food, clothing, footwear, consumer electronics, and cars, but not software, databases (data collections), books, movies, and the like.

Please, let those who want WT:BRAND to apply to all brand names including "Citibank" and "Lufthansa" create a vote that removes "for a physical product" from CFI's section for brand names. Then the repetitive discussions in RFV are over.

By contrast, I would like to see WT:BRAND removed from CFI. There is IHMO no serious risk of commerical spam relating to inclusion of brand names. Above all, single-word brand names can host interesting lexicographical material, including pronunciation and etymology. --Dan Polansky 07:59, 14 October 2011 (UTC)

Our entry physical doesn't cover it, but I think there's a difference in two senses of physical here. For example is a table physical in the same way that wind or heat is physical? So a website isn't a 'physical product' like a table is, but it can be considered physical in terms of bits on a server, which correspond to electricity (um, I think, I'll let the experts explain it).
Specifically in response to Dan Polansky, I agree that some products are non-physical. Cartoon characters like Mickey Mouse are non-physical. They may have physical representations (toys, etc.) but are by nature non-physical. It would be nice to clean up WT:BRAND and WT:COMPANY. Mglovesfun (talk) 09:57, 14 October 2011 (UTC)
Thanks for raising this issue. I agree that editors have been wrongly trying to enforce WT:BRAND's rules for things that are not physical products — just because something has some physical reality, that doesn't make it a "physical product" — but I support resolving the issue removing the "physical product" bit. —RuakhTALK 11:17, 14 October 2011 (UTC)

Patrolling enhancements now on by default, and now include deletion.

Admins —

I've been bold and made two big changes to the patrolling enhancements. If anyone disagrees with either of them, please either revert, or let me know and I'll revert.

The changes are:

  • A "delete" button is now added for each newly-created page that has not yet been marked as patrolled. A text field also appears at the bottom of the page; whenever you click an edit's "delete" button, the current contents of the text field will be used as the deletion reason (the edit-summary-like message that appears in the deletion log). For example, if you are an administrator who knows Chinese, you can just visit http://en.wiktionary.org/wiki/Special:NewPages?hidepatrolled=1 every day or two, type something like "Engirst cruft" in the text-field, and go to town.
    • The text field looks kind of crappy, and is probably confusing. I welcome any improvements.
    • There's no drop-down to choose one of the predefined deletion reasons at MediaWiki:Deletereason-dropdown. Anyone who's better at UI design than I am, please feel free to add this. :-)
  • The patrolling-enhancement Gadget is now turned on by default for anyone with the "patrol" right.
    • If you dislike it, you can turn it off via Special:Preferences: in the "Gadgets" tab, uncheck "Patrolling enhancements – makes it faster and easier to mark edits as patrolled.".
    • Also, if you dislike it, please comment here. If it turns out that multiple admins dislike it, then we should probably de-defaultize it.
    • Edited to add: Of course, it would be even better if we could improve it so that all admins do like it, if that's possible.

I welcome any questions, comments, suggestions, concerns, threats, . . .

RuakhTALK 14:52, 14 October 2011 (UTC)

I suspect it just isn't working properly, but shouldn't individual new pages have a 'delete' button next to them, not just a single delete button. Or else how do I know what I'm deleting? A small 'delete' button next to every new page that's also an unpatrolled edit sounds fine to me. But currently, that isn't what this is. Mglovesfun (talk) 15:12, 14 October 2011 (UTC)
For me, in Firefox 7, in IE 8, and in Chrome, I do have a small "delete" button next to individual new pages. What browser are you using? I can try to debug . . . —RuakhTALK 15:26, 14 October 2011 (UTC)
I've just tried to delete "super-calli-frage-listic-epi-ali-doctus" with delete reason of "tosh" and I get a message saying that a token must be set. SemperBlotto 15:31, 14 October 2011 (UTC)
Yup, bug. (Introduced during migration from my personal JS to the Gadget's JS.) I noticed and fixed it a moment ago. Sorry about that. :-/   —RuakhTALK 15:35, 14 October 2011 (UTC)
I use Firefox. Will clear my caché now to see what the current version is like. Mglovesfun (talk) 15:50, 14 October 2011 (UTC)
I have two patrol buttons and two delete buttons. Using the deletion summary didn't work, it just displayed the default. Mglovesfun (talk) 15:54, 14 October 2011 (UTC)
Re: two patrol buttons and two delete buttons: keeping up my string of excessive boldnesses for the morning: http://en.wiktionary.org/w/index.php?title=User:Mglovesfun/vector.js&diff=14073538&oldid=14039687. Re: deletion summary not working: Oops, thanks, you're right, it doesn't work for me anymore, either. It worked yesterday, though, so hopefully it's a quick fix. —RuakhTALK 16:00, 14 October 2011 (UTC)
O.K., that's working now. Thanks again. :-)   —RuakhTALK 16:08, 14 October 2011 (UTC)
I just tried the delete button in Firefox and in Opera; it worked in FF and in Opera. :) Is the "mark" button intended to be used in conjunction with another feature? If not, it just seems to allow marking as patrolled with checking, which seems odd. (Nonetheless, it works in both browsers.) - -sche (discuss) 17:03, 14 October 2011 (UTC)
I'm sorry, I don't understand the question. What do you mean by "marking as patrolled with checking"? :-/   —RuakhTALK 17:43, 14 October 2011 (UTC)
Oops, I mean "without checking". In the past, I had to click on "diff" and look at the diff to find the "mark as patrolled" button. Now, I could just click "mark" in Recentchanges, without checking the diff to see if it was vandalism or not. Why would I do that...? - -sche (discuss) 18:19, 14 October 2011 (UTC)
Ah, I see. You're right, of course, but there are a number of cases where it's useful:
  • Whitelisting (whereby the button gets "clicked" automatically when you load the page).
    • A number of pages in the Wiktionary: namespace are whitelisted. These are pages that are so high-traffic that we don't really have to worry about vandalism going unnoticed and unreverted. Similarly, all pages in the User talk: namespace are whitelisted, as are users' edits to their own user-pages and sandboxes (e.g., in my case, User:Ruakh and User:Ruakh/Sandbox).
    • An IP address can be whitelisted, which has roughly the same effect as granting a user the "autopatrolled" privilege (except that it's mediated by this Gadget, rather than being built-in).
    • When granting the "autopatrolled" privilege to a user, we can also whitelist him/her temporarily, so that their existing unpatrolled edits can be quickly marked as patrolled.
  • If there are a bunch of edits to a single page, I can just go to its history, view the overall diff of edits, and if the overall result is O.K., then I don't need to view each individual diff to mark all the edits as patrolled.
  • If there are a bunch of similar-looking edits by a single editor (e.g., creating thirty Khmer nouns in an hour, with the automated edit-summaries that show you the initial page contents), then I can just look at a representative sample of edits to confirm that there's no funny business going on, then mark a bunch of edits as patrolled in short order.
    • I also have code in my own common.js that applies the patrolling enhancements to user-contributions pages, which makes this a bit easier for me. I haven't added it to the Gadget, though, because I'm not sure if it's ready for prime-time.
In addition, you were right that there's another feature that someone (maybe Connel MacKenzie?) intended for it to be used in conjunction with:
  • If I have Lupin's Popups turned on, then I don't have to actually click on the diff to see what changed. (That's the Gadget whose description reads, "Navigation popups, page previews and editing functions popup when hovering over links".)
but I find that feature very annoying, so I almost always have it turned off. (Still, you might as well try it out and see what it does. Even if you find it as annoying as I do, you still might find uses for it.)
RuakhTALK 18:53, 14 October 2011 (UTC)
The red and blue stuff is too large and garish for me. It leads to more scrolling and visual annoyance. Small icons instead of the large words, or just a lesser font size, would be good. Equinox 11:51, 15 October 2011 (UTC)
How about now? —RuakhTALK 13:14, 15 October 2011 (UTC)
That's definitely better for me. Equinox 13:15, 15 October 2011 (UTC)
Is it possible to allow default deletion summaries? Perhaps by specificing something ine one's javascript? Mglovesfun (talk) 16:54, 15 October 2011 (UTC)
Done. Actually, doubly done. You can set either a default value named GPE.initialDeleteReason that gets put into the input-box initially, but which you can override by clearing out that box, or a default value named GPE.deleteReasonIfBlank that gets used when you click the delete-button if the input-box is blank. Or you can even set both, in which case the latter is used if you explicitly clear out the former. To set them, you would put something in your common.js (or vector.js or whatnot) that looks like
GPE.initialDeleteReason = "I forgot I could specify a deletion reason!";
GPE.deleteReasonIfBlank = "I couldn't think of a deletion reason to enter!";
RuakhTALK 21:42, 15 October 2011 (UTC)

Straw Poll: each section of our CFI

I apologise if I have chosen a poor format, but (following comments on WT:RFV#Finnair) I propose a straw poll to gauge the community's opinion of each section of CFI. (This is broader than just the necessary changes to BRAND CFI that Dan has a section for, above.)
  • If you think the section is good pretty much as-is, vote "keep as-is" (or "support").
  • If you think we should change a section, but still have a section (for example, change our criteria for including brand names, but still have criteria for brand names that are different from our general criteria), vote "change". If you can explain what you would change briefly (not three paragraphs), please do so.
  • If you think we should remove a section (for example, remove our specific criteria for including brand names, so that only general criteria apply to them), vote "remove" (or "oppose").
  • If you want to add a section (for example, to handle taxonomic names), add it under its own header in the "Sections to add" section. If your proposed section's text is very long, consider posting it in your userspace and simply putting a link under the header. Sign your section so we know who added it.
This way, we develop a clear idea, all in one place, of which sections are liked as-is, which (if any) a majority of editors would put on the chopping block, and which (if any) a majority would change. I used fake==== for some subsections so this page's TOC wouldn't explode, but I left some real headers so everyone could edit one section at a time and perhaps avoid edit conflicts. There's a section for general discussion at the bottom, if you'd rather comment there that you would "remove sections A and B, and change C, but keep the rest". - -sche (discuss) 19:08, 14 October 2011 (UTC)
PS: Where CFI has a section, followed by text, followed by subsections, I have commented here on the subsections in their own (well) subsections, and my comments on the general section only apply to the text that is not part of any of the subsections. For example, my comments on the section "Attestation" are about the bit "“Attested” means verified [...] include the ISBN." My comments on the subsection "Conveying meaning" are in a subsection for that subsection. - -sche (discuss) 19:08, 14 October 2011 (UTC)

Sentences 1, 2, 3

"As an international dictionary, Wiktionary is intended to include “all words in all languages”. A term should be included if it's likely that someone would run across it and want to know what it means. This in turn leads to the somewhat more formal guideline of including a term if it is attested and idiomatic."

  • My vote: keep as-is, a good statement of purpose, clarified by subsequent sections. I'm not opposed to changing it, though, if that's what a majority wants. - -sche (discuss) 19:08, 14 October 2011 (UTC)
    • I'd like to change it in some way, not sure how. Mglovesfun (talk) 20:17, 14 October 2011 (UTC)
  • Keep as is, except to suggest that it should guide the drafting of other sections of specific application, not be a substitute for them. DCDuring TALK 20:43, 14 October 2011 (UTC)
  • Keep as is. I think that you should be able to take any text (in any language), wikify it, and get no red links (I'm still not sure about red links due to capitalization at the beginning of sentences). SemperBlotto 07:15, 15 October 2011 (UTC)
    • What happens if one of the words is the name of a small, local shop? Mglovesfun (talk) 09:18, 15 October 2011 (UTC)

"Terms" to be broadly interpreted

  • Keep as-is. I would support changing it if someone showed that to be needed. Comment: "A term need not be limited to a single word in the usual sense. Any of these are also acceptable: [...] multiple-word terms". Hurra, self-referential definition! - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Keep as-is. Wording improvements OK. DCDuring TALK 20:40, 14 October 2011 (UTC)
  • Keep as-is. Not perfectly written, but not a problem IMHO. —RuakhTALK 02:10, 15 October 2011 (UTC)
  • Minor change to emphasise that a term of multiple words needs attestation whereas a single word just needs to exist in the real world. SemperBlotto 07:17, 15 October 2011 (UTC)

Attestation

  • Keep as-is for now, continue to change as necessary (there have been several successful and unsuccessful votes to change this section). Perhaps refine the paragraph which follows the list, and which could be argued to be more explanation and discussion than criteria. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Keep as-is basically. DCDuring TALK 20:45, 14 October 2011 (UTC)
  • Change 4th. It would be nice an exact definition of "extinct" in this sense. Also, does a transliterated form count as a contemporary source? To be honest I'd like this criterion removed, but since there was a vote for it there is nothing I can do :-( Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)
  • Keep concept. I'm not sure about all of the details, though. And we're really misusing the term "attested", a fault which we compound by structuring the section as a definition of the term! —RuakhTALK 02:24, 15 October 2011 (UTC)
  • Keep as-is - I'm reasonable happy with this (and related) section(s). SemperBlotto 09:54, 15 October 2011 (UTC)
  • Change to explain what extinct means and whether transliterations for the purposes of study count as attestations. —CodeCat 10:12, 15 October 2011 (UTC)
Conveying meaning
  • Keep as-is, or refine (like Attestation). - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Keep as-is basically. DCDuring TALK 20:45, 14 October 2011 (UTC)
  • Keep as-is, I think. I'm not sure that the use-mention distinction is exactly the relevant criterion here, because there are some cases that seem to be to be technically "uses", but that we exclude as though they were mentions. For example, the section says that we exclude "made-up examples of how a word might be used", but don't those made-up examples actually use the word, rather than merely mentioning it? But I think we all have a general sense of what this is supposed to mean, and it doesn't seem worth getting hung up about what we call it. (I doubt we could get consensus for any specific clarification of it, anyway.) —RuakhTALK 02:24, 15 October 2011 (UTC)
Independence
  • Change. As has been said in RFV, we only partially, unclearly define "independent". - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Completely rewrite. I started a beer-parlour discussion about it back in February — see Wiktionary:Beer parlour archive/2011/February#Independence. — and feedback was mostly positive (in that people mostly agreed with me about what it should say), but I let it drop without ever proposing a specific wording. —RuakhTALK 02:03, 15 October 2011 (UTC)
Spanning at least a year
  • Keep as-is, or refine by removing the last sentence, which is a comment, not a criterion. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Keep as-is basically. DCDuring TALK 20:45, 14 October 2011 (UTC)
  • Keep as-is. —RuakhTALK 02:24, 15 October 2011 (UTC)

Idiomaticity

  • Change. This section is messy. Its passing mention of the Phrasebook should be a separate section, establishing the Phrasebook with clear purpose and different CFI (especially with regard to idiomaticity). - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Change and also add some information about how to add single-word terms that are not idiomatic. This may seem strange for English, but in Finnish there are suffixes like -kin that can be added to almost any word, and this would be considered idiomatic in Finnish. Similar cases would also apply for unusually long compounds in German like the name of that law, or the name of that very long protein. —CodeCat 22:28, 14 October 2011 (UTC)
  • Change. Mglovesfun (talk) 08:56, 15 October 2011 (UTC)
Spellings
Formatting
  • Change: "page the" to "page, the", and add a link to ELE(?); possibly remove the subsection header or remove the subsection entirely. I do not feel strongly about this; I would not mind keeping it as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)
Inflections

Idiomatic phrases: Pronouns, Articles, Verbs

Proverbs

Languages to include: Natural languages

  • Change. I would keep the first sentence; the second sentence is more explanation than criteria, and is unclear: "a proposed language is considered a living language, or a dialect of or alternate name for another language" — I would at least remove "living" (surely there are debates over whether dead tongues were languages or dialects). - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Change. Should give some parameter as to what is a language and what is a dialect (ISO codes?), and make it clear that dialectal forms/pronunciations are also allowed (because some people might think "dialect" means "non-standard"). Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)
Sign languages
Constructed languages
  • Keep as-is, or clean up. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Change, from the 2nd criterion (unapproved languages) add Brithenig and Láadan to the 4th (restricted to some literary works) criterion, and approve the rest (because they are languages intended for general use. The other criteria should deal with whether or not they are actually used). Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)
    • That would need a vote. I am all for approving LFN, but I have serious doubts about the other languages. -- Liliana 16:48, 15 October 2011 (UTC)
  • change delete the last sentence, because language names, like all other words, can be entered if they meet the CFI, making the page contradict itself. -- Liliana 10:34, 15 October 2011 (UTC)
Reconstructed languages
  • Keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Change. Just a little rewording. If I was reading that for the first time I'd interpret it as hostility against reconstructed words. Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)

Exclusions: Vandalism

  • Remove. This isn't a criteria for inclusion. Vandalism is vandalism because it does not meet the other criteria for inclusion: the new entry is not an attestable word, or the addition of "rxjgfrr" as the Russian word for "hair" is clearly wrong, etc. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Keep as is. It doesn't hurt to have another reminder against vandalism. Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)
  • I'm happy for this to stay or go - whatever the consensus is. SemperBlotto 10:18, 15 October 2011 (UTC)
    • Perhaps if we keep this, we should also say that we remove information which is wrong. I mean... Mglovesfun (talk) 10:22, 15 October 2011 (UTC)
  • Remove. This does not regulate the inclusion of a term or a sense of a term. Vandalism, which includes replacing the content of a page with "eerwerjhewkrkew" and other sorts of edits, gets removed or reverted without reference to CFI. --Dan Polansky 11:00, 15 October 2011 (UTC)
Protologisms
  • Remove, as this isn't a criteria for inclusion, or move the link to WT:LOP to the ===See also=== section with a short note like "for words that do not meet CFI". - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • As previous. SemperBlotto 10:18, 15 October 2011 (UTC)
  • Remove. Protologisms get excluded as unattested, so the attestation section already handles this. The other way around, if something is attested, then it is not a protologism. --Dan Polansky 11:00, 15 October 2011 (UTC)

Fictional universes

  • Keep as-is. I would support changing it if someone showed that to be needed. - -sche (discuss) 19:08, 14 October 2011 (UTC)
Wiktionary is not an encyclopedia
  • I have no strong opinion on this section; lean keep as-is. (Why is "the successor of Saul" allowed a sense-line at David under this section, as it stands?) - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Change - I would be happy with short encyclopaedic content if it helps to explain the meaning of a term. SemperBlotto 10:18, 15 October 2011 (UTC)
  • delete as is. This is better handled by specific criteria for specific types of terms, and long contradicts common practice at Wiktionary (note how Houdini is listed as an example of what not to include, yet this very sense passed RFD!) -- Liliana 10:41, 15 October 2011 (UTC)
  • Remove; if not that, remove the Houdini paragraph. --Dan Polansky 11:00, 15 October 2011 (UTC)
Language-specific issues

Names

  • I have no strong opinion on this section; lean keep as-is. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Change (this and related sections) to emphasize that single-word entries are acceptable, but multi-word ones (e.g. "Greater Manchester") need attestation. SemperBlotto 10:22, 15 October 2011 (UTC)
Company names
  • I have no strong opinion on this section; lean keep as-is. Mglovesfun and Ruakh have pointed out that it basically says "company names shall not be included", which I tend to support. - -sche (discuss) 19:08, 14 October 2011 (UTC)
    • But it contradicts all words in all languages when the company name is a word. Mglovesfun (talk) 08:55, 15 October 2011 (UTC)
      • Well, to be clear, Ruakh's full comment was "it's saying that if a company name is also a family name, then that's included; and if a company name is also a common word, then that's included" (but the company name is not included as such). - -sche (discuss) 18:43, 15 October 2011 (UTC)
  • Remove. Keep most attestable single-word company names, at least for their pronunciation and etymology. Let company names be regulated by the section on the names of specific entities. --Dan Polansky 11:00, 15 October 2011 (UTC)

Brand names

  • Change. Per many RFV discussions, "a physical product" should be changed to either "a product" (if it is meant to include all products), or something like "a tangible/three-dimensional product" (if it is meant to exclude non-tangible products). - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Change to include all commercial names, advertising, and political slogans. DCDuring TALK 20:52, 14 October 2011 (UTC)
  • Change to include commercial names and advertising, e.g. Internet service providers and banks as well as tangible items, and commercial creations like toy brands and cartoon characters. I'm not so sure about political slogans; they are not brands per se and I imagine most of them would fail CFI for other reasons. Equinox 20:59, 14 October 2011 (UTC)
  • Remove; keep all single-word attested brand names of pharmaceuticals, at least. --Dan Polansky 11:00, 15 October 2011 (UTC)
Given and family names
  • Keep as-is, or change by settling the status of patronymics (and, I presume, matronymics). - -sche (discuss) 19:08, 14 October 2011 (UTC)
Genealogic content
Names of specific entities
  • Change. The section admits that it is incomplete. ("Among those that do meet that requirement, many should be excluded while some should be included, but there is no agreement on precise, all-encompassing rules for deciding which are which.") We should complete it. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Change with largely exclusionary intent, possibly allowing for phased inclusion of types, (eg, in "populated places": countries, then provinces/states, then cities with greater than 100K population). DCDuring TALK 20:52, 14 October 2011 (UTC)

Issues to consider

Attestation vs. the slippery slope
  • Remove. This section is more discussion than criteria. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Remove but it's useful on a separate page that discusses the rules more in-depth, and in a way that it can be edited without a vote. Other points could be discussed there too. —CodeCat 22:23, 14 October 2011 (UTC)
  • Remove; off-topic material. Mglovesfun (talk) 08:54, 15 October 2011 (UTC)
  • Remove; needless and misleading by its reference to "common use" and "general use". --Dan Polansky 11:00, 15 October 2011 (UTC)
See also
  • Keep, of course. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Remove - quite interesting, but not really relevant. SemperBlotto 10:24, 15 October 2011 (UTC)

Sections to add

Translingual entries

Note: these is Wiktionary:About Translingual, but it is not formal policy.
  • I propose that we develop criteria for including translingual or might-be-translingual entries such as taxonomic names and Latin phrases such as caveat emptor: specifically, we should have criteria for determining which language(s) to consider them: Latin? Translingual? English? German? - -sche (discuss) 19:08, 14 October 2011 (UTC)
    • That is kinda covered by WT:AMUL, isn't it? -- Liliana 19:51, 14 October 2011 (UTC)
      • Kinda. That page clarifies that taxonomic names are translingual, but it isn't a formal policy. Also, it isn't easy to find. - -sche (discuss) 20:24, 14 October 2011 (UTC)
  • Change to clarify that a phrase from a language does not become translingual unless it assumes a meaning inconsistent with its meaning in that source language, eg, two-part species names are Latin. DCDuring TALK 21:02, 14 October 2011 (UTC)
    • I do remember a discussion about considering pizza a Translingual entry. This suggests we need a definition of "Translingual" to exclude that. -- Liliana 16:46, 15 October 2011 (UTC)

Phrasebook

Note: these is Wiktionary:Phrasebook, but it is not formal policy.
  • Per my comments above about the section "Idiomaticity", I think we should have formal Phrasebook criteria. We might have them on a separate page and only link to that page from a section on the main CFI page. - -sche (discuss) 19:08, 14 October 2011 (UTC)
  • Delete There needs to be a project with active participants and serious intent. There is no evidence of such interest. DCDuring TALK 21:02, 14 October 2011 (UTC)
  • Delete In its current state it's a shame for Wiktionary, and I doubt it has any chance of improving fast enough. Maybe reopen in half a decade or so. Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)
  • Keep as long as we can find proper criteria. —CodeCat 22:35, 14 October 2011 (UTC)
  • Keep in some form, deleting this section entirely will simply mean that the phrasebook will have no rules. Mglovesfun (talk) 08:52, 15 October 2011 (UTC)
  • delete Similar to a company which is running losses monthly, we need to concentrate on our core topic, which is building a dictionary. The phrasebook can come back later once there is interest. -- Liliana 10:47, 15 October 2011 (UTC)
    You assume that Wiktionary is a single company that can focus on only one project at a time. But there are many Wiktionary users who can do many different things at a time. If they want to help, let them help in whatever way they feel is best, as long as it is an improvement. I would agree with you if I felt that a proper phrasebook would not improve Wiktionary, but I think it would. I don't think it's really our job to tell users what to focus on by banning everything else. —CodeCat 11:03, 15 October 2011 (UTC)

Placenames

  • I will also propose that we consider (restoring) some CFI of placenames. - -sche (discuss) 19:08, 14 October 2011 (UTC)
    • Ha ha. Good luck getting any sort of consensus on that! -- Liliana 19:55, 14 October 2011 (UTC)
  • Which ones? I disagree anyway. Placenames are words; they have pronunciations, they have translations (some quite unexpected: Aachen/Aquisgrão, Florence/Firence), and they have etymologies. The etymology of placenames is very important because they often come from a rare language substrate. (Etruscan placenames in Italy, Gothic placenames in Portugal, etc). Ungoliant MMDCCLXIV 22:02, 14 October 2011 (UTC)
  • I would like to allow place names as they are words in the usual sense. The etymologies of place names are often hard to find and this would definitely be useful information. But because it's hard to find proper criteria, I propose allowing them only in a separate namespace. —CodeCat 22:38, 14 October 2011 (UTC)

Discussion

I have a question. Can anyone vote? Ungoliant MMDCCLXIV 20:09, 14 October 2011 (UTC)

This isn't a formal vote, so go ahead. -- Liliana 20:10, 14 October 2011 (UTC)
Right! Everyone should give input. :) - -sche (discuss) 20:12, 14 October 2011 (UTC)
Usually any registered user can vote. DCDuring TALK 21:04, 14 October 2011 (UTC)

Thanks to User:-sche for this. DCDuring TALK 21:04, 14 October 2011 (UTC)

I'd like a more holistic approach, though I know that's not easy at all. The document shouldn't contradict itself and should be clear. It should define any potentially ambiguous terms, for example, what is a 'word', what is a 'language'? Example of contradictions are "all word in all languages" can contradict the rules on fictional universes, the rules on brand names and the rules on company names. I find such contradictions are a natural product of wikis, where one editor edits one part of the page, another editor edits another part independently. Mglovesfun (talk) 16:35, 15 October 2011 (UTC)
I like what User:DCDuring wrote on his user page about that. -- Liliana 16:44, 15 October 2011 (UTC)
I'm flattered. It is just cautionary, though. DCDuring TALK 18:05, 15 October 2011 (UTC)
In legal drafting, there are usually clauses beginning with notwithstanding that indicate that a given clause is to be read as superseding the ones mentioned in the "notwithstanding" clause. There are also standard rules of construction for interpreting apparent contradictions in the absence of their explicit resolution. Obviously, it is best to be as explicit as possible about conflicts that are noted at the time of drafting and to attempt to identify as many of them as possible at that time. For example, in our case, attestation seems to override other considerations in that an absence of attestation (at least for lemmas) is deemed to be fatal to includability. DCDuring TALK 18:05, 15 October 2011 (UTC)

Chinese radical changes

What the hell is going on? See http://en.wiktionary.org/wiki/Special:Contributions/213.79.124.126

Also, please archive this page so it doesn't take forever to load. It's just simple common sense.

71.66.97.228 07:38, 15 October 2011 (UTC)

I oppose radical changes (lol). Mglovesfun (talk) 08:51, 15 October 2011 (UTC)

Did you look at it? 71.66.97.228 23:20, 15 October 2011 (UTC)