Wiktionary:Beer parlour/2010/September

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives +/-

September 2010

MediaWiki:Category header

Currently, category pages have two sections: ==Subcategories== and ==Entries in category “ [] ”==. Since the latter section can contain not just entries, but also appendices, templates, discussion pages, etc., I think we should change the latter to ==Pages in category “ [] ”==. We can do this by deleting MediaWiki:Category header. What do y'all think? —RuakhTALK 12:52, 10 September 2010 (UTC)

Agreed. --Bequw τ 13:41, 10 September 2010 (UTC)
Yes check.svg DoneRuakhTALK 09:35, 13 September 2010 (UTC)

ISO 639-3 only, instead of 639-1 and 639-3

This idea has been floating around in my head for a long time, because they use ISO 639-3 a lot more on the Romanian wiktionary. Anyway, what reaction do we have to the idea of using, for example {{eng-noun}} instead of {{en-noun}}, Category:spa:Animals instead of Category:es:Animals, etc? We already use ISO 639-3 codes where there are none in ISO 639-1. It's just that it's slightly odd to me to use some two-letter codes and some three. (I know this would be a lot of work fixing if it were implemented, but a few bots could get the job done in a reasonably swift fashion, I imagine.)[ R·I·C ] opiaterein — 15:08, 29 September 2010 (UTC)

I think the 639-1 codes are easier to memorize. We certainly don't want to make contributing to Wiktionary even harder than it is now. -- Prince Kassad 15:46, 29 September 2010 (UTC)
I agree with Kassad that the 639-1 codes are easier to memorize. Additionally, ISO 639-3 does not cover Wiktionary's scope completely. Even if this hypothetical situation of converting every Wiktionarian 639-1 code into 639-3 code comes into effect, certain non-ISO codes would exist, like aus-gun and roa-grn. (So, apparently, the closer way to achieve strictly the same quantity of characters in all language codes would be converting en to gmw-eng, pt to roa-ptg, etc., which I would oppose, of course.) --Daniel. 16:21, 29 September 2010 (UTC)
I'm pretty neutral on this issue, but I would strongly support using {{etyl|ell}} to code for Modern Greek instead of using {{etyl|el}} to code for Greek; I've seen too many miscategorisations using el in place of grc. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:58, 29 September 2010 (UTC)
That would require the create of a {{etyl:ell}} (not difficult). Mglovesfun (talk) 16:12, 29 September 2010 (UTC)
Since we don't have "Modern English" (or "Modern Japanese", "Modern Spanish", etc.), but have "English" (and "Japanese", "Spanish", etc.), I think el as being Greek but implying Modern Greek is clear enough... to me, at least. If using {{ell}} can possibly improve the proper distinction of Ancient/Modern for other users, then by all means I support using the 639-3 code for this language. --Daniel. 16:21, 29 September 2010 (UTC)
Oh, let's, please; it's a perennial pain in the arse. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:23, 29 September 2010 (UTC)
Erm, how would it change anything? I don't quite understand. -- Prince Kassad 16:24, 29 September 2010 (UTC)
Most of Category:Greek derivations is miscategorised, with most of its entries properly belonging in Category:Ancient Greek derivations; most of those erroneously categorised have been there since {{Gr.}} was transcluded in their entries, which were then converted by AutoFormat without being fixed to {{etyl|el}}. However, more entries have since been added by editors using {{etyl|el}} instead of {{etyl|grc}}. Using {{etyl|ell}} would deter mistaken uses because the language shown would explicitly Modern Greek; moreover, I envision retaining {{el}}, uses of which as {{etyl|el}} categorising to Category:Greek derivations could then be treated as a clean-up category. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:42, 29 September 2010 (UTC)
Yes, Greek and Ancient Greek are mixed up in etymologies all the time. Modern Greek would be a better name. Ancient Greek is an important classical language, and it's called Greek in many dictionaries. But I've never seen English mixed up with Old English. I'm not interested in changing other two-letter language codes.--Makaokalani 16:22, 30 September 2010 (UTC)
I suppose {{el-noun}} will be renamed to {{ell-noun}}, Category:Greek language will become Category:Modern Greek language, and so on... --Daniel. 16:46, 30 September 2010 (UTC)
No, I don't think anyone is suggesting that; Doremítzwr is specifically talking about etymologies. —RuakhTALK 16:53, 30 September 2010 (UTC)
Whilst consistency is always a desirable quality, all I'm interested in with this proposal at the moment is preventing miscategorisations. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 17:40, 30 September 2010 (UTC)
ell and el are absolutely synonymous; I don't think we should introduce a distinction between them whereby el has a different meaning in etymologies from everywhere else. The ideal would be for {{etyl|el}} to generate "Modern Greek" — and that's easy to effect, BTW — but if you're certain that we can't stop human editors from erroneously adding that when they mean "Ancient Greek", then we should go for something like {{etyl|el-GR}} (GR being the country code for the modern country of Greece) or {{etyl|Modern Greek}}. —RuakhTALK 16:53, 30 September 2010 (UTC)
What I think happens, is that inexperienced editors see "Greek" in other dictionaries' etymology sections, and then seek to reproduce that in our etymology sections; they see that we require ISO codes, so they look up "Greek" or "Greek language" on Wikipedia, find el and then use that. Very probably, if other dictionaries uniformly used "Ancient Greek" for grc, we wouldn't have this problem. I prefer Martin's idea of using {{etyl:ell}} because it isn't any more laborious in terms of numbers of characters than typing {{etyl|ell}}. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 17:40, 30 September 2010 (UTC)
It's the standard thing to do; Wikimedia projects are labeled by ISO 639-1 first then ISO 639-3, for one nearby example. I think it will be very problematic for many of us; French is fr, Spanish is es, German is de. I'd have to look up the ISO 639-3 equivalents.--Prosfilaes 21:09, 29 September 2010 (UTC)
There is a template {{wikimedia language}} to deal with this issue - not that I support usurping ISO 639-1, mind you. Mglovesfun (talk) 16:59, 30 September 2010 (UTC)

Modern Greek

Split off from above, as not strictly relevant. I've created a {{etyl:ell}} (which may need to be renamed) which displays and categorizes as Modern Greek, indeed we've had Category:Modern Greek derivations for years, and I've never heard of it. Thoughts? Mglovesfun (talk) 17:15, 30 September 2010 (UTC)

I think we need to use Category:Modern Greek derivations instead of Category:Greek derivations, because otherwise we lack a clean-up category for continued (mis)uses of {{etyl|el}}. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 17:40, 30 September 2010 (UTC)
Yes, that's what the template uses. —RuakhTALK 18:25, 30 September 2010 (UTC)
Note that we already have {{etyl:el-GR}}, which does the same thing, and which IMHO is a better name. (If we want our etymologies to use a more explicit code for Modern Greek, then we should use a code that's actually more explicit, rather than a code that's completely synonymous.) As for the overall concept — I've proposed it several times before, hence the existence of {{etyl:el-GR}}, but various editors objected (including Atelaes (talkcontribs) and Vahagn Petrosyan (talkcontribs), IIRC). But if they've changed their minds, then I'm definitely on board. It's important that the approach we take have the support of the editors who actually work on these etymologies, else we're not going to accomplish anything. —RuakhTALK 18:25, 30 September 2010 (UTC)
I'm for renaming {{el}}, or at least {{etyl|el}}, to “Modern Greek”. But first we should manually sort Category:Greek derivations between {{el}} and {{grc}}. If someone is willing to help me, we can do that in a week. --Vahag 13:41, 1 October 2010 (UTC)
Capitals and letters from a to h are done. --Vahag 18:14, 1 October 2010 (UTC)
The Etymology sections of entries that improperly have el for grc usually have other problems, which should not be neglected in the haste to accomplish this apparently worthwhile goal. Missing {{term}}, omitted lang= in {{term}}, omission of "-" as second parameter in {{etyl}} for cognates are typical.DCDuring TALK 14:19, 1 October 2010 (UTC)

OK, Category:Greek derivations is now cleaned up. All entries in it are from Modern Greek. The rest are in Category:Ancient Greek derivations and Category:Byzantine Greek derivations. Now let's:
1. Pat me on the head
2. Make {{etyl|el}} display "Modern Greek" and categorize into Category:Modern Greek derivations
3. Delete {{etyl:el-GR}}
4. Redirect {{ell}} to {{el}}
5. Do something with Category:Greek derivations. I don't know what. --Vahag 19:36, 5 October 2010 (UTC)

Well done! To clarify: by Category:Greek derivations do you mean everything in Special:WhatLinksHere/Template:etyl:el? Or just the English entries? —RuakhTALK 20:15, 5 October 2010 (UTC)
Everything, yes, in all languages. --Vahag 21:23, 5 October 2010 (UTC)
Vahag: pat, pat. Seriously: great job. DCDuring TALK 23:19, 5 October 2010 (UTC)
Awesome, thank you! —RuakhTALK 21:53, 5 October 2010 (UTC)
Oh — and I think we should delete {{etyl:ell}} and {{ell}} as well as {{etyl:el-GR}}. None of them will serve any purpose once {{etyl:el}} behaves correctly. —RuakhTALK 22:17, 5 October 2010 (UTC)
I suspect that {{etyl|el}} will continue to be misused for {{etyl|grc}}; IMO, we should stick with {{etyl:ell}}. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 14:54, 8 October 2010 (UTC)
How can it be misused? When a user sees it says "Modern Greek" he will look for the code of Ancient Greek or would ask a more experienced user for help, no? Are you suggesting to disable {{etyl|el}}? --Vahag 14:59, 8 October 2010 (UTC)
I'd not thought about deleting {{el}} disabling {{etyl|el}}; that could work... Re your second question, that would seem the most sensible thing to do in such a situation, but I simply don't trust all our inexperienced users to do the most sensible thing in a given situation. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:04, 8 October 2010 (UTC)
Support merging the {{etyl:ell}} and {{etyl:el-GR}} into a new {{etyl:el}}. {{el}} and {{ell}} don't need to be deleted at all IMO. Mglovesfun (talk) 15:10, 8 October 2010 (UTC)
We could just keep {{etyl|el}}, any uses of which can automatically be considered cases for clean up (categorised to Greek derivations). IMO, {{ell}} ought to redirect to {{etyl:ell}} (or whitherever) instead of to {{el}}, since I've never seen {{ell}} misused. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:51, 8 October 2010 (UTC)
Odd you should say that, as hypothetically it should never be used, as ISO 639-1 codes supercede ISO 639-3. The few cases where {{etyl:ell}} is used, it's me testing it. Mglovesfun (talk) 16:12, 8 October 2010 (UTC)
IMO, any principle dictating consistent use of ISO codes should take a back seat to preventing etymological miscategorisation. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 16:19, 9 October 2010 (UTC)
O.K., but what's your evidence that misusing ISO codes will prevent etymological miscategorization? —RuakhTALK 18:39, 9 October 2010 (UTC)
What evidence do you mean? Evidence of misuse of {{etyl|el}}, {{etyl|ell}}, or {{etyl:ell}}? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:30, 10 October 2010 (UTC)
Well, you seem to believe that (1) if we change {{etyl|el}} to display and categorize it as "Modern Greek", then there will nonetheless be editors who misuse for Ancient Greek etyma; and (2) if we start treating Category:langcode:Greek derivations as a cleanup category, replacing instances of Modern-Greek {{etyl|el}} with {{etyl|ell}}, then there will not be editors (or at least, not as many editors) who misuse {{etyl|ell}} for Ancient Greek etyma. Is that a correct expression of your beliefs? If so, then I assume that these beliefs must be based, at least in part, on some sort of evidence, and I would like to know what that evidence is, so that I may judge for myself whether I agree with them. —RuakhTALK 12:48, 10 October 2010 (UTC)
I think that what Wikipedia and other references say will guide usage, and since they uniformly give el for Greek (because it's the ISO 639-1 code), that's what people will use; changing the displayed language name may make some users think again, but certainly not all or, I would guess, even most of them (I've seen Onondaga misused for Old Norse, which, unlike Greek, Template:etyl:ell, and Ancient Greek, could in no way be confused except by mixing up code names). My objection is necessarily speculative, because the change is so new, meaning errors will not have had time to accumulate. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:21, 18 October 2010 (UTC)
FWIW, I thought that the use of "Greek" for Modern Greek was okay, and that no change was needed. Most of the misuses of "Greek" to mean "Ancient Greek" were by my estimation traces of early Wiktionary history, not something done in recent edits: but in this I may be wrong. Most of these would result from hasty taking over etymologies from Webster 1913, as that only involved thoughtless copy-and-paste. The situation was corrigible almost robotically, as most Greek roots for English terms are actually Ancient Greek ones; a bot or quasi-bot supervised by a human would have corrected the issue. Messing up with {{el}} seems rather unfortunate to me. I would like to see some evidence that people confuse "Greek" for "Ancient Greek" in recent edits. --Dan Polansky 08:30, 11 October 2010 (UTC)
The argument "we need to use Category:Modern Greek derivations instead of Category:Greek derivations, because otherwise we lack a clean-up category for continued (mis)uses of {{etyl|el}}" looks as implausible as anything: a clean-up category should be named like one, on the model of "Category:Greek derivations that should be Ancient Greek". But there does not even need to be a cleanup category; there can be a bot-generated cleanup list, such as the lists created as subpages of WT:TODO. --Dan Polansky 08:33, 11 October 2010 (UTC)
Re the evidence, off the top of my head: this and this, both less than a year old. --Vahag 15:41, 11 October 2010 (UTC)
Two edits almost nine months ago are not a sign of overflood of errors that cannot be corrected by notifying the editors of the difference between Greek and Ancient Greek, possibly using a boilerplate template. The deviation from the common practice in case of "Modern Greek" seems unfortunate: there are English and Old English, French and Old French, Swedish and Old Swedish; Chinese and Old Chinese, Dutch and Old Dutch, Japanese and Old Japanese, Spanish and Old Spanish, etc. The problem of confusion of Greek and Ancient Greek seems not urgent enough to justify an unvoted-on change of practice. I wonder what is planned to happen with the L3-heading, namely whether it would be renamed to "Modern Greek". In case this is planned, I oppose renaming L3-heading "Greek" to "Modern Greek". Given the absence of solid evidence to support the change in practice, I also oppose the existence of Category:Modern Greek derivations. And I request restoration of status quo before the change announced in this thread until it is shown in a vote that this change has a consensus support. --Dan Polansky 18:44, 11 October 2010 (UTC)
Dan, please appreciate how difficult and tedious it is to find these errors; how many do you need to see before you'll consider it a problem? I favour changes insofar as they clearly distinguish Modern from Ancient Greek. I agree that we oughtn't change the level-two language header from Greek to Modern Greek; ISO 639-3's septempartite division of Greek is not reflected here (in that we treat Ancient, Koine, Byzantine, and other pre-1453 forms of Greek all as "Ancient Greek" and everything post-1453 as "Greek"), so appeals to consistency don't apply to language-header names. I also agree with you that the clean-up category shouldn't just be named Category:Greek derivations; how about Category:Greek derivations from unspecified periods (clean-up category)? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:21, 18 October 2010 (UTC)
(unindent) I am not convinced. These errors seem to be fairly easy to find: all you need to do is look at Category:Greek derivations and see whether there is anything suspect; the category has now 51 entries, and its members look rather typical of Modern Greek derivations: "bouzouki", "rebetiko", etc. --Dan Polansky 14:18, 18 October 2010 (UTC)
That's because Vahagn Petrosyan very recently cleaned up that category (see his post hereinbefore, timestamped: 19:36, 5 October 2010), so it's quite unlikely that it contains any miscategorisations at the moment; however, that certainly does not mean that no new miscategorisations will occur if we revert to the status quo ante. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 17:07, 19 October 2010 (UTC)
(unindent) So what? I thank Vahag for having done the cleanup[1]. Nonetheless, the entries that he has cleaned looked unlike Modern Greek derivations, so were easy to spot from looking at Category:Greek derivations. By contrast, you maintain that it is "difficult and tedious ... to find these errors", a statement that sounds implausible to me. --Dan Polansky 18:02, 19 October 2010 (UTC)
Post-archival: I responded to Dan Polansky on his talk page. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:19, 24 October 2010 (UTC)

Esperanto/Ido uppercase nouns/adjectives

Are Esperanto adjectives relative to a country or other proper noun with an uppercase letter or not? We seem to have nederlanda and Nederlanda. The same goes for names of languages and demonyms - uppercase letter, or not? Mglovesfun (talk) 13:59, 4 September 2010 (UTC)

Go by whichever is/are attested per WT:CFI. (I suspect that in most cases, that will be neither.) —RuakhTALK 14:02, 4 September 2010 (UTC)
Having said that, when converting atento from uncountable to countable, atentojn gets five Google Book hits, and atentoj gets many more than that. Mglovesfun (talk) 10:33, 5 September 2010 (UTC)

2 000 000 entries!

Congratulations to Zeitlupe (talkcontribs) for creating the entry betragen, which is, unless I'm quite mistaken, the 2 millionth entry on Wiktionary! Sorry, to Prince Kassad for Jubiläum

Go team! —Internoob (DiscCont) 19:04, 7 September 2010 (UTC)

Wait, now Special:Statistics says 2 000 002 entries, but in the meantime, 6 more entries have been created, as per Special:NewPages. It appears I'm quite mistaken. —Internoob (DiscCont) 19:19, 7 September 2010 (UTC)
Goddammit I deserve one of those milestones by now :) Equinox 19:25, 7 September 2010 (UTC)
You are quite mistaken. I use the count on Recent Changes (which should be the most accurate) and according to it, my entry on Jubiläum is the two millionth entry. -- Prince Kassad 19:35, 7 September 2010 (UTC)
(after edit conflict) Even if one of the other entries around Jubiläum is the real 2000000th, I think we can look the other way, just because of the suitability of Jubiläum.  :-) ​—msh210 (talk) 19:43, 7 September 2010 (UTC)
Yes, you're probably right. I was going by Special:Statistics and the one on the main page. Congratulations! —Internoob (DiscCont) 19:41, 7 September 2010 (UTC)

Untidy pages viz. plurale tantum - Pronunciation section

Do we need these large boxes asking for pronunciation info? With a TOC and a wikipedia-link all we lack is a picture or two. Until the majority of articles have this pronunciation information it seems unnecessary to have these requests - a similar one is available for Etymology.
If we want users to be able to make a request there are other ways. —Saltmarshαπάντηση 16:26, 9 September 2010 (UTC)

Would something the size of {{slim-wikipedia}} be acceptable? The plurale tantum case had two {{rfap}}s where one would do. (I have merged the two.) I don't see why we would dispense with in-entry displays. Big ones are warranted for RfV and RfD at a level higher than sense level, but smaller ones should suffice for requests and challenges to lesser components of the entry than PoS, inflection line, and definitions. DCDuring TALK 17:27, 9 September 2010 (UTC)
BTW, is it possible to force such a section request template to occupy space to the right of the associated header but to the left of rhs ToC, images, sister-project boxes. Frankly, directly to the right of the header with a little padding would be fine with me. DCDuring TALK 17:31, 9 September 2010 (UTC)
We do have some big, boxy templates that can hardly ever be used in entries without disrupting the layout. {{rfquote}} is one, I think. Mglovesfun (talk) 23:08, 9 September 2010 (UTC)
But even here I think that the example I have put at flux#Verb is so much neater? It would point to a suitable help page.
The trouble is that when most entries are incomplete these boxes could proliferate everywhere. —Saltmarshαπάντηση 04:33, 10 September 2010 (UTC)
A box wouldn't hurt to draw attention to what is missing. The fact is that we are "under construction". To attempt to create the illusion that we are not is counter-productive and deceptive. I believe that we have need a great more attention-drawing notices to improve the appallingly bad quality of English definitions (often using century old wording) and the use of obsolete, archaic, and rare English words as glosses in FL sections. Similar specialized notices are warranted for defective sections such as synonyms.
Reducing the size of such notices seems like a good thing, especially the vertical space taken. For example, your boxless design to rfquote is one line less wasteful. A boxed notice that appeared at the end of the definition would be even less wasteful. DCDuring TALK 12:08, 10 September 2010 (UTC)
Certainly the small box at the end of a definition is a step in the right direction. I also agree that there are many places where editing is needed - I am just highly cynical (in the absence of stats) that they encourage the casual user to be brave and add data. And I worry that further proliferation might put users (not editors) off. My point doesn't attract support so I'll shut up! —Saltmarshαπάντηση 17:40, 10 September 2010 (UTC)
If normal users don't see that we mark problems, how can we get some of them to mark them or correct them? If we can't get more occasional users to tag and correct problems, how are we going to recruit more regular users to do the work? Should we make it easier for users to query specific aspects of an entry that bother them? Opening up an edit box seems slow and takes the specific focus of the user's problem out of view. Would a small pop-up window be better? Would it be feasible? Limited to registered users?
There might be support for steps in the direction you suggest.
  1. Why can't RfP and RfAP be combined, with RfAP being replaced by a switch within {{rfp}}? Double boxes in a section seem silly.
  2. Why can't all request boxes be smaller, which should not be highly controversial?
  3. Could certain boxes appear just to the right of the applicable section header?
-- DCDuring TALK 18:20, 10 September 2010 (UTC)
Combining rfp and rfap sounds grand.​—msh210 (talk) 18:53, 16 September 2010 (UTC)
Those three suggestions of DCDuring answer all the points I originally made. Of course I want to attract users to improve the project - I didn't think we were going about it in the right way - and some pages were becoming very busy down the right hand side. —Saltmarshαπάντηση 04:59, 11 September 2010 (UTC)

Classification of abbreviations, initialisms and acronyms

IMO abbreviation, acronym and/or initialism should be in the etymology, not as a part of speech. Often it's a good 'lazy' solution to avoid splitting by part of speech to lump everything under 'abbreviation'. MILF for example is a noun, USA is a proper noun and FUBAR in an adjective. While these are abbreviations in one sense, we lose valuable information if we don't put a proper part of speech header. Perhaps that's why {{abbreviation}} exists, so you can put a proper part of speech template below it, instead of {{infl|en|abbreviation}}. Mglovesfun (talk) 23:06, 9 September 2010 (UTC)

Additionally, I think the distinction between acronym and initialism is one of pronunciation, not necessarily POS level. --Bequw τ 01:46, 10 September 2010 (UTC)
I agree, except that I think it should go in the sense line rather than the etymology section, using {{abbreviation of}} and its ilk. In many cases we'll need to specify dot=: and add more information besides just the expansion, but even then it's the logical starting-point for the definition. —RuakhTALK 01:55, 10 September 2010 (UTC)
I agree with everything said so far. How's our entry for ind. as an example of a standard format? (There's a better one that I created at some point, but I can't remember where it is.) — Raifʻhār Doremítzwr ~ (U · T · C) ~ 10:29, 10 September 2010 (UTC)
Looks good to me. And for a non-English example, [[usw.]]. —RuakhTALK 12:54, 10 September 2010 (UTC)
Should these entries have pronunciation sections like usw. or not like ind.? DCDuring TALK 18:40, 13 September 2010 (UTC)
I think they should (I added the pronunciation to usw. since when a text containing this abbrevation is read aloud, it's pronounced just like the spelled-out form rather than letter-for-letter, which might be different for other abbrevations). Longtrend 18:56, 13 September 2010 (UTC)
I hadn't thought of this before. I see that the OED has fubar as an adjective and a verb - all points lost with the the Abbrev... heading. —Saltmarshαπάντηση 05:27, 11 September 2010 (UTC)
What are the asterisks on the verb inflections at ind.? Equinox 08:54, 13 September 2010 (UTC)
Click on them and you'll see. They link to another term which explains it. —CodeCat 09:03, 13 September 2010 (UTC)
No, they don't. They just link to [[*]], without any indication of which sense is meant. Honestly, I find it strange — asterisks in linguistics usually mean either "unattested" or "ungrammatical", and in either of those cases we shouldn't list any inflected forms at all, rather than listing them under erasure. —RuakhTALK 09:33, 13 September 2010 (UTC)
I agree strongly that these should not be L3 headers. I'm fine with having them in an etymology section as proposed by Martin or in a definition line as proposed by Ran, even allowing both depending on which makes more sense for any given entry. (For example, if an abbreviation is much better known than its long form (e.g., perk#Noun) then the fact that it's an abbreviation should probably be in its etymology section, with the definition line stating the definition.)​—msh210 (talk) 16:35, 13 September 2010 (UTC)
What should the entry for perk look like? How many etymologies? I think we need to have some well-structured examples to understand where this might go. DCDuring TALK 18:29, 13 September 2010 (UTC)
Procedurally, this seems to be a matter that is vote-worthy once we have achieved some kind of consensus. I am reasonably sure that we don't want full pronunciation sections, even for initialisms and acronyms and dead certain that we won't want to have separate etymology sections for each of the words for which "ind." is an abbreviation. AFAICT the usual layout our entries follow is nowhere explicitly sanctioned. Wiktionary:Entry_layout_explained/POS_headers#Acronyms.2C_Abbreviations.2C_and_Initialisms is terse. I suppose someone versed in the art of reading legal documents could infer that it permits whatever our prevailing practice at the time it was voted on. Do we need to make some kind of explicit recognition that our rules are intended to cover only some set of languages, say, those using the Roman letters without diacritics for abbreviations?
More substantively, it seems useful to have some kind of distinguishing notation for senses that normally only appear in writing and highly specialized speech vs those that are commonly spoken short forms. For example, AFAICT and vs are not spoken as far as I can tell, whereas some senses of arb are (eg, the finance sense arbitrageur and when used to mean arbitration in some contexts.
Should this kind of thing be explicit or implicit? Leaving such to be inferred from the absence of a Pronunciation section will mislead even knowledgeable users until all entries have been fully addressed. It may mislead users even after we have achieved an adequate level of compliance with our standard for these entries.
Under what circumstances would we not want an etymology section? All cases of true initialisms, acronyms, and truncations of the definiens seem sufficiently transparent that we should proscribe any etymology section. DCDuring TALK 18:22, 13 September 2010 (UTC)
What about attestation? These terms may be quite difficult to cite in each individual sense especially if we depend on Google and even more so if we are to distinguish between forms with and without punctuation, usually handled as a style-guide question, not descriptively AFAICT. DCDuring TALK 18:40, 13 September 2010 (UTC)
So, how do people propose we set up the entry for AFC? If we mark by part of speech, then we'll need at least one pronunciation section and nine etymologies. --EncycloPetey 14:59, 20 September 2010 (UTC)

Random author attribution with no other data

As in this version of localize. Were these imported from a particular source? Is there any pattern that can be used to find and fix them? Nadando 20:27, 3 September 2010 (UTC)

That looks like one of Webster's argumenta ad verecundiam. See, for example, secret#References, where Webster’s Unabridged Dictionary (1913) has "To keep secret. [Obs.] Bacon."; in that case, it cites the same source as our 1625 quotation at secretted#Verb. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 20:52, 3 September 2010 (UTC)
The ones from Webster usually have specific works associated with each one. See Wiktionary:Abbreviated Authorities in Webster. Adding the works could be bot-able. --Bequw τ 04:48, 4 September 2010 (UTC)

Poll: Deleting "/more" pages from Wikisaurus

This year some people wanted to get the "/more" subpages in Wikisaurus deleted, such pages as is Wikisaurus:penis/more. That would also involve deleting or obsoleting the template {{ws more}}, whose text reads "Only words that meet criteria for inclusion can be included. You can post poorly attested or unattested words to Wikisaurus:ws more/more". Deleting the "/more" subpages would overrule the vote Wiktionary:Votes/2006-09/Wikisaurus semi-protection.

Please indicate whether you support or oppose the deletion, or whether you do not care. If no substantial opposition arises here, I would send this proposal to a formal vote. --Dan Polansky 09:50, 4 September 2010 (UTC)

An addendum days later, on the intended purpose of the slash-more subpages: The slash-more subpages of Wikisaurus have been proposed in this post:

'I have an additional suggestion. For each of the problematic Wikisaurus pages, that would be semi-protected under this proposal, put a simple note at the top saying, "This page is full. Please add additional synonyms to [[/overflow]]." That way the kids can continue to have their fun thinking of new words for penis and breasts, but nobody else has to look at them. —scs 00:38, 16 October 2006 (UTC)'[2].

User scs (talkcontribs) had, in the same place, given a longer (some 520 words) exposition in support of "/overflow". Another quote of user scs:

'Or, in a nutshell, we have to (sometimes) act as a repository for any swill that the twelve year olds can come up with so that we can be as open as we have to be to also attract the editors who will actually write the open dictionary.' —scs 02:54, 16 October 2006 (UTC)[3] --Dan Polansky 08:02, 8 September 2010 (UTC)

Yet another addendum: The slash-more subpages have been proposed even earlier by Richardb:

"A possible compromise between the "tough criteria for WikiSaurus", and the "Don't lose even the least valuable "synonyms". Introduce, in WikiSaurus, a xxx/more subpage for the problem pages. Cull the trash from the main page (by whatever criteria), but don't just delete it, put it in the /more page. In the main page indicate that new entries not meeting the tough criteria have to be put in the /more page, and there can be researched for verifiability, and perhaps later promoted to the main page. With this I would then suggest we might even protect the main WikiSaurus page. Admin's would then be responsible for checking the /more pages every so often to see if there are any terms that could be promoted to the main page, as they meet the criteria. Thus we would meet two purposes. The main WikiSaurus page would be kept up to our "standard" (which I have to point is very subjectively applied), whilst the /more page would capture every possible synonym, and would in effect be a specific protologism page.--Richardb 23:26, 10 May 2006 (UTC)"[4]

--Dan Polansky 07:38, 9 September 2010 (UTC)


  1. Symbol support vote.svg I support deletion; however, I'd be willing to leave them as they are, as long as the requirement for listing on those /more pages were made more strenuous (i.e., something like one citation of the word's use in a durably archived medium). — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:49, 4 September 2010 (UTC)
  2. Symbol support vote.svg Support Mglovesfun (talk) 12:53, 4 September 2010 (UTC) per Doremítzwr
  3. Symbol support vote.svg Support -- Prince Kassad 13:18, 4 September 2010 (UTC)
    Symbol support vote.svg Support. (I don't suppose we could dump LOP at the same time?) --Yair rand (talk) 05:54, 5 September 2010 (UTC)
  4. Symbol support vote.svg SupportInternoob (DiscCont) 17:49, 5 September 2010 (UTC)
  5. Symbol support vote.svg Support Equinox 19:25, 5 September 2010 (UTC) I think talk pages are good enough.
  6. Symbol support vote.svg Support The uſer hight Bogorm converſation 06:44, 13 September 2010 (UTC)


  1. Symbol oppose vote.svg Oppose DCDuring TALK 10:36, 5 September 2010 (UTC) I might favor the proposal if there were some presentation of the consequences.
    Well, they're not linked to from many pages (almost none in fact) so the consequences would be not having lots of explanations for entries that don't meet CFI. Mglovesfun (talk) 17:52, 5 September 2010 (UTC)
    1. How many are there? Where can I find a listing of them?
    2. Is their intended purpose obsolete? Can anyone demonstrate that they have not succeeded in their intended purpose? Is there a way to make them serve their intended purpose? Is there a superior way of achieving their intended purpose? DCDuring TALK 19:20, 5 September 2010 (UTC)
    Do you know what their intended purpose is? Mglovesfun (talk) 19:23, 5 September 2010 (UTC). Oh and Special:WhatLinksHere/Template:ws_more seems to answer the first question. Mglovesfun (talk) 19:24, 5 September 2010 (UTC)
    I would have expected someone proposing deletion to not be so dismissive of all who have labored in these vineyards before as to not trouble to find out the purpose. Clearly the pages are intended to provide a home for the vast number of synonyms in fairly widespread use for the terms involved, to wit, the most common vulgarities, sex, drunkenness, etc. No rationale for deletion has been presented. I think this is perhaps the best home we could have for these as the lack attestation is mostly because no one wants to wade through the muck of usenet to find it. IOW, prudish distaste. Hardly befitting our slogan. DCDuring TALK 20:14, 5 September 2010 (UTC)
    The /more pages in Wikisaurus are the following: Wikisaurus:penis/more, Wikisaurus:sexual intercourse/more,Wikisaurus:breasts/more, Wikisaurus:prostitute/more, Wikisaurus:buttocks/more, Wikisaurus:anus/more, Wikisaurus:homosexual/more, Wikisaurus:masturbate/more, and Wikisaurus:vagina/more; these are 9 pages. The rationale for deletion is that this is non-CFI-meeting material, mostly unattestable. Unattestable material is what most people who want Wiktionary become as professional as possible presumably do not want to keep in Wiktionary. All CFI-meeting content can be hosted directly in the main (non-/more) Wikisaurus pages, regardless of how vulgar or obscene it is: Wikisaurus is not filtered by obscenity, only by its meeting CFI. Randomly picking from Wikisaurus:vagina/more: there is *"full meathole jacket", which has 2 or 4 Google hits (depending on how you search) from which half are in Wiktionary, and it has zero Google search groups hits (searching all groups, not only Google groups)[5]. Let me ephasize that the proposal is to delete Wikisaurus:vagina/more; Wikisaurus:vagina should naturally be kept. The discussion that has lead to the creation of Wikisaurus "/more" pages: Wiktionary:Beer parlour archive/October 06#Stop me if this sounds familiar. --Dan Polansky 08:30, 6 September 2010 (UTC)
    If volunteers don't care to do the work to attest terms that they find distasteful, that is their prerogative. That we should allow this crypto-proscriptivism to grossly contradict our collective commitment to the principles and slogans we claim to hold so dear says a great deal about the depth of our actual commitment to the principles. I think we would be jettisoning a bit of our stated principles by jettisoning these pages, which, hidden and silent though they may be, stand as a reminder of the implications of "all words". If we do delete these pages, I will not hesitate to remind of this precedent those who would invoke our superordinate slogan in contradiction to CFI and in defense of whatever entry they happen to be favoring. DCDuring TALK 18:54, 6 September 2010 (UTC)
    Many of the terms on these pages fail the attestation criterion. They do so, not by failing to have an actual attestation entered into Wiktionary, but by having no potential attestation found on the web using Google. Put differently, many of the terms in the "/more" pages are utter crap, as the anons who have entered them cared for no standard whatsoever. There is no way that the unattestable terms are somehow "words" to be included. The proposal to delete these "/more" pages is one to keep the whole Wikisaurus to roughly the same standards as the mainspace rather than letting part of Wikisaurus be a free-for-all crap-wiki. I see no stated principles that this proposal discards. This proposal says that all Wikisaurus pages and subpages should abide by the stated principles, while the status quo says that "/more" subpages are exempt from all inclusion criteria. --Dan Polansky 19:17, 6 September 2010 (UTC)
    On another note, if you need these subpages to 'stand as a reminder of the implications of "all words"' (I don't think they really are an implication of "all words", anyway), these pages could be moved to your user space instead of deleted. --Dan Polansky 19:28, 6 September 2010 (UTC)
    I need little reminder of the prudish and elitist tendencies that keep us from actually living up to our stated principles. DCDuring TALK 14:11, 7 September 2010 (UTC)
    @DCDuring: See w:Wikipedia:Verifiability. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:09, 7 September 2010 (UTC)
    What's your point? DCDuring TALK 14:11, 7 September 2010 (UTC)
    You seem to be giving that list of barely-attestable and unattestable terms undue weight. I was reminding you that it's pretty unimportant whether "[that] vast number of synonyms [are] in fairly widespread use" if we cannot prove it. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 14:19, 7 September 2010 (UTC)
  2. Symbol oppose vote.svg Oppose. I support doing away with the concept of /more pages, but seeing as there are currently only nine, we might as well examine them for terms that might actually meet the CFI, and delete them individually rather than collectively. —RuakhTALK 18:34, 6 September 2010 (UTC)
    So what text of the vote would you support? What about "Voting on: Deleting "/more" subpages of Wikisaurus after their content is examined for CFI-meeting terms, and the CFI-meeting terms are copied to the main Wikisaurus pages"? Any other text that you would support? --Dan Polansky 18:51, 6 September 2010 (UTC)
    Maybe "Voting on: Doing away with Wikisaurus '/more' subpages"? —RuakhTALK 12:03, 7 September 2010 (UTC)


  1. Symbol abstain vote.svg Abstain Yair rand (talk) 18:57, 6 September 2010 (UTC)


We currently have sections in Edittools for both languages and script sections (eg "French" and "Latin/Roman"). I think we should just organize by script. It would simpler as there are way fewer scripts than languages. Currently is seems like we play favorites with languages as some choices aren't even driven by need ("Yoruba" has a section but under 100 entries). Organizing by script would also reduce redundancy and therefore speedup editing. By my count we could merge easily mergeinto "Latin/Roman" all: "Baltic", "Catalan", "Esperanto", "Estonian", "French", "German", "Hawaiian" "Icelandic", "Italian", "Maltese", "Old English" (maybe), "Pinyin", "Portuguese", "Romaji", "Romanian", "Scandinavian", "Slavic Roman", "Spanish", "Turkish", "Welsh", and "Yoruba". We could, of course, still make some layout decisions based on languages, but overall I think it would be much better if the broad grouping were by script. --Bequw τ 18:56, 5 September 2010 (UTC)

This has come up before, and I supported the merge to avoid having a few dozen sections. Mglovesfun (talk) 19:22, 5 September 2010 (UTC)
Fully support. Also, Sorani Kurdish should really be part of Arabic. -- Prince Kassad 19:44, 5 September 2010 (UTC)
I agree as well. Though, given that administrators seem to find the "Latin/Roman" section unwieldy, we may want to duplicate a small number of subsets in their own sections — maybe a "Western European", for example, modeled after ISO-8859-1? Certainly we don't need a separate section for each language, though.
By the way, we can also merge "Greek (Modern)" with "Greek (Ancient)", of which it's currently a subset.
But the edit-tools need a lot of technical work as well. We've made a heck of a lot of progress on this front, but we still have quite a ways to go.
RuakhTALK 19:49, 5 September 2010 (UTC)
I'd rather go ahead and remove some of the infrequently used symbols. For example, does anyone need the obsolete Irish lenited consonants? They're not used in current orthography and it seems also neither in Old nor in Middle Irish orthography. -- Prince Kassad 05:30, 6 September 2010 (UTC)
I have no comment on Irish specifically, but yeah, if a symbol is not seeing much use, then it doesn't need to be in the edit-tools (until we improve them in a way that makes a surfeit of symbols less costly). —RuakhTALK 18:25, 6 September 2010 (UTC)
I devoted a few hours of my time to create this list. I hope it does not contain major inaccuracies. -- Prince Kassad 20:44, 6 September 2010 (UTC)
Perhaps a section for linguistic use/proto-languages and such would be nice. For Proto-Germanic I'd like to request the characters listed here: Wiktionary:About Proto-Germanic. —CodeCat 21:05, 6 September 2010 (UTC)
For reconstructed languages it makes sense having a separate section since they have a large set of unique characters. We already have an Indo-European section so maybe its scope could be broadened. -- Prince Kassad 21:08, 6 September 2010 (UTC)
Good work on the list. This info could be saved to the entries for each letter. --Bequw τ 03:29, 7 September 2010 (UTC)
Yes, that's a useful list. Note that vowels with tildes are used often used in Mediæval texts to denote "vowel + 'm'" or "vowel + 'n'", which includes Ũ (common for the -um endings of Latin's second-declension neuter nouns); also, vowels with breves were formerly used extensively to mark explicitly short vowels in Latin, too. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:15, 7 September 2010 (UTC)
Not sure which are the Irish lenited consonants but here's the usage frequency of all characters appearing on Edittols. --Bequw τ 04:04, 8 September 2010 (UTC)
Hmm, it shows that they're only used once, namely on the entry on themselves. This proves that no one really needs them. -- Prince Kassad 06:36, 8 September 2010 (UTC)

I'm still not sure what to do with the Sign languages section. I mean it's pretty nice to have all possible positions listed, but it seems to me that the Edittools is the totally wrong place for that. -- Prince Kassad 07:56, 9 September 2010 (UTC)

apocope & apheresis

I believe that we should import these categories from fr:Catégorie:Apocopes and fr:Catégorie:Aphérèses. JackPotte 21:16, 6 September 2010 (UTC)

What would those categories contain, though? I can't make it out. —CodeCat 21:20, 6 September 2010 (UTC)
Well, with wordings like Category:English examples of apocope and Category:English examples of aphaeresis, they can have contents similar to Category:Examples of syncope. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 21:38, 6 September 2010 (UTC)
If they are like Category:Italian apocopic forms, they will contain the apocopic forms of words. SemperBlotto 21:50, 6 September 2010 (UTC)
Yes, we might add a {{DEFAULTSORT:apocope}} into Category:Apocopic forms. JackPotte 23:31, 6 September 2010 (UTC)
So, are we looking at Category:English apocopic forms and Category:English aphaeretic forms? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:22, 7 September 2010 (UTC)
Given the prior existence of the former, I've created the latter. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:29, 7 September 2010 (UTC)

blue-plate special

See Blue-plate special. I want to add the image of the plasticky blue plate (captioned "A typical blue divided plate used for a Blue-plate Special"), but when I copied the Image: tag and added it to our entry, I got a totally different plate, a yellow ceramic one. What's going on? Equinox 20:10, 9 September 2010 (UTC)

The image is uploaded locally to English Wikipedia. Therefore, we can't use it. -- Prince Kassad 20:19, 9 September 2010 (UTC)
I think there is a mechanism for requesting that it be moved to Commons. I don't know how to be notified here when that process is complete or otherwise effectively monitor the image's availability. DCDuring TALK 21:16, 9 September 2010 (UTC)
Not only that: there's a mechanism for moving it to Commons.  :-)  I've done so, and added the picture to our entry.​—msh210 (talk) 16:49, 13 September 2010 (UTC)

Fundraising 2010

Hello Wikimedians,

As many of you are aware, we are now two months away from the Fundraiser for the Wikimedia Foundation, 2010. We have lofty goals, and we can meet them and exceed them!

The meta translators are already actively engaged in the annual drive to distribute our messages and we encourage you to do the same, but we would like to point everyone to the developments we've made in banner messages- from creation to commentary to the ones that will go live for test and for the drive itself in November. It's one of our goals to make sure that all volunteers know that there is a place for them in the Fundraising drive. We've started the setup on meta for both banner submission, statistical analysis, and grouping volunteers together that would like to find specific focus and work in that area.

This year the Wikimedia Foundation is taking a proactive stance in reaching out to each and every Wikimedia project and volunteer to find innovation, collaboration, and collation of ideas from the community driven process. The staff working on this is comprised of long-time Wikimedians with as much care and concern for the success of this drive as the volunteers, and we want you to actively participate and have a voice.

Use the talk pages on meta, talk to your local communities, talk to others, talk to us. Engagement is what we strive for, without each other we would never had made Wikimedia succeed. Everyone is welcome to contact any of us on staff at any time with a timely response to follow. We actively encourage focusing discussion on meta so we can all work together.

Please translate this message into your language if you can and post it below.

See you on the wiki! Keegan, WMF Fundraiser 2010 02:39, 13 September 2010 (UTC)

Attestation for pronunciations

Are there criteria for attestability of pronunciations here, or should there be? If I added the pronunciation /hɛnhaʊs/ to fox, would there be a reasonable reason to revert (besides the fact that "fox" contains none of the sounds in "henhouse"...) — lexicógrafo | háblame — 18:45, 13 September 2010 (UTC)

Nitpick: Fox does contain /s/, which is in henhouse. And if you pronounce henhouse /hɛnhaʊs/ (I don't), then fox shares /a/ with it also.​—msh210 (talk) 19:42, 13 September 2010 (UTC)
We lack criteria for inclusion in Wiktionary of pronunciation, etymology, and, I think, everything else, except senses.​—msh210 (talk) 19:42, 13 September 2010 (UTC)
I guess three indepedent people would need to record other people pronouncing the word and upload their videos to YouTube. But in all seriousness, I have no idea how you would want to attest pronunciations. -- Prince Kassad 19:51, 13 September 2010 (UTC)
I was just thinking something like "I can name five unrelated people in different places who say it this way", but perhaps five is too many. — lexicógrafo | háblame — 19:56, 13 September 2010 (UTC)
I don't think that "three indepedent people would need to record other people pronouncing the word and upload their videos to YouTube" is so ridiculous- after all, we have Commons to upload recordings to. And I assume it would be legal to 'quote' from other recordings (of native speakers) as long as the quoted segments were single words. Nadando 20:43, 13 September 2010 (UTC)
A single audio files on Commons per-accent is sufficient. Supporting references from youtube should be easy to find given one can search the closed-captioning/subtitles. Might help, when an accent is tagged {{rfv-pronunciation}}. --Bequw τ 03:16, 14 September 2010 (UTC)
How would one link to such videos, then, from the entry page? Or would one have to download and re-upload the audio to Commons? — lexicógrafo | háblame — 12:19, 14 September 2010 (UTC)
I've added a small box at crawfish. Is that ok-looking? — lexicógrafo | háblame — 14:19, 14 September 2010 (UTC)
I don't like it. I think what we want is edited audio uploaded to Commons, with just the relevant word. The source can be linked to on the description page for the file. Nadando 16:20, 14 September 2010 (UTC)
Are YouTube videos free such that it would be acceptable to upload a snippet of their audio to Commons? — lexicógrafo | háblame — 17:09, 14 September 2010 (UTC)
I don't know. I've uploaded a test file and posted the question on their village pump. Nadando 18:49, 14 September 2010 (UTC)
I've also posted a question at the YouTube help forum. — lexicógrafo | háblame — 21:49, 14 September 2010 (UTC)
Here's how crawfish looks if the links were formatted with just normal <ref> tags. --Bequw τ 19:22, 14 September 2010 (UTC)
I thought about that too, but there are some entries that already have a bucketful of references and it might be confusing to add more. And these aren't really references, more like citations, aren't they? — lexicógrafo | háblame — 19:38, 14 September 2010 (UTC)
The =References= header may be a bit loose for these source. But the <ref>-tags are actually just footnotes, and so they don't need to be shown in a =References= section. We can have a different "group" parameter and show them somewhere else. They could go in a subsection of =Quotations= (can't be citations as that's a different page), a partitioned area of =References=, a special dropdown in =Pronunciation=, or not use footnotes at all and do something like you first did. Also, you can deeplink to specific time-index of a youtube stream. I've created {{youtube}} to facilitate those links. --Bequw τ 23:19, 14 September 2010 (UTC)
Taking snippets of the audio is "explicitly forbidden", according to YouTube's terms of serviec, so it'll have to be links to the videos. — lexicógrafo | háblame — 00:17, 15 September 2010 (UTC)
Come to think of it, putting <references/> in =External links= seems like a good idea. --Bequw τ 04:22, 15 September 2010 (UTC)
WT:References says, "Pronunciation information can rely on recorded audio of actual speech". —Internoob (DiscCont) 22:47, 13 September 2010 (UTC)
Good to know. — lexicógrafo | háblame — 22:56, 13 September 2010 (UTC)
We'll probably want videos though. Otherwise, there's no way to know if it is one and the same person, the contributor himself, or maybe even a synthesized voice. -- Prince Kassad 16:22, 14 September 2010 (UTC)
This just occurred to me re: Castlegregory. It met CFI when I added a pronunciation, provided my IPA is correct. Though CFI doesn't say that anything about correct etymologies and pronunciations, it would be ridiculous to assume that incorrect information is sufficient. Mglovesfun (talk) 16:33, 14 September 2010 (UTC)

ListenToYouTube.com seems to be an excellent website for acquiring YouTube audio, if YouTube's audio is ok for us to use. — lexicógrafo | háblame — 18:20, 14 September 2010 (UTC)

CFI for -ing form nouns and adjectives

I propose that we not have noun and adjective sections for -ing forms of English verbs unless some stringent tests are met. .

For adjectives, ordinary tests of adjectivity should be be sufficient. (See Wiktionary:English adjectives, which is not exhaustive on sufficient texts, but has the most common tests.)

For nouns, the stringent test would be the existence of a non-gerundial sense of the -ing form. The attestable-plural test that we have been using seems to me to be too inclusive in one sense and too exclusive in another. On the one hand, the formation of a plural for a previously unattested -ing form is no more remarkable than finding an instance of an uncommon inflected form of an uncommon word in an inflected language. Excluding a term is misleadingly exclusive. On the other hand, too many -ing forms have attestable plurals but are found as gerundial nouns with meaning that are completely predictable from the senses of the verb from which they are derived. In principle, we might include noun senses for each of the verb senses. If we require attestation of each sense, we will exclude some of them, but they will nevertheless probably find use. A user might be misled by the absence of a sense into thinking that the word could not be properly used in the missing sense though its meaning would be readily derived from the senses of the verb. In fact one could argue that each verb sense should have both an uncountable sense ("much radioing") and a countable sense ("many radioings").

Many -ing forms have developed meanings that are semantically distinct from gerundial noun meanings. (Eg, coping, building, printing, booking) The older of these are so distinct as to be obvious. But generally, semantic arguments are less clear-cut than syntactic arguments. Consequently, there are likely to be cases that lead to marginally productive disputes. I nevertheless believe that this is a small price to pay for generally preventing the creation of trivially redundant, but hard-to-maintain PoS entries for virtually every verb. If we can have a bot rapidly generate all the generic noun and adjective entries in a consistent way and we can maintain the cross-entry linkage between the senses of each gerundial noun and the verb from which it derives, the maintainability argument will no longer apply. DCDuring TALK 22:53, 13 September 2010 (UTC)

My main issue with this proposal is that it treats occurrences of -ing forms that have several features of nouns as verb forms. Even if an -ing form (a) has an attestable plural ("perusings"), (b) is attestably used with an indefinite article ("a perusing"), (c) is modified by an adjective ("illegal downloading"), you would still exclude a noun section or one of the noun definitions for the form. Yet another test of nounhood: in "swimming pool", "swimming guide", "perusing fee", "reading guide", "reading glasses", "shopping mall", and "cooking recipes", the -ing forms seem to be (d) nouns used attributively to modify other nouns. And a further test is that (e) the -ing form is modified attributively by another noun such as in "horseback riding" or "night swimming". Of course, (f) -ing forms typically occupy noun-phrase positions in sentences, as in "swimming is fun" and "I like swimming. The discussed -ing forms that would probably be barred a noun section or a noun sense by your proposal are those for processes, patterns of processes, acts, actions, and activities (some of these are likely synonyms) rather than those for results of processes. You would probably include -ing forms that refer to results of processes such as "building" (a physical object) or "booking" (a reservation). It seems that none of the noun-behaviors listed above of -ing forms for processes convince you to include a noun section or a noun sense for the process. IMHO, a verb section does not do justice to all these noun-behaviors. --Dan Polansky 06:31, 15 September 2010 (UTC)
It's true however that Category:English plurals say nothing about 'noun plurals' - we already have plurals of proper nouns in there. Mglovesfun (talk) 11:03, 15 September 2010 (UTC)
A lexicon cannot do justice to many matters of grammar, pragmatics, etc. It probably serves it users best by maintaining a distinction between what it covers lexically and what it does not that is somewhat intelligible or, at least, explainable to users. Category:English grammar appendices has some useful information. It would not be difficult to have some specific appendices or wiktionary pages to address the non-lexical subjects that explain the grammar of the parts of speech and the apparent omissions. We could attempt to come to some kind of agreement on how such appendices could be more obviously linked to from the entry instead of being relegated to "See also", included in Usage notes, or omitted altogether.
Proper noun plurals, attributive-only use of nouns, attributive and predicate use of past participles which are not otherwise adjectival, and the various uses of -ing forms are all cases where a lexical treatment seems to be a waste of time. Perhaps a bot can create the entries. Perhaps some enhancement of our search-result capability can pull information from our core entries to display pseudo-entries. Perhaps some other technical innovations can economize on English language contributor effort to create and maintain redundant effort.
Some useful thing with respect to -ing forms would be to:
  1. revise the displayed text in {{present participle of}}
  2. include plurals in the inflection line with appropriate text and labels
  3. include multiple (4-5) usage examples under the form-of "definitions" that show the main usages of the -ing form (gerundial noun (including plural), adjective, progressive aspects (2 tenses))
  4. generate a list of -ing form entries that have fewer than 2 usage examples.
  5. insert a link to an appropriate appendix in every -ing form entry.
I believe this would make -ing-form entries useful without creating a maintenance nightmare that confused readers by presenting rule-type information lexically. DCDuring TALK 12:35, 15 September 2010 (UTC)
I do not think there is any maintenance nightmare of -ing forms. There are other groups of forms that have regular formation and rather regular definitions: <adjective>-ness: The quality or state of being <adjective>; <verb>-able: Capable of being <verb-past-participle>; <verb>-er: A person or a thing who <verb>s. While the very existence of the form <adjective>-ness and <verb>-able is non-trivial as not all adjectives have this suffix and not all verbs are attestably -able, the definition is often trivial. I admit that -ing forms are particularly regular, more so than the given example groups. But I do not quite see the problem that you are trying to solve. The noun defitions that you are trying to get deleted are of the form "The act or activity of <verb>ing", which looks much like the -ness definitions in terms of regularity. --Dan Polansky 13:50, 15 September 2010 (UTC)
But there is a caveat: "The act or activity of <verb>ing" defines an -ing form in terms of an -ing form. Hmm. --Dan Polansky 14:05, 15 September 2010 (UTC)
A coordination page for the subject, for reference: Wiktionary:English -ing forms. --Dan Polansky 09:44, 17 September 2010 (UTC)
  • This maintenance nightmare is solved in same languages by using a ===Participle=== PoS header which conflates several distinct usages (adjectival, nominal, verbal e.g. to form compound tenses) of words derived from the same original verbal root using well-established and predictable patterns of morphological derivation. I voiced by dissent back then when it was discussed and I voice it now: if something acts as a noun, it should be formatted as a noun; if something acts as an adjective, it should be formatted as an adjective; if something is used to form a compound verbal tense, it should be formatted as a ===Verb===. If any of these different usages conflate in a single word form, there is no problem in having separate PoS headers to use for all of them. English -ing forms are just another case of it. The adjectivity criteria listed at Wiktionary:English adjectives are a bit stretched: the first and the second cannot be used for relative (those meaning "of or pertaining to") and possessive adjectives but only for those denoting some kind of quality (and only for quality that can be graduated, not those "binary" types of adjectives denoting quality can only be present or absent: actually adjectives present and absent are in fact perfect examples), the third criterion requires that the adjective denotes a change in state, and only the fourth criterion is valid IMHO (though the "before the noun" part should be removed because some of the English adjectives are postpostional). As for th English -ing verb forms, well all obviously can be used as both adjectives and nouns, just as past participles in -ed can al also be be adjectives. If English had preserved a bit more morphology from the ancient times this would've been more obvious. That these are true adjectives can be seen from the fact that both of them can regularly form adverbs ending in -ingly and -edly. Same goes for verbal nouns (gerunds): they are true nous, not "participles", "verb forms" or whatever. That they are constructed in a regular manner, and that their meaning can usually be trivially inferred ("act or process of doing X", "the outcome of doing X" etc.) is irrelevant. Since we have no space limitations as opposed to paper dictionaries, I see no reason why all -ing forms shouldn't have all the three PoSes. Yes it means a lot more work, and if it can be automated - great, but if it can't in same cases as I suspect it can't, better roll up your sleeves and get to work. --Ivan Štambuk 08:40, 18 September 2010 (UTC)
Re: "roll up your sleeves": No, thank you. It is a fantasy to believe that we will recruit enough people to do the work manually. I believe our best hope is consistent presentation of -ing forms until we have a different data structure or magic-performing bots or templates. The conceptual ideal of largely redundant rewording of multiple senses of a verb for identically formed nouns, adjectives, and participial verb forms is absurd. The presentation of separate noun, adjective, and participial PoS sections without separate senses is useless for translations if a FL uses different words for the senses. I look forward to the result of the joint efforts of our technical magicians and linguistic theorists to practically solve the conceptually simple problem of rewording verb senses by transformation as participle, noun, and adjective.
Re: adjectivity tests: As we have been interpreting them, it is only necessary for one test to be passed (beyond attributive use). DCDuring TALK 11:41, 18 September 2010 (UTC)
It is certainly preferable to initially use the an automated system of templates, but the full coverage of separate PoS sections should also be encouraged wherever possible. This is a free Wiki system and we have all the time in the world. Whether it's "absurd" or not is irrelevant. We're already doing much more absurd things manually that could be automated by relatively simple low-cost adjustments of MediaWiki software (for example, adding the string manipulation functions would reduce the number of templates that we use by at least 1 order of magnitude, adding a lemmatizer functionality to searchbox which would index specially marked inflected forms in inflection tables which would eliminate all the inflection bot effort and the clutter in the main namespace). As of recent lots of Latin participles get the "full" treatment, and are not merely templated redirects to the base verb in the definition lines, which is no doubt a very positive development despite lots of duplication that occurs (essentially of every original verbal sense).
The presentation of separate noun, adjective, and participial PoS sections without separate senses is useless for translations if a FL uses different words for the senses. - No, because each of these nouns and adjectives would get a definition line for each of the base senses, and each sense would get a different translation table.
Why is "beyond attributive use" necessary? All those other criteria are dependent on the semantics of the adjective, i.e. whether it denotes some kind of quality or change in state. There are lots of "real" adjectives that can only be used attributively. English present and past participles are like that by definition. --Ivan Štambuk 15:40, 18 September 2010 (UTC)

"Synchronic" and "diachronic" etymologies

I've seen a few etymology sections where a synchronic derivation of the word in question was given, rather than an actual etymology based on historical forms. For example, consider the German word schlaflos ("sleepless"), which is synchronically derived from Schlaf + -los (which is exactly what is in the entry's etymology section). Let's assume the word existed back in the times of Middle or Old High German, in a slightly different form of course (it likely did, I just don't have any proof right now). That is, it was already derived from the Old/Middle High German forms of Schlaf and -los. Shouldn't that early form be listed in the etymology section then rather than the synchronic derivation (which puts the reader under the impression that the word was not derived before Modern German times)? Or is the way it is now acceptable as well?

And what about "Derived terms" sections? Should there only be terms that were actually historically derived from a word, or those that can be synchronically described as being derived (such as schlaflos from Schlaf)? Longtrend 21:36, 14 September 2010 (UTC)

Affixed and compound words are pretty open to interpretation, depending on how the speakers perceive such words. Speakers of a language may or may not be aware of a regular process to derive one from the other. If they are not generally aware of it (as is probably the case with e.g. dawn or lord) then listing the older etymology would be a good idea. However, a word like schlaflos is still readily analysable by a modern German speaker, because both the stem schlaf- and the -los suffix are still recognised and in productive use today. So I think if modern speakers are known to recognise and able 'peel apart' the word into its constituents, then a modern etymology should certainly (also) be listed. —CodeCat 22:15, 14 September 2010 (UTC)
I have recently noted this problem in English with the suffix -arian, but it seems quite general. -ism is a productive English suffix formed by simplifying the endings of words ending in -ismus and -isme. But many of the words ending in "-ism" were apparently borrowed prefabricated from other languages (Fr., De., Lat., etc)
A problem arises in the presentation of the derived terms of a suffix like -los. Those terms formed in Modern German (deemed "synchronically derived") belong under the Derived terms header. Those terms we presume to exist that were formed in an early vintage of German (deemed "diachronically derived") from a suffix spelled differently but that now have the same spelling ought to appear under the Related terms header. It would be very convenient if we could use Category:Germans words suffixed with -los to generate one or the other list reliably. At present we have a hodge-podge because the affix templates are used for both synchronic and diachronic derivations.
I think we need two categories for many suffixes to reflect the two possibilities. We would need to make {{suffix}}, {{prefix}}, and their relatives to place entries for diachronically and synchronically derived terms into the different categories. DCDuring TALK 23:47, 14 September 2010 (UTC)
The corresponding Swedish word sömnlös (sleepless) was first observed in 1635 although both sömn (sleep) and -lös (-less) are known from Old Swedish (i.e. before 1526). Another compound word, boklös (bookless) was first used in 1716, indicating that lack of sleep was considered a problem worthy of attention 81 years earlier than lack of books. This is interesting information and should be recorded where available. But if all you got for now is {{compound|schlaf|-los|lang=de}}, is this worthy of its own Etymology heading? In similar cases, I have taken the liberty to make this a parameter to {{infl}}, see for example isflak. --LA2 09:40, 15 September 2010 (UTC)
Even a simple etymology like is + flak should really go in an etymology header though, as in beeldscherm. We currently only have exceptions for terms made of multiple words, where the individual words are then linked using the head= parameter of {{infl}}. But even that not everyone agrees on. —CodeCat 10:49, 15 September 2010 (UTC)
Oh, and another thing. Affixed words should use {{suffix}}, {{prefix}} etc. So sömnlös should have {{suffix|sömn|lös}}. —CodeCat 10:52, 15 September 2010 (UTC)
There should be an aesthetic rule that requires more content than headlines. Since is+flak is just 7 characters, I find it repulsive to introduce the 9 character Etymology heading with nothing more than this underneath. --LA2 14:00, 15 September 2010 (UTC)
I'd say we don't want to get to 'anal' about this. One approach is to use both such as:
{{etyl|enm}} {{term|manhede||quality of being a man|lang=enm}}, corresponding to {{suffix|man|hood}}
Mglovesfun (talk) 11:01, 15 September 2010 (UTC)
I think it is a bad sign for a lexicographer to be concerned about "going anal". I am not sure what disciplines are more intrinsically "anal" than lexicography. DCDuring TALK 15:29, 15 September 2010 (UTC)
The markup "{{etyl|enm}} {{term|manhede||quality of being a man|lang=enm}}, corresponding to {{suffix|man|hood}}" looks good to me. --Dan Polansky 11:40, 15 September 2010 (UTC)
I suppose each language can do what it wishes according to its stage of lexicigraphic development, but it degrades the value of the "[language] words affixed with [affix]" categories for the single category to include both diachronic and synchronic. I had been so enamored of the diachronic approach that I had been purging derived terms of the synchronic related terms and eliminating hard-coded DT lists in favor of category listings (for which there is a specific template to introduce the listing to the entry without requiring a jump to the category page itself). I now appreciate the value of the synchronic/morphological derivation and believe that it should enjoy the same presentational advantages as the synchronic in the entries for the affixes, at least in English. DCDuring TALK 12:06, 15 September 2010 (UTC)
Manhood should at least link to -hood in some way. Mglovesfun (talk) 12:27, 15 September 2010 (UTC)

This dichotomy is false: every single compounded word has a "real", diachronic etymology. It was (usually) coined (or borrowed) by a single source (if you go long enough back in time, every word was a neologism once) with a specific meaning, whence it spread. The fact that the word was formed as a morphologically transparent combination and that its meaning can be derived effortlessly from the semantics of the compositional morphemes is irrelevant. The sad state of thing is, that etymologists (and etymological dictionaries) largely ignore the origin of such words, as they are considered "unworthy" of study because they don't play a prominent role in comparative (historical) linguistics (they rarely undergo interesting semantic shifts, there are no cognates to compare them to etc.). There is also the possibility of such words often being coined separately in various points in spacetime, given how intuitively easy it is to form them, but IMHO we should simply ignore that problem focus on the earliest attestation and assume that all the later attestations, unless there is hard evidence to point otherwise, stems from the same source. ===Etymology=== sections should contain both of these, in a format similar to Mglovesfun outlines above. There is no problem if only the morphological etymology is given at first (which is likely to be the case in most of words), as those are very useful to language learners. --Ivan Štambuk 07:50, 18 September 2010 (UTC)

I agree with Ivan here. I have one quibble: Let's use "equivalent" instead of corresponding to; that way, we can also have "equivalent to the Latin / Ancient Greek [common etymon of most of the term's 'cognates']". — Raifʻhār Doremítzwr ~ (U · T · C) ~ 09:18, 18 September 2010 (UTC)
Our current category-naming convention for affix derivatives and its use by the affix templates needlessly confounds the language-learner/"synchronic" and the "diachronic". To take Category:English words suffixed with -arian as an example, it is unclear as to whether it refers to:
  1. suffixation as a historical occurrence in the lexicon of a given language (possibly arbitrarily distinguished from its predecessor) ("diachronic") OR
  2. suffixation as a notional reconstruction ("synchronic") or a translingual diachronic phenomenon, treating, say, the formation of totalitario#Italian < totalità#Italian + -ario#Italian as somehow equivalent to totality#English + -arian#English. (egalitarian < égalitaire#French + -ian is similar.)
One advantage of having two categories for words with a given affix (one for true derivatives of the affix, another for the otherwise derived terms) is that we could provide better quality information on the derivation of productive suffixes. The existing category names such as should be split into two categories. I think the names might be "Category:English words derived [using|from] -arian" and "Category:English words with suffix -arian". Members of the "derived" category would appear under "Derived terms". Members of the "with" category would appear under "Related terms".
The second category is, as Ivan says, useful to language learners, though there are many false trails that remain in our English entries. Our entry for -arian actually showed barbarian and Bulgarian as derived terms. The first category is essential to show the actual productive use of an affix in a given language (or vintage of language, (eg, Old French). DCDuring TALK 11:11, 18 September 2010 (UTC)
That distinction is quite odd IMHO, and should be ignored. Every word having the suffix X is by definition a word derived by a using a suffix X. That the base lexical stem used for deriving is a native one, functioning by itself as a full-blown word, or hypothetical, adapted from a foreign language but only as a bound morpheme, does not seem to me a factor relevant enough so that the distinction should be introduced. --Ivan Štambuk 15:15, 18 September 2010 (UTC)
The distinction in the case of -arian#English is between words ending in "arian" that are the source of the now-productive suffix -arian and those that were produced in English by suffixation with "-arian" as a matter of historical "fact". A simpler case is -ism < words borrowed from other languages that came to spelled in English ending in "ism" (eg words ending in -isme or -ismus.) DCDuring TALK 15:33, 18 September 2010 (UTC)
I'm aware of the distinction that you mention, but I think that that distinction is not important. In each of these cases the source (author which borrowed a word from the foreign source) used an already productive native English suffix on the basis of a native or foreign but adapted lexical stem. It used native and productive English suffix -ism, not foreign -ismus or -isme. They should be IMHO all treated equally. --Ivan Štambuk 15:48, 18 September 2010 (UTC)
This is true for these particular instances of late incorporation into English (totalitarian (20th C) and egalitarian) (18th-19th C). But from where did the productive suffixes -arian (and -ism) arise? A high point of productivity for -arian seems to have been in the 18th and 19th century. I have not completed my examination of the cases (eventually to be presented in an Appendix [working title: English words ending in "arian"]). I expect that there will be a number of words that, before any use of -arian as a productive suffix, were:
  1. borrowed whole into either Middle or Modern English ending in "arian" (or a near equivalent) or
  2. borrowed with a suffix -ary (or near equivalent) to which -an was affixed in English (eg, contrary, library)
Other routes into English from Latin or Old French are likely as well.
I find it implausible that the productive use of these suffixes derives from imitation of another language's use of, say -isme or -ismus, rather than imitating English words ending in -ism, whatever their source. DCDuring TALK 16:22, 18 September 2010 (UTC)
Most of these words, especially learned words with classical (Greek, Latin) origins, were introduced by sources that were quite conversant with the language(s) of origin and mediation, as well as (either intuitively or through formal grammarian coverage) with the general principles of English grammar, word-building in particular. These were not true "borrowings", in a sense of words getting borrowed in a spoken language with physical persons communicating, but rather scholarly adaptations to fit the structure of the destination language. So I find it quite plausible that these suffixes derive from imitation: they were introduced and later reinforced by imitation, and eventually became productive on native stems too. Of course, words derived in -arian are not the same types of derivations as those built with -an but a stem ending with -ary, and these should be treated separately. But that's a different issue altogether: what I'm saying is that there is no need for a distinction on derivations from the same suffix but with the base stem being a "native" word, or adapted from a foreign word. --Ivan Štambuk 18:02, 18 September 2010 (UTC)
It may in principle be true that ,every single compounded word has a "real", diachronic etymology.. (Stambuk), it is not true that this diachronic etymology is always known. In fact most of the 4000 languages (or so) of this planet have such a poor historical record that one could say that the diachronic is usually not known. Jcwf 17:01, 18 September 2010 (UTC)
re: Ivan: "scholarly adaptations": I think you may overestimate the importance of scholars in such word formation, unless you include pamphleteers, newspaper journalists, university students, and secondary-school students. But, I don't have the facts at my disposal to make a convincing argument. I am equally unfamiliar with the facts that support your argument. You seem to be suggesting a trickle-down theory of word formation, from highly education elites to non elites. DCDuring TALK 18:35, 18 September 2010 (UTC)
There are about a dozen globally relevant languages that have a chance of survival by the end of this century, and they all have an excellent historical record, undoubtedly reflecting a very resistant and cohesive culture associated with them. The rest will eventually go the way of the dodo, and until they do we can do our best to provide as thorough etymologies as possible for them too. For less documented languages in particular, this will entail doing lots of original research for morphologically transparent derivations, in their older attestations, if they exist. That work would be very boring and time-consuming. If they don't exist because the language was documented only recently - well who cares, it's probably not worth the effort anyway, and the chances of it being of use to somebody are infinitesimal. --Ivan Štambuk 18:02, 18 September 2010 (UTC)
Much of that is true Ivan. My point was merely that for those poorly attested cases and even for better attested ones where no one has done that boring and time-consuming effort (yet?), the best we can often provide is a structural analysis of the current word. I would not even call that synchronous etymology. In fact this is why at nl.wikt we have changed the heading "etymology" into "afkomst en opbouw", roughly: "origin and structure". That covers the kind of info we provide under the header a bit better. Jcwf 22:30, 18 September 2010 (UTC)

How's this for the combined format? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 23:38, 18 September 2010 (UTC)

Is there any evidence that δωδεκαρχία (dōdekarkhía) existed in Ancient Greek? Even if it existed in a later vintage of Greek would that be a likely source for an 18th C writer? Whereas I am reasonably sure that, at that time (assuming the coinage to be around them), both the suffix and prefix would have been productive among well-educated Britons. DCDuring TALK 01:04, 19 September 2010 (UTC)
And the ambiguity of "prefixed by" and "suffixed by" in the category name reamins. DCDuring TALK 01:12, 19 September 2010 (UTC)
That was the information the OED gave; my knowledge of Ancient Greek is vastly insufficient to comment on the reliability of its assertion. For the category names, would you prefer Category:English words which feature the prefix dodeca- and Category:English words which feature the suffix -archy? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:37, 19 September 2010 (UTC)
When OED say such a thing, do they mean that they found it to be attested it; that the English coiner said that he coined the term on that basis; that such a coinage pattern was "in the air" at the time?
If dodecarchy were formed in English, then my ideal category name would be something like "English words formed using the suffix -archy". If the word were borrowed whole from Ancient Greek, then I would rather "English words with the suffix -archy". Both would need some explanation at the category page to make the distinction I would like to make between historic formation in English and other modes of formation now correctly analyzed as if formed in English and having the same meaning. This kind of distinction would neatly correspond to a distinction between Derived terms and Related terms. DCDuring TALK 16:25, 19 September 2010 (UTC)
The OED's etymology section for dodecarchy reads "f. as prec. + Gr. -αρχία rule…"; the "prec." entry is dodecarch ("dodecarch, dodek-"), whose etymology section reads "ad. Gr. δωδεκάρχ-ης, f. δώδεκα twelve + -αρχης ruler."; interpret that as you will. Your category name for borrowings is, IMO, ambiguous, implying that terms included in that category were formed with that suffix. I tend to agree that the distinction you wish to maintain is a useful one; perhaps we need different templates from {{prefix}}, {{suffix}}, {{confix}}, &c. for synchronic equivalents. BTW, see hoplochrism and heterarchy for the same combined format, but vice versâ. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 18:38, 19 September 2010 (UTC)
I am loath to put in more effort on such "etymologies" until this is resolved. The existing approach fails to provide categories that match the needs of our headings, requiring increasingly laborious manual entry of derived terms and related terms for affixes. Unlike those who, seemingly expecting literal immortality, are phlegmatic about such matters, I am hoping to see en.wikt achieve some milestones of competitive accomplishment during my lifetime. Oh well, base words need a lot of work, too. DCDuring TALK 19:41, 19 September 2010 (UTC)
That outlined in #Extending etymological autocategorisation may be a means of achieving what you want. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 21:19, 19 September 2010 (UTC)
  • Would it make sense to use our existing templates {{prefix}}, {{suffix}}, {{compound}}, and {{confix}} and their associated categories for "diachronic etymology" and use a specialized morphological-component template for display of morphemes, "synchronic etymology", aka "morphology". (See referentiality for an example (which lacks categorization of the second term, "-ial". This would be better for gradual roll-out than revising these widely used templates. DCDuring TALK 15:54, 24 September 2010 (UTC)
See {{morph}} for a straightforward approach to presenting morphology and referentiality and referentially for a live demonstration, but with some categories red-linked. Reactions, thoughts, opinions, criticisms, recommendations welcome. Brickbats, vituperation, etc, accepted and returned in kind. DCDuring TALK 17:54, 24 September 2010 (UTC)
To add my 2¢ – thank you all for your efforts!
…and having distinct etymology/{{morph}}-ology (as DC proposes & demonstrates) seems v. valuable for providing both these data and making this distinction clear.
DC & I have had some discussion here: (UT:Nbarth: affix entries) (archive)
…of which one useful point is: novice contributors/lexicographers usually intuitively give surface analyses (synchronic/morphological, not diachronic). Even though we also want history, as a social matter we should expect many surface analyses, and accommodate this.
Also, as a complicating matter, many classical compounds are formed in English on the basis of Ancient Greek/Latin terms, using Anglicized forms which are not recognizable as distinct morphemes. A good (hybrid) example is genocide-cide (kill) is recognizable, but *geno- (in the sense “type, genus”) is not recognizable independently in English.
It may be worth thinking about how to phrase and code these (e.g., “On model of (Greek) + (Latin)” [Sorry for the hybrid example.]); perhaps {morph} can handle these?
—Nils von Barth (nbarth) (talk) 18:43, 24 September 2010 (UTC)
Note that a morpheme is "The smallest linguistic unit within a word that can carry a meaning, such as 'un-', 'break', and '-able' in the word 'unbreakable'." [emboldenment is my emphasis]. In the case of those referential- words, refer can be broken down further to re- + fer; however, in that case, only re- is an English morpheme, the other is a modified Latin verb. (Well, fer may be a morpheme after all: The OED lists †fer, v., but states that it is "App. meaningless"; it gives no etymology, but cites two uses, of which the texts are "Boy. He sayes his Name is M. Fer. Pist. M. Fer: Ile fer him, and firke him, and ferret him." [1599] and "I..could haue ferd and ferkt y'away a wench As soon as eare a man a liue." [1611]. Even if all the morphemes in the referential- words turn out to exist in English, there will be some words for which that isn't true, so the example is still useful.) We must either find a way to handle these foreign-language morphemes or use a different word from morpheme. In the case of genocide, the OED states that it was formed as γένος (génos) + -cide; in that case, adding a "synchronic" etymology to that "diachronic" one would add nothing (whilst any entry formed by affixation should eo ipso be a member of the corresponding "with this morpheme" category). As a minor point to end, I don't like the Category:English words with morpheme: … formatting — something like Category:English words featuring the morpheme would be better; at the very least, the definite article is necessary, for without it, morpheme appears to be a mass noun, and seemingly some kind of unpleasant disease, at that. All the foregoing criticisms notwithstanding, I congratulate you on your hard work, and wish to note how pleased I am with the progess that has been made thus far. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:00, 25 September 2010 (UTC)
In the case of "classical" morphemes, it is clear that they now often have a life of their own. Gen, -gen, and possibly gen could be considered an English morpheme, whereas fer cannot. The difference seems to be in whether speakers and writers can ascribe meaning to them. In any event, there is not reason to let the issue prevent implementation. The main objective has been to produce useful categories for derived terms and related terms. A "Related terms" category for "refer" is useful. One for "fer" that prevented us from having one for "refer" would not be. I welcome suggestions for an alternative term to morpheme. One possibility is that we have a third template, perhaps called {{base}}, for terms not properly called morphemes. Said template could assign the term to a category named as those named by {{derv}}. I do not want to make changes to category names that have been created until there is more input. But the omitted "the" needs to be changed ASAP. DCDuring TALK 11:35, 25 September 2010 (UTC)
The one concern I would have with this is the intuitivity of the category names — the simpler the better, and easier to remember for the average user. Category:English words not directly derived from, but containing: refer seems a little long to me, although I dont know how exactly to shorten it while retaining the same meaning. — lexicógrafo | háblame — 12:28, 25 September 2010 (UTC)
At this prototyping stage, I'd rather have a verbose name, but it is clearly not ready for prime time. I have a feeling that there is a way of radically simplifying this.
The names of the templates need to be short, 4-8 keyboard characters, and should map to something that makes sense to a new contributor. {{derv}} is the only one of the three new ones that seems adequate. {{base}} is not good because it is not to be applied to all "bases", only those not better covered by {{derv}}. Perhaps {{relative}} or {{rltv}} would convey the purpose of generating membership in a category suitable for use under "Related terms". This however presupposes that the contributor grasps the importance of that.
The original {{morph}} seems unsatisfactory as it cannot work for many morphemes. Some morphemes will have already been covered by {{derv}} or {{base}} (or a new alias). In addition, at least for a transition period, {{prefix}} and {{suffix}} (with only one argument}} are superior to any of the new templates as they feed the existing category structure. Lastly, it is not at all clear that there is any advantage to having category membership for "referentiality" in categories for "-ence", "-ent", and "-ial". It might be possible to allow use of a non-categorizing multiargument version {{morph}} to kick off a conversion process for an entry, with its transclusion listing being treated as a cleanup list. It could assign the entry to hidden categories to avoid massive redlinking and cluttering of the bottom-of-the-page category section.
The category names don't have to be short, but 48 characters (excluding the word and the language name) seems too long. One possibility would be to use the words we have long used for headers: "Derived terms" and "Related terms". A normal human might expect "derived terms" to be a subset of "related terms", but we have always tried to treat membership as disjoint. My experience tells me the contributors haven't always grasped the distinction. (Let's ignore the overuse of "See also".) DCDuring TALK 15:09, 25 September 2010 (UTC)
Derived terms does indeed read like it would be a subset of related terms, much like you'd expect your children to be a subset of the persons you label your family.
I suppose the question, as far as category names, would be, what will Joe Average be most inclined to look for and/or recognize as the information he is looking for. To me, Category:English terms containing but not derived from: refer (intentionally sans punctuation) would be somewhat simpler. Another question — will Joe Average know the term 'morpheme'? Probably, but is there another, simpler, more familiar word that can be substituted there and still retain the same meaning? "element"?
The template names should ideally be intiutive as well, even if it exceeds the 4-8 character limit — people will more readily recognize {{morpheme}} or {{derived}} for what they are, than something cryptic like {[temp|mrphm}} or {{drvd}} or something like that. — lexicógrafo | háblame — 15:35, 25 September 2010 (UTC)
On the next iteration, I will implement what can be implemented, including your suggested rewording.
I think we can have long template names that redirect to shorter ones so that contributors would soon learn the faster-to-type short names.—This comment was unsigned.
Cool. — lexicógrafo | háblame — 16:18, 25 September 2010 (UTC)
  • {{derv}} has been field-deployed in a hundred entries or so, together with {{base}} and the now-nearly-non-functional {{morph}}, and {{prefix}} and {{suffix}}. In this deployment, occasionally {{compound}} and {{confix}} have been replaced. I have edited but not replaced {{prefix}} and {{suffix}} as they function adequately in a one-blank-argument mode. Thus, the set of categories fed by those templates has been added to.
{{morph}} is used only for morphemes (clitics, prefixes, suffixes, prepositions, conjunctions) that are never (?) bases and are not part of the direct derivation of the entry. No categorization of the entry occurs with respect to such morphemes. {{base}} could be used as a model for a template to replace {{morph}} in cases where someone is interested in the full set of terms that contain the morpheme and no just it's direct derivatives.
The next step is to modify {{derv}} and {{dervcat}} to allow categorization into categories designed specifically for Etymology- or PoS-specific Derived terms headers. DCDuring TALK 18:32, 28 September 2010 (UTC)


In the article carbon, some definitions are marked "informal" (e.g. slang for carbon copy). One user remarked that some translations of these definitions don't appear to be equally informal. So is that an error? Should these translations be removed? And what is the point of adding such HTML comments, where nobody will see them? --LA2 13:23, 15 September 2010 (UTC)

Before the innovation that allowed form-based additions of translations, translators would have seen them when adding a translation via the edit box. Now only braver souls such as you see them. As there is another entry that is the appropriate home for such a term, eg, carbon copy, the translations belong there. It is quite possible they were copied from there. It is possible that younger contributors might not be aware of dated informal terminology, even in their native language, and of their lack of awareness. Instead of deletion, marking the term {{a|formal}} might suffice. DCDuring TALK 15:42, 15 September 2010 (UTC)


Just built today an Instant version of Wiktionary: http://Instantionary.com

Already spoke to a lot of folks who never heard of Wiktionary before but who are now using it due to Instantionary.com! I'm very interested in all your feedback :) If you like it, spread the word (blogs?twitter?facebook?)! We need this thing going viral and getting more folks to visit Wiktionary.

  • Rubbish. Search for "druse" gives the German Drüse rather than the English druse. SemperBlotto 15:37, 15 September 2010 (UTC)
    • Will fix that soon with an Exact Search Only toggle below the search bar (like PMinstant.com has). Drüse is the autosuggest popping up.--MeProtozoan 15:49, 15 September 2010 (UTC)
I don't get it. How is this different from just adding a search engine lookup in Firefox? ---> Tooironic 12:13, 16 September 2010 (UTC)
Crossbrowser proof, no install and easier for users to use :)--MeProtozoan 12:20, 16 September 2010 (UTC)
See also http://ninjawords.com — they are much faster but only show definitions (and not only from Wiktionary). Conrad.Irwin 23:27, 23 September 2010 (UTC)
(I didn't mean to belittle what you've done — I think it's great). Conrad.Irwin 23:30, 23 September 2010 (UTC)

Swadesh lists on main page — a "WikiVocab" project

Hello, I'd like to request placing Appendix:Swadesh lists (it contains vocabulary word lists for many world languages) along with "Appendices • Abbreviations • Thesaurus • Rhymes • Frequency lists • Phrasebooks," or on some other part of the main page. I'd just like to find some way to increase traffic to the area so that more people can know about this great resource. It's a tremendously important part of Wiktionary that many users have found to be really helpful. I'd also be OK if the Wiktionary community opposes; however, it's one of the best resources on the web for learning vocabulary, comparing/preserving/promoting languages — simply indispensable.

My dream is for there to be a big database on the Internet where anyone can access the basic vocabulary words (in standardized topical lists) of all the world's languages. Wikipedia has information on the grammar and demographics of languages, but does not often include vocabulary, which is the core and essence of language. The closest things we have to a massive comparative database on world languages are the Austronesian Basic Vocabulary Database, Intercontinental Dictionary Series, and of course, Wiktionary's Swadesh lists. As a side note, even though this is basically the Rosetta Project's goal, the website is still quite unwieldy for ordinary users, has a very low Alexa site ranking, and does not allow wiki-style contributions. The Rosetta Project has also pulled off Swadesh lists that used to be on there, and do not have any searchable vocabulary databases as of now. To help in language preservation, comparative linguistic studies, language learning, and more.

Or perhaps we can even create a separate "WikiVocab" website, similar in style to WikiSpecies! If we do create a big, unified, and searchable database for all the world's languages — all in one place — I believe it will be one of the greatest human achievements in modern times.

Thanks for your considerations! — Stevey7788 08:06, 16 September 2010 (UTC)

Note: The main Swadesh list page looks a lot different now. You can see for yourself. — Stevey7788 08:37, 16 September 2010 (UTC)
See also http://meta.wikimedia.org/wiki/WikiVocab
I could see a link being added to that list "Appendices • Abbreviations • Thesaurus • Rhymes • Frequency lists • Phrasebooks" that appears in the box right at the very top. -- Prince Kassad 11:52, 16 September 2010 (UTC)
A noble endeavor, but given that most of the world's languages are growing extinct almost as fast as the remaining non-microbial species, its long term-prospects remain grim. With less than a dozen languages you can get by on >90% of the globe, and these should be the focus of any multilingual basic-vocab projects. --Ivan Štambuk 09:03, 18 September 2010 (UTC)

More Fundraising

Greetings, Wiktionarians! I'm back.

The Wikimedia Foundation is getting ready for our annual fundraising drive, and we need Wiktionary to help! We're currently testing testing messages on the English Wikipedia each Thursday until November to work with the global communities to provide the best and most successful messages written by us all. This year we will be able to localize messages for specific projects, such as Wiktionary. We urge everyone to have a look and suggest banners for Wiktionary. You can also help in translating messages, or getting together with other Wikimedians on other projects to localize messages.

Let's look forward to an actively engaged fundraising season! Keegan, WMF Fundraiser 2010 19:45, 16 September 2010 (UTC)

Good stuff! Keep it up. — Stevey7788 06:49, 17 September 2010 (UTC)

SAMPROSA (pronunciation)

Does anyone know anything about this? It's used on a few Vietnamese entries (che, cam, ) and could use a template. Nadando 23:46, 16 September 2010 (UTC)

See SAMPROSA. As of yet, Wikipedia has no article with treats the subject, and SAMPROSA is mentioned in only one place: w:X-SAMPA. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 23:58, 16 September 2010 (UTC)


Prince Kassad and I were discussing by IRC today about language families and their codes.

Specifically, family codes are used in etymologies and may be used in categories too. However, most codes come from ISO 639-5, that is uncomplete in comparison with the scope of Wiktionary.

In LANGCODE, there are some exceptional codes, made up from ISO codes, followed by a hyphen and additional letters, such as cel-gae for Gaelic and cel-bry for Brythonic.

In that conversation, I had the idea of creating codes for all the remaining families and he had the idea of, when no prefix can be derived directly from ISO, using "qfa". One example of affected family would be the Torricelli, with the code qfa-tor. --Daniel. 19:08, 17 September 2010 (UTC)

What does this qfa mean? --Ivan Štambuk 08:51, 18 September 2010 (UTC)
The entire range from qaa-qtz is reserved for private use. qfa was chosen for easy memorization (fa = family). -- Prince Kassad 09:22, 18 September 2010 (UTC)
Upon a quick research, I believe the codes to be created include:
--Daniel. 10:59, 19 September 2010 (UTC)
Volta-Congo can use nic (Niger-Congo) as a prefix. We would also need a code for Undetermined languages. -- Prince Kassad 12:06, 19 September 2010 (UTC)
You're looking for {{und}}: undetermined. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:30, 19 September 2010 (UTC)
I have created them all, including the Template:etyl:nic-vco as suggested by Kassad, by using Template:etyl:cel-bry as the model. I intend to create these later:
--Daniel. 04:02, 20 September 2010 (UTC)
This list of codes is done. As a courtesy for other people, the final results, included Template:etyl:qfa-min for Misumalpan and Template:etyl:sai-car for Cariban. New family codes still have to be created. However, I'm not going to list them here anymore: they will be naturally listed at WT:LANGCODE eventually. --Daniel. 10:46, 23 September 2010 (UTC)

Greek adjectives - degrees of comparison

The comparative form in Modern Greek gives the relative superlative when preceded by the definite article. It is undesirable to create separate entries for all relative superlative forms, I think that these should be indicated in the comparative entry, alternative ways of illustrating this are shown at περισσότερος and ακριβότερος γεματότερος , I prefer the latter. Are there any views on this preference. —Saltmarshαπάντηση 07:05, 18 September 2010 (UTC)

As a Portuguese speaker, a language that also employs the relative supelative by means of a separate word, I can roughly make a comparison with Greek (a language that I virtually don't speak) and opine that separate entries for περισσότερος and ο περισσότερος are indeed not desirable. Yet, the relative superlatives would look better if listed in the lemma form of each Greek adjective. --Daniel. 07:32, 18 September 2010 (UTC)
It will appear in the declension table, as these are created. —Saltmarshαπάντηση 12:10, 18 September 2010 (UTC)
I think it would be useful to indicate the relative superlative form in both the lemma and comparative form entries. I suppose that if I were an English native speaker learning Greek, I'd like to see two translations in the comparative form entry. E.g. in περισσότερος
  1. more
  2. (relative superlative) most
    Οι περισσότεροι άνθρωποι ...
    Most people ...
--flyax 11:14, 18 September 2010 (UTC)
If all Greek comparatives can be used as relative superlatives, perhaps this information should be displayed in simply one definition line, like this possibility for περισσότερος:
  1. Comparative (more) or relative superlative forms of πολύς.
--Daniel. 12:29, 19 September 2010 (UTC)

I would favor a "Usage notes" section, possibly with a templated message to explain this issue. --EncycloPetey 14:54, 20 September 2010 (UTC)

I think perhaps I am making things too complicated. (1) A basic knowledge of the language should be assumed (2) All the forms are to be (or will eventually be) found displayed at the lemma (positive) entry which will show in simplified form:
positive απλός
comparative πιο απλός or απλούστερος
relative superlative ο πιο απλός or ο απλούστερος
absolute superlative απλούστατος
(1) SO: entries should only be created for the word forms (shown as linked above).
(2) When a word form exists any a synonymous terms should be shown like we do for simpler giving for απλούστερος > πιο απλός.
(3) This means that in most (or all?) cases the relative superlative will be ignored, except in the declension table.
any views please —Saltmarshαπάντηση 15:49, 20 September 2010 (UTC)

Vote on number words

FYI, the vote Wiktionary:Votes/pl-2010-06/Number vs. numeral has started yesterday. I am notifying of this here for the case that some readers of Beer parlour do not monitor WT:Votes. --Dan Polansky 09:59, 18 September 2010 (UTC)

Finding a language's ISO code

One of the major problems with trying to edit Wiktionary: We use ISO codes for everything, yet we have no way at all for newbies to figure out what the ISO code of a language is, making it very difficult to contribute. I think we need to have some way to translate language names into ISO codes, perhaps with a Javascript tool in the sidebar, or a specific "language-to-code translating page" with an easy-to-find link, or some basic (and easy to find) procedure for finding out a language's ISO code. Thoughts, ideas, potential problems? --Yair rand (talk) 06:33, 19 September 2010 (UTC)

For codes used in the edit box when editing a non-English language section a comment containing the code for the language in question could appear right after the L2 header. This could be bot-inserted.
If that is too inelegant, could we have context-sensitive toolbox links to newbie-oriented help pages and to a language-code lookup tool or an Appendix table or even a WP table?
The latter suggestion would enable us to offer various kinds of help to newbies and to more experienced editors, including lists of relevant templates, language-specific "about" and discussion pages, lists of contributors in the language, pages (existing or new) with help for specific L3/L4 headers etc. DCDuring TALK 08:58, 19 September 2010 (UTC)
When I'm wikifying a translation table, I quite often have to look up at least one or two languages. Usually if you look up the name of the language, such as Middle High German, it tells you on the page what its code is. So that relies on me opening a separate page. Mglovesfun (talk) 09:04, 19 September 2010 (UTC)
One way is using categories for that. If you read Category:Latin language, Category:Old Swedish language and Category:Moldavian language, you'll find la, gmq-osw and mo, which are not always found in ISO. --Daniel. 09:11, 19 September 2010 (UTC)
For experienced users it helps if there is a "guaranteed" way to find a code, preferably one step. MG's way is close. If one knows the language name or a covered hypernym of the language, one should be able to find the code, usually in one step. But newer users might not know to look under the external links section of the English language portion of the entry to find the code.
Almost all of us (a subset of all regular contributors) have ways of solving this problem and others like it. The problem for a new would-be contributor is more severe and extends beyond not knowing the language code. If a user expects contributing to be easy, disappointment awaits. We should be able to help new contributors with some tasks without insulating them too much from the edit box. I think the way we handle translations has enormous net benefits, but it may keep users from learning about the edit box. DCDuring TALK 09:51, 19 September 2010 (UTC)
Have we got a "list of supported languages" somewhere? I think I've seen a special page listing every language and how many entries it has. This could be more prominently linked and include the ISO codes. Equinox 11:10, 19 September 2010 (UTC)
You mean Wiktionary:Statistics? -- Prince Kassad 11:16, 19 September 2010 (UTC)
Wiktionary:Statistics displays the warning "Of the 744 languages on Wiktionary, only the 400 with 10 or more entries are shown." So, in its current state, the page is incomplete anyway. Even if it is changed to include language codes in the future, it would not be reliable for editors of Kenyan Sign Language and Komi-Permiak. --Daniel. 12:13, 19 September 2010 (UTC)
  • WT:Index to templates/languages (and WT:Etymology/language templates if you want those + the etyl: codes). The former was bot-updated by RU, but as he hasn't been around for a while maybe I'll update it so it has the more obscure, recent additions (and I could add the alternate names). Maybe we can link to this from the edit screen? --Bequw τ 17:31, 19 September 2010 (UTC)
    I've updated these lists. Is there good way to link to them from the edit window? They are quite large (as we have >6k codes). --Bequw τ 04:57, 25 September 2010 (UTC)

I'd actually direct the readers to Google. It seems to be a very effective way to search for language codes if you include the keywords "ethnologue" or "linguist list". -- Prince Kassad 18:34, 19 September 2010 (UTC)

A Wiktionary or Wikipedia search for the language's name usually does it for me. —CodeCat 18:52, 19 September 2010 (UTC)
The question is how newbies should find the ISO code. If going to the language name's Wiktionary entry is the proper way, we should give those instructions when "ISO code" is mentioned. If there's a specific list that should be used we should link to that. Without any clear instructions on how to find an ISO code, we're blocking about 85% of possible contributors from helping in any area that requires a code. --Yair rand (talk) 19:02, 19 September 2010 (UTC)
(One possible solution: Something like User:Yair rand/languagenametocode.js (available in PREFS).) --Yair rand (talk) 04:32, 20 September 2010 (UTC)
I took a quick look at it but had to cut short my experiment as my computer required a hard reboot for some reason. I realize that this is a demo. It should be placed in the experiment section to discourage folks from using it prematurely. I am not willing to test it further until I have comfort that my hard reboot was probably unrelated.
Specific observations:
  1. I found that I could not remove the box from the position it assumed on the page.
  2. Such items as "Old Swedish", "anglosaxon", "Anglo-Saxon", "Medieval Latin" yielded "not found". As experienced editors may already have ways of finding codes that are not problematic, it would greatly increase the chances to rapidly fully test this in a variety of user environments to have such items covered. It would also prove more useful when fully deployed. The source of the data may have to be some kind of ad-hoc master table rather than a pre-existing one. The table your look-up now uses excludes alternative names for look up and does not provide assistance with items that {{etyl}} supports, such as the various vintages of Latin. DCDuring TALK 10:49, 20 September 2010 (UTC)
I've put a small "X" icon on the right that closes the box and I've added the contents of WT:LANGNAME to the script, allowing it to work for additional language names. I highly doubt your hard reboot was related. --Yair rand (talk) 06:11, 22 September 2010 (UTC)
A data point for what it's worth: To find a langcode, I usually check the page [[category:Langname language]].​—msh210 (talk) 15:58, 20 September 2010 (UTC)
That would not yield a good result for the legacy codes supported by {{etyl}} for vintages of Latin, for example. Nor for some alternative language names. It does offer the advantages of having the language code, if provided, at the top of the page and providing clues, eg, for specific hyponymic codes in the form of names of subcategories, and also linked categories or supercategories. One might eventually look at a sample of entries to find codes if all else fails. DCDuring TALK 16:37, 20 September 2010 (UTC)

1, 2, 3...

Can someone please tell me what is the script of the following ten characters?

1, 2, 3, 4, 5, 6, 7, 8, 9, 0

Would it perhaps be Arabic? Devanagari? Mathematical notation? Undetermined? Zsym? --Daniel. 07:44, 19 September 2010 (UTC)

I don't suppose there is something as 'multilingual' script... —CodeCat 09:04, 19 September 2010 (UTC)
I'd say Zinh (inherited script). -- Prince Kassad 09:05, 19 September 2010 (UTC)
Definitely not Arabic - that would be ١ etc. SemperBlotto 09:07, 19 September 2010 (UTC)
Which, ironically, Unicode considers "common", which suggests they're used in something other than Arabic too. -- Prince Kassad 09:21, 19 September 2010 (UTC)
Aren't they usually called the Hinduarabic numerals? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 13:28, 19 September 2010 (UTC)
Well, there isn't a code for "Hinduarabic" that I know of. These numbers seem to be a subset of "inherited" as commented above. However, Wikipedia says at Mathematical notation that "Mathematical notations include relatively simple symbolic representations, such as numbers 1 and 2, function symbols sin and + [...]", which seems, to my mind, not totally fit for the goal of organizing characters because it blatantly uses Latin letters for its symbol sin and apparently includes ١ and 十, and everything used individually in mathematics.
A possible solution for organizing the ten symbols from 0 to 9 (and, by extension, the fullwidth 1234567890) and formatting them, if this distinction is desirable, could be creating a new script code, like Numb. Or Qanm. --Daniel. 06:03, 20 September 2010 (UTC)
The only use for the script codes (Latn, Cyrl, etc.) that I'm aware of is that they format the characters specific ways: they specify font faces and, when used in inflection lines, boldfacedness or font-size. That is, the script templates are practical, not inherently beneficial (AFAICT. Perhaps someone can correct me by pointing out some inherent benefit). The characters '1', and '2', then, don't need a special script template: we can use {{Latn}} (or {{unicode}}, or {{Zsym}}, or none).​—msh210 (talk) 15:30, 20 September 2010 (UTC)
There is the purpose of organization, as explained at WT:SCR. Please compare these specific examples:
I would prefer other, more natural title, like "Hindurabic numerical script characters". --Daniel. 19:20, 20 September 2010 (UTC)
A better category would be nice, but I would oppose new script codes if there is no practical (eg rendering) reason. --Bequw τ 01:35, 21 September 2010 (UTC)
Please see Category:Latin script. It includes a description and subcategories that share common characteristics for the goals of organization and consistency. You could, for instance, look at Category:Cyrillic script to find out that its code is Cyrl.
The related templates can be extended to create categories without codes, or their behavior can simply be templatelessly copied into Category:Hinduarabic numerical script and its future subcategories. However, what practical reason do you expect for the creation of a code for 12345, other than fulfilling the set of script names and their codes? As stated above, their formatting is expected to be identical to Latn, which can be done by simply they sharing the same CSS class.
WT:SCR and MediaWiki:Common.css display various scripts created especially for use in Wiktionary; upon reading them, you may notice that Persian and Kashmiri have identical but separate codes, fa-Arab and ks-Arab. As an analogy, I personally think it is easier to remember using fa-Arab for one language and ks-Arab for other, than hypothetically remembering that fa-Arab can format two or more languages, which would also require certain technical savviness. --Daniel. 02:40, 21 September 2010 (UTC)
IMO, it would be worth having a specific script template for these numerals if having one would allow us to cause them to display like they do in the second, third, and fourth lines of this image; when written like that, they're more readable and æsthetically pleasing than their ordinary counterparts of uniform width and height (especially when they occur in running text). — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:13, 22 September 2010 (UTC)
Yes, formatting the discussed numbers as shown in File:Nicolaus Kesler, about 1486.PNG is possible, by choosing the right fonts for the right script code. More specifically, I'm not certain if I would agree with using the second or third fonts from that image as the standard for Hindurabic numerals in Wiktionary; however, if such code exists, you may always bypass any standard Wiktionary font by choosing how you want them to be displayed in your computer individually. --Daniel. 13:13, 22 September 2010 (UTC)
It depends on the context. What about if we were using the script template to format numbers in quotations, so as to better reproduce the form in the source? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 14:09, 22 September 2010 (UTC)
Can you please provide examples of quotations where the use of certain fonts would result in better reproducing how the numbers are formatted in the respective source? --Daniel. 15:41, 22 September 2010 (UTC)
Here are a couple: [6], [7]. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:14, 24 September 2010 (UTC)

I am very much against this. Why does it matter if the original numerals were w:text figures or lining figures? Typesetting aspects such as font, line spacing, margins, and hyphenations generally shouldn't concern us. We aim to remove most unsemantic textual differences since they are unnecessary for dictionary making and would just bloat our site. Linking to the original source is much superior as someone can see all these textual variations and more. However, if you would like your numerals to be fext figures then install one of the fonts mentioned on the wiki page and style normal ("Latn") text accordingly. --Bequw τ 19:25, 24 September 2010 (UTC)

If we allow text figures at all, then we may as well reproduce the distinction if it exists in a source text (despite its being non-semantic).

High-quality typesetting prefers text figures in body text: they integrate better with lowercase letters and small capitals, and their greater variety of shape facilitates reading. They help accomplish consistent typographic colour in blocks of text, unlike runs of lining figures, which can distract the eye. (Such a distraction becomes necessary when numbers are to stand out, for example in scientific text.) Lining figures are called for in all-capitals settings (hence the alternative name titling figures), and may work better in tables and spreadsheets.

That paragraph from w:Text figures#Design explains why we should prefer text figures in our running text (given that we should aim for "high-quality typesetting"). If we choose not to follow source texts' choices of numerals, then the numerals used will be text figures by default in most cases, because we tend to quote running text (rather than tables or spreadsheets). — Raifʻhār Doremítzwr ~ (U · T · C) ~ 09:56, 25 September 2010 (UTC)
The standard fonts we can rely on most users having don't include text figures. This is why we shouldn't have it be our default, and that's why other websites don't do this either. --Bequw τ 14:41, 25 September 2010 (UTC)
I agree with Bequw and I support his reasons. That is, I prefer all numbers equally formatted in all quotations, not reflecting individual "line spacing, margins, and hyphenations", etc. that may be better seen at their sources. However, the unique font for numbers in quotations does not necessarily have to be one of uniform width and heigth. As an example of possible font that is common and displays numbers as text figures, there is Georgia. --Daniel. 23:55, 30 September 2010 (UTC)
Would it be possible to have a PREF for displaying our numbers as text figures? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 14:58, 8 October 2010 (UTC)
Yes, the PREF would be possible. Or, alternatively, you may edit User:Doremítzwr/vector.css (or User:Doremítzwr/monobook.css, etc.) with relevant CSS information to choose a particular font. --Daniel. 15:42, 8 October 2010 (UTC)
I'd quite like it to be an option available generally, if that's possible. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:45, 8 October 2010 (UTC)
It doesn't really make sense for that to be a distinct PREF; it's an aspect of the font. We might as well have a PREF to choose between the two lowercase-A glyphs, or the two lowercase-G glyphs. —RuakhTALK 18:55, 8 October 2010 (UTC)
In terms of script templates, we usually use sc=unicode for miscellaneous characters. I'm guessing this doesn't help a lot. Mglovesfun (talk) 15:37, 22 September 2010 (UTC)

Extending etymological autocategorisation

I've been thinking awhile about how to extend the kind of autocategorisation that {{prefixcat}}, {{suffixcat}}, {{prefixsee}}, {{suffixsee}}, {{prefix}}, {{suffix}}, and {{confix}} bless us with. The kind of templature I've come up with is as follows:

They're the basic ones, and they're enough for now. Augmentations, problems, and sundry comments are welcome. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 21:17, 19 September 2010 (UTC)

I note that you have excluded bases in your example of {{native}}. I would think we would want to include them. This line of autocategorization thinking could and should be extended to include the bases in {{prefix}}, {{suffix}}, and {{compound}}. DCDuring TALK 00:01, 20 September 2010 (UTC)
I do not mean to suggest that coverage of bases is essential, merely that it is highly desirable. It would greatly enhance the value of the family of etymology templates by allowing the content of Derived terms and Related terms in many entries to be at least partially rendered dynamic. It is conceivable that a bot could even eliminate redundancy between static and dynamic items under those (and other) headings. DCDuring TALK 10:58, 20 September 2010 (UTC)
Your first template would need a way to distinguish prefixes from suffixes, since we have no string functions. Nadando 00:08, 20 September 2010 (UTC)
Indeed. {{confix}}, for example, seems to assume only that the first- and last-positioned items are affixes suitable for auto-categorization. As most normal-language terms for which this approach is applicable are constructed by sequential affixation, the assumption allows autocategorization in many cases. It is with more fancifully or normatively constructed terms that, according to its documentation and my experience, {{confix}} would require supplementation by manual work. Normatively constructed complex English words, such as the pharmaceutical names that are like Hungarian words, would require a different approach which may have already been developed for Hungarian. DCDuring TALK 11:12, 20 September 2010 (UTC)
DCDuring, by "base" do you mean something like sense II. 14. of the OED's entry for base, n.¹, viz. "Gram. The form of a word to which suffixes are attached; the theme." (a sense we don't have)? If so, please note that {{native}} is meant to cause the same autocategorisation for affixed bases and compound words (the example I gave did not show that, sorry). For a compound word, the example used is the English notwithstanding:
  • {{native|en|adv|not|verb|withstanding}}
And autocategorising the entry to Category:English words formed with the adverb not and Category:English words formed with the verb withstanding (as well as Category:English compound words, presumably).
Nadando, when you say that "we have no string functions", I take that to mean that our software doesn't recognise leading and closing hyphens as denoting affixes. It seems that it is unavoidable that we'd have to note the POS if we want it included in the category name (that would go for most of them, even if the software did recognise the structure of affixes). In the light of that, I propose that we alter the above templates to: {{native|en|pfix|thromb-|pfix|end-|pfix|arteri-|sfix|-ectomy|gl1=thrombus|gl2=inside|gl3=artery|}}, {{foreign|en|grc|noun|βαπτισμός|dipping|t2=baptism|tr=baptismos|sc=polytonic|||||}} (ISO-lang.-code order switched for regularity), and {{hybrid|en|la|adj|amnicola|en|sfix|-ist|gl1=dwelling by a river|||||}}; the derivations categories changed would be Category:English words derived from the Ancient Greek noun βαπτισμός and Category:English words derived from the Latin adjective amnicola.
Two more things have come to me. The first is that this kind of templature allows a great deal of detail to be given by our derivations categories. In the same way that synonyms are separated by {{sense}}, terms could be shown to be derived from particular senses. At the very least, we should provide the level of detail of distinguishing derivations from those of their homographs. For example, the OED lists three different verbs spelt alighten, derived from three different verbs spelt alight (given in respective order); we could distinguish and autocategorise them according to this scheme:
This scheme requires that we develop stable ordering of our senses. The only principle for such ordering that we can rely upon, to my knowledge, is a historical one (i.e., from earliest to latest date of first attestation), supplemented by the semantic grouping that subsense structure requires (which is what the OED does). ¶ The second is that we should use the kind of code Wikipedia uses to italicise certain article titles to italicise all the Latin-script mentioned terms in our derivations categories, so that we have, for example, Category:English words formed with the adverb not, Category:English words derived from the Latin adjective amnicola, Category:English words formed with the suffix -en², &c. Without that kind of italicisation, some category titles will initially cause parsing difficulties; italicisation will improve readability.
That's all for now. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:45, 20 September 2010 (UTC)
The quality and therefore stability of our senses is poor, especially for bases (OED sense, yes), which are often highly polysemic. I would think we should be happy indeed it our Etymology numberings were stable for such words. IMO, any new construct that depends on something like the success of the SenseID should wait on the successful use, over a year or so, of the product of the SenseID effort in other uses. I think we get value from the approach without requiring SenseID. In addition the approach needs its own field testing, which may as well proceed in parallel to SenseID's field testing. DCDuring TALK 12:08, 20 September 2010 (UTC)
Agreed; it's too early to include sense-specific derivation categories yet. However, I think we can include etymology- and POS-specific derivation categories. There aren't very many English homographs and it isn't too much work to find dates of first attestation per etymological root; consider secrete, sophy, and reparate, for the sake of example. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 12:22, 20 September 2010 (UTC)

Great ideas, Doremítzwr. Some thoughts: (1) Categorizing by etymon might be too confusing to readers, not allowing them to find what they want. English essence comes (according to our etymology from Latin esse, form of sum, whereas English future comes from Latin futurus, form of sum. Presumably (although we don't list the etymology of these Latin words) these Latin words come from different etymons (even if the same PIE word, presumably via different Greek, or whatever, words). So future and essence would go in different "derived from Greek 'foo'" categories. But since (unlike myself) enwikt is enamored with treating all forms of a word as the same, future and essence would go in the same "derived from Latin 'foo'" category. (2) We'd have thousands and thousands of "derived from" categories for English words alone; a good many, I'm guessing, would have one entry (for example, presumably the only English word derived from Middle English notwithstandinge is notwithstanding). Do we want this? (3) I especially like the idea of {{hybrid}}. We have no template for this now AFAIK. (The -fix templates don't do it, right?)​—msh210 (talk) 15:49, 20 September 2010 (UTC)

re: One- or few-member categories. What is the resource cost of having categories? The benefit is that we can have dynamic, low-maintenance categories for Related terms, Derived terms, and Descendants. Having the approach be nearly uniform over large classes of terms (eg all English terms, all English words spelled solid, or even all English nouns spelled solid) might facilitate all sorts of operations, including the use of CatScan to generate specific lists for various purposes contributor and user purposes.
We have very incomplete implementation of those sections and have just barely begun to pay the price of maintaining them. I, for one, wouldn't want to have to continue to do such work manually. My experience with such work has been quite unsatisfying in terms of my confidence in the completeness of what I have done at a time cost I was willing to pay or my sense that the result was worth the cost of doing it properly using currently available resources (eg wiki search-generated lists, what-links-here lists). On the other hand, if we are to have RT, DT, and Descendant sections that we prominently display, the work needs to be done, as my experience in trying to improve some medium-to-large derived terms sections suggests. DCDuring TALK 16:23, 20 September 2010 (UTC)
Good point; cf. [[user:msh210/underived]], which I've made only a negligible dent in.​—msh210 (talk) 17:04, 20 September 2010 (UTC)

Building on the templates we already have, {{prefix/test}} categorizes the entry based on the second parameter, with options for the derived part of speech and the part of speech of the base word. {{deriv}} is similar to {{prefixsee}}, and adds a list of terms from the category generated by {{prefix/test}}. Nadando 00:45, 21 September 2010 (UTC)

I've now extended {{deriv}} to work with prefixes and suffixes ({{prefixsee}} and {{suffixsee}} can be orphaned). Any objections to merging my changes with {{prefix}}, {{suffix}}, {{confix}}? Nadando 19:14, 21 September 2010 (UTC)
Have you tested it in any live entry? I think we should avoid even temporary bad consequences for a widely used templates.
Is the wording of the new class of category names what we want?
How can this be deployed to avoid an extended period with a large number of red category links? Special:WantedCategories was last updated 4:35 19 Sep. What is its update schedule? Does the change need time to propagate so that it appears in that page? How much? DCDuring TALK 19:50, 21 September 2010 (UTC)
It's been tested and generates the expected categories. Do you have any improvements on the wording? I'll wait for more comments on that.
As for red category links, we won't be able to avoid having a huge amount of them at first. As I recall Special:WantedCategories only updates every few days. Also needed is a template similar to {{prefixcat}} for use on category pages. Nadando 20:05, 21 September 2010 (UTC)
After thinking about this some more, I've added an optional 'cat' parameter, which will add the new categories. So, implementing this change won't generate a ton of red links and categorization can be optional. Nadando 01:21, 22 September 2010 (UTC)
Not a bad idea. We could then do a demonstration/test and make sure everything is as expected downstream and that no one objects or has a good alternative. Once that was OK and we had a good roll-out for the categories, we could remove the "if" in the template. I suppose we could have it include or exclude a limited number of languages if we had to. DCDuring TALK 02:03, 22 September 2010 (UTC)
@ msh210: FWIW, I would support marking derviation specifically from esse and futurus (rather than from sum) for essence and future; the lemma can always be noted without the use of complex, autocategorising templates. I understand your point about sparsely-populated derivation categories, but I think DCDuring's response answers it well; using autogenerated derivations lists cuts down our human-maintained informational redundancy massively — I think we can all agree the software is much better at keeping such lists updated than humans working manually are (and User:Msh210/underived would seem to prove it). As for the specific case of Middle English → Modern English derivations, I tend to think sometimes that marking derivation through those intermediate forms is a little pointless, and that we should instead go straight to the Old English etyma (perhaps noting the enm form with a postmodifying viâ); this would be similar to what the OED does (which treats all English (and some Scots) from AD 1000 onward), but I suspect such a proposal would attract quite a lot of opposition. Yes, AFAIK, {{hybrid}} is novel in allowing multi-source derivations in one template; I think it's best to keep it separate from {{native}} and {{foreign}} because of the need to specify ISO language codes for every term mentioned by {{hybrid}}, whilst it is only necessary to do so once, at the beginning, with the other two.
@ Nadando: Could you write documentation and/or give examples of use for your new templates, please? My understanding of wiki templature code is far too basic to follow the discussion in its present form.
@ DCDuring: Wouldn't it be possible to have a bot autocreate these categories? Judging from {{prefixcat}} &c., it's a very simple matter which should be unproblematic to automate.
 — Raifʻhār Doremítzwr ~ (U · T · C) ~ 14:28, 22 September 2010 (UTC)
re: Nadando: I would very much like a quick test of one template, preferably {{compound}}, in English only. Though {{compound}} has been transcluded perhaps ten thousand of times, the number in English is perhaps 20-30% of the total. (Finnish seems the most heavy user. I think Hungarian has its own language specific templates.) As there are many editors in English, including some involved in this discussion, and no Finnish contributor has been involved, I would not want to impose the massive number of redlinks on them. But I would also not want to have to change all of the uses in English of one of these templates to new template names just to get a realistic sense of the work required.
A smaller-scale test would be to just do entries that had no lang= parameter. This would enable us to clean up one source of error (uses of the template in languages other than English without said parameter) and one barrier to completeness of an English-only implementation.
re: Raif'har: I think we need to do at least a medium-scale reversible test and manually check for issues, before getting bots involved. I don't know what the issues might turn out to be, but being very cautious in large-scale implementation and bold in small-scale testing seems a good policy to me. DCDuring TALK 14:58, 22 September 2010 (UTC)
I agree with your cautiousness, DCDuring; we'll use bots only when we're fairly sure that the kinks have been ironed out. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 10:00, 25 September 2010 (UTC)
There are two documented templates, {{morph}} and {{derv}}, that can be used to explore presentation issues. They can be integrated with {{suffix}} and {{prefix}} (with just the prefix or suffix entered and an empty section for the base) to explore how the presentation might look. I have taken a run at some entries and categories with the new material centered on refer and referential. There is nothing about the presentation that requires any technical skill other than template use and our most common wikimarkup to explore. Please take a look. DCDuring TALK 10:50, 25 September 2010 (UTC)
I've made comments about {{morph}} in #"Synchronic" and "diachronic" etymologies. As for {{derv}}, am I right in concluding that that template functions more or less like a single-term version of {{native}} which I proposed above? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:04, 25 September 2010 (UTC)
That's a way to look at it. It's genealogy begins from a copy of {{confix}}, with a crude mod by me, a radical simplification by Ruakh, and spawning of {{derv}} and {{base}} because of limitations of the original. DCDuring TALK 14:08, 25 September 2010 (UTC)

Autocategorizing base derivation using existing templates

boozehound- an example of {{compound/test}}. See geography, zoogeography for an example of {{prefix}}. Nadando 19:39, 22 September 2010 (UTC)
For the testing of the template in an individual entry I hadn't doubted your word, though it's comforting to see the examples. I am more concerned with possible problems at the level of implementation: especially with the quality of the input, the total number of categories required, and response by contributors and users.
I am fairly sure that there will turn out to be a noticeable number of uses of these templates with the wrong lang= for the L2 they are in. Most will be omitted parameters in non-English sections, but there are other types. Yesterday I found and corrected about 20 among 1500 entries at Special:WantedCategories, from errors in the use of {{suffix}}, {{prefix}}, and {{confix}}, some due to lang parameter errors, but also due to more subtle errors. Manual addition of the missing categories may speed the identification of errors.
Is it possible to limit the use of the templates to those entries without any langcode in the template and then to limit the use of the templates to English? I know that the data in the case of prefixes and suffixes is highly problematic, so we would not quickly get a good sense of what might be possible working with them first. That is why I would like to use {{compound}} first for a limited deployment. DCDuring TALK 20:09, 22 September 2010 (UTC)
Some analysis: there are 11342 instance of {{prefix}}, 19494 of {{suffix}}, 7482 of {{compound}}, and 2779 of {{confix}}. A rough estimate of the potential number of new categories is 28,000-30,000 if changes are implemented in the four previous templates (without the cat- parameter). There are 209 instances of the four templates that need the lang parameter (list). Nadando 04:59, 23 September 2010 (UTC)
I think my problems are two:
  1. I am reluctant to suggest doing this for languages other than English without explicit support from contributors in those languages, which has been notable for its absence.
  2. In English, I know that the confounding of diachronic and synchronic use of the {{prefix}}, {{suffix}}, and {{confix}} templates means that we cannot progress very far with the new information provided by altering the templates. The use of {{compound}} in English is relatively clean IMO.
Almost all of the categories created to turn blue the category redlinks created by the template change for compound would reduce the work required subsequently for the other templates.
I had guessed that 20-30% of the use of {{compound}} was in English. If so, there are 3000-4500 bases before considering duplicates. Eliminating duplicates might reduce the total by a factor or 2 or 3, yielding a range of 1,000-2,250 missing categories generating category redlinks in 1500-2250 entries. Using an 80/20 rule, we could eliminate both redlinks in nearly 2/3 of the entries by creating categories for the 200-450 most-populated categories. This could be accomplished within a day by a single individual. Creating all the categories with more than one member might take a week or less. Thus the redlink problem would be substantially reduced even without broad participation in the process.
Perhaps success in the process would convince other languages to participate.
Perhaps success across languages would lead to some movement on supporting both diachronic and synchronic categories for all of the general morphology templates. Alternatively, it might prove necessary to replace the existing all-language morphology templates with either language-specific or simply variant all-language morphology templates to accomplish the autocategorization necessary to support complete, accurate, low-maintenance content under Derived and Related terms headers in English.
Meanwhile, I'll work on the 200-odd missing lang parameters. DCDuring TALK 15:17, 23 September 2010 (UTC)
Those missing lang= parameters are a perennial nuisance, and they'll never go away as long as we allow a blank parameter to mean en. I strongly admonish that we require an ISO code to be stated at all times in any templates we create or modify. The two extra keystrokes are by far less work than sorting out the mess that is created otherwise. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:54, 25 September 2010 (UTC)
From the perspective of an outsider, this is awesome. I've always wished Wiktionary had some way to search more easily for words derived from one particular root. — lexicógrafo | háblame — 16:03, 24 September 2010 (UTC)
We have related terms and derived terms. But see referentiality for the start of a more category-based approach.
Also, your perspective as one not too familiar with the historical, technical and other reasons why things are as they are is valuable. You probably are closer to the perspective of casual users, but, unlike them, are communicating usefully. DCDuring TALK 16:13, 24 September 2010 (UTC)
I truly believe that this change can make Wiktionary the best free multilingual etymological resource available within two years; that is an exciting prospect. There is some difficulty regarding how to automate the display of descendants in multiple languages, as well as in the display of descendants to more than one generation. Here are two possible schemata:
1. Grouping by language (only one generation available; frugal in its use of vertical screen space; allows different alphabetical ordering for each language)
2. Nesting by word (multi-generational; takes up a lot of vertical screen space (though not much of a big deal if the display is collapsible); alphabetical ordering by language is necessary)

[real-life (albeit incomplete) example is the Ancient Greek χρῖσμα (khrîsma, salve)]:

The first schema is by far the easiest to accomplish, but it is also far more limited than the second. Which do y'all prefer? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:54, 25 September 2010 (UTC)
A list like the one you put in a box would be better IMHO, (of course if the links weren't all red). It looks more family-tree-ish. — lexicógrafo | háblame — 12:23, 25 September 2010 (UTC)
The way to avoid a massive number of redlinks is to start at the bottom, at the leafy end of the tree.
The presentation in the box is definitely better. I wonder how parts of it can be field-tested.
I wonder how you would manage cases of multiple descent. We have many cases of "modal" descent where multiple etymologies exist, qualified by "probably", "perhaps", and "possibly" and some where the main etymology is "influenced by" something else. For a simpler case in native morphology, I have tentatively included both reference and referent in the tree referential. DCDuring TALK 14:27, 25 September 2010 (UTC)

Purepecha, pua, tsz

Apparently, in the list of languages at WT:LANGNAME, Purepecha is listed as having two codes simultaneously: pua and tsz. Is this correct and expected? Wouldn't "pua" perhaps be the code for Western Highland Purepecha instead? --Daniel. 08:33, 20 September 2010 (UTC)

Yes, you could undo your edit. This duplicate name was noted WT:Grease pit archive/2010/July#Mwera vs. Mwera languages (with others). Make them different, and if someone cares to unify them (as w:P'urhépecha language did) then they can bring it up later. --Bequw τ 01:42, 21 September 2010 (UTC)
Agreed, it is better to keep these two different versions of Purepecha separately. Then, I have edited Template:pua to display "Western Highland Purepecha" instead. --Daniel. 02:45, 4 October 2010 (UTC)

Forbidding the part-of-speech of numeral and number

It seems to me that the following could find consensus:

Forbidding or obsoleting the part-of-speech heading of "Numeral" and "Number" in favor of any of "Cardinal number", "Ordinal number", "Cardinal numeral", "Ordinal numeral", "Adjective", "Noun" and "Determiner" in all those languages in which cardinals and ordinals show different grammatical behavior, where grammatical behavior includes usual positions taken in sentences, inflection, and more.

This would affect at least English, Czech, and German, and, if one believes EncycloPetey's assessment, also many Romance languages.

This proposal leaves it open whether ordinals are classified as "Ordinal number", "Ordinal numeral" or "Adjective"; it only makes sure that they classified neither as "Numeral" nor as "Number"; and similarly for cardinals.

Put differently, this proposal requires that cardinals and ordinals are assigned different part-of-speech in languages where this is warranted, in contrast to both cardinals and ordinals having assigned the same part-of-speech "Numeral" or the same part-of-speech "Number".

One consequence of this proposal would be that it would forbid the combination cardinals:"Numeral"+ordinals:"Adjective", while allowing the combination cardinals:"Cardinal numeral/number"+ordinals:"Adjective", cardinals:"Determiner"+ordinals:"Adjective" and other combinations.

Thoughts? --Dan Polansky 11:17, 20 September 2010 (UTC)

In Portuguese (apparently relevantly, it is a Romance language), the approach of "Cardinal numeral" and "Ordinal numeral" seems perfect in comparison with Portuguese books, grammars and dictionaries, that officially teach about the groups of numerais cardinais and numerais ordinais. --Daniel. 11:21, 20 September 2010 (UTC)
You'd need many of them as nouns, hence the English plurals ones, twos (etc.). In French they're grammatically invariable, but they can be used in the plural, des quatre, des cinq. Mglovesfun (talk) 11:24, 20 September 2010 (UTC)
Prohibition seems unreasonable as it denies those working on a language the chance to use a natural-seeming name for a category of words whose grammar may not fit our PoS scheme. Unless the conceptual framework of linguists has reached such perfection that it can confidently state that there can be nothing new under the sun in this area. DCDuring TALK 11:54, 20 September 2010 (UTC)
There is the "in all those languages in which cardinals and ordinals show different grammatical behavior" qualifier in the proposal, anyway.
Do you want that part-of-speech headings for number words are left completely unregulated and ununited, or regulated on a language-specific basis? --Dan Polansky 12:04, 20 September 2010 (UTC)
I wonder how this would affect Dutch. Unlike adjectives, Dutch cardinals and ordinals are not inflected (except occasionally 'one'), and I don't know what considerations would apply with respect to syntax. I assume the situation is similar to English, though. —CodeCat 13:18, 20 September 2010 (UTC)
If Dutch cardinals and Dutch ordinals have the same grammatical properties, they would be exempt from this regulation. In any case, Dutch entry for achtste uses "Ordinal number" and Dutch entry for acht uses "Cardinal number", and this seems to be the pattern also for other Dutch number words in Wiktionary, so the current Dutch practice would be unaffected either way. --Dan Polansky 13:38, 20 September 2010 (UTC)
Picking another strand off your note: if it is right that Dutch ordinals are much unlike Dutch adjectives (in that Dutch ordinals are not inflected while Dutch adjectives are), this is a refutation of sorts of the proposed regulation to file all ordinals as adjectives across all languages. The alternative regulation to file ordinals as "ordinal numbers/ordinal numerals" in part-of-speech heading seems much less problematic, as in the worst case "ordinal number" is seen as a specific kind of adjective. --Dan Polansky 13:49, 20 September 2010 (UTC)
As a general matter, I think I favor minimal regulation. Regulation by "Language committees" alone would be fine, except for the desirability of maintaining a degree of uniformity of appearance, mostly for the benefit of passive users, some uniformity to reduce technical complexity, whatever is needed for effective interwiki cooperation, and any other legitimate common concern.
As I haven't heard of a universal grammar for number words, I would be reluctant to implicitly impose one, even though it could be subsequently overturned by vote or by refusal to comply in practice. I think it is a mistake to assume that this category structure is anything other than a default. We could accomplish a great deal with, 1., redirects of category pages to category page names consistent with the grammar of each language or, 2., with multiple categories, especially if the number words in each were autocategorized by a template more specific than {{infl}}. DCDuring TALK 14:22, 20 September 2010 (UTC)
Well, I think I should need more clear. While Dutch ordinal numbers show invariable inflection, they have the ending of inflected adjectives. All ordinals, such as tweede, have the ending -e found in adjectives. And achtste among others also shares the ending -st with superlative adjectives. The lack of inflection is only in the sense that the uninflected base forms *tweed and *achtst don't exist, as they would for regular adjectives. But the adjectival ending is unmistakeable. —CodeCat 18:45, 20 September 2010 (UTC)
Dan, you only read part of what I wrote. As the comments posted above show, this is a much bigger issue than saying unilaterally that will will/won't allow a particular header. The header of Numeral/Number is needed in most languages. What differs between languages is which groups of numerical words belong under that header. In English, cardinals qualify, and you could make a case either way for the ordinals. In Spanish, Portuguese, and other Iberian languages, there is no discernible differnce grammatically between an ordinal and an adjective. Grammars often treat these words together with the cardinals (which do have unusual grammatical properties), but this is out of a convenience of presentation and has nothing to do with the grammar. Latin, likewise has no discernible difference between ordianls and adjectives that I've been able to find, and, like some other languages, has additional groups of numerals, such as the distributive and adverbial. While some of these additional numeral types can be more-or-less placed into other parts of speech, most cannot. Thus, forbidding the Number/Numeral header would require the propogation of additional part of speech headers that would not otherwise be necessary.
So, rather than propose we throw the baby out with the bathwater, why not open a reasonable discussion with a less dramatic solution? --EncycloPetey 14:51, 20 September 2010 (UTC)
Why is the header "numeral/number" needed? What service is done by "numeral/number" that is not done by "cardinal numeral/cardinal number" and "ordinal numeral/ordinal number"? What is wrong about classifying English cardinals as "Cardinal numeral" while English ordinals as "Adjective", and doing the same in Spanish and Portuguese (given that would be your preference over using "ordinal number)?
I did not think this proposal dramatic at all. I thought I was proposing common sense. But the discussion has showed otherwise.
You spend a lot of time in your response explaining why you need the heading "adjective", although I nowhere propose to forbid "Adjective" in ordinals. The proposal intentionally leaves the question of "adjective" vs "ordinal number" unaddressed. --Dan Polansky 15:21, 20 September 2010 (UTC)
Okay, some of my questions are probably answered by your "Latin, ... like some other languages, has additional groups of numerals, such as the distributive and adverbial". So the Latin distributive numerals and adverbial numerals won't fit into "cardinal numeral" and "ordinal numeral" pair. --Dan Polansky 15:26, 20 September 2010 (UTC)
Because "cardinal number/numeral" and "ordinal number/numeral" are not parts of speech. Parts of speech are only "number" or "numeral". It makes as much sense to use the former as ===Relative adjective=== or ===Collective noun=== or ===Transitive verb===. That additional information should be encoded in language-specific context labels ({{cardinal}}, {{ordinal}} etc.). There is no problem in "classifying adjectives as numerals" - numerals/number are themselves a overarching category (like we use participles and determiners as PoS headers) defined on the semantics of words and encompass nouns, adjectives, adverbs, miscellaneous indeclinables.. depending on the particular grammar tradition. Trying to devise a universal cross-language solution won't get us anywhere. Just let individual language policies decide what constitutes a numeral/number, and simply vote on the header name. --Ivan Štambuk 17:58, 20 September 2010 (UTC)
Re: "Because 'cardinal number/numeral' and 'ordinal number/numeral' are not parts of speech. Parts of speech are only 'number' or 'numeral'": That sounds very unlikely to me; in the languages that I speak, cardinal and ordinal numbers behave nothing like each other. Do you have any evidence for that claim? —RuakhTALK 18:16, 20 September 2010 (UTC)
I'm not a linguist, but I am a wikipedian. According to w:Lexical category there are 8 parts of speech and numbers aren't listed there. Whether numbers or cardinal numbers are parts of speech, maybe they should first be listed in the Wikipedia article with good references (citation needed, no original research). In my Swedish primary school, there were 9 parts of speech and one (not two) of them was for numbers, including both cardinal and ordinal. But this is Wiktionary and not a Swedish primary school. --LA2 18:44, 20 September 2010 (UTC)
Not really. As Wikipedia says, those are the eight traditional parts of speech; it's not a good categorization scheme. (We do use that scheme to some extent here, simply because so many readers are familiar with it, but as far as linguists are concerned, it's very obsolete.) —RuakhTALK 19:03, 20 September 2010 (UTC)
Re LA2: In my Czech primary school, there were 10 parts of speech and one (not two) of them was for numbers, including both cardinal nd ordinal. But this is plainly wrong, as far as I can see. Czech cardinals and Czech ordinals behave much unlike each other. Czech ordinals behave much like Czech adjectives. So having the part of speech of numerals that subsumes both cardinals and ordinals looks traditional for Czech parts of speech, but wrong. --Dan Polansky 19:12, 20 September 2010 (UTC)
What I wanted to say is that if we want to do X (say, to separate cardinals and ordinals) and we do have good reasons for this (do we really?), these reasons should first be presented in Wikipedia, and then Wiktionary's guidelines could simply point to Wikipedia. Three years later, we can still refer to Wikipedia rather than "we discussed this three years ago". For this to work, the good arguments must be good enough for Wikipedia (citation needed, no original research) and this is (in my opinion, being a wikipedian) a good test to determine if an argument is good or not. Documenting a principle in Wikipedia is a test of its authority. --LA2 12:34, 21 September 2010 (UTC)
That sounds very unlikely to me; in the languages that I speak, cardinal and ordinal numbers behave nothing like each other. - It doesn't matter. As I said in the rest of the remark (which you conveniently ignored), numerals/numbers as a PoS is a generic category encompassing words whose "underlying" PoS can be of various different sorts. Who cares that cardinals and ordinals inflect differently. The purpose of PoS labels is to facilitate lexicographical categorization. People studying a language want to learn how to say "one" together with how to say "first" and "individually" and "by one" and "singular" etc. If it's somehow "unnatural" and "asymmetric" - so what. Grammars of all of world's languages use a single PoS for numbers/numerals, and so should we. Special cases of numbers/numerals (cardinals, ordinals..) should be treated like special cases of nouns, verbs.. You can't really require separate ===Cardinal n..=== and ===Ordinal n..=== without requiring a few additional PoS headers for languages who have more types of numbers/numerals. The best solution seems to be simply treat them all as one PoS and use labels, navigation templates, and categories to mutually link them and explain their difference (in appendices, usage notes or wherever). --Ivan Štambuk 17:44, 21 September 2010 (UTC)
Re "Grammars of all of world's languages use a single PoS for numbers/numerals, and so should we": This strong claim, probably a hyperbole, would require citation I think. I very much doubt that most grammars of English use a single PoS "numeral" or "number" for cardinal numerals and ordinal numerals. I know of at least two grammars of English, one from 19th century and one modern, that have nothing like "number" or "numeral" as PoS. --Dan Polansky 17:54, 21 September 2010 (UTC)
Do you know of any English (or of any other language) grammar that treats cardinals and ordinals as different parts of speech? I think not. --Ivan Štambuk 18:06, 21 September 2010 (UTC)
Yes. Believing Brett, CGEL has cardinal numerals as either nouns or determinatives, while ordinals as adjectives. OTOH, S:The Grammar of English Grammars seems to have both cardinals and ordinals as adjectives. --Dan Polansky 18:22, 21 September 2010 (UTC)
No, I asked you if you knew of a grammar that has cardinals and ordinals as two different parts of speech. From your descriptions above, none of those grammars even acknowledges numerals/numbers as parts of speech. --Ivan Štambuk 19:04, 21 September 2010 (UTC)
@Ivan: Ah, sorry, I didn't understand what you meant. Now that I do — I completely disagree. People studying a language want to learn how to say "eat" together without how to say "food", but that doesn't mean we should have a ===Comestible=== POS header. The POS breakdown should be based mainly on linguistic principles; the existence of crappy traditional grammars is relevant only when it means that accuracy is likely to confuse readers. In this case, I don't think it is. —RuakhTALK 17:56, 21 September 2010 (UTC)
"one" and "first" are much more deeply connected than "eat" and "first". That connection is widely acknowledged in all the world's grammars of all languages, where these two are commonly treated as a single lexical category. Linguistic accuracy is irrelevant, utilitarianism takes precedence. We already use similarly generic PoS categories such as ===Determiner=== and ===Participle=== so there is a precedent for this. I don't see how this could possibly "confuse readers" - I think they would be much more confused by having 2+ PoS headers for every type of number/numeral that the grammarians of the respective language usually define. --Ivan Štambuk 18:06, 21 September 2010 (UTC)
I don't think at all that "one" and "first" are much more deeply grammatically connected than "eat" and "food", or "eat" and "edible", or "blue" and "blueness". Ruakh's analogy seems rather fitting. --Dan Polansky 18:27, 21 September 2010 (UTC)
Them being etymologically related in Czech (in most cases ordinals being regularly derived from cardinals simply by appending the adjectival suffix) doesn't really reinforce that apparent disconnection that you imagine. --Ivan Štambuk 19:04, 21 September 2010 (UTC)
Huh? I said nothing of Czech. You have shown no grammatical connection between "one" and "first", "five" and "fifth", or "ten" and "tenth", let alone "two" and "twice", or "five" and "fivefold". Apparently, you do not care about any grammatical connection. For semantic connections, we already have a topical classification. We do not need to mess up part-of-speech headings so that they fit to, as Ruakh said, crappy traditional grammars. --Dan Polansky 19:16, 21 September 2010 (UTC)
If you cannot see the obvious connection between "one" and "first" then your reasoning skills are really, really bad, or you're simply playing dumb (more likely IMO). And I love the "crappy traditional grammars" arguments. They are "crappy" because they don't fit your point of view. You even admit that you were taught in the school that Czech cardinals and ordinals are the same parts of speech "numbers/numerals". Wow, the whole generations of Czech grammarians must have been "crappy". Your arguments are worthless Polansky. You haven't provided a shred of evidence refuting my claims of utility of grouping them all under the same header, of non-existence of a grammar tradition where cardinals and ordinals are grouped as separate, but distinctly numeral parts of speech as you propose we do here, or of the inapplicability of the generic PoS treatment when we already do the same for e.g. participles. --Ivan Štambuk 20:39, 21 September 2010 (UTC)
(unindent) Re Ivan Štambuk: You seem to misunderstand what I am saying. I say that there is no or little grammatical or part-of-speech connection between "one" and "first", "five" and "fifth", or "ten" and "tenth", let alone "two" and "twice", or "five" and "fivefold"; this claim is markedly distinct from the claim that "there is no connection" without the "grammatical" qualifier. At least one modern English grammar agrees with me on this. On another note, your move that either I admit a grammatical connection (Or any connection? Did I ever deny semantic connection?) and yield to your argumentation or else I get labeled as stupid or dishonest has no place in a decent intellectual discussion. Furthermore, it is your task to prove your claims, not my task to refute them. Your claim that "Grammars of all of world's languages use a single PoS for numbers/numerals, ..." is still lacking a proof, and looks as doubtful as ever. But if you actually meant "Many traditional grammars (not all grammars) of many (not all) languages use a single PoS for numbers/numerals", that is plausible. --Dan Polansky 09:46, 22 September 2010 (UTC)
I have to correct EncycloPetey's above statement "In Spanish, Portuguese, and other Iberian languages, there is no discernible differnce grammatically between an ordinal and an adjective."
Actually, in Portuguese, adjectives are expected to be placed after nouns and ordinal numerals before nouns in any situation, except few exceptions such as special rules for monosyllabical adjectives and poetical liberty in general. In addition, there are laws that specifically intend to decide which parts of speech group which words: I recommend searching the text "C - Classes de palavras" in the page http://www.portaldalinguaportuguesa.org/?action=nomenclatura. --Daniel. 19:32, 20 September 2010 (UTC)
OK, there are positional limitations, but this is true for certain special adjectives in a number of languages. English itself has some adjectives that can only appear in front of the noun, and are never separated by a copula from the noun it describes, but this limitation does not extend to most English adjectives. So, I don't consider positional limitations significant to distinguish parts of speech, especially as Latin, the parent language for Portuguese, was very free about adjecitve position except for ordinals. This is therefore an inheritted limitation and does not reflect novelty in the development of adjectives in Portuguese grammar. --EncycloPetey 06:24, 28 September 2010 (UTC)

English appendix-only nouns

Upon a recent discussion at User talk:Ruakh#en-noun, the existence of appendix-only terms has been questioned.

Specifically, according to WT:CFI and WT:FICTION, there are terms that should be defined only in appendices. Then, entries for terms that are used only in context of fictional universes or minor constructed languages, have been created.

Effectively, the appendix-only (or context-only) entries are formatted as common entries, with templates like en-noun, etyl, etc. where necessary. Their links and categories are slightly changed to reflect the separate naming system.

Do we want the appendices and categories as displayed in Appendix:Brave New World/bokanovskify? Should {{en-noun}} also be converted to format such entries accordingly? Or, oppositely, should {{en-verb}} and similar templates be un-converted and never be used in appendices? Any comments and additional ideas are welcome. --Daniel. 22:03, 20 September 2010 (UTC)

They are entries much like any other; in particular they have a headword and a part of speech. That alone already warrants the use of such templates IMO, because it seems silly to not use a template like {{en-noun}} 'just because'. —CodeCat 22:47, 20 September 2010 (UTC)
So, as I understand it, the proposal is as follows:
  • Some terms will be defined only in appendices. (Note: as Daniel. notes, this is already specified by existing policy pages; I mention it here only for completeness' sake.)
  • The appendices in question will have the same format as mainspace entries: they will be split by language, then by POS, and so on. (For example, an English term and the corresponding Japanese term will appear on the same page if they are spelled identically, and on separate pages otherwise.)
  • The appendices in question will be subpages of language-neutral main appendix pages, and will belong to identically-named language-neutral categories; for example, Appendix:Pokémon/evolving and Appendix:Pokémon/アーボ will both be categorized into Category:Pokémon.
  • The appendices in question will belong to categories like Category:Japanese appendix-only nouns.
  • The appendices in question will use the same templates as mainspace entries — {{en-noun}} and {{etyl}} and so on. These templates will automatically notice when they are used in appendices, and are what will handle the addition of entries to Category:Pokémon and Category:Japanese appendix-only nouns or whatnot.
  • Also, these templates will automatically notice when they are used in other namespaces, and will behave in arbitrarily different ways for no apparent reason. (O.K., sorry, some of my own POV is sneaking in there. I can't help it: I don't understand the reason for the arbitrary breakage, so I can't describe it sympathetically.)
Is that correct?
One thing I'm not clear on is the future of appendices for reconstructed terms in unattested proto-languages. Are they supposed to fit into the above proposal somehow? Note that they currently don't follow the above structure, and I'm not sure they should. (For example, Category:Proto-Germanic appendix-only nouns seems a bit redundant, since we don't have any mainspace entries in that language.)
RuakhTALK 22:58, 20 September 2010 (UTC)
But proto-languages, being not English, have no use for templates like {{en-noun}}, so they can have their own category structure (Category:Proto-Germanic nouns, etc.). --Yair rand (talk) 23:21, 20 September 2010 (UTC)
That's half-true: they won't use {{en-noun}}, but Daniel. (talkcontribs) explicitly said "templates like en-noun, etyl, etc." Does that ostensive list include {{infl}}? I can't tell. —RuakhTALK 23:40, 20 September 2010 (UTC)
  • Whoa. When we were discussing putting these items in Appendices, the last thing I expected that we would have one Appendix per word. I thought that there would be one Appendix per fictional universe or possibly a few Appendices per. I thought the Appenidces would be formatted as glossaries along the line of the various glossaries that we have long had. The whole premise of needing any word-level templates seems wrong to me. To treat these on a par with PIE and other proto languages seems almost as bad as having them in principal namespace. DCDuring TALK 23:15, 20 September 2010 (UTC)
    Thanks for the explanation, Ruakh. It is correct, except that I would not be in favor of redundant category names as you described; for example, we have Category:Klingon nouns but not Category:Klingon appendix-only nouns, because all Klingon nouns lie in appendices. In addition, reconstructed languages are organized similarly but I can notice few diferrent details: in particular, Appendix:Proto-Germanic *haglaz is not named Appendix:Proto-Germanic/*haglaz with a slash. So it's safe to say that reconstructed languages don't fit the system of constructed languages and fictional terms, and that they could continue existing separately, or they could merge in the future by means of additional proposals for a more convenient overall organization.
    I also would not be in favor of arbitrary behavior, so please elaborate your experiences and expectancies about how the discussed templates behave in various namespaces, if you would.
    DCDuring, there are appendices with indexes of terms: see, for instance, Appendix:Harry Potter spells and Appendix:Pokémon items. The additional format of one appendix per term is necessary, because each one has their characteristics, like translations, pronunciations, etymologies, inflections and multiple definitions: see Appendix:Pokémon/Ditto, that also does not display proper inflection-line or categorization due to the un-converted en-noun, but includes other characteristics of a common entry. --Daniel. 23:47, 20 September 2010 (UTC)
    It is by no means necessary to glorify these terms with any support whatsoever beyond including them in a Glossary. This seems like a sandbox project that has gotten loose. I see no reason whatsoever to change the least element of or incur the slightest risk to anything whatsoever used elsewhere in wiktionary in support of a runaway sandbox project. If some good ideas emerge that support the main work of the project, that's great. Otherwise, it doesn't seem worth the least consideration except to ensure that it does not harm the project as a whole. DCDuring TALK 00:13, 21 September 2010 (UTC)
    The community voted specifically to include appendices of fictional terms. It is not a "sandbox project". --Yair rand (talk) 00:22, 21 September 2010 (UTC)
    I'd be amazed if the "community" thought this would be anything like what has been perpetrated. DCDuring TALK 00:26, 21 September 2010 (UTC)
    I wouldn't. What exactly do you think the community was assuming the the appendices would be like? --Yair rand (talk) 00:34, 21 September 2010 (UTC)
  • I don't know about anyone else, but I was assuming that the appendices would be like what the vote describes. The vote uses terminology like "appendices of words from that universe" and "a Star Wars appendix" and so on. I guess there might be a terminological barrier here: as you see it, maybe all the pages in Category:Pokémon, taken together, constitute a single "appendix"? I guess that makes some sense, but to me each of those pages looks like a separate "appendix". Therefore, I was expecting a single page like Appendix:English Star Wars terms or something, with a list or table of terms, not a whole family of pages with their own categories and tree structure and whatnot. —RuakhTALK 00:50, 21 September 2010 (UTC)
  • Update: I do know about anyone else, because I've just now read the discussions that led up to the vote. The concept was Concordance:A Clockwork Orange, with the correction that it should be in the Appendix: namespace because "concordance" implied, and implies, a complete list of all words used in a corpus. (That said, Visviva did at one point say "I would suggest 'Appendix: Glossary of Foo terms' as an alternative convention, with option to create subpages ('Appendix:Glossary of Foo terms/Bar') for individual terms where warranted", and no one replied with obvious horror; but the vote itself didn't mention the possibility of subpages, and no mention was ever made of what might warrant such a subpage, so that comment alone doesn't justify putting all individual terms in subpages.) —RuakhTALK 01:13, 21 September 2010 (UTC)
Glossaries. I thought they were voting mostly on the modified CFI required, for example, "to permit the phrase to be included in a Star Wars appendix". The Appendices at the time were largely limited to glossaries. When the discussion turned to the Appendix of protologisms the discussants did not object to the form of that Appendix, but rather its content. Since that vote, we have acquired a citation space to house the citations if these are being warehoused pending such time as they meet some criteria for inclusion. Also the items are supposed to meet modified CFI to even be included in an appendix. DCDuring TALK 00:53, 21 September 2010 (UTC)
There is a Star Wars appendix: Appendix:Star Wars, which links to individual entries like Appendix:Star Wars/protocol droid. The latter includes three citations, which in turn naturally make the page longer. It would be not feasible to include every information related to a single fictional universe into a single page without further subdivisions. In addition, as part of the group of appendices related to Star Wars, there is a glossary of terms derived from the series: Appendix:Star Wars derivations. --Daniel. 01:01, 21 September 2010 (UTC)
I was also not envisioning prolific appendix subpages (nor for that matter extensive sections for computer languages such as Appendix:Hyper Text Markup Language). Their existence doesn't bother me too much, but I would only make real concessions for proto languages. Those are valuable for linguistics. The other stuff could be moved to fan-sites such as Wikia. --Bequw τ 02:00, 21 September 2010 (UTC)
Then, for instance, should APL symbols not be defined here on Wiktionary? Also note that Wikipedia, Wikia and Wiktionary differ in their contents, presentation and scope.
Appendix:Hyper Text Markup Language links to each of the standard tags, attributes and values of HTML, while Wikipedia, despite having a page with similar purpose, has policies preventing the maintenance of "lists of indiscriminate information", which apparently involves glossaries, including the HTML one. --Daniel. 02:11, 21 September 2010 (UTC)
Having items in a glossary is much like the tabular presentation of certain information in a print dictionary. Glossaries are also useful as repositories of terms that might warrant inclusion if they are attested and otherwise meet our criteria for inclusion. I think APL was debated. I don't know about HTML tags. They seem suspect to me, but may have been debated.
WP has its policies and this community has its own because we have distinct objectives. We often are the home for dictionary material from WP just as they are the home of information too encyclopedic for us.
As I see it, the vote was trying allow for a species of inclusion for terms from fictional universes circumscribed in in two ways: 1., the elements had to meet modified CFI, and 2., the elements were to be included in the kind of appendices that existed at the time. There was no discussion or even mention of any subpages or use of templates.
No reasonable reading of the vote discussion and what preceded it would lead one to believe that the community was sanctioning a parallel Wiktionary for fictional-universe words.
What really concerns me about this is that no one involved in this effort seems to have bothered to check with the community at large. AFAICT no one in this effort was around when the vote took place, but yet they assumed they know what the 2 1/2 year old vote was about. This only came to light because of a reversion of an odd edit to one of the most commonly transcluded templates we have.
If the participants grasped the role of the fictional-universe effort as potentially a feeder to principal namespace (as made clear in the vote discussion) the emphasis would have been on gathering citations for the fictional-universe words that might become includable and using citation space for the citations. DCDuring TALK 02:30, 21 September 2010 (UTC)

No. I have been participating in discussions. Firstly, there is the perfect chance of when I proposed "Appendix-only terms" but nobody bothered to reply. Perhaps I wasn't persuasive enough. Yet, it has been read and shortly discussed at Na'vi. In Harry Potter terms, more effectively, I've got information about the CFI for those entries from other editors. There are also some discussions about related technical details, like this about catlangname into en-verb and this about how to name templates for appendix-only terms.

At the relevant vote and related discussions, people more than once commented on how this approach is not perfect; they also expressively talked about how it serves as an initial decision to be developed in their future. The voters have not expressed the particular ideas of specifically allowing or rejecting the current format. Which is very natural, because there were (and still are) very few entries with fictional terms to bother with.

In addition, the discussed terms are created in Wiktionary regularly by various editors. See colspan and dilithium. They indicate, fundamentally, that few or more people want them defined. Of course, we may simply discuss and vote until reaching a major decision of never defining terms strictly outside the realm of reality. However, such a decision has not ocurred yet.

The decision we have is of that vote, which says "[...] require that terms originating in fictional universes [...] may be included only in appendices of words from that universe, and not in the main dictionary space." Please note the plural "appendices"; I also recommend reading the complete text with comments. If the vote is not about what it says, then it should be rewritten and restarted. --Daniel. 03:39, 21 September 2010 (UTC)

I also support restarting of that vote. In particular, it should be explicitly specified whether the works of religious fiction (which is a fiction nonetheless, no different than Star Wars or Harry Potter) are also covered by it. I don't see how one should be forbidden to be used as a source for attestation, and the other one not. --Ivan Štambuk 17:26, 21 September 2010 (UTC)
Broadly speaking, religious narratives don't take place in fictional universes, they take place in the real one. Zeus doesn't actually exist, but he's not part of a "fictional universe". (Well, there are some fictional universes that he appears in, but Greek mythology is not one.) —RuakhTALK 17:35, 21 September 2010 (UTC)
How is Zeus not a part of a fictional universe?! Bible, Quran, Vedas etc. are full of imaginary sky-persons, obviously imagined (non-Earth) locations and events that are no different than the magic of Harry Potter (talking fire-shrubs etc.). Does "fictional" in this context mean "not having a certain number of devout followers" ? If so, see w:Jediism. Are all religious texts from the scripturs of Christianity and Scientology to shamanism and folk religion to be treated as non-fiction? I find that preposterous. --Ivan Štambuk 17:58, 21 September 2010 (UTC)
Somehow Greek mythology, Tanach, the Koran, and so on also strike me as having our universe as theirs, yet I can't (at the moment, anyway) think of a difference between them and Harry Potter: both reference real places and times and people, and both have fictional elements (well, in my own belief system Tanach is nonfiction, but that's me). Ruakh, can you justify your claim?​—msh210 (talk) 18:07, 21 September 2010 (UTC)
False religions, like false scientific theories, don't have fictional elements, but mistaken ones. Do we ask that ether (the fixed physical medium through which light travels) be cited independent of reference to the belief that light travels through a fixed physical medium? (There is probably some blurring at the edges — I think it's fair to say that some religious groups take a much stricter attitude toward accuracy than others, and I suspect that even the most devout Ancient Greek adults did not literally believe that echoes are produced by a cursed nymph — but overall, I think the distinction is clear.) —RuakhTALK 18:29, 21 September 2010 (UTC)
Ah, of course. Thanks.​—msh210 (talk) 18:35, 21 September 2010 (UTC)
What is a "false religion" ? To an atheist, every religion is "false". Yes, exactly because of your reference to ether is that we should also allow terms from fictional universes (to either fictional or real or possibly real concepts). They are words which have a meaning, and which are used in an actual language to convey information. It is not up to us to judge whether they are "worthy" of inclusion on the basis of their semantics (or spelling, register of usage, or whatever). If they pass the primary CFI (used by enough people in a certain time frame), they should be included. Millions of hours are wasted every day in various "fictional universes", and as a modern dictionary which should describe them too. --Ivan Štambuk 18:50, 21 September 2010 (UTC)
The policy requires usage independent of reference to the universe in question for the term to count as being "real" and included in the mainspace rather than in the appendix. I think if any religious text has any terms that would actually qualify for appendix inclusion (having no usage except while referencing the universe described in the original work) then having an appendix for it makes some measure of sense. --Yair rand (talk) 18:10, 21 September 2010 (UTC)
If they have no attestation except while referencing the original work, then right, they fail the independence criterion. (Though major religious works might get a bit of a pass, under the "well-known work" rule …) But if they have no attestation except while referencing [a mistaken understanding of] reality, then I don't see the problem. —RuakhTALK 18:29, 21 September 2010 (UTC)
Right.​—msh210 (talk) 18:35, 21 September 2010 (UTC)
No, it doesn't matter whether the work is original or derived. The attestation must be outside the universe of reference. So religious commentaries are also forbidden. And why should "major religions" get a special pass? They became "major" only because of the confluence of certain historical events. It doesn't matter if the word is known to 500 or 500 million people. They should all get the same treatment. Some of the vocabulary of the currently forbidden fictional universes is known to more people than most of the obscure terminology from Judaism, Islam, Hinduism or Christianity. It appears to me that you're a bit biased here, wanting to push your belief system as having precedence over others. --Ivan Štambuk 18:42, 21 September 2010 (UTC)
Re: "And why should 'major religions' get a special pass?": They don't, exactly. Major religious works do (or some do). That's a very different matter, because it's language-specific: in English, the KJV is surely a well-known work, but I doubt that any specific translation of the Qur'an is, whereas in Fus'ha, the Qur'an is obviously a well-known work, whereas I doubt that any Christian religious work is. Re: "It appears to me that you're a bit biased here, wanting to push your belief system as having precedence over others": I don't think I'm giving that appearance. (But you're obviously trolling, so I don't know why I'm bothering to reply. Ah, well.) —RuakhTALK 21:10, 21 September 2010 (UTC)
Notably, I can infer the Wiktionarian privileges of the Bible by looking Daniel up: it is defined as "The book in the Old Testament of the Bible." and "The prophet whose story is told in the Book of Daniel." There are not many works whose official subdivisions and characters merit definitions in the main namespace. --Daniel. 22:13, 21 September 2010 (UTC)
Honestly, I'm not sure that those two senses do merit inclusion (though obviously they should be mentioned in the etymology). At one point they would have been subject to the "attributive use" clause, but currently the CFI have nothing to say about them. If someone were to list them at RFD, I would definitely be curious to see the arguments for keeping them. Daniel (the Book of Daniel) is different from titles of other works, in that it's treated as a name rather than as a title (e.g., in not being italicized or underlined or anything), but is that a reason to have an entry for it? —RuakhTALK 23:53, 21 September 2010 (UTC)
If a strict and specific policy of only mentioning the Bible in the etymology section were introduced, Daniel would effectively keep the same information, as you pointed out, but other major examples like Exodus might end up being deleted. I, personally, am in favor of not preventing readers from knowing how their favorite books of the Bible are spelled in other languages. Given the overall impact of the Bible in our history and culture, querying for information about it would conceivably not be uncommon. --Daniel. 01:22, 22 September 2010 (UTC)
And the only way to do that would be to require the words originating from religious literature (is there a term for that?) to be attested in obviously non-religious usage. Which I think would exclude great deal of them. Don't get me wrong, I'm not against their inclusion, and I think it would be bizarre to forbid the Bible as an e.g. attestation for Jesus, but I want to draw attention to two things 1) the distinction between "real" and "fictional" universe doesn't make much sense when lots of humans believe that things which some would argue to be fictional are/were real (religion being the least problematic of them) 2) The bizarre requirement that these terms be attested in usage outside their primary literature, which is impossible for most of them unless they can be used metaphorically or something. Some of these fictional universes are really huge and have a large number of "followers", and while it might make sense to exclude words appearing in e.g. only one work, excluding terms which have gained such a wide audience that they even have translations (from English) to dozens of other languages, simply doesn't seem right. --Ivan Štambuk 18:36, 21 September 2010 (UTC)
Your reasoning is correct to me: "Sorting" (meaning "The annual ceremony where each new student of the fictional European school Hogwarts is assigned to one of four different Houses by the Sorting Hat.") is strictly related to a context of Harry Potter, so it is conceivably only cited in Harry Potter books, news about Harry Potter, Harry Potter fansites and other related works. Similarly, "Flood" (meaning "The flood referred to in the Book of Genesis in the Old Testament") would be similarly cited only in context of the Bible. If there is a way to discriminate between events of Harry Potter and events of the Bible, for the purpose of treating them differently in Wiktionary, then I would like to know it; otherwise, they probably should be defined and organized in equal terms. --Daniel. 11:17, 22 September 2010 (UTC)
I agree; I don't see a fundamental difference between specific senses of this kind, whether their context is modern or historical, whether they come from myths, folklore, fiction, religious works, etc. A lexicographer should be looking for the true meaning of words (preferably encompassing the entire scope of meaning the words may connote), but not the truth of the concepts those words describe. A theoretician may expound an entirely new theory and coin new terminology to describe it. As the theory becomes popular and/or built upon by others, this terminology becomes more widespread (and then arguably a more integral part of the language(s) it is used in), but it usually started in the writings of one author (perhaps with co-authors). Is this fundamentally different from Rowling coining or redefining terms to describe the fictional concepts of her works, and these terms subsequently spreading to other writers/speakers? In both cases the concepts only exist in the author's mind (and his readers' minds) and have not been proven to exist outside of this imaginary or conjectured context. In both cases the context may be extremely limited; people who use the terminology, at least at first, are usually familiar with the original work and the context is limited thereto. However, it would by unwise to open the door to any and all terminology of this kind: as with the biblical terminology, it should be included because it is in common use. I believe our normal criteria for inclusion (demanding cites from three separate works by three separate authors) quite suffices to weed out unnecessary clutter in this area. Then, with words such as sorting/Sorting, which are basically generic words that can potentially be used in myriad specific senses, these senses should of course be grouped together somehow and/or sorted by commonness/frequency of use, because we must keep Wiktionary pages useful (and easy to use). An example, for clarification of my views: “Horcrux” should be included since it has become widely known and is attestable in separate sources (see Citations:Horcrux), but as a new novel introducing a new term is published, we should not jump to include it, but rather wait and collect citations such as there are (as is already being done). Also, I must clarify that, while I don't see a fundamental (lexicographical) difference between terms originating in the Bible and in Harry Potter, one must concede that the Bible terms are ahead of them in many respects: they are more widely known (also across generations, language periods, etc.), attested earlier, more frequently and in more sources, etc., and should for those reasons alone take precedence. – Krun 14:54, 22 September 2010 (UTC)
Then, a standard of common English must be defined, since the current criteria of fiction/nonfiction and constructed/nonconstructed from WT:CFI and WT:FICTION are not clear enough for religions and other issues. However, that standard would necessarily not simultaneously encompass "the entire scope of meaning the words may connote" and be "attestable in separate sources", if separate means outside the context of any fictional universe. For example, the English adjective shiny strictly means "that shines"; but in context of Pokémon, it, differently, basically means "extremely rare and of a different color". These definitions are currently elaborated in different places: shiny and Appendix:Pokémon/shiny. --Daniel. 18:03, 22 September 2010 (UTC)
  • Religious works are not the same as Harry Potter. Religious literature is not fiction, it is non-fiction that happens to be bollocks (in my view anyway), like works on homoeopathy or dream diaries. That is very different from fiction, which deliberately sets out to invent a world which readers are never expected to believe exists. Ƿidsiþ 12:03, 23 September 2010 (UTC)
    Note: I'm not saying it's the same with respect to context or reality. I'm just saying that shouldn't matter if the term has spread to other writers. I wouldn't support inclusion of Bible terms if they weren't used anywhere else (e.g. in religious commentaries, etc.). If they weren't used by anyone else, then the Bible wouldn't be so familiar, and its terms would be obscure. As it is, they are not obscure at all, but neither are Harry Potter terms. Personally, I think it would be nice if we could also include words used only by one author in one work, but I don't think it is readily manageable; therefore, I wouldn't support it. – Krun 07:23, 24 September 2010 (UTC)
    I, particularly, like the approach of having two different pages for "shiny" as a manner of conveying the fact that there are definitions restricted to the context of Pokémon. It does not seem very different from having two definition lines in brute force, one of them being restricted to computer science. It is manageable enough for me. --Daniel. 01:34, 25 September 2010 (UTC)
This is probably horrifically politically incorrect, but I will vote for anything that reduces the inclusion of clearly recently-made-up-for-a-story things in the dictionary, e.g. anything by Rowling. Equinox 02:50, 24 September 2010 (UTC)
You can have a point of view. Who cares about political correctness? – Krun 07:23, 24 September 2010 (UTC)
Regarding "religious works", and setting aside the issue of truth or fiction for the purposes of this discussion, the Bible (Koran, etc.) is to Harry Potter what Shakespeare is to Twilight. The Bible (Koran, etc.) and the works of Shakespeare have survived as "great literature" for longer than many modern nations have existed. They have strong cultural influence, have spawned an enormous corpus of literary interpretation, and have resulted in the coinage of many idioms now part of the everyday lexicon. The Bible and Shakespeare are (technically) anthologies, rather than works created by a single author with the intention of telling a single cohesive story arc. One could argue that Shakespeare's history plays fit this "cohesive fiction" concept, except that they happen to be the plays based in historical fact, but certainly not his other plays. --EncycloPetey 06:10, 28 September 2010 (UTC)

Wikipedia phrases

Many Wikipedia pages for languages include "Example phrases" (example). Is there any way that we can integrate better with those sections? --Bequw τ 05:19, 25 September 2010 (UTC)

Well, as a specific example, I've populated Appendix:Portuguese pronouns with some example phrases where I found necessary. --Daniel. 05:49, 25 September 2010 (UTC)
In my mind, part-of-speech appendices (e.g., "English nouns", "Portuguese adverbs"..., which could be easily extended to "English suffixes", etc.) would be the most appropriate places for example phrases, to illustrate their subjects. Other intuitively obvious choice could be the AXX pages ("Wiktionary:About English", etc.); however, their primary focus should be providing information about Wiktionarian conventions for individual languages, which would be better illustrated by examples of usage of templates and MediaWiki codes. --Daniel. 06:17, 25 September 2010 (UTC)
They are choosing to illustrate grammatical points, something we rarely do.
If we had a phrasebook Appendix structure for each language, presumably with content derived from our translations of the individual phrases, we might be able to offer massive supplementation of such sections, thereby helping users and increasing users. Clearly it needs to be done only with languages where we have phrasebook coverage that is competitive, not an embarrassment. We seem far removed from achieving that AFAICT.
I assume we already provide links to Wiktionary somewhere on the page, preferrably in more than one section.
We could provide interwiki links, using [[wikt:word]], to each individual word (or first occurrence in each section) for which we have an entry and add the ones missing. DCDuring TALK 10:35, 25 September 2010 (UTC)

CJKV Characters in translations

Before adding a request to WT:TODO, it seemed prudent to bring this here. We have CJKV Characters (or CJKV characters) in translation tables. While I see why this is, we translate into languages not scripts. If the entry in question has Mandarin, Cantonese, Japanese (etc.) sections, add them separately, not. It's not really any better than adding a Grek translations instead of Greek and Ancient Greek. Mglovesfun (talk) 10:31, 25 September 2010 (UTC)

I should probably say that CJKV = Chinese, Japanese, Korean and Vietnamese characters (took me months to figure it out). Mglovesfun (talk) 10:35, 25 September 2010 (UTC)
As false friends do exist among Chinese, Japanese, and Korean, like among languages in Latin scripts, removing CJKV Characters when translating from English should be considered. Just translate from English to Chinese, Japanese, Korean and Vietnamese.--Jusjih 03:29, 26 September 2010 (UTC)
As a Mandarin/Korean contributor I too have no idea why we keep those CJKV characters. ---> Tooironic 22:45, 26 September 2010 (UTC)
Mainly because it would take someone to actively get rid of them. I'm specifically talking about translation tables, not our translingual entries, which at times do seem a little odd to me. Mglovesfun (talk) 23:22, 26 September 2010 (UTC)
I answer briefly. I have some exposure to all 4 CJKV languages but focusing on Chinese and Japanese. The reason behind is to show SINGLE characters, shared by all 4 languages. Usually a historic link, a character, which may no longer be a current modern word but understood by users of these languages. Vietnamese users may not know or remember the character, Chinese characters are not taught in Vietnam but many characters have Vietnamese readings. More information is in the Unihan dictionary. A typical example is , a CJKV character for water. Vietnamese word for water is nước, which is NOT of Chinese origin but Vietnamese reading of is "thuỷ", it is used in Sino-Vietnamese components, like "thuỷ tinh" - glass, crystal. Finding origins of Sino-Japanese, Sino-Korean and Sino-Vietnamese words or names can be challenging as the pronunciation and spelling changed greatly, has many variants and can be very unsimilar to the current Mandarin or Middle or Ancient Chinese. --Anatoli 04:43, 28 September 2010 (UTC)
I don't support getting rid of CJKV translations but some require clean up. --Anatoli 04:46, 28 September 2010 (UTC)
There are words like portar used in many different Romance languages. We currently have 4 pretty much identical entries in Catalan, Galician, Portuguese and Spanish, and I suspect the Asturian and Aragonese would also be identical. So, how would we enter this into a translation table? Romance? Latin script? What's the argument against splitting them into the individual languages, such as Mandarin, Wu, Cantonese, Japonese, Korean (etc.). Mglovesfun (talk) 13:33, 28 September 2010 (UTC)
If you look at individual (single) character entries, they have separate CJKV sections, showing the readings (as a minimum) of the hanzi in different languages. If the entry is complete, it will have at least 4. The argument is - Vietnamese and modern Korean don't use Chinese characters, Japanese has variants or may fall on Kana for many words, Chinese has traditional and simplified, modern Mandarin may differ greatly from Middle Chinese or the MOST COMMON AND UNIVERSALLY KNOWN character. (I'm not shouting, just trying to highlight the important bits).
Example (this entry's translations needs update):

to eat (verb)

  • Chinese Mandarin (modern): (chī) (modern Mandarin uses a different character), (shí) (rare, as a component)
  • Cantonese: (sik6)
  • Japanese: 食べる (たべる, taberu) ( (shoku) is used as a component only)
  • Korean: 먹다 (meokda) (식 (sik) is used as a component only, derived from CJKV )
  • Vietnamese: ăn (this word is not Sino-Vietnamese, so it's not derived from CJKV)
  • CJKV characters:
The CJKV translations helps to determine the common link between CJKV, it means to "eat" to all educated speakers or CJKV linguists, even if the modern word may be different as above. Character doesn't have the reading ăn in Vietnamese but thực, tự, and some others not listed, e.g., xin. Koreans use 먹다 (meokda) as verb, but the word 식당 (sikdang) - "canteen" uses Sino-Korean elements (CJKV) and the word can be written as 食堂, which is immediately understood by Chinese or Japanese readers. I don't know the language code for CJKV, perhaps it doesn't exist. CJKV serves sort of like Old Church Slavonic to some Slavic languages. --Anatoli 00:01, 29 September 2010 (UTC)
Then the inter-language relational information should be shown in the etymology sections of the separate CJKV language entries. They can be classified as cognates or, if correct, etymons, or even vaguely as "related to:". There's no CJKV "language". --Bequw τ 03:34, 29 September 2010 (UTC)
I did my best to explain but it obviously didn't work. The words that are related, may already have the etymology sections. For CJKV languages is a Sinitic, "translingual" symbol (for East Asia only), meaning "to eat", even if they don't use as a verb. Only in Cantonese and some other Chinese dialects, it's the current translation of to eat. Korean 먹다 (meokda) and Vietnamese ăn have nothing to do with but both have a cultural and linguistic link to this character. I suggest to make a code that would nest CJKV under Chinese, even if CJKV doesn't exist as a separate language because it is an important piece of information for CJKV language learners. --Anatoli 04:54, 29 September 2010 (UTC)
The etymology of tête is an important piece of information, but I wouldn't put it in a translation table. Translation tables are just that - for translations. Mglovesfun (talk) 08:10, 29 September 2010 (UTC)

I give up. --Anatoli 11:20, 29 September 2010 (UTC)

What is Usenet?

That's it, really. Mglovesfun (talk) 11:39, 26 September 2010 (UTC)

See [[w:Usenet]]. :-)   —RuakhTALK 14:35, 26 September 2010 (UTC)
Short version: it's the world's biggest forum, made before forums were invented. Or was, anyway. —CodeCat 15:32, 26 September 2010 (UTC)
And, more practically from the PoV of WT:RFV#Attestation, it accessible as a subset of Google Groups. The form of the names and urls of the component forums is the clue to distinguish the usenet forums from Google's own. I hope that is available on WP. DCDuring TALK 15:47, 26 September 2010 (UTC)
The names are described in w:Big 8 (Usenet) and leading examples given in w:List of newsgroups. DCDuring TALK 16:00, 26 September 2010 (UTC)
Why exactly do we consider Usenet durably archived, but other sources not? It seems a bit presumptuous to assume that some websites will disappear, while some will not. I'd rather we word it more like Wikipedia's "reliable third party sources", AFAICT that's what we mean when we say durable. So it's a total airball, like most of our policies which are years out of date with our actual practise, more in some cases. Mglovesfun (talk) 17:53, 26 September 2010 (UTC)
Because the institutional arrangement for keeping its plain-text content differentiates it from others. (It does not durably archive binary files.) We should know that many human institutions are fragile, eg the Library at Alexandria, printed works on high-acid paper; subject to amendation by the owner or others, eg most Internet sites; or not-yet-proven-durable and/or hard to access, eg the purportedly durable snapshots of the Web's content from time to time. DCDuring TALK 18:42, 26 September 2010 (UTC)
Stone tablets are durable. Should we accept only stone tablets from now on? -- Prince Kassad 05:37, 27 September 2010 (UTC)
I'm open to inscribed pottery and other ceramics as well, even shards thereof. There is an accessibility question. We conventionally accept only printed reports of such inscriptions.
It is really a question of which attributes we need. I think we should take into account the possibility that the few entities that are taking snapshots of the web may go bankrupt or suffer catastrophic hardware failure. If many well-funded institutions worldwide had copies of periodic snapshots of the web that were on EMP-protected media, we could feel comfortable relying on them. The owner-/author-alterability of specific web content on normal private sites seems problematic to me. OTOH, we can periodically check the validity of such attestation references by spider and then enjoy countless hours of amusement updating or replacing the attestations or reopening RfVs. As I think Ivan has said, we have all the time in the world. DCDuring TALK 12:59, 27 September 2010 (UTC)
@Mglovesfun: Speaking as one of the main citers of RFV'd entries, and as the main closer of RFV discussions (I think), I can say that this policy is not out of date with our actual practice. I, for one, hew very closely to that policy. I do not take it to mean anything like "reliable third party sources"; there are a number of blogs that I consider to be much more reliable than most Usenet postings, but I don't consider reliability to be directly relevant. (Reliability is indirectly relevant, however, when we rely on intermediate sources to accurately reflect ultimate sources. For example, most book of our book quotes don't actually come from books, but rather from Web-sites that scan and/or transcribe books. The reliability of those sites is relevant; I don't really care about the text in some image on books.google.com, except insofar as that image is in fact a scan from the book that Google Books says it is.) —RuakhTALK 13:04, 27 September 2010 (UTC)

A phrasebook entry

I have created a phrasebook entry with my own choice of the city - I live in Melbourne. The discussion is here. I have already added a bunch of translations, would be a bit sorry to lose but anyway, please criticise and suggest something. --Anatoli 04:27, 28 September 2010 (UTC)

Are there languages for which "I live in (a city)" differs from "I live in (a province, a country, the mountains, the desert, the countryside, the suburbs)". I won't ask about "I live in a (house, apartment, mansion)" where the likelihood of a different construction seems higher. DCDuring TALK 04:42, 28 September 2010 (UTC)
There would be but not much. E.g. Russian interchanges в and на, often dependent on what the object is. Interestingly, with Ukraine, it has become a political issue, one camp says you should say "я живу в Украине", the other says "я живу на Украине" (this is not too important for this discussion). I know what you mean, perhaps, just "I live in..." followed by grammatical notes. I didn't do it to demonstrate the grammar with the example. --Anatoli 04:53, 28 September 2010 (UTC)
My preference would be Appendix:I live in.... Mglovesfun (talk) 13:29, 28 September 2010 (UTC)
I think French makes a distinction: j'habite en France, j'habite à Cannes. Equinox 19:51, 28 September 2010 (UTC)
It's a bit more complicated than that, but yes, that's part of it. (Certain masculine countries also use à, as in « j'habite au Canada » and « j'habite aux États-Unis ».) —RuakhTALK 20:28, 28 September 2010 (UTC)
It's also too common in French to omit prepositions altogether with masculine city name not requiring the indefinite article, e.g. "j'habite Paris" (Equinox put prepositions in brackets). Mglovesfun, I thought of the Appendix (like Appendix:I am (ethnicity)) but they are less visible and the table structure doesn't let add translations easily. --Anatoli 23:35, 28 September 2010 (UTC)

Multiple names of families

While I was researching to provide codes for codeless families, I made this small list of families that are recognizable by more than one name.

  • alv-wat -> Atlantic, West Atlantic
  • aql -> Algic, Algonquian-Ritwan, Algonquian-Wiyot-Yurok
  • aus-nep -> Northeast Pama-Nyungan, Pama-Maric
  • auf -> Arahuan, Arauan, Arauán, Arawa, Arawan, Arawán
  • awd-gua -> Guahiban, Guahiboan, Guajiboan, Wahívoan
  • awd-mai -> Arahuacan, Arawakan, Maipuran, Maipúre, Maipurean, Maipuran Arawakan, Maipureano
  • fiu -> Finno-Ugrian, Finno-Ugric
  • grk -> Greek, Hellenic
  • hmx -> Hmong-Mien, Miao-Yao
  • ine -> Indo-European, Indo-Germanic
  • ine-toc -> Tocharian, Tokharian
  • jpx -> Japanese, Japanese-Ryukyuan, Japonic
  • khi -> Khoesaan, Khoesan, Khoisan
  • nic -> Niger-Congo, Niger-Kordofanian
  • nic-gur -> Gur, Voltaic
  • qfa-taq -> Daic, Kadai, Kra-Dai, Tai-Kadai
  • qfa-min -> Misuluan, Misumalpa, Misumalpan
  • qfa-mje -> Macro-Gê, Macro-Jê
  • qfa-pat -> Pano-Tacana, Pano-Tacanan, Pano-Takana, Páno-Takána, Pano-Takánan
  • qfa-yen -> Yeniseian, Yeniseic, Yenisei-Ostyak
  • qfa-yuk -> Jukagir, Yukaghir, Yukagir
  • roa -> Latin, Neolatin, Neo-Latin, Romance, Romanic
  • sem-osa -> Epigraphic South Arabian, Old South Arabian, Sayhadic
  • sem-sar -> Modern South Arabian, South Arabian
  • tup -> Tupi, Tupian
  • tuw -> Manchu-Tungus, Tungus, Tungusic
  • zhx -> Chinese, Sinitic

I noticed that, for at least few families, the chosen name is not strictly the same in all places: for instance, {{etyl:jpx}} was displaying "Japanese", but there was no Category:Japanese languages, only Category:Japonic languages. Then I subsequently edited {{etyl:jpx}} with the text "Japonic".

I believe a list like this can help the organization of families, their recognition and hopefully the choice of a possible consistent and unique name for each one. So, I'd like this list to be placed elsewhere to facilitate findability; and expanded if necessary. Perhaps the best choice would be adapting it into WT:LANGNAME, together with the list of languages. --Daniel. 05:07, 28 September 2010 (UTC)

The reason certain names are preferred over other is, in some cases, because the language family shares a name with one of its member languages. So, "Greek", "Chinese", "Japanese", "Tupi", and a number of other alternatives will not ever be used to refer to a family on Wiktionary (except in a definition of the word itself), because they name a language as well as a language family. If you intend to create a list, this problem should be clarified for those language family names that have this feature. --EncycloPetey 05:57, 28 September 2010 (UTC)
WT:LANGNAME includes "Modern English", "Croato-Serbian", "Kalaallisut" and other language names that, given the current Wiktionarian conventions, will not ever be used. Similarly, "Greek" and "Latin" may be listed as family names to be always avoided. I agree that this issue should be clarified at the list.
Notably, {{etyl:grk}} was returning "Greek", not "Hellenic", thus affecting the entry Медуза. I edited this template recently too, making sure it would not conflict with the name of the language, as you described.
Even more notably, Chinese seems to be a perpetual confusion in the set of families, languages and dialects; to me, this phenomenon was being further emphasized by {{etyl:zhx}} displaying Chinese, not Sinitic, in more than twenty pages. For that matter, I have also edited "Chinese" to "Sinitic" for consistency with this issue of language/family. However, if I remember correctly, there are greater controversies that include the proposal of using always "Chinese" as family after hopefully cleaning up all the Sinitic languages and deleting Category:Chinese language in the process. --Daniel. 06:48, 28 September 2010 (UTC)
If you list the names in a /names template (eg {{yue/names}} but in the etyl: space) then they will be updated in WT:Etymology/language templates. --Bequw τ 03:38, 29 September 2010 (UTC)
It's good to know that WT:Etymology/language templates can be updated this way.
Yet, if no one objects, I'm going to list the multiple lists of families at WT:LANGNAME too, since WT:Etymology/language templates is so long it won't even load completely on my computer.
As a side note, WT:Etymology/language templates is incomplete, as it does not show up the various names of {{bzs}}. --Daniel. 11:45, 29 September 2010 (UTC)
Thanks for the catch, I've fixed that bug. Maybe I should make a smaller list of just those codes currently in use? --Bequw τ 02:43, 1 October 2010 (UTC)
Thanks for updating the list of templates and thanks for your suggestion. I personally don't need a smaller list of just those codes currently in use, because I can use the language categories for these little queries.
A page like Wiktionary:Etymology/language templates (or Wiktionary:Index to templates/languages, it's tricky to tell the difference) works better for me as a full list. (because I hardly but effectively made it load completely in the end) --Daniel. 04:13, 1 October 2010 (UTC)
The list of families with multiple names has been sucessfully implemented at WT:LANGNAME, together with the guideline of avoiding certain names that would conflict with languages. --Daniel. 20:01, 2 October 2010 (UTC)

Pronunciation format

Is there any reason that many pronunciation sections use *: or ** for rhymes and/or IPA? I think it looks worse than just having a bulleted list. I've been converting these to a simple * for ages now (probably back to 2009) and nobody's complained. WT:ELE doesn't mention the practice explicitly, but does use *: in an example. Mglovesfun (talk) 13:27, 28 September 2010 (UTC)

I think a bot or template did it that way. It may still. DCDuring TALK 14:32, 28 September 2010 (UTC)
Some entries should use *:, such as when there is a separate IPA transcription given for both UK and US pronunciations and there are sound files for both regions. In that case, the sound files are indented under their respective IPA transcriptions for a bit of clarity. But, in general, I agree that the unindented bullted list looks better for nearly all situations, and for the vast number of cases where such indentation exists in a Pronunciation section, it serves no purpose. --EncycloPetey 16:34, 30 September 2010 (UTC)

Language name with diacritics

Our policy about language names says "The language name chosen should avoid diacritics [...] when possible."

Then, should Mocoví be called Moqoyt? See Template:moc.

And, should Kadiwéu be called Caduveo? See Template:kbc.

Equally, Franco-Provençal (Template:frp) would be renamed to Arpetan.

Alternatively, maybe the piece of text from the policy should be rewritten, since Arpetan seems too obscure for me, personally. --Daniel. 16:06, 28 September 2010 (UTC)

I think it's okay to have diacritics in language names, especially since we already allow the African click sounds in language names (cf. ǂKxʼauǁʼein). We should, however, avoid the typographical apostrophe, which some language names currently have. -- Prince Kassad 16:41, 28 September 2010 (UTC)
As for your particular example, I would prefer "Koko language", instead of "ǂKxʼauǁʼein language", for simplicity's sake. --Daniel. 12:10, 29 September 2010 (UTC)
@Daniel.: Good point. How about "All else being equal, the language name chosen should avoid diacritics and parentheses"? I mean, it's always possible to avoid diacritics and parentheses, it's just sometimes undesirable. —RuakhTALK 17:15, 28 September 2010 (UTC)
Yes I'd change the text. For languages that are difficult to write (for any reason) I use {{subst:frp}} (a specific example). I'd imagine this is just one of our many, many policies that hasn't been updated in some time. Mglovesfun (talk) 08:48, 29 September 2010 (UTC)
The suggestion above "All else being equal, the language name chosen should avoid diacritics and parentheses" is, to me, a little subjective, because it is devoid of other instructions, which would define equality. I propose "When there are two or more possible names for a single language, only one name should be chosen by consensus. Common guidelines are to avoid abbreviations, the words "Modern" and "Standard", diacritics and parentheses when possible." --Daniel. 12:10, 29 September 2010 (UTC)
That sounds good. —RuakhTALK 16:56, 30 September 2010 (UTC)
Then I have implemented the text. It looks better now. --Daniel. 19:59, 2 October 2010 (UTC)

Language categories

Since we have nothing which explains the system used for language categories, I created a stub at Wiktionary:Language categories. It currently mostly covers the policy considerations, but should be expanded to contain some explanations too. -- Prince Kassad 14:26, 30 September 2010 (UTC)

Good. In my opinion, Wiktionary:Language categories looks much better than the supposedly more comprehensive policy, Wiktionary:Categorization. Thanks. --Daniel. 04:03, 1 October 2010 (UTC)

"There are new messages for you"

The text above appears at the top of my watchlist regularly.

When this happens, I do not get a link to my talk page, nor to any AI-based omniscient program that only recognizes comments directed to me. I, instead, see a boxed, conspicuous link to Special:NewMessages, effectively warning me that certain "Thread" pages have been created. Most certainly these aren't messages for me.

Well, I already have learned that "There are new messages for you" is fundamentally a lie, so I just ignore this fact when I need to. However, I easily assume that other people often see this box as well, so I recommend replacing that text with something less misleading, such as "There are new LiquidThreads messages." --Daniel. 17:15, 30 September 2010 (UTC)