Wiktionary:Beer parlour/2011/June

From Wiktionary, the free dictionary
Jump to navigation Jump to search
This is an archive page that has been kept for historical purposes. The conversations on this page are no longer live.
Beer parlour archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002
December


Category:All topics - is overrun by other languages e.g. es:All topics, fr:All topics

Does anyone actually use Category:All topics? I find there are hundreds of other language entries there, with just a few English language categories. For example, here are the entries under G:

[+] ga:All topics (27 c, 0 e) [+] gaa:All topics (1 c, 0 e) [+] gag:All topics (6 c, 0 e) [+] gd:All topics (24 c, 0 e) [+] gem:All topics (17 c, 0 e) [+] Geography (241 c, 114 e) [+] gil:All topics (3 c, 0 e) [+] gl:All topics (23 c, 0 e) [+] gmh:All topics (13 c, 0 e) [+] gmy:All topics (2 c, 0 e) [+] gn:All topics (7 c, 0 e) [+] gni:All topics (12 c, 0 e) [+] goh:All topics (12 c, 0 e) [+] got:All topics (7 c, 0 e) [+] grc:All topics (27 c, 0 e) [+] gsw:All topics (6 c, 0 e) [+] gu:All topics (10 c, 0 e) [+] gul:All topics (1 c, 0 e) [+] gv:All topics (23 c, 0 e)

To sift through all that to find Category:Geography is a test of stamina (or boredom). To try to address this, I created a new category Category:All topics (other languages) and put the first entry Category:aa:All topics there, along with a note in Category talk:All topics to see if anyone thought this was a good idea. But I see Category:All topics (other languages) has already been deleted without any mention in Category talk:All topics or on my talk page. What does everyone think? (And yes, I could have brought it up here first, but I wanted to demonstrate the concept). Cheers, Facts707 06:11, 1 June 2011 (UTC)[reply]

"I find there are hundreds of other language entries there, with just a few English language categories." can be said of virtually any topical category. --Daniel 06:22, 1 June 2011 (UTC)[reply]
There is a vote going on right now to fix this to a degree: Wiktionary:Votes/pl-2011-05/Add en: to English topical categories, part 2. —CodeCat 09:59, 1 June 2011 (UTC)[reply]

Image Captions

Do you have a policy like on wikipedia regarding the verboten of periods at the end of sentence fragments in captions of images(/files)?205.206.8.197 10:02, 1 June 2011 (UTC)[reply]

Deleting empty categories

While we have a speedy deletion summary 'empty category', there hasn't been much debate over it. Special:UnusedCategories gives a pretty up-to-date list of these, a couple of hundred of them. Clearly, clear to me anyway, we don't want categories like Category:French nouns lacking gender deleted every time it is empty. What about Category:Danish colloquialisms? It's empty right now, but it's a valid name, and if deleted, it has to be restored if used. Or recreated. And only admins can restore categories, other users can recreate them but cannot restore the original page history. Thoughts? --Mglovesfun (talk) 12:23, 1 June 2011 (UTC)[reply]

I only delete categories that have obsolete names and only if they are empty. —CodeCat 13:40, 1 June 2011 (UTC)[reply]
Links to categories attract readers who may think that there is something to see and navigate. Empty categories are blue links that lead to nothing.
Category:Kabuverdianu language has only five categories for its few contents, but one hundred empty categories can be created for that language. If we do create one hundred empty categories for that language, then finding actual contents would become far more difficult. --Daniel 18:48, 1 June 2011 (UTC)[reply]
I only create categories that actually have something in them, I don't create them just to 'fill up the tree'. —CodeCat 19:24, 1 June 2011 (UTC)[reply]
I never accused you of "filling up the tree". I just mentioned a hypothetical category filled as an example. --Daniel 20:12, 1 June 2011 (UTC)[reply]
Personally, I have nothing against empty categories. If someone just goes out to create a bunch of them for no reason, though, I'd consider it disruptive, and this contributor would receive a warning and/or a short block. -- Prince Kassad 02:04, 2 June 2011 (UTC)[reply]

Russenorsk

Russenorsk is an extinct pidgin of Norwegian and Russian, and as far as I can tell filman is the only term we currently have in it. I'm not sure what our policy is, but I noticed we don't have a language code for it, so we can't make categories for it either. Does anyone know what we could do? —CodeCat 17:29, 1 June 2011 (UTC)[reply]

I think ISO does not have a code for Russenorsk, but we are a step ahead. If you check Category:Russenorsk language, you'll notice the "crp-rsn" there. --Daniel 17:33, 1 June 2011 (UTC)[reply]
I can never quite find a list of all the codes we use on Wiktionary. Do you know if there is one? —CodeCat 21:34, 1 June 2011 (UTC)[reply]
If you want to know the code of any language in particular, just check its top-level category. Category:English language displays en and Category:Portuguese language displays pt. Wiktionary:Languages contains (or should contain) a list of all language codes for languages without ISO codes. I think Wiktionary:Index to templates/languages is a full list of language templates, but I wouldn't know that; it's too big to be opened on my computer. --Daniel 21:47, 1 June 2011 (UTC)[reply]
Not all languages have a top-level category, though. —CodeCat 22:07, 1 June 2011 (UTC)[reply]
Russenorsk does. When a language doesn't have a top-level category, I suggest simply checking ISO. If ISO doesn't help, it's time to create a new code. --Daniel 22:28, 1 June 2011 (UTC)[reply]
What languages have entries, but no top-level category? -- Prince Kassad 22:35, 1 June 2011 (UTC)[reply]
Naturally any language whose entries are created first. As far as I know, the last one was Category:Kuanua language, created yesterday, five months later than the Kuanua entry iau. --Daniel 14:04, 2 June 2011 (UTC)[reply]

A lover of what is past

I would like a word for "a lover of what is past", i.e. somebody who rejects what is modern in favour of older things. I doubt any such word exists, so I am happy to coin a neologism. I'm thinking it might be something like aoristophile for example, but nobody taught me Classics. Can someone suggest something reasonable? Equinox 00:23, 2 June 2011 (UTC)[reply]

Amish? conservative? —CodeCat 00:36, 2 June 2011 (UTC)[reply]
Luddite perhaps? — lexicógrafa | háblame01:41, 2 June 2011 (UTC)[reply]
google books:"lover of the past", while not finding quite the word you want (antiquarian not being quite right, IMHO), provides lots of fodder for possible eponyms. Bedist, perhaps? Oddly, no one seems to have described Carlyle that way, but even so I think Carlylist could be a good one. Annoyingly, neither of those names really lends itself to an -ism, but what can you do? (By the way, if you want to stick to Classical roots, an alternative to -phile (lover) might be -later (worshiper), depending on the tone you aim to strike.) —RuakhTALK 02:01, 2 June 2011 (UTC)[reply]
How about archaist or archaeolater? DCDuring TALK 04:55, 2 June 2011 (UTC)[reply]
¶ Why is this topic here? What does this have to do with Wiktionary or its policies? --Pilcrow 12:13, 7 June 2011 (UTC)[reply]
Because Equinox (talkcontribs) put it in the wrong forum, and it wasn't a big enough deal for anyone to speak up. *shrug* —RuakhTALK 14:34, 7 June 2011 (UTC)[reply]

Reverse-mapping of language templates

Right now we have language templates to turn codes into words. But as far as we know we don't have any templates to do the reverse, to turn a name into a code. I think something like that could be useful for maintenance and such. I would like to create a template called {{langcode2name}} or something similar, with one subpage for each English name of a language. So {{langcode2name/English}} would contain 'en'. —CodeCat 13:20, 2 June 2011 (UTC)[reply]

This has been proposed quite often before, but such proposals always failed due to technical limitations. -- Prince Kassad 13:22, 2 June 2011 (UTC)[reply]
Which limitations are those? —CodeCat 13:28, 2 June 2011 (UTC)[reply]
User:MglovesfunBot/switch, but Prince Kassad is right, use with caution. --Mglovesfun (talk) 13:38, 2 June 2011 (UTC)[reply]
I started a bunch of templates that could be used for that, beyond just English names of a language, and they wound up going unused and deleted. DAVilla 18:46, 4 June 2011 (UTC)[reply]
The template Mglovesfun gave is rather slow because it contains a very large switch statement. It would be faster to use subpages. I created {{langrev}} and {{langrev/English}} which seems to work. If you call {{langrev|English}} it returns en and otherwise nothing. I would like to convert Mglovesfun's template to this, but I don't know if everyone would agree with me creating thousands of subtemplates, so I'm asking now. —CodeCat 14:44, 5 June 2011 (UTC)[reply]
Since there were no objections I've now created the remaining subtemplates, based on Mglovesfun's list. —CodeCat 14:45, 7 June 2011 (UTC)[reply]

Unattested SI unit entries

A while back we had some deletion discussions concluding that we should not have entries on names of prefixed SI units that are only found in publications listing names of prefixed SI units, and not 'in the wild' (e.g. yottakelvin and zeptocandela). I propose to redirect all such terms (excluding the ones that really are attestable such as attometer) to an appendix listing all of the possible unit/scale combinations and their respective symbols. Does this sound like a workable plan? bd2412 T 20:33, 2 June 2011 (UTC)[reply]

If by "redirect" you mean "soft-redirect using {{only in}}", then, yeah, sounds good AFAI'm concerned.​—msh210 (talk) 21:04, 2 June 2011 (UTC)[reply]
Is it even necessary to list combinations at all? They are pretty much SOP... —CodeCat 00:22, 3 June 2011 (UTC)[reply]
I agree (with CodeCat). —RuakhTALK 01:47, 3 June 2011 (UTC)[reply]
I don't see how these can be SOP when they are single, unbroken, unhyphenated words. Conversely, kilometer and milligram are exactly as SOP as any of these combinations, and I doubt anyone would support deleting those as SOP. bd2412 T 14:36, 3 June 2011 (UTC)[reply]
Right, and I'm not supporting any sort of "deletion"; I'm just saying that the appendix doesn't really need to list all possible combinations. They're all SOP, so it just needs to explain the Ps. —RuakhTALK 15:18, 3 June 2011 (UTC)[reply]
My point is that we generally don't treat unbroken words as SOP. Unattestable, perhaps, but not SOP any more than any unbroken word combining a prefix and a suffix. bd2412 T 15:51, 3 June 2011 (UTC)[reply]
SOP — "sum of parts" — just means that anyone who understands the parts will understand their sum. We generally have entries for sum-of-parts words, but (IMHO) only because a reader would have no way of knowing what the parts are; but that reason doesn't apply here, because the appendix makes clear what the parts are. —RuakhTALK 16:57, 3 June 2011 (UTC)[reply]
I see your point. Of course, if the appendix lists all the prefixes, and all the suffixes, and says to the reader, you can take anything from column A and prepend it to anything from column B, there is no reason to show all the resulting combinations. If those combinations are soft-redirected to the appendix, there is similarly no concern about what searches will produce. bd2412 T 17:35, 3 June 2011 (UTC)[reply]
Also, to Msh210, yes, an {{only in}} soft redirect would be fine by me. bd2412 T 14:40, 3 June 2011 (UTC)[reply]
Who would be looking for these words? And why choose just SI units while neglecting the whole lot of possible but unattested combinations of prefixes and words? I can think of unrewind, vice-girlfriend and nephrodonation. (I checked Google Groups and Google Books to make sure they aren't attestable; uncontradict, vice-husband and hemodonation are barely attestable, though.) --Daniel 02:07, 3 June 2011 (UTC)[reply]
Regarding why just SI units, let me paraphrase one of my comments from here: It's because we know for sure what all of the SI units would mean, even if we don't know that they do mean that. Even if tomorrow morning, everyone were to start using the word zeptogram with some sort of metaphorical sense, we could still provide the unattested literal sense and know that we were "right". This is because the system is used in a consistent way by actual people, and the gaps are basically real words that just so happen not to have been used three times in durably archived media. It's like how we include full conjugations of Spanish verbs, even ones that are so rare that some individual forms might not actually meet the CFI. —RuakhTALK 02:35, 3 June 2011 (UTC)[reply]
I would allow the inclusion of all words with an official status (such as these ones, but also words recommended by official language bodies), even when no actual use (or only 1 or 2) can be found. When appropriate, the pages may explain that no use or almost no use has been found, but that they are standard unit names, or mention which organization promotes them. This would provide some useful information to possible readers. Of course, these pages wil not be read much, but this is not a problem at all. It's always better to provide information. Lmaltier 20:13, 3 June 2011 (UTC)[reply]
The noninclusion of these terms as entries has already been decided by the community. The question is what, if anything, to do about them now. bd2412 T 02:30, 4 June 2011 (UTC)[reply]
Inclusion in an appendix with soft redirects using {{only in}} seems quite appropriate. DCDuring TALK 03:40, 4 June 2011 (UTC)[reply]
I will make it so tomorrow. Cheers! bd2412 T 03:57, 4 June 2011 (UTC)[reply]
I don't want to reinvent the wheel, and combining the table at SI prefix#List of SI prefixes with the explanations at SI base unit, SI derived unit, and with the tables assembled by Dcljr at User:Dcljr/Units, already represents pretty much everything I would envision in an appendix. Any thoughts on this proposition? bd2412 T 18:35, 4 June 2011 (UTC)[reply]
I have made a mock-up at User:BD2412/Appendix:SI units. Cheers! bd2412 T 19:19, 4 June 2011 (UTC)[reply]
Oops, turns out we've had an Appendix:SI units sitting there for years. I touched it up a bit. bd2412 T 22:44, 4 June 2011 (UTC)[reply]

Gender-specific babel userboxes

I noticed that in many languages our userboxes for languages are written from a male perspective, which kind of bothered me. So I've now made it so that you can adjust the templates to display a different message depending on the gender you set in your preferences. I've made those changes to a few languages that I was comfortable fixing (French, Catalan, Spanish, Dutch, German) but there are lots more out there that I would have no idea how to fix. So this is a kind of request to please help update the templates of the languages you know. Thank you! —CodeCat 14:46, 3 June 2011 (UTC)[reply]

I've updated the Hebrew ones. :-)   —RuakhTALK 15:15, 3 June 2011 (UTC)[reply]

Language-specific inclusion

Given the current focus on discussing possible language-specific attestation rules, I thought it would be appropriate to create Wiktionary:Criteria for inclusion/Language-specific with some content. And I created it. Feel free to improve it. --Daniel 16:20, 3 June 2011 (UTC)[reply]

I think it's a bad idea to put all of it on one page. It could become very long that way. Maybe it would better go on each language's 'about' page? —CodeCat 16:36, 3 June 2011 (UTC)[reply]
I suggest using the new page as a list for ease of comparison rather than as a replacement of "about" pages. If the list is expected to become very long, then it is an additional reason to have it, because searching for individual attestation rules at every "about" page would be troublesome.
Anyway, do you have any idea how it could become very long? I'm curious. --Daniel 16:40, 3 June 2011 (UTC)[reply]
Because we have many languages? —CodeCat 16:46, 3 June 2011 (UTC)[reply]
Having many languages does not necessarily leads to having a long list of language-specific rules. For starters, we have only three listed rules, which make a very short page. When this page grows, we can choose among countless possibilities of presentation of contents. I'd probably suggest simply organizing languages by types of rules: We could create something like a "List of languages that allow otherwise unattestable romanizations" and a "List of forbidden characters by language". Splitting the list into various pages will probably be unnecessary in the foreseeable future, but it's always an alternative. --Daniel 17:07, 3 June 2011 (UTC)[reply]
Sorting by types of rules was actually what I had in mind when I started the vote. It is a good system that works even if many languages are involved. -- Prince Kassad 04:51, 4 June 2011 (UTC)[reply]

Deprecating 'plurals' categories in favour of '(POS) plural forms'

I know there has been some debate about the plurals categories. Some like them, some prefer we use 'noun forms'. This isn't really about that, it's just about renaming categories that are ambiguous with respect to their part of speech. In English, plurals can only contain nouns, but many languages like most of the Romance languages have plural nouns and adjectives. So I would like to deprecate 'plurals' and suggest that the entries in those categories be moved to '(part of speech) plural forms'. This would make languages more consistent without forcing languages without cases to adopt 'noun forms' as their category for plurals. I have already done this for Dutch. —CodeCat 11:08, 4 June 2011 (UTC)[reply]

English proper nouns can have plurals - Johns, Janes, Jacks etc. Mglovesfun (talk) 13:12, 5 June 2011 (UTC)[reply]
English pronouns, too. --Daniel 13:14, 5 June 2011 (UTC)[reply]
Then use 'English proper noun plural forms' and 'English pronoun forms'? —CodeCat 13:16, 5 June 2011 (UTC)[reply]
The fact that there are plural pronouns just serves as a good example of "Category:English plurals" being ambiguous. Probably "English pronoun forms" would be unwanted, because plural pronouns are just individual words, rather than forms of singular ones.
Yes, "English proper noun plural forms" is a good name. Another good name would be "Plurals of English proper nouns". --Daniel 13:23, 5 June 2011 (UTC)[reply]
Is it really necessary to categorise plurals of proper nouns any differently from those of regular nouns? I don't know any language where that distinction would really be meaningful. —CodeCat 13:32, 5 June 2011 (UTC)[reply]

Please help adding affix categories!

I've been working on reducing the amount of wanted categories, which is going well. However, most of the wanted categories (about half) seem to be categories of affixes. I could easily add all of those with a bot, but I'm not sure if all of them can actually be considered proper affixes in their languages. Some might have been added with {{prefix}} or {{suffix}} when {{compound}} or something else might have been more appropriate. I can't just create all of them blindly with a bot. So I would like to ask everyone here to help tackle this list and bring it down to a more manageable size. Down to nothing if possible!

The list of categories can be found here. You can create those categories in any way you like, but be sure to first check if they should actually be created. We don't want something like 'English terms prefixed with bread-'! Once you've created them and the links have become blue, or if you decided that they shouldn't be created and removed the categories from the entries, please remove them from the list if you can. Thank you! —CodeCat 22:23, 4 June 2011 (UTC)[reply]

Help us help you. It would help to know:
  1. whether the affix was a redlink
  2. what template created the category (possibly also hard categorized).
  3. how many members in the category
So far I have found in English use of prefix when suffix was probably intended and vice versa, misuse of confix, redlink for affix. In each case there was only one member of the category. DCDuring TALK 04:35, 5 June 2011 (UTC)[reply]
Most of the categories listed have 1 member, except for a few which may have two or three (but none higher). I don't think it's possible to automatically generate a list of what created the category, it would be too much work to do by hand. I have added links to the affixes themselves, though, and a number to indicate the amount of entries. —CodeCat 10:11, 5 June 2011 (UTC)[reply]
It seems that PAGESINCAT doesn't work after too many uses and just returns 0 all the time. I could subst: it, but that might show false results. —CodeCat 10:14, 5 June 2011 (UTC)[reply]
I've updated the list. It seems that the languages with the most categories that need to be created are English, Finnish, Italian, Serbo-Croatian and Spanish. Are there any editors here that are able to help with those? —CodeCat 14:54, 12 June 2011 (UTC)[reply]
Thanks.
One point is that it helps to check whether the etymologies are historically correct. I just checked caco- and found that both etymologies seemed incorrect, thus obviating the need for the category at this time. DCDuring TALK 15:15, 12 June 2011 (UTC)[reply]
I continue to find this a better source of historically erroneous etymologies than of missing categories. It is quite time-consuming to properly populate the entries (very much so for suffixes) and/or to correct the etymologies. DCDuring TALK 15:37, 13 June 2011 (UTC)[reply]

Administrator rights

I was an administrator for a while, probably about a year, and in any case I've created shitloads of new entries. I have a horrible temper but I don't think I ever abused it in terms of misusing admin commands. I would like to resume that role, mostly so that I can delete spam instead of having to put the delete template on it. Do I need to start a vote or what? Equinox 23:48, 4 June 2011 (UTC)[reply]

Do you still have the sysop bit? If not, how did you cease to have it? bd2412 T 01:20, 5 June 2011 (UTC)[reply]
See Special:UserRights/Equinox. —RuakhTALK 01:36, 5 June 2011 (UTC)[reply]
Done (can be undone again if anyone objects). Please update list of sysops. SemperBlotto 07:09, 5 June 2011 (UTC)[reply]
Thanks. Equinox 22:51, 8 June 2011 (UTC)[reply]

Removing the horizontal line between language sections

Our standard practice has always been to add a horizontal line between language sections:

----

It seems a little silly to me because we could easily reach the same effect by using CSS, as far as I know. Maybe we should deprecate this and use CSS formatting? —CodeCat 12:46, 5 June 2011 (UTC)[reply]

Can the CSS formatting be reasonably used to add the horizontal line above all language sections except the first one? That seems like additional work, but I'm ready to be proven wrong. --Daniel 13:09, 5 June 2011 (UTC)[reply]
I think there is a special way to say that a CSS property should apply to only the first or all except the first. —CodeCat 13:16, 5 June 2011 (UTC)[reply]
I think I know how this could be done now, but I don't know if it will work.
body.ns-0 h2 { border-top: 1px; }
body.ns-0 h2:first-child { border-top: 0; }
CodeCat 15:12, 5 June 2011 (UTC)[reply]
No, that won't work. h2:first-child doesn't mean "an h2 that is the first h2 within its parent"; it means "an h2 that is the first element (of any type) within its parent". —RuakhTALK 15:41, 5 June 2011 (UTC)[reply]
h2:first-of-type, but that won't work with IE8 or lower. --Yair rand 17:14, 5 June 2011 (UTC)[reply]
Having '----' in the wikitext makes it very easy to separate the language sections when doing any kind of dump processing. Nadando 17:34, 6 June 2011 (UTC)[reply]
In October, 2005, Jon Harald Søby argued strongly for removing the ---- and assigning the task to CSS. However, he insisted on removing all of the ---- first and then sometime later perhaps getting someone to add it to CSS. The counterargument was that he should first make it work in CSS and then we could remove the instances of ----. Nothing ever came of it. —Stephen (Talk) 08:55, 8 June 2011 (UTC)[reply]

Poll: Specific fictional characters

We have a policy for attestability of terms originating in fictional universes, which naturally includes names of specific fictional characters. Sometimes it seems to be defended as the highest and unquestionable authority on this subject, and sometimes it seems to be ignored in favor of the argument that they are "not dictionary material" anyway.

Since we have a policy for this issue, it should convey exactly what people think — even if it is just conveying the disagreement, like it happens for other subjects.

So, I think it's time to ask a simple question...

  • In your opinion, how many citeable English names of specific characters of works of fiction (proper nouns such as "Mickey Mouse", "James Bond", "Tiny Tim", "Batman", etc.) should be defined on Wiktionary?

Thank you. --Daniel 13:11, 5 June 2011 (UTC)[reply]

Poll: Specific fictional characters — All of them

Poll: Specific fictional characters — Some of them

  1. Agree --Daniel 13:11, 5 June 2011 (UTC)[reply]

Poll: Specific fictional characters — None of them

Poll: Specific fictional characters — Discussion

I think a discussion of this sort would be a waste of time. The question is not whether we should have one class of thing or another, but what are the requirements for a word or phrase to enter the lexicon. The CFI, agreed to by consent of the community, already inherently answers this poll with "some of them", subject to qualifications also determined by the community. Obviously if I post a story somewhere on the Internet tomorrow about a space knight named Nordskeeb Bemmeron, I can't have that name included in the dictionary on that basis. It is equally obvious that some fictional characters have become lexical terms: Darth Vader, Robin Hood, Captain Kirk, Sherlock Holmes, Casanova, Aphrodite, etc. The whole discussion will always be on the bounds of inclusion. bd2412 T 19:32, 6 June 2011 (UTC)[reply]

  • (By the way, don't bother telling me Casanova was a real person, I will refuse to acknowledge it). bd2412 T 19:40, 6 June 2011 (UTC)[reply]
  • I approximately agree with bd2412.​—msh210 (talk) 19:54, 6 June 2011 (UTC)[reply]
  • I also approximately agree with bd2412. If he posts a story somewhere on the Internet tomorrow about a space knight named Nordskeeb Bemmeron, this name should not be included. My criterion would be if the name can be considered as a word, and belongs to the culture of the language, it should be includable (e.g. Othello should be includable as a word). The same rule should apply to fictional placenames. I know that such a criterion is not something that can be decided by a computer, and that decisions relying on such a criterion would sometimes be disputable, but it's difficult to imagine a better one. Lmaltier 20:11, 6 June 2011 (UTC)[reply]
  • I also approximately agree with bd2412. I would actually be O.K. with categorically excluding the names of all fictional characters, but obviously there's no community support for that, and this poll seems thoroughly unnecessary. —RuakhTALK 22:01, 6 June 2011 (UTC)[reply]
  • I agree it's a tricky issue, and RFD and RFV seems the best way to deal with them right now, though both of those are (or can be) long processes. I'm not sure what sort of 'policy change' could simultaneously reflect the opinions of the community and make the issue clearer. Mglovesfun (talk) 22:03, 6 June 2011 (UTC)[reply]
Thanks for your insights; I think I'll be able to use them in the future. I like this poll, by the way, even if no one else formally votes on it. I fetched some opinions about all the reasonable options, ("Some of them" and "None of them"), which is more-or-less the purpose of any poll (or, at least, any of my polls).
I tend to agree with the apparent consensus of this discussion — of having nonzero entries for specific individual characters, while following aggressively strict, but yet vaguely shaped, rules for their inclusion — though I still fear I'll have to discuss against visceral opinions about this matter in the future. --Daniel 13:11, 8 June 2011 (UTC)[reply]

MglovesfunBot request

I propose to empty the categories Category:Translations to be checked (Serbian), Category:Translations to be checked (Bosnian) and Category:Translations to be checked (Croatian) into Category:Translations to be checked (Serbo-Croatian) by changing the language parameter used inside {{ttbc}} to sh. Does anyone oppose this? --Mglovesfun (talk) 16:52, 6 June 2011 (UTC)[reply]

I have no problems with it. —CodeCat 17:05, 6 June 2011 (UTC)[reply]
Possibly tonight if I'm home early enough. I will leave at least 24 hours from yesterday for objections. --Mglovesfun (talk) 09:27, 7 June 2011 (UTC)[reply]


469 entries that your bot changed on June 7, 2011 between 21:49:49 and 23:00:15 should be returned in those three categories that you deleted. Even if those translations were to be checked, they should have been checked in those three categories. -- Bugoslav 13:55, 8 June 2011 (UTC)[reply]
Too late. --Mglovesfun (talk) 14:08, 8 June 2011 (UTC)[reply]
It is not too late, please read this. Thanks. -- Bugoslav 14:22, 8 June 2011 (UTC)[reply]
I don't see any issues around lateness. But see this. --Mglovesfun (talk) 17:09, 8 June 2011 (UTC)[reply]

Categories for inflected forms

Such as præserving (which I will modify as præserve +‎ -ing in a moment), is this how we want our entries to look? I've heard it said that we don't categorize regular inflections such as -s, -ed and -ing, but what about Category:English words suffixed with -eth and Category:English words suffixed with -est? Aren't these purely inflectional suffixes? Especially Category:English words suffixed with -eth where they seems to be all inflection. --Mglovesfun (talk) 09:24, 7 June 2011 (UTC)[reply]

Indeed, we don't categorize mere inflectional suffixes. See the discussion at Category talk:English words suffixed with -s. These erroneous usages should be removed. -- Prince Kassad 09:39, 7 June 2011 (UTC)[reply]
So what would Category:English words suffixed with -eth contain, nothing? --Mglovesfun (talk) 10:01, 7 June 2011 (UTC)[reply]
Not just nothing, it simply should not exist. -- Prince Kassad 10:02, 7 June 2011 (UTC)[reply]
That's kinda what I meant, yes. --Mglovesfun (talk) 10:58, 7 June 2011 (UTC)[reply]
So, should {{suffix}} only categorize if the category actually exists? Or how should this work? I assume we don't want to start using From {{term|præserve|lang=en}} + {{term|-ing|lang=en}}. everywhere? —RuakhTALK 20:27, 7 June 2011 (UTC)[reply]
¶ I should have read this topic beforehand. I suppose it is actually redundant to include the etymology sections since it is essentially duplicating auto‐categorization such as Template:past of. Still, I hope it is acceptable to include links to the original forms, I just made an example here: keying. --Pilcrow 20:34, 7 June 2011 (UTC)[reply]
I think there's a quite a good counter-argument to be made; keying is key suffixed with -ing. Our entry says

"{{linguistics}} one or more letters or sounds added at the end of a word to modify the word's meaning". Seems to fit the bill. Mglovesfun (talk) 20:42, 7 June 2011 (UTC)[reply]

I see no point in adding etymologies to inflected forms, so there would be nothing like From {{term|præserve|lang=en}} + {{term|-ing|lang=en}}.. The current practice is to mostly avoid categories for inflected forms per inflectional suffix that they contain, a practice which I support. --Dan Polansky 07:34, 8 June 2011 (UTC)[reply]
I'm more or less indifferent to categorising inflexions by derivation; however, I think that giving them etymology sections is a good thing, and in the case of homographs like needles, they are indispensable. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:36, 8 June 2011 (UTC)[reply]

-ies forms in English words not following the I before E except after C rule

IMHO -ies plurals such as "currencies" should not be placed to Category:English words not following the I before E except after C rule, as they are non-lemma forms. The following forms have been recently added to the category: obstinancies, obeisancies, magistracies, lunacies, infrequencies, lieutenancies, frequencies, inaccuracies, latencies, idiosyncracies, accuracies, idiocies, kakistocracies, intimacies, supremacies, ecstacies, fancies, fallacies, extravagancies, exigencies, inconsistencies, constituencies, conspiracies, excellencies, Excellencies, consistencies, conservancies, concurrencies, competencies, delinquencies, emergencies, deficiencies, choccies, efficiencies, currencies, biccies, bibliomancies, belligerencies, bankruptcies, bureaucracies, æquivalencies, agencies, adhocracies, accountancies, aristocracies, aberrancies, abbacies, urgencies, mercies, inefficiencies, delicacies, contingencies, democracies, pharmacies.

I am not really sure what Category:English words not following the I before E except after C rule is worth, but that is another consideration.

Thoughts? --Dan Polansky 09:39, 7 June 2011 (UTC)[reply]

Afterthought: If I understand correctly, the forms would not belong to the category even if they were lemma forms. The rule of thumb for which the category was created is that "ei" rarely occurs in English words except when in "cei", that is, after "c". This rule of thumbs helps correct wrong spellings. An example of an exception to the rule is "Fahrenheit", as in there "ei" occurs outside of "cei" sequence, in "hei" sequence. --Dan Polansky 09:51, 7 June 2011 (UTC)[reply]

¶ That does not make sense. The trigraph c‐i‐e is clearly inconsistent with the ‘…except after C’ part. There are many terms besides plurals which contain c‐i‐e included in that category: ancient, efficient, science, society—need I continue? Those are also mentioned as exceptions in the Wikipedia article. ¶ The word policies remained tagged for months without concern and I did not categorize it. --Pilcrow 10:16, 7 June 2011 (UTC)[reply]
It's a bad rule as it has too many exceptions! It's not really a rule at all, for this reason I could go for an RFD - an appendix seems ok as you have more scope to discuss the issue in an appendix than you do in a category. --Mglovesfun (talk) 10:00, 7 June 2011 (UTC)[reply]
¶ The category title is clearly consistent with the forms it contains. It is quite misleading to remove the categorization even if the purpose and title are consistent with the word included. --Pilcrow 10:16, 7 June 2011 (UTC)[reply]
@Pilcrow, Re: "The trigraph c‐i‐e is clearly inconsistent with the ‘…except after C’ part": That is not clear. The name "English words not following the I before E except after C rule" refers to the rule as "I before E except after C", and this title alone does not make it clear what the rule says. Can you state what it is that you think the rule states? Does the rule also state that "cie" is rare? --Dan Polansky 10:21, 7 June 2011 (UTC)[reply]
Related discussion WT:FEED#Wiktionary:Requested entries. I agree with Pilcrow, these are English words (no argument there is there?) that don't follow the rule. So, they're categorized correctly. We allow plurals and whatnot in Category:English palindromes. --Mglovesfun (talk) 10:40, 7 June 2011 (UTC)[reply]
@MG: What does the rule say, then? Does the rule also say that "cie" sequence is rare? --Dan Polansky 10:42, 7 June 2011 (UTC)[reply]
Yes the rule in its totality is "I before E, except after C". These words show I before E precisely after C. QED, Pilcrow is right. This is a silly category anyway. Ƿidsiþ 10:49, 7 June 2011 (UTC)[reply]
"I before E, except after C" is not a complete specification of a rule, but rather a shortcut with multiple interpretations. The interpretation of this shortcut that I find most straightforward is this: "ei" is rare except when after "c" in "cei". Your interpretation seems to be this: "ei" is rare except when after "c" in "cei", and "cie" is rare. I find the statement '"cie" is rare' implausible anyway. --Dan Polansky 10:56, 7 June 2011 (UTC)[reply]
I admit that your reading is consistent with W:I before E except after C, and that my reading is probably not the intended reading of the rule. --Dan Polansky 11:09, 7 June 2011 (UTC)[reply]
Isn't there another part about sounding like "a" as in neighbor and weigh? bd2412 T 15:09, 7 June 2011 (UTC)[reply]
Isn't the full rule “I before E, except after C, for an ē sound”? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:13, 7 June 2011 (UTC)[reply]
I learned it as BD has it. I've always found it useful, though riddled with further exceptions, as for borrowings from other languages (notably ancient Greek). DCDuring TALK 15:36, 7 June 2011 (UTC)[reply]
I don't understand "I before E, except after C, for an ē sound". Could someone provide a fuller formulation of the rule modified for the thing with the sound of "a" as in "weigh"? Does that mean that "weigh" is not really an exception to the modified rule, or does that mean that "mercies" is not really an exception to the modified rule? --Dan Polansky 16:03, 7 June 2011 (UTC)[reply]
I don't think any native speaker would apply the rule to words ending in ies, being confronted with such words on every other page. This is just an orthographic mnemonic, not to be taken too seriously except by someone hoping to get an academic publication out of it. DCDuring TALK 16:14, 7 June 2011 (UTC)[reply]
I was asking these specific questions about "weight" and "mercies" to understand what the rule says when modified for the pronunciation. If the rule is useful, it is probably also useful for non-natives, right? Anyway, if the category should stay, we should clarify at some point whether it is driven by a rule that refers to pronunciation or a rule that does not. In the former case, it would be good to know in clear unambiguous terms what the rule actually says. --Dan Polansky 16:19, 7 June 2011 (UTC)[reply]
¶ Yes it is considerably excessive to mark normal affixed terms as exceptions; I highly doubt this mnemonic is treated as infallible. That said, I think it is acceptable to nominate this category for deletion. I personally have no interest in seeïng an appendix made instead, I do not recall beïng taught this when I was younger in the first place, so I do not have much else to comment about on this ‘rule’. ¶ In frequency: I desire to add supplementary efforts in my edits, so I did clean‐up some entries. My categorization was simply supplementary. --Pilcrow 16:59, 7 June 2011 (UTC)[reply]
@DanP. I think that it is an empirical question as the various forms of the rule have little real authority behind them. The clearly inadequate "i before e except after c" get 3800 bgc hits. The more adequate "i before e except after c and when it sounds like a" gets 5. This "rule" seems better suited for WP than for us and our categories. DCDuring TALK 18:12, 7 June 2011 (UTC)[reply]
  • You should all watch Stephen Fry and Harry Potter (no, really!) talking about why this rule is not very useful here. Ƿidsiþ 08:28, 8 June 2011 (UTC)[reply]
    :). I think I now have a formulation that refers to the sound of "a": '"ei" is rare except when after "c" in "cei" and except when pronounced like "a" as in "weigh"'. I am not saying that this is what was intended but rather that this seems a fairly accurate statement about English spellings. Of course, the "is rare" predicate allows for a host of exceptions, but they should really be rare for the rule to hold. The part of the rule criticized in the program is '"cie" is rare', the part of which I said it was implausible :P. --Dan Polansky 09:01, 8 June 2011 (UTC)[reply]

Uncategorized definitionless Chinese entries

For an entry such as , how would contributors feel about Category:Cantonese hanzi and Category:Mandarin hanzi? Or should it be Cantonese han characters? --Mglovesfun (talk) 10:14, 8 June 2011 (UTC)[reply]

The current system of treating Chinese characters is unreasonable. Chinese characters were originally invented to record the Chinese language, and applying them in non-Old-Chinese-derived languages was a much later event. The way the characters were designed was tightly associated with their pronunciations at the time of invention in (Early Old) Chinese, as the majority of Chinese characters have a phonetic component. The "etymology" of a character thus comprises two parts: one phonetic and one graphic. In the "definitions" section the original sense of the character in Old Chinese needs to be listed first, and then the rest in a roughly chronological order. The current arrangement of a "translingual" section at the top and putting usually non-Chinese languages next is illogical. The usage of characters in non-Chinese languages is 99.5% of the time determined by their original meanings in Chinese. There is no need to state what the character means in non-Chinese languages again if the meanings in Chinese had already been explained; only their non-borrowed meanings (as determined from the fossilised sinoxenic vocabularies) need to specified.
There are regular sound correspondences between Middle Chinese pronunciation of characters and their modern readings in non-Chinese languages. The development of characters' (literary) pronunciations in varieties of Chinese is also reasonably regular. Provided a Chinese character has its Middle Chinese pronunciation recorded in a rhyme book, one can predict from the initial and finals the expected pronunciations in modern varieties and languages, and that's the way the pronunciations of rarely used characters are usually determined. The categories "Cantonese hanzi", "Mandarin hanzi" are obviously inappropriate. 60.240.101.246 12:10, 8 June 2011 (UTC)[reply]
Why? Apparently, it's so obvious you haven't bothered to mention it in the two paragraphs above. --Mglovesfun (talk) 12:28, 8 June 2011 (UTC)[reply]
I do kind of understand the reasoning. The Romans, when they created their alphabet, assigned sounds to their letters, just like the Chinese, when they created their script, assigned meanings and sounds to their characters. The problem, though, is that the sound and the meaning can change, and it has in both cases. The Romans created V to represent /w/ and /u/, neither of which are now common pronunciations of the letter. And in the same way, Chinese characters have changed over time and may no longer have the meaning they had when they were created. And because of semantic drift, they can differ depending on where they are used, so that some languages preserve more archaic meanings that others lost. This is just as some Latin letters are pronounced closer to the original Roman pronunciation in some languages (Irish always pronounces C as /k/ like the Romans did, but Slavic languages pronounce an affricate /ts/). Granted, most of the Latin letters haven't changed much in sound, and the Chinese characters haven't changed in meaning, but some still mean slightly different things in different areas or languages. A character may signify a meaning that is common in one language but rare or archaic in another. So to call meanings 'translingual' is a bit strange. —CodeCat 12:51, 8 June 2011 (UTC)[reply]
Sure but, how does this relate to my question? --Mglovesfun (talk) 12:53, 8 June 2011 (UTC)[reply]
What I mean is that I would prefer to reduce the 'translingual' header, and instead allow each language to deal with the characters individually just as we do now with Latin letters. The translingual section itself could stay, but it should only deal with the character itself and its origin and original meaning, and not with contemporary meanings and pronunciations. So I think that just as we allow Category:English letters, we should also allow Category:Mandarin Han characters or some variation. —CodeCat 13:00, 8 June 2011 (UTC)[reply]
And if the pronunciation of Middle Chinese is important for understanding the development of non-Chinese languages, then we should simply have a separate ==Middle Chinese== section with the appropriate pronunciation. —CodeCat 13:02, 8 June 2011 (UTC)[reply]
This is another thread regarding my drive to empty User:Yair rand/uncategorized language sections/Not English. We already have Category:Japanese kanji, which nobody has disputed, I don't see why Category:Mandarin Han characters would be 'clearly unacceptable'. --Mglovesfun (talk) 13:05, 8 June 2011 (UTC)[reply]
I don't disagree with it either, that's my point. —CodeCat 13:06, 8 June 2011 (UTC)[reply]
Seriously, how can a rude, mentally disabled person be given administrator rights when he has explicitly stated in his own user page that he cannot be held accountable even for his own actions? 60.240.101.246 11:41, 9 June 2011 (UTC)[reply]
Do you have an answer to my question? --Mglovesfun (talk) 11:48, 9 June 2011 (UTC)[reply]
Any Chinese character, is by default a "Mandarin Hanzi", "Cantonese Hanzi", ... And if the Middle Chinese pronunciation of that character is known, its readings in Korean, Japanese, Vietnamese can also be deduced (see for example this incomplete page for predicting Sino-Vietnamese readings), and in that way it is also a "Japanese Kanji", "Korean Hanja" and a "Vietnamese hán tự". 60.240.101.246 12:18, 9 June 2011 (UTC)[reply]
But what's your objection? --Mglovesfun (talk) 12:53, 9 June 2011 (UTC)[reply]
  • We don't do that because they only have one script. But I wouldn't disagree with doing something like that for Serbo-Croatian, for example, which uses Latin and Cyrillic. —CodeCat 13:06, 9 June 2011 (UTC)[reply]

Related terms that are part of the etymology of a word

This question is regarding the addition of an etymon of a said word to the list of related terms. What is the current rule on this, are words already mentioned in the etymology allowed to be added to the related terms list? I am only asking this as I have seen a few users remove these from the list of related terms; therefore it would be helpful to have this matter cleared up. Caladon 11:25, 8 June 2011 (UTC)[reply]

I think that there is absolutely no reason to disallow words listed in the etymology to be also listed in related terms.Matthias Buchmeier 11:39, 8 June 2011 (UTC)[reply]
I generally remove these when the related terms section and etymology are close together as 'duplication of links', but in long entries, they can be useful as the separation between the etymology and related/derived terms is significant. --Mglovesfun (talk) 11:45, 8 June 2011 (UTC)[reply]
I know of no specific rule. I generally remove these. (But I don't remove from the =Related terms= section same-language cognates that happen to be listed in the =Etymology= section.)​—msh210 (talk) 17:28, 10 June 2011 (UTC)[reply]
If it's best to repeat such related terms on a long page, then it's best to include these related terms on any page, short or long, since ultimately any page could develop into a long page with pronunciation from several regions, quotations and example sentences, etc. It's not necessary to duplicate them, but if the goal is to develop pages, then removing them is actually a bad idea. DAVilla 19:02, 5 July 2011 (UTC)[reply]

ICAO phonetic alphabet

Would this be considered translingual or English? The pronunciation of the words is based on English, but the alphabet is used around the world even by people who don't speak English. As far as I know the standard itself prescribes the pronunciation as well, which isn't specific to English. —CodeCat 14:14, 9 June 2011 (UTC)[reply]

I would venture to say translingual. DAVilla 19:04, 5 July 2011 (UTC)[reply]

Inflected German participles

As in English, verbs in German have a present and a past participle form (e.g. spielen (play) -- present participle spielend (playing), past participle gespielt (played)). Those participles can be used in such a way that they are syntactically adjectives -- they are also inflected like adjectives then, e.g. die spielenden Kinder ("the children playing"), das gespielte Spiel ("the game played"). How can these verb (or adjective?) forms best be added to Wiktionary (I know of no such entry)? One option would be to add two adjectives for each verb reflecting the present and the past participle -- i.e. spielend would get an adjective section and so would gespielt, the inflected forms would then link to their "base form" (i.e. spielenden would link to spielend). But firstly, that would make a lot of redundant sections, and secondly, most of those participles are only used as adjectives -- i.e., they are adjectives syntactically, but not lexically. (Actually there are many lexicalized participle forms which are proper adjectives now, such as gefragt (in demand), those would of course get adjective sections either way.) Does anyone have a better idea or is there even some policy already that I missed? Longtrend 17:15, 9 June 2011 (UTC)[reply]

I have often wondered this myself as well. Participles behave as adjectives but they sometimes behave as verbs too. There are some cases where participles can be replaced by adjectives, and some where they can't. —CodeCat 17:19, 9 June 2011 (UTC)[reply]
I don't speak German, but from your explanation, it seems to me that these are verb forms — "Strong masculine singular nominative past participle of ____" and so on. I think gespielt would then be something like "Uninflected past participle of spielen", rather than simply "Past participle of spielen". —RuakhTALK 20:21, 9 June 2011 (UTC)[reply]
Participles are verb forms that are adjectives. They're both, but their inflection is adjective-like. In a way, a participle in German is like an adjective derived from the verb, except that it retains some verbal properties because you can use it with an auxiliary verb. —CodeCat 21:52, 9 June 2011 (UTC)[reply]
Yeah, I think that's true in all languages. That's basically the definition of participle. :-)   —RuakhTALK 22:27, 9 June 2011 (UTC)[reply]
Then I don't understand why you think they are verb forms and not adjectives. —CodeCat 22:43, 9 June 2011 (UTC)[reply]
I think you may be misunderstanding my original comment. I didn't say "they are verb forms and not adjectives". I said they are verb forms; and you obviously agree with that. But perhaps I said the wrong thing, anyway; what I really should have said was that they are {{infl|de|verb form}}s. But regardless, to answer your question: It's because of what you and Longtrend say above — "most of those participles are only used as adjectives -- i.e., they are adjectives syntactically, but not lexically"; "There are some cases where participles can be replaced by adjectives, and some where they can't" — which matches my (vague) background knowledge, as well as what I've found today by Googling. Past participles in German, as in English, seem to be verb forms that behave in many respects like adjectives, with most not being quite the same as normal adjectives. Present participles I'm less clear on — they don't seem to have any purely verb-y uses (though I'd welcome correction on that) — but it seems natural to treat them the same way (especially since, according to some web-sites at least, most can't be used predicatively, only attributively, whereas obviously normal adjectives have both uses). —RuakhTALK 23:52, 9 June 2011 (UTC)[reply]
German participles can be used anywhere regular adjectives can. They can be used attributively and predicatively, and also adverbially (German adverbs of manner are identical to the base adjective). Although it depends on the transitivity of the verb whether the combination makes sense. In theory, based on semantic grounds, some participles such as (present) bedeutend or (past) geeignet even have degrees of comparison. —CodeCat 00:14, 10 June 2011 (UTC)[reply]
@CodeCat: Actually, examples such as bedeutend and geeignet are what I meant by lexicalized participles. They are undoubtedly true lexical adjectives now. gespielt, for example, is not lexicalized in this way, that's why I'm asking how to treat it. @Ruakh: Your assumption is right: Present participles don't ever have any verbal uses anymore, yet a present participle can be made out of any verb. It's also correct that they usually can't be used predicatively, only attributively. But it's not correct that normal adjectives always have both uses; actually there are many that can be used only predicatively. Longtrend 08:41, 10 June 2011 (UTC)[reply]

Have you looked at how Latin participles work in Wiktionary? I've been thinking the situation must be similar. -- Prince Kassad 23:08, 9 June 2011 (UTC)[reply]

They are treated as adjectives but with a ==Participle== header instead, right? —CodeCat 23:12, 9 June 2011 (UTC)[reply]
I think so. I like this solution. If we can't decide whether they are verbs or adjectives, why put them in either of these categories? A Participle header probably makes more sense. So if we go with this solution, gespielt would have (only) a Participle section with an inflection table, whereas the lexicalized gefragt would have an additional Adjective section. The inflection tables for the participle and for the adjective wouldn't differ, except that the adjective has comparative and superlative forms. Longtrend 08:54, 10 June 2011 (UTC)[reply]
A fairly straightforward treatment seems to be this. When a participle form can behave both like a verb form and an adjective, let it have both sections, as in this revision of "gefragt". When a participle form is only used as an adjective, let it only have an adjective section, and let its origination as a participle be mentioned in its etymology. This treatment seems to be simple and clear, tightly following the notion of part of speech as something that is revealed in positions in sentences that a word form takes, and in inflection. Its only disadvantage, AFAICS, is that it often mandates two sections of one word form, but it is this disadvantage that offers a lot of clarity to the reader: it tells that the word form is used both as a verb--as in "wurde gefragt"--and an adjective. What I admit not to understand is what is meant by an adjective's being lexicalized or not lexicalized: that is, what observable properties a word form has that leads to its being ranked as "lexicalized adjective", and what observable properties a word form has that leads to its being ranked as "non-lexicalized adjective". Whether something originates from a verb should not be a key consideration for its being classified as a verb, or it definitely should not be the only consideration; there are plentiful classes of counterexamples, among which I pick agent nouns such as "doer" or "writer". If "bedeutend" is never used as a verb, all its positions in sentences are adjectival ones, and it is inflected as an adjective, then it is IMHO best classed as an adjective, no matter its origin. This was the treatment of "bedeutend" in this revision; the later addition of a verb form section to "bedeutend" seems wrong. --Dan Polansky 10:48, 10 June 2011 (UTC)[reply]
"When a participle form is only used as an adjective, let it only have an adjective section": this would apply to every present participle form then. But the present participle appears in each verb inflection table, so wouldn't it be a contradiction to have only an adjective entry for it? I think participle as a header makes more sense, with an inflection table because each present participle can be inflected. There can then be an additional adjective section for words that originated as participles but have now a meaning on their own. I admit it's sometimes difficult to draw the line between lexicalized and non-lexicalized forms (as it's always the case with lexicalization), but firstly that problem is already there (it would not be introduced with my suggestion) and secondly there are some reliable indicators. For example, if present or past participle forms have comparative and superlative forms, they are certainly lexicalized. "When a participle form can behave both like a verb form and an adjective, let it have both sections": this would apply to virtually every past participle form, apart from some intransitive-verb-based ones. This means that (almost) every single verb would have two sections for its past participle -- quite redundant, don't you think? Why not make one participle section for each verb, and then an additional adjective section for those participles that have developed a new meaning independent from their origin as participles? Maybe you're skeptical because we would need to introduce a new participle header for Geman. I wouldn't want to do that either, but it's already there for Latin verb forms (and perhaps in other languages as well) and I don't see a reason to give it up there. So why shouldn't we use it for German, too? Longtrend 11:59, 10 June 2011 (UTC)[reply]
I still don't quite understand the difference between a 'lexicalised' participle and a regular participle. To me, a participle simply is an adjective. In Dutch (which is the same as German in this respect) if I say het gebouwde huis ("the built house") then I don't see any difference between a verb form and an adjective because the participle clearly inflects as an adjective. In het huis is gebouwd, the same thing applies but there are two interpretations: the house has been built, or the house is now in a state of having been built. This split applies to every participle I can think of, not just those that seem to have a different meaning from the verb. And with present participles it's the same... I can't be sure whether doorlatender is simply something that does more doorlaten than normal, or whether it stands apart from the verb. —CodeCat 12:15, 10 June 2011 (UTC)[reply]
Okay, back to my initial example for a lexicalized participle, gefragt. It originated as the past participle of the verb fragen (to ask), which can be translated as "(be) asked", and is still used as such (e.g. in er wurde gefragt ("he was asked")). As such it is quite clearly a verb form. BUT another form has developed out of it, that differs both semantically and syntactically. gefragt as an adjective means more than its verbal origin suggests -- it's not just "asked", but also "asked for", "in demand", "requested". It is lexicalized just like toaster is not just any person that toasts but rather a special device. And syntactically the lexicalized gefragt is an adjective, inflecting just like other adjectives, and it has comparative (gefragter (more asked for, more requested)) and superlative (gefragtesten (most asked for, most requested)) forms. Whether the same applies to Dutch doorlatender, I can't tell. But do you know what I mean? Longtrend 12:32, 10 June 2011 (UTC)[reply]
@Longtrend 11:59: You have a good point with present participles being in the inflection tables of verbs. The question is whether they really belong there. Many nouns can fairly regularly form diminutives, and many verbs can form agent nouns, but that does not mean we want to treat diminutives and agent nouns as inflected forms. If present participles are always used as adjectives, then they are adjectives, IMHO.
As regards "lexicalized", you have stated only one indicator: whether the adjective has a comparative and a superlative form. But there are adjectives that do not have these forms (see Category:English uncomparable adjectives), so I do not see why this is a good necessary condition for a word form's being a true adjective, or "lexicalized adjective". I still do not know what is meant by "lexicalized", as you have stated only one observable property, such one that is not true of unquestionable adjectives.
Above all, I do not think that having both a verb section and an adjective section is redundant: the verb section contains other information than the adjective section. Yes, they could be merged into one "Participle" section, but who or what stands to gain? The savings in the entry length would be minimal, I think, and the reader has to figure out what part of speech "participle" is. IMHO "participle" is not really a part of speech at all; it is a tag from which a bundle of parts of speech has to be derived. The benefit of having two sections is clarity for the reader, and alignment with a neat notion of part of speech, as I have detailed in my previous response.
Latin: There is now in Wiktionary the part of speech of "participle" for Latin, but it would be good first to understand why this was introduced before you introduce it also for German. "Participle" for Latin was introduced by EncycloPetey, I think, and I vaguely remember he was hesitant to introduce it, but it solved some problem that seemed worth it. --Dan Polansky 12:41, 10 June 2011 (UTC)[reply]
What about Esperanto? In Esperanto the part of speech can be determined from the last letter(s) of the word. Participles end in -a, like adjectives do. So in Esperanto there is no doubt that they are adjectives at all. But nonetheless, Esperanto does have compound tenses that use auxiliary verbs with participles even though the participles are clearly adjectives. —CodeCat 12:57, 10 June 2011 (UTC)[reply]
You are engaging in the fallacy of equivocation. (See [[w:Equivocation]].) The term "adjective" is frequently applied to all Esperanto words ending in "-a", and the term "part of speech" is frequently applied to groups of Esperanto words ending with the same last letter(s), but such use of said terms is not commensurate with any of use of "adjective" or "part of speech" as applied to other languages. There's a close connection, obviously, in that Zamenhof intended his adjectives[Esperantists] to be adjectives[linguists] and his parts-of-speech[Esperantists] to be parts-of-speech[linguists], but he was a fallible human being — and not even a trained linguist — and anyway the study of syntax has progressed a great deal since his death, so there's certainly no reason to take his analysis as definitive for any language besides Esperanto. (Honestly, I'm not sure we should even take it as definitive for Esperanto itself; but on that point I have no axe to grind.) —RuakhTALK 17:20, 10 June 2011 (UTC)[reply]
Inflected German participles — AEL
@Dan Polansky: As for Latin participle headers: Agreed, we should be sure how and why they were introduced. Does anyone remember? Until then I can just say that I like the solution and that a Participle header reflects the fact that those words are something in between verbs and adjectives. Merging verbal and adjectival uses under such a header is the lesser evil, IMO, compared to having two separate sections for one past participle, and perhaps an additional one for "lexicalized" participles (see below), but I guess that's a matter of taste and should be decided democratically, if necessary.
As for lexicalized participles: Have you read my last reply to CodeCat? I think I'm quite clear there. Comparability is only one criterion. The more important one is semantics, see my gefragt example.
Why are present participles included in verb inflection tables? I don't know, but I know that this is common practice in every grammar I know. I wouldn't oppose removing them from verb inflection tables (the arguments are too good :)), but we should be aware that we set apart from common practice then. Longtrend 13:03, 10 June 2011 (UTC)[reply]
Re: "... the fact that those words are something in between verbs and adjectives": That is not really a fact AFAICS. Neither do I think that "paper" is something between a noun, a verb and an adjective. One word form can have several part-of-speech classes of occurrences; the part-of-speech classification applies to occurrences rather than word forms.
Re: "Merging verbal and adjectival uses under such a header is the lesser evil": Having several part-of-speech sections is not evil, if you ask me. As regards the requirement of economy, it is imperfect. As regards the requirement of clarity, it is seems excellent. As regards accuracy, once you put each present participle under "Participle" heading, it will no longer be true that each German term classed as "participle" can behave both like a verb and an adjective.
Lexicalization: Per your explanation, in order for a word form to be a "lexicalized adjective", it must have a semantics that is more specific than or otherwise deviating from the semantics that directly follows from the generating word and the suffix. I do not see why we should accept this notion of "lexicalized adjective" as driving our classification: it would be similar to classing some "-ness" word forms in English as adjectives and some as "lexicalized" nouns. Even if the semantics of a "-ness" form is perfectly predictable from the adjective that generates it, the "-ness" form is still a noun, "lexicalized" or not. A German analogue is "Schönheit", which is a noun even if its semantics can be directly derived from "schön" and "-heit". (I was responding to "@Longtrend 11:59", where "11:59" is the time of your post, after an edit conflict, so my previous response indeed did not accomodate your later post.) --Dan Polansky 18:07, 10 June 2011 (UTC)[reply]
We really shouldn't worry now about what I call "lexicalized participles". Look again at the entry for gefragt as it is now, we have two sections there, one for the participle and one for the lexicalized meaning (however you want to call it). Neither your nor my proposal would affect that second section, right? (I just brought those lexicalized forms up to make clear that I don't attempt to put everything that looks like a participle or was one originally under a "Participle" header. Let's forget about that now.)
I just did a quick search in the archives and you seem to remember correctly about the introduction of Latin participles. This doesn't mean, however, that it's a bad solution. A lot depends on whether the community thinks that this solution is a good one for Latin -- if yes, then I really don't see any reason to treat German participles differently; if no, then obviously it's a bad idea for German as well. Do you agree with me here?
Perhaps we should consider present and past participles separately (only if we don't go for the Participle header solution, of course). Then treating present participles is easy: since they occur only as adjectives, they would each get one adjective header (no verb one). The only problem would be our current practice of listing present participles as verb forms in verb inflection tables (which, as I said, is common practice in many grammars). What about past participles? According to your suggestion, (almost) each of them would get at least two sections, since they can be used in two ways: as verbs (ich habe das Buch gekauft) and as adjectives (das gekaufte Buch). What makes me worry about this solution, apart from the fact that we'd need two POS headers for each and every verb, at least each transitive and many intransitives, is that there actually are cases where it's not clear at all whether the participle is used as an adjective or as a verb. This is exactly the split CodeCat mentioned above for Dutch: Consider his example, translated to German: das Haus ist gebaut. Is gebaut a verb here ("the house has been built")? Or is it a predicatively used adjective ("the house is built, i.e. in a state of having been built")? I'd say it's something in between, and if that's correct, it's a strong argument in favor of a Participle header. Longtrend 20:37, 10 June 2011 (UTC)[reply]
In French, past participles and present participles are verb forms. But, very often, adjectives are derived from these forms. The verb form and the adjective must be considered as distinct words. An example: sucré is a past participle, (e.g. in j'ai sucré mon café), but also an adjective (e.g. in il aime tout ce qui est sucré). In some cases, determining the status of the word in a sentence is not obvious, but you may apply the following principle: when you think to the action expressed by the verb, it's a verb form, when you think to a characteristic of the thing, it's an adjective. In some cases, it may be both depending on what you mean (e.g. the second sucré in Il a sucré son café, puis a bu le café sucré : if you mean the coffee which has just been sugared, sucré is a participle, if you mean the coffee with a sweet taste, it's an adjective). The existence of the adjective is not systematic for all participles. I think that this analysis is also applicable to English and to many other languages. Lmaltier 16:31, 11 June 2011 (UTC)[reply]
Yup, so in French, too, there are cases where participles are used ambiguously between verbs and adjectives. Perhaps this discussion should be extended to participles in general, cross-linguistically? By doing this, we could also question whether the Latin way is the right way to go. Longtrend 12:59, 12 June 2011 (UTC)[reply]
I think a cross-linguistic discussion might be productive (with the understanding that editors for individual languages will have to reach their own conclusions about what to take from the cross-linguistic discussion). We've discussed this a few times for Hebrew, and haven't yet reached any clear conclusions. I'd welcome input from editors of languages that have similar issues. —RuakhTALK 15:16, 12 June 2011 (UTC)[reply]
Maybe a separate discussion page should be created instead, though. This discussion is getting rather long... —CodeCat 15:30, 12 June 2011 (UTC)[reply]

Diacritical marks

I would like to add the POS header "Diacritical mark" to ¨, ` and other entries. --Daniel 19:43, 10 June 2011 (UTC)[reply]

Why is Symbol not good enough? (If you do want a new header, please also update WT:POS.) Conrad.Irwin 07:34, 11 June 2011 (UTC)[reply]
They're not "symbols", are they? A "symbol" generally represents an idea or a thing. What does ` represent in the French word (where)? —RuakhTALK 13:24, 11 June 2011 (UTC)[reply]

I see no objections. The proposal passed. When I have the time, I will feel free to add the aforementioned POS headers to the applicable entries. --Daniel 16:37, 17 June 2011 (UTC)[reply]

煽动仇恨者 is not sum-of-parts. It is a word because is a suffix. So, it should not be deleted. [1] 2.25.193.35 13:36, 11 June 2011 (UTC)[reply]

煽动仇恨者 is not a "word". It's not even attestable. It literally reads: "someone who incites hatred". Go away User:123abc. I'm just gonna keep blocking your multiple user accounts until you do. ---> Tooironic 23:12, 11 June 2011 (UTC)[reply]
煽动仇恨者 is a word and attestable, please see Google hits: "煽动仇恨者" 2.25.191.243 23:51, 11 June 2011 (UTC)[reply]
"People who incite hatred" also has thousands of hits on Google, that doesn't make it attestable. In that word, is not a suffix. It is only a legitimate suffix in two-character words such as 学者, 记者, etc. Obviously you know nothing about Mandarin word boundaries. ---> Tooironic 07:14, 12 June 2011 (UTC)[reply]
So, you base on your "knowledge" to delete 志愿者 too? 2.25.191.247 10:03, 12 June 2011 (UTC)[reply]
I agree with the banned user that -者 (zhe3) is a suffix, and that by extension, e.g. 志願者 (and perhaps 煽動仇恨者) are words. It may come from Classical Chinese originally, where it serves a grammatical function, but I'd say it has already become a suffix. Some non-suffix constructions remain, such as "XX者YY也", or like in the example below, but I'd say it's pretty rare. "在學術上,溫泉的學術定義中把湧出地表的泉水溫度高於當地的地下水溫,即可稱為溫泉。" (from Chinese Wikipedia). (Note: I'm not a native speaker.) Vaste 02:39, 13 June 2011 (UTC)[reply]
No, it's not a word. Not all suffixations lead to words worthy of inclusion, for example "湧出地表的泉水溫度高於當地的地下水溫者" ("springs which produce water with temperatures higher than those of (local) underground water") from your example above. 60.240.101.246 03:08, 13 June 2011 (UTC)[reply]
Yes, that was meant to be an example of where XXX+者 is clearly not a word. I.e. 者 is both a suffix (like in 志願者 or 學者), and a grammatical particle (or something). Or am I misunderstanding the meaning of suffix?
Regarding 煽動仇恨者, I think its use mainly seems to be an artifact of translating "hatemonger" to Chinese, though I honestly don't know. The question is how it is used in Chinese. Is it seen as a word? Is it used as a word? Would it be used outside translation? A "bad translation" that's popular enough is still a word. Btw, how do we treat words that only appear in translation? Vaste 05:19, 14 June 2011 (UTC)[reply]
There are two translations of hatemonger from Google for your reference:
"煽动仇恨者" (煽动仇恨+者)
"仇恨煽动者" (仇恨+煽动者) 2.27.72.254 06:09, 14 June 2011 (UTC)[reply]
Doesn't seem awfully common, now does it? Come to think of it, why is there no entry for 我爸是李刚? It has 1000+ times more hits on google. :) Vaste 08:31, 14 June 2011 (UTC)[reply]

Why is it deleted? 2.25.191.81 03:49, 12 June 2011 (UTC)[reply]

Because you created it when you shouldn't be on here at all. You've been blocked time and time again after abusing multiple accounts and anonymous IPs. When will you understand you are not welcome here? ---> Tooironic 00:00, 13 June 2011 (UTC)[reply]
志愿者 is not created by me, but is deleted by you. You block me because you hate Pinyin entries, but Pinyin entries are allowed. 2.25.211.239 01:16, 13 June 2011 (UTC)[reply]
I don't "hate" Pinyin entries. Understand this: Wiktionary only keeps words which can be attested. Have you ever actually read Wiktionary:Criteria for inclusion? ---> Tooironic 13:30, 13 June 2011 (UTC)[reply]
志愿者 is a word and attested. 2.27.72.254 13:37, 13 June 2011 (UTC)[reply]
Google hits: "志愿者"
Google Books: "志愿者" 2.27.72.254 21:52, 13 June 2011 (UTC)[reply]
Entry restored. But please remember just because a word has 者 on the end of it doesn't mean it is a word in its own right. 煽动仇恨者 will never be recreated because it's not a word in Chinese - the idea that you can put two multiple-character words together PLUS a prefix and call it a legitimate word is absurd. ---> Tooironic 23:51, 13 June 2011 (UTC)[reply]
One point that I would like to stress is that Google searches can be helpful, but are not without problems when it comes to establishing a word or phrase as a valid entry. Common phrases or sentences that have a lot of Google hits would never be accepted here at Wiktionary. One silly example is Google hits: 我喜欢唱歌 and 我喜欢唱歌. In this case, something like Baidu would be much more convincing. -- A-cai 23:25, 1 July 2011 (UTC)[reply]

Twitter

Who runs the @Wiktionary Twitter account? It tweets the word of the day every day, which is a Good Thing, but it could be a lot more visible if all the tweets were tagged #wotd (which is quite an active hashtag). Is it automatic or is someone actually doing this every day? Ƿidsiþ 09:11, 12 June 2011 (UTC)[reply]

Fairly sure it's automatic (it didn't update on days there wasn't a 2011 word), but I don't know how it's being done or who owns the account. — lexicógrafa | háblame12:13, 12 June 2011 (UTC)[reply]

It is currently 10:41, 12 June 2011 (UTC) , voting closes at 23:59 UTC, so just hours after this post. If you want to vote, please do so. Additionally, there's been an on-going discussion about a recommendation for voting to be extended to give people more time, opinions welcome. --Alecmconroy 10:41, 12 June 2011 (UTC)[reply]

You might want to say what the vote is. --Mglovesfun (talk) 16:31, 13 June 2011 (UTC)[reply]
It is the Board of Trustees elections. Our own User:GerardM is in the running, so you should think about voting. —Stephen (Talk) 08:56, 14 June 2011 (UTC)[reply]
Except that it's too late now. The polls closed thirty-six hours ago. (The results will be announced "tomorrow", though I'm not sure what time tomorrow. It could even be twelve hours from now, 8:00 PM EDT, for all I know.) —RuakhTALK 12:06, 14 June 2011 (UTC)[reply]

Tabbed languages, definition editing, again

Previous discussions: Wiktionary:Beer parlour archive/2011/March#Tabbed Languages, Definition side boxes, and Sense IDs, Wiktionary:Beer parlour archive/2011/April#Edit definition gadget

For those who missed the discussions last time around, tabbed languages is a script that displays language sections as "tabs" rather than having languages stacked on top of each other (along with some additional additional features such as category editing and a tool for adding new language sections via WT:EDIT), and the definition editing script allows simple editing of definitions and adding example sentences and such. Since the last discussion, I reworked tabbed languages based on a new design by Brandon Harris, a designer employed by the WMF, and I changed some elements of the definition editing tool, doing away with the expandable boxes. Both scripts can be tested by clicking the button below:

(Please purge your cache so that this button will work.)

During the last discussion it was suggested by Ivan Štambuk that the definition editing tool be enabled "for some representative period of time (e.g. 1 week) so that one can gather statistics (from edit summaries) how many new users (IPs) edited with it, so that we can have hard numbers quantifying its potential benefits". This sounds like a good idea to me, except that since it can take as long as 30 days until all users' browsers no longer cache the previous revision of common.js, the trial period would probably have to be longer than one week in order to get good statistics. --Yair rand 22:02, 12 June 2011 (UTC)[reply]

How much work would it be to make it into a Gadget? That way it could be turned on-by-default and off-by-default on a dime. —RuakhTALK 23:13, 12 June 2011 (UTC)[reply]
Definition editing options is now available as a gadget, and it seems to work. If I remember correctly, last time I tried to make tabbed languages into a gadget something broke. --Yair rand 03:12, 13 June 2011 (UTC)[reply]
(Note: The option to make gadgets on by default is not available in the current version of Mediawiki. --Yair rand 08:20, 11 July 2011 (UTC))[reply]
I like it, but there would be some bugs to work out. For instance, if you go to cheese with tabbed languages enabled there's a massive white space next to where the picture is. ---> Tooironic 23:55, 12 June 2011 (UTC)[reply]
The picture shouldn't really be placed where it is, though. Ideally nothing should be before the first header, except 'see also' links. —CodeCat 00:18, 13 June 2011 (UTC)[reply]
Really? What if the image applies to more than one language? We can't really have them repeating throughout. ---> Tooironic 01:35, 13 June 2011 (UTC)[reply]
I'm pretty sure this has been discussed and the consensus was that images should be placed only in individual sections. (Not completely sure about that.) --Yair rand 03:12, 13 June 2011 (UTC)[reply]
On second thought it's probably not a big deal. See, for example, tank. Users can see it upon first viewing, regardless of the language. ---> Tooironic 01:36, 13 June 2011 (UTC)[reply]
When you select a non-English section, the language codes get dropped from the categories- "English derivations" instead of "oc:English derivations". This seems non intuitive and wrong to me. Nadando 00:50, 13 June 2011 (UTC)[reply]
I added that function during the discussion on topic category naming format above. The prefixes really don't mean anything to the reader, and it really doesn't make sense to display them to users if categories are being sorted into specific sections. If there isn't consensus for removing topic category prefixes from the display, I can revert the change. --Yair rand 03:12, 13 June 2011 (UTC)[reply]
I don't think those prefixes should be hidden, unless the full-word prefixes are hidden as well. Aside from that, I think tabbed languages is pretty sweet! —RuakhTALK 22:35, 13 June 2011 (UTC)[reply]
The problem with hiding full-word prefixes is that then we have conflicting visible category names ([[gerund]] would appear to be in "Nouns" twice.) --Yair rand 23:09, 13 June 2011 (UTC)[reply]
All the more reason to retain the distinction, I think. By the way, I find another current asymmetry confusing as well: categorizing [[gerund#Dutch]] under "Flowers" adds the page to [[Category:Flowers]] rather than to [[Category:nl:Flowers]], which is not what I would expect a user to expect. —RuakhTALK 23:21, 13 June 2011 (UTC)[reply]
Good point. I've turned off category prefix removal for now. There's no way that I know of for the script to tell if the newly added category is a topic category and requires a prefix. Hopefully the category structure can be reworked at some point to allow for a simple display of categories. --Yair rand 23:52, 13 June 2011 (UTC)[reply]
Both look good to me, although I'm a tech n00b so my feedback is kinda vague. There is something with the "add language" button though: once you put in a language name and press Add, it brings up something for about a millisecond, then something refreshes and you lose your progress. — lexicógrafa | háblame00:37, 14 June 2011 (UTC)[reply]
It doesn't bring up the new section with input boxes for filling in content? What browser do you use? --Yair rand 00:43, 14 June 2011 (UTC)[reply]
Well, in that millisecond I see a green box in the upper corner (same type that appears for other things) and a box in the entry that says "[Language] categories:", but that's it. I'm in Chrome. — lexicógrafa | háblame01:24, 14 June 2011 (UTC)[reply]
Hm, it works in Chrome for me. Do you have any gadgets enabled? Does it happen in all entries? Also, does the URL change after clicking "Add" (maybe adding a question mark at the end?)? And does the "Add part of speech" button in the bar on the left also break? Does the [Language] categories box have a (+) icon in it? (And is that too many questions? :) ) --Yair rand 01:43, 14 June 2011 (UTC)[reply]
Gadgets as in Special:Preferences; yes. All entries I've checked; yes. Yes, it adds a '?' to the end of the URL. The "Add part of speech" and "Add definition" buttons have never worked for me, and they still don't. No, it doesn't. And no. :p — lexicógrafa | háblame02:23, 14 June 2011 (UTC)[reply]
Does it work now? --Yair rand 03:50, 14 June 2011 (UTC)[reply]
Yes, it works now. :) — lexicógrafa | háblame11:51, 14 June 2011 (UTC)[reply]
Does tabbed languages remove the ability to see hidden categories? —RuakhTALK 02:37, 14 June 2011 (UTC)[reply]
Not anymore, thanks for pointing out the bug. --Yair rand 03:50, 14 June 2011 (UTC)[reply]
In the entry a, most of the categories appear under the Filipino tab. —Internoob (DiscCont) 17:18, 14 June 2011 (UTC)[reply]
That is because the recently added Finnish section following the Filipino section is entirely uncategorized, breaking the category sorting. See User:Yair rand/uncategorized language sections/Not English for a list of similar entries, which will need to be fixed before tabbed languages is implemented. (Or, alternatively, we could modify {{attentioncat}} so that "Category:X terms needing attention" aren't hidden categories, and then bot-add {{attention}} to uncategorized entries...) --Yair rand 17:52, 14 June 2011 (UTC)[reply]
Is there a way to edit a language section when tabbed languages is turned on? —RuakhTALK 01:54, 16 June 2011 (UTC)[reply]
The former version of tabbed view had the option to edit individual sections, but this one, sadly, apparently doesn't. --Ivan Štambuk 06:26, 18 June 2011 (UTC)[reply]
There is now a button for editing individual language sections. --Yair rand 11:00, 27 June 2011 (UTC)[reply]
I've started discussions at User talk:Yair rand/TabbedLanguages2.js for various open issues. Some of them are bigger issues than others. —RuakhTALK 02:16, 19 June 2011 (UTC)[reply]

Awesome! So now can we list English in alphabetical order with all the other languages, and just default to opening that tab? DAVilla 19:25, 4 July 2011 (UTC)[reply]

If this does get enabled, then we could, in theory, but I don't really see what the advantages of moving down English would be. --Yair rand 15:44, 5 July 2011 (UTC)[reply]

Multiple context labels and parentheses

The definition of the entry goodness gracious starts this way:

  1. (idiomatic, euphemistic, dated)

While I think this way would be more appropriate:

  1. (idiomatic, euphemistic, dated)

That is, I prefer having only one set of parentheses per sense, and all context labels inside it (regardless of which context labels are appropriate). I often "correct" entries that way. Thoughts? --Daniel 08:46, 13 June 2011 (UTC)[reply]

This is I think a straightforward change that doesn't need any kind of discussion. -- Prince Kassad 09:11, 13 June 2011 (UTC)[reply]
I've considered how to do this by bot, but it would be quite complicated for a relatively small number of entries; merging the labels by hand seems therefore superior. --Mglovesfun (talk) 10:08, 13 June 2011 (UTC)[reply]
The bot could, at least, list all the entries that need this cleanup work. WT:Todo/Separated context labels, perhaps. --Daniel 11:07, 13 June 2011 (UTC)[reply]
Mine can't! Bequw, Nadando and Ruakh are best at generating these sorts of lists. --Mglovesfun (talk) 16:26, 13 June 2011 (UTC)[reply]
I'll make a list, but it will probably have from false positives- I can't very easily tell what's a context label and what's not. Nadando 18:22, 13 June 2011 (UTC)[reply]
That's why I didn't want to try and do it by bot! Mglovesfun (talk) 18:25, 13 June 2011 (UTC)[reply]

Numbers and numerals

Yes, I know this has been discussed before, and I know the discussion didn't reach a consensus. If I remember right, the discussion centred mainly around whether number or numeral was a better term to refer to the part of speech. I have a slight preference towards number, because it's a little simpler and more people will be familiar with the word. But I don't think there is really going to be a way out of this debate on semantic grounds alone, so maybe we could try a different approach and just look at current practice. Category:Numerals by language contains 197 languages, while Category:Numbers by language contains 50. There are also a few wanted categories for both, but the majority of wanted 'numbers' categories have an equivalent 'numerals' category that already exists. So overall, the vast majority of languages already uses numerals, and this can be considered the more usual practice. Even if we don't all agree on which term to use, it would be easier to use numerals because it means less work. —CodeCat 12:24, 13 June 2011 (UTC)[reply]

However, consider the following facts:
  1. Many subcategories of the numeral category are in fact empty.
  2. Some entries are in both the number and numeral categories (like altmyş).
  3. The numerals category doesn't really distinguish between language and script. See Category:Arabic numerals, it contains symbols, and not words in the Arabic language, which makes creating a category for Arabic (as in the language) numerals impossible. -- Prince Kassad 12:55, 13 June 2011 (UTC)[reply]
That's a good point, but I remember seeing 'Hindu-Arabic numerals' somewhere as well. The category is also a subcategory of Category:Arabic alphabet, which is kind of strange because the numerals aren't even used in Arabic in that form. —CodeCat 12:58, 13 June 2011 (UTC)[reply]
I seem to think EncycloPetey and Daniel Carrero (or Daniel. at the time) strongly opposed numbers. I'm hoping now tha EncycloPetey's gone, Daniel will just recognize he's in the minority and say 'ah well'. --Mglovesfun (talk) 16:29, 13 June 2011 (UTC)[reply]
That's not very nice... I agree that consensus is a good thing but to actually pick on people for disagreeing and blocking consensus is a bit much, don't you think? —CodeCat 16:31, 13 June 2011 (UTC)[reply]
Depends if it's disagreeing or blocking. Mglovesfun (talk) 18:56, 13 June 2011 (UTC)[reply]

Single lowercase entry in situations where both cases are present (English Wiktionary)

From a standpoint of Web navigation and complete comprehension by readers, my proposal is that in cases where both upper and lower cases are used, for example Brook (the name) and brook (the water feature), that we redirect the uppercase entry to the lowercase entry and co-locate the text of both entries on the same page. Case distinctions can be provided on one page. In cases where *only* an uppercase entry exists, the redirection (if any) can be from the lowercase entry.

Current policy states "For languages with two cases of script, the entry name will usually begin with a lowercase letter. Exceptions include proper nouns, German nouns, and many abbreviations." It seems needlessly pedantic to have two separate entries where the spelling is the same.

For some reason, several editors have interpreted this wording to mean that Wiktionary *requires* a separate entry for Proper nouns, rather than a word that *only* has a proper noun meaning being in uppercase. For words with multiple meanings that include non-proper-noun types (verbs, regular nouns, pronouns, adverbs, adjectives, prepositions, conjunctions, and interjections), we don't create a separate entry, so why in the world should a second entry happen just because it happens to be a proper noun?

The practical effect of this change would make it easier to find word variants as well as making maintenance of entries easier.

Thanks. -- Avanu 13:31, 13 June 2011 (UTC)[reply]

I'd more or less dispute all of this. First of all uppercase and lowercase isn't all about proper nouns - consider American. Also in this, you'd be redirect American to american, which seems counter-intuitive for an English speaker. First of all, I don't see how it would make anything easier, all information on all pages should be correct and well formatted. If anything, merging pages (and lots of them, tens of thousands) would make the entries larger, thus harder to navigate - consider merging Malta with malta! Thirdly, how about acronyms and initialisms? What do you do with MAN in reference to man and Man? I hate the idea, I suppose that capitalization isn't a spelling issue, but it is an orthography issue, and it seems reasonable that two non-identical forms should not be treated as though they are identical. --Mglovesfun (talk) 16:14, 13 June 2011 (UTC)[reply]
Dubiously relevant, but I feel roughly the same about French with and without accents - siecle isn't spelt different from siècle, the spelling is the same, but we keep both forms as they exist. --Mglovesfun (talk) 16:16, 13 June 2011 (UTC)[reply]
Just to be clear, I'm essentially talking about the URL, not the appearance of the entry once you get there. Words like "run" have over 100 meanings, and that is just the lowercase version. In situations where this has been done with separate entries, it should be relatively trivial from a programming standpoint to merge such articles. Why should there be two separate entries just because a word is one part of speech and not another? A run in the stockings and we compile at run time and We run to the grocery. The idea that these should all be on separate pages also seems a bit silly, and as you say, I'm only talking about the English dictionary, not the French or any other. -- Avanu 16:22, 13 June 2011 (UTC)[reply]
As Mg pointed out, "part of speech" is only a little part of the issue. In German, every noun begins with a capital. Plenty of English capitalised words are not proper nouns, too. Equinox 16:29, 13 June 2011 (UTC)[reply]
Best example I can find is Austrian, no proper noun meaning. --Mglovesfun (talk) 16:40, 13 June 2011 (UTC)[reply]
Just making sure, you do realize that English Wiktionary covers all languages, right? --Yair rand 19:10, 13 June 2011 (UTC)[reply]
Another thing: you're pushing this as a usability win, but it depends on the user. A guy who is an expert user of Wiktionary knows he can visit the URL /ira and see the French verb, /IRA and see the political abbreviation, /Ira and see the girl's name. By mashing everything into one page you are making it harder for him. Equinox 16:45, 13 June 2011 (UTC)[reply]
We have 'expert' users of Wiktionary? Shouldn't a dictionary be simple enough for almost anyone to use? -- Avanu 18:31, 15 June 2011 (UTC)[reply]
It should be, and it is; but likewise we should not hamper and punish skilled users, i.e. "dumbing-down". Computer software companies have to make the same decisions: making Word easier to use must not remove the keyboard shortcuts that save so much time for experts. Equinox 18:36, 15 June 2011 (UTC)[reply]
Well, in that vein, it would be easy to make a redirect that points to a specific anchor in a destination page like:
  1. http://en.wiktionary.org/wiki/IRA - the organization, would redirect to
  2. http://en.wiktionary.org/wiki/ira#IRA
or
  1. http://en.wiktionary.org/wiki/Ira - the name, might redirect to
  2. http://en.wiktionary.org/wiki/ira#Ira
-- Avanu 20:11, 15 June 2011 (UTC)[reply]
Isn't this why we have see also at the top of the page? -- ALGRIF talk 16:57, 13 June 2011 (UTC)[reply]
I support having a single page [[man]] to cover man, Man, and MAN. Not only is it not always obvious to users (and editors) that we make this distinction in general, but also, it's not always obvious to people reading a book whether a given word is capitalized because it's a capitalized word (of the sort that we would cover at the capitalized entry-name), or just for some other reason. (If I see a word at the beginning of a sentence, how am I supposed to know whether it would have been capitalized otherwise? If I'm reading the U.S. Declaration of Independence, how am I supposed to know that Providence is capitalized because it refers to G-d, and therefore is covered at [[Providence]], when most of the other capitalized words are covered at their lowercase spellings?) —RuakhTALK 18:45, 13 June 2011 (UTC)[reply]
Didn't Wiktionary used to make no distinctions between capitalization due to problems in Mediawiki? --Yair rand 19:10, 13 June 2011 (UTC)[reply]
The Wiktionaries used to be like the Wikipedias, with the first letter of a page-name automatically being capitalized. I don't know if "problems in MediaWiki" is the right phrase; I think it was an intentional feature. When they changed the Wiktionaries to allow lowercase first-letters, they didn't make such a change on the Wikipedias. —RuakhTALK 19:19, 13 June 2011 (UTC)[reply]
Some of the discussions leading up to the original decision to split entries by capitalization: Wiktionary:Capitals again, Wiktionary:Beer parlour/case-sensitivity vote, Wiktionary:Beer parlour/First letter capitalization. --Yair rand 19:30, 13 June 2011 (UTC)[reply]

I think that both solutions are acceptable: keeping the Wikipedia solution would have been acceptable (and might have been wiser, despite its drawbacks): yes, it would have been possible to address man and Man (or the adjective parisien and the noun Parisien) in the same page (but MAN must be a different page anyway, unless all letters are capitalized in page titles). Lmaltier 20:55, 13 June 2011 (UTC)[reply]

The French Wiktionary is wholly case sensitive, AFAICT other Wiktionaries where I have a handful of edits (Italian, Portuguese, Occitan) are also case sensitive. I see Ruakh's point, though. Other online dictionaries tend to have them on the same page with the headword showing whether it's the capitalized form or not. On WT:FEED you do see readers with this problem; they type in a German word and find an English one, because they forgot to enter the initial capital letter. But on the other hand, you get readers who ask why troika doesn't have a Russian section or para- doesn't have an Ancient Greek section. Mglovesfun (talk) 21:04, 13 June 2011 (UTC)[reply]

Support the proposal. Solar System and solar system should be on the same page. See also links are hard to notice and are unintuitive. --Vahag 06:40, 14 June 2011 (UTC)[reply]

The reason that several editors have interpreted this wording to mean that Wiktionary *requires* a separate entry for proper nouns is that, until June, 2005, all entries were capitalized the same as on Wikipedia. We had a long, involved discussion here about it and decided to have this feature changed on Wiktionary so that most words would be on pages spelled with lowercase, like brook, but proper nouns and Germans nouns and other words of that sort would be moved to separate pages with a capital letter, like Brook. After this feature was adjusted for us, we used a conversion script to move every article to the lowercase spelling, and then we manually moved proper nouns and German nouns back to the uppercase spelling, and over time we gradually separated all of the pages that had common and proper nouns together into separate pages. In a nutshell, the reason several editors have interpreted this wording thus is because it was our original intent.
If we’re going to merge proper and common nouns on the same page, we should simply have this feature reversed so that all pages are automatically capitalized the way it is on Wikipedia, and go back to the confusing and complicated way it used to be. —Stephen (Talk) 17:02, 17 June 2011 (UTC)[reply]
Two-ish questions. Why would you need the title of the page to be capitalized (like you say Wikipedia has it)? Can't they just be lowercase? And, I'm missing something... how is it more complicated/confusing to have all versions of a particular spelling on the same page? -- Avanu 03:42, 18 June 2011 (UTC)[reply]
Hmm, this would work by moving all contents to one, lower case page. I guess it would work for Turkish too, where entries with the dotless i would be on a separate page and any word with dotted i:s would be on the same page as any identical English word, if any exist.
How about other languages though? There are a lot of languages to be considered. What about German? Also, showing it would show the title as all-lower case, even in cases where no such word exists. E.g. armenian in English is always capitalized. Similar for German nouns.
Also, is this a suggestion to move all entries for one word to a single page, regardless of script used? E.g. should nippon, nihon, 日本 and rìběn all be one one page? What about hana, はな, 花, 鼻? misu, ミス, 御簾?
Very controversial, a lot of work, with lots of issues. Perhaps useful in the end though. But is it really worth it? Vaste 06:37, 18 June 2011 (UTC)[reply]
My proposal is primarily for the English entries (although I realize English has many loan words). Also, something like Armenian, being a proper noun would have a uppercase first letter unless there is a non-proper-noun form. Then both entires would inhabit the same entry, with the uppercase first letter spelling being indicated in that section of the entry. -- Avanu 06:52, 19 June 2011 (UTC)[reply]

I think that using capitalized titles (same rule as Wikipedia) would have several advantages:

  • in many languages, all words are capitalized in some cases, especially at the beginning of sentences. It's not always obvious that you have to enter the word as uncapitalized
  • it would be much easier when using automated tools allowing a simple double click on any word of any Internet page (or something of the kind) to get its definition (cf. Wikilook)
  • it might save many pages (maybe 100 000 or so on fr.wiktionary, especially cases such as parisienne/Parisienne)
  • it would make easier to compare the senses of a capitalized word and of the (usually) uncapitalized word with the same spelling.

The main drawback is that it makes pages longer (this drawback is important, because some pages may become very, very long with time).

To Avanu: would it really be reasonable to propose incorrect spellings such as churchill as page titles? And the rule must be the same for all words, not only English words. To Vaste: I think nobody proposed to study nippon, nihon, etc. on the same page. It would be inconsistent with the most basic principle of the project: the access key is the spelling (whatever the language of the word). Lmaltier 19:55, 20 June 2011 (UTC)[reply]

What about Vahag's example above, [[solar system]] vs. [[Solar System]]? Would we then keep both entries, one at [[Solar system]] and one at [[Solar System]]? —RuakhTALK 20:12, 20 June 2011 (UTC)[reply]
I think so (because the senses are different). But not in cases such as red fox/Red Fox, because the sense is the same, the only difference is that the capitals provide a clue about the general character of the use of the word (not an individual), which is one of the standard and systematic use of capitals. Lmaltier 16:41, 21 June 2011 (UTC)[reply]
Lmaltier, how is churchill an incorrect spelling? (unless you're trying to spell zebra) Also, why does this rule have to be the same for *all* words? If so, that seems like a silly rule. I'm just suggesting a way to make this dictionary work in a common sense and easy fashion. The idea that Brook and brook are on separate pages goes against Web usability and common sense. Why add an unnecessary click into the process? -- Avanu 21:56, 20 June 2011 (UTC)[reply]
The only correct spelling is Churchill, with a capital. The rule must be the same for all languages for simplicity and conistency, but also because the software parameter for automatic management of page title capitalization is at the project level. Lmaltier 16:41, 21 June 2011 (UTC)[reply]
So I assume that CHURCHILL is wrong then? Or what if I wrote a novel and "the Nazi spy quietly whispered the code word to me ... 'churchill', which I knew was chosen because of the prime minister's special interest in this mission." Your point is not well taken since the spelling is perfectly fine, but as our very own rule on Wiktionary says "For languages with two cases of script". In other words... 'C' is a different case from 'c' but is the exact same spelling. Except for uber-nerdom, "we like Linux because it knows its cases!", I can't see the reason there is so much resistance to a common sense approach to dictionary entries. The esoteric idea that people need to have things perfectly cased in order to get to the information they're seeking is just silly. We're trying to help people find information, not making it harder? Right? -- Avanu 01:35, 22 June 2011 (UTC)[reply]
No, CHURCHILL is used in some circumstances, and it's not a different spelling, just like using a capital at the beginning of sentences does not make a different spelling. But churchill is never used (when it's used, it's not considered as correct). Have you ever heard of a dictionary removing the capitals, or choosing an uncapitalized entry when the normal spelling is capitalized? (Webster's, maybe: it assumes that, in many words, capitalization is not quite systematic, but it stil uses capitals for some words). Do you really think that using non-standard spellings as entries helps people? Lmaltier 05:19, 22 June 2011 (UTC)[reply]
I already argued against your usability/"fewer clicks" argument. Losing the "unnecessary click" adds more unnecessary scrolling and scanning to find one item on a page of many — not to mention that you can already avoid the clicks if you look up the word with the desired case in the first place (e.g. Ira, IRA). Equinox 22:00, 20 June 2011 (UTC)[reply]
That scrolling argument is empty though. You can add anchor tags into a web page that move you directly to a section. So why the problem? And what if the version you found isn't even the one you wanted? Oops, I accidentally typed Equinox or eQuinox instead of equinox... guess I'll just have to get my spelling right until I find it. (oh, but wait Equinox already redirects to equinox, what luck! Why don't all entries just work this way?) -- Avanu 22:09, 20 June 2011 (UTC)[reply]
Just browsing around Wikipedia and saw the article Capitonym. Very interesting. -- Avanu 04:25, 22 June 2011 (UTC)[reply]
Equinox is right. The argument of "save one click = add unnecessary scrolling and scanning" is not empty just because of possible anchor tags. We would have to click on a link to reach the anchor in the first place; ergo, one anchor does not save one click. --Daniel 05:40, 22 June 2011 (UTC)[reply]
You obviously don't understand how a redirect works then. You can easily redirect to an anchor. But for some reason you want more clicks instead of less, and more pages instead of fewer. What's the logic for keeping multiple definitions on the same page then? I do not understand the mentality of those who think separate pages make any sense at all when you have the same spelling. -- Avanu 10:54, 27 June 2011 (UTC)[reply]

123abc

User:123abc seems to be able to use a massive range of IP addresses, whenever we block him, he moves on. These blocks could potentially stop valid contributors contributing. What can we do, if anything? --Mglovesfun (talk) 16:35, 13 June 2011 (UTC)[reply]

Assuming that he really is just a pure vandal — which I kind of wonder about, because he keeps trying to engage us in conversation, but I'm not qualified to judge — so anyway, given that assumption: If we use shorter-term blocks, they'll still force him to "move on" pretty often, presumably lowering his throughput, while lessening the risk of valid contributors being affected. Aside from that — users who are capable of recognizing good and bad Chinese edits, or who are capable of distinguishing his edits from other people's, should spend some time patrolling. About the only thing that I can think of that anyone else can do is maybe help develop tools to identify unpatrolled Chinese-related edits? And maybe to make it quicker/easier to block someone while patrolling? —RuakhTALK 19:15, 13 June 2011 (UTC)[reply]
I agree that there are two issues here, the initial block, which I feel uneasy about, and the constant circumnavigation of blocks. FWIW 123abc under his latest alias seemed to give up creating toneless pinyin entries, which is why he was blocked in the first place. I don't really know what his current block was for. Mglovesfun (talk) 19:21, 13 June 2011 (UTC)[reply]
I am entering high quality Mandarin toned Pinyin entries for Wiktionary (please see here and a recent example: zhōngnián). I don't understand why did you block me? Ddpy

Account admits editing through IP to avoid block, see diff. Just an FYI. -- Cirt (talk) 20:40, 13 June 2011 (UTC)[reply]

Your second blocking was made because you continuously create pinyin entries which are totally unattestable. And before you go do a Google search and post the results here let me remind you that a Google search only tells you if words are used together in a whole variety of non-archived texts. Please, for the hundredth time, I beg you to actually read our WT:CFI so you can learn what criteria we use to keep entries in Wiktionary. Let me give you a simple example - a common pinyin word like nǐhǎo has 28 results on Google Books (note: that's NOT Google Web Search), whilst words like "zhìyuànzhě" and the hundreds of other pinyin entries that you created have ZERO hits. This violates Wiktionary's criteria for inclusion - yet you keep creating them, getting blocked, changing your IP, creating them again, getting blocked again, etc, etc... then you innocently wonder why you are getting this treatment! Give me a break! ---> Tooironic 00:07, 14 June 2011 (UTC)[reply]
zhìyuànzhě (volunteer) and shāndòng (to incite) are words and attested.
Google hits: zhìyuànzhě
Google hits: shāndòng 2.27.72.254 00:22, 14 June 2011 (UTC)[reply]
How many times do I have to tell you - just because something is on Google does not mean it meets Wiktionary's Criteria for Inclusion. Are you actually reading the messages I am typing to you? ---> Tooironic 00:55, 14 June 2011 (UTC)[reply]
Creating strings of unattestable entries, in my opinion, could justify a block per WT:BLOCK "The block tool should only be used to prevent edits that will, directly or indirectly, hinder or harm the progress of the English Wiktionary." --Mglovesfun (talk) 10:48, 14 June 2011 (UTC)[reply]
  • If Tooironic believed an entry was unattestable, he should put the entry to Request For Deletion such as here. However, Tooironic just directly deleted the entry and blocked me. Eventually, Tooironic admitted having done wrong and restored the entry (please see here). 2.25.213.208 12:34, 14 June 2011 (UTC)[reply]
    • There's no need to RfD entries which are clearly unattestable by doing a search on Google Books, a durable archive. You keep on creating unattestable entries over and over again and then wonder why I occasionally delete an attestable one - you should just not create these entries in the first place! If you're going to keep evading our blocks then the least you can do is do a search on Google Books BEFORE you add an entry, pinyin, hanzi or otherwise. Your behaviour really is a joke. ---> Tooironic 12:59, 14 June 2011 (UTC)[reply]

In light of this problem, I propose that we implement a simple and straightforward rule for pinyin entries. All such entries must include at least one citation to a durably archived reference. Any such entry lacking a citation will be speedily deleted on the spot. bd2412 T 15:46, 14 June 2011 (UTC)[reply]

I disagree. Are there editors adding valid pinyin entries? If so, why should they have to do extra work just because of one jerk who doesn't care whether we want the garbage he's adding? And if not, then it's academic. —RuakhTALK 16:41, 14 June 2011 (UTC)[reply]
I think it would be reasonable to require that the corresponding hanzi entries exist first. If there is already an acceptable hanzi entry, then there should be no objection to making a pinyin entry that links to it. If the hanzi entry if found to be unacceptable, then of course the pinyin would have to go as well. —Stephen (Talk) 16:49, 14 June 2011 (UTC)[reply]
That sounds reasonable to me. As far as I know pinyin is a 'secondary' representation, sort of like how umlauts are written as following e in German, or circumflexes are written as following x in Esperanto, if the proper characters can't be written normally. —CodeCat 17:43, 14 June 2011 (UTC)[reply]
That also sounds reasonable to me. All pinyin representations are of words originating in a Chinese script. If the original doesn't exist, then the pinyin doesn't actually represent anything. Doubtless some pinyin writings can be found that errantly run together syllables that would be understood in script to be separate words, so we should have the Hanzi term before a pinyin form can be said to exist. I think the "ˈblu.ˌbɛ.ri" point made by Vaste below is instructive on this point; it only exists because blueberry exists in the first place. bd2412 T 17:52, 15 June 2011 (UTC)[reply]
That's not true — not for Chinese, and not for English. The pronunciation /ˈblu.ˌbɛ.ri/ came first, and the notation "ˈblu.ˌbɛ.ri" is a representation of that pronunciation. The spelling <blueberry> is another way of representing that word, but the word would exist even if there were no way to spell it. Likewise, pinyin is one way to represent a Chinese word, irrespective of whether there exist Hanzi for it. Obviously it's quite rare, both in English and in Chinese, for a word not to have any spelling at all; but it's not unheard-of. —RuakhTALK 18:12, 15 June 2011 (UTC)[reply]
It may be that the word blueberry was part of an oral tradition before anyone wrote it down, but no one would have bothered to create the notation "ˈblu.ˌbɛ.ri" unless and until the written word existed for the notation to explain. The average person would likely represent a word with no spelling by the closest approximation to existing words, or by onomatopoeia, before representing it by a phonetic notation likely to be understood only by dictionary writers. Similarly, there are sounds in Chinese not directly represented by any symbol, and therefore having no pinyin transliteration of the same. There simply won't be any pinyin versions of words that do not exist first in Hanzi. bd2412 T 21:44, 15 June 2011 (UTC)[reply]
I think if sometimes (or, often) Hanzi terms are attested when Pinyin terms aren't, it's because Hanzi is used significantly more to write Mandarin, and the fact that our entries reflect that doesn't seem like a bad thing at all. Mglovesfun (talk) 21:31, 14 June 2011 (UTC)[reply]
Specific example of what I've said directly above: 香皂 (xiāngzào) gets 14 600 Google Book hits, while the pinyin xiāngzào gets just one; and it's not in Mandarin, though it's a reference to the Mandarin word in running English text. Mglovesfun (talk) 21:35, 14 June 2011 (UTC)[reply]
I'd say "xiāngzào" (for 香皂) is as attestable in Chinese as "ˈblu.ˌbɛ.ri" is in English. It is not a form actually used in Chinese. Surely including them serves no other purpose than helping learners (or for indexing/searching etc)? Now, words like LZ, PK, TMD etc. are actually used, though mostly colloquially as internet slang. Vaste 03:05, 15 June 2011 (UTC)[reply]
Guys, my point of view might be different from the majority, but what I think is toned pinyin entries should be included - attestable or unattestable, as a way of conveniencing Mandarin learners. What I have problem with is ddpy/123abc's effort to duplicate the definitions - now by duplication, he doesn't actually copy them from the existing character entries, he creates his own.. often creating inconsistencies between his pinyin entries and character entries. I want to see pinyin entries as points of reference to their character counterparts (or pointers / shortcuts if you will), much like the alternative forms we have (pinyin entries are strictly speaking alternative forms), without making them hard redirects. This makes it much easier to create homophonous pinyin entries. I also have issues with him creating suffix/prefix/affix categories, as to me, there is no true affix in Mandarin (or Chinese in general) as it's not an agglutinative language like most Germanic languages. Almost everything comes in compounds and should be labelled as such. JamesjiaoTC 03:45, 15 June 2011 (UTC)[reply]
Most Germanic languages are not agglutinative. 60.240.101.246 03:57, 15 June 2011 (UTC)[reply]
You know very well what he meant, you’re just trying to lead the discussion astray. Jamesjiao, I agree with you and that was what I had in mind in my comment above. Pinyin entries should link to existing hanzi entries, and all the definitions and examples should be restricted to the hanzi entries. —Stephen (Talk) 08:24, 15 June 2011 (UTC)[reply]
Something like {{ja-def}} (hence {{cmn-def}}), or perhaps {{pinyin form of}}. My other problem is that when 123abc posts on my talk page, and I ask him a question, he doesn't reply. All we need to do is get 123abc to read WT:CFI, understand it an abide by it. I don't think it's his less than perfect English that's the problem, I think he's just refusing to play ball; you can lead a horse to water but you can't make it drink!
I also agree with Jamesjiao that we should get rid of all this "prefix/suffix" nonsense - in 90 percent of cases, Chinese does not create words like that. About pinyin entries - users can easily find pinyin via the search box, so I don't agree with adding pinyin entries based on the idea that doing so will somehow make it easier for users to find entries. The only reason I think we should keep them is to allow users to look up pinyin combinations which have multiple readings (that would be really useful!), but we should only define the individual readings at their hanzi entries. So, in short, let's make toned pinyin entries link to hanzi entries without giving any more information. Most pinyin entries, after all, can't be attested anyway. Now who can help create an appropriate template? ---> Tooironic 10:40, 15 June 2011 (UTC)[reply]
I've gone with {{cmn-def}}, per {{ja-def}} which I rather like myself. --Mglovesfun (talk) 16:57, 16 June 2011 (UTC)[reply]
Pretty simple and straight forward. I'd use the template. I reserve my opinion. Toned pinyin entries yes, but separate defintions, no; linking to character entries with this template, yes. JamesjiaoTC 04:45, 20 June 2011 (UTC)[reply]
On User talk:Engirst, 123abc is essentially saying WT:CFI should not apply to pinyin entries, and for the moment, is refusing to budge. Unless he either agrees to WT:CFI, or we agree to not apply CFI to Mandarin pinyin, then there is no middle ground. --Mglovesfun (talk) 12:44, 17 June 2011 (UTC)[reply]
Or he can be perma blocked. That's another option? Here is an example of what I want to see on all Pinyin entries: qín - single chars or compounds. Essentially it points to all character entries that have this pronunciation and an optional (very brief) main (very subjective here) definition - no parts of speech headings allowed except for the one for Pinyin. We will need to get the ball rolling here. JamesjiaoTC 04:43, 22 June 2011 (UTC)[reply]
Jamesjiao, do you mean like this or like this? (the current revision) Vaste 04:22, 1 July 2011 (UTC)[reply]

Wikimania submission on dictionaries

[[wm2011:Submissions/A Smart Dictionary. From Information to Knowledge]] may be of interest.​—msh210 (talk) 18:47, 14 June 2011 (UTC)[reply]

Ligatures in English headwords

I wonder whether there has been a discussion about whether we want to have ligatures such as "areolæ" in headwords, alongside "areolae". I cannot easily find the discussion. From what I recall, there was either a tentative decision to avoid English entries with long s (ſ), or people argued in favor of avoiding such entries.

Judging from the long existence of "fœtus" (created on 2 May 2005) alongside "foetus", it seems that headwords with ligatures have so far been tolerated as alternative spellings. Category:English terms spelled with Œ now has 277 entries, while Category:English terms spelled with Æ has 708 entries; both categories were created on 25 December 2009.

I wonder whether it is a good idea to have these entries with ligatures.

What are your thoughts on this?

Do you know of any past discussions to recommend for reading?

--Dan Polansky 10:16, 15 June 2011 (UTC)[reply]

I have no problem with alternative forms/alternative spellings where it's perfectly obvious what the word means. This could be ligatures like fœtus and fœtal, but also things like French siecle, the spelling of siècle before roughly 1780. I say keep the lot, and add any more that are attestable. --Mglovesfun (talk) 11:52, 16 June 2011 (UTC)[reply]
I agree (w/Mglovesfun). —RuakhTALK 12:13, 16 June 2011 (UTC)[reply]

Arabic Afrikaans

I think this would come to a surprise to many people, but Afrikaans was actually written in Arabic script in some Muslim communities. I can't read any Arabic, but I am familiar with Afrikaans, and I think it would be nice if there was at least some basic coverage of this. I'm not sure how this should be done, but I imagine that we could create subcategories for script in the same way we do with Chinese. Category:Afrikaans nouns in Arabic script maybe? Or we could just put the Arabic script entries in with the Latin script entries. I don't think there is a need for a category for Latin script, though. And how should the definitions themselves be formatted? A transliteration and then {{alternative form of}} to redirect to the Latin script entry? I don't really have the knowledge to contribute to this in any way (beyond being able to understand Afrikaans) but I thought it would be good to table (UK) the idea in case anyone would like to work on this. —CodeCat 13:51, 16 June 2011 (UTC)[reply]

I like the idea; by analogy Category:Serbo-Croatian nouns in Latin script, Category:Serbo-Croatian nouns in Cyrillic script also sounds good to me; could be added via {{sh-noun}}. Regarding specifically Afrikaans, it might be just as good to tag them by hand, as I suspect there won't be too many of them. --Mglovesfun (talk) 10:52, 17 June 2011 (UTC)[reply]
Right now there are no Arabic entries for Afrikaans at all. I was hoping someone would be willing to create some. :) I could copy the words from the Afrikaans wiktionary, but I would rather not do that without some way of verifying it. And their entries don't have transliterations for the Arabic writing either. —CodeCat 12:51, 17 June 2011 (UTC)[reply]
I was able to find some more sources and I've now added some entries: كُوْنِڠْ, ڨَارْلِكْ, دي, ان, بَاس, فِـَرْ, اِتْسْ, اِسْ. —CodeCat 14:09, 17 June 2011 (UTC)[reply]

Alternative forms inside inflexions

¶ Could I please have permission to include alternative forms inside the varying inflexions of terms? --Pilcrow 01:05, 17 June 2011 (UTC)[reply]

Like sayeth on the headword line for say? We've discussed that and the conclusion AFAIR was that we don't want it.​—msh210 (talk) 05:13, 17 June 2011 (UTC)[reply]
If we did, we would have {{en-verb}} do it automatically (I hope, anyway). FWIW the counterargument is that if we have terms like sayest, liketh and whatnot we should link to them. --Mglovesfun (talk) 10:57, 17 June 2011 (UTC)[reply]
I like that counterargument. Yes, I think we should link to them. --Daniel 12:51, 17 June 2011 (UTC)[reply]
Maybe we could turn this into a preference setting, or maybe a collapsible extension? Something like 'show archaic forms of terms', which could then be applied to other languages as well. —CodeCat 12:55, 17 June 2011 (UTC)[reply]
  • We should think twice before we place archaic inflected forms on many headword lines of verbs. For "acquire", this would involve adding at least four forms: "acquirest", "acquireth", "acquir'd", "acquiredst"; an expert on archaic forms could possibly provide more. --Dan Polansky 13:52, 17 June 2011 (UTC)[reply]
See travel#Verb for an example of how to link to more than one past tense. The problem with uniformly linking to -est and -eth forms is that more recent verbs like upload won't have attestable -est/-eth forms. Another problem is we'd have to manually a lot of them, a few thousand, no doubt. We could use some sort of #ifexist: syntax, but that might be 'expensive' with respect to server workload. --Mglovesfun (talk) 18:50, 17 June 2011 (UTC)[reply]

¶ I was wanting to include ‘Alternative forms’ sections inside verb forms, plurals, and the like. Here is an example I made. --Pilcrow 00:21, 18 June 2011 (UTC)[reply]

By all means, no. It reduces clarity (makes it harder to find the actual definition). Besides, it informs the reader who is interested in learning the common form about the existence of an uncommon form, an information of little use. That is, the link goes into the wrong direction. -- Gauss 12:10, 18 June 2011 (UTC)[reply]

Completely uppercase titles

If you guys are considering making the titles of all entries be case-insensitive, I suggest displaying them completely in capital letters, rather than completely in lowercase letters.

The results would include:

That way, the case-insensitivity of the titles would be much more clear, to newbies and experienced users alike.

Completely lowercase titles would be easily misleading, because they would give the impression that there are English words written like "australian" and "wilson". On the other hand, the titles "AUSTRALIAN" and "WILSON" are not misleading, because all English words can be written with capital letters, regardless of their original spellings, anyway. --Daniel 16:12, 17 June 2011 (UTC)[reply]

Yes, I prefer all uppercase to all lowercase, if I had to choose (which right now, I don't). --Mglovesfun (talk) 16:21, 17 June 2011 (UTC)[reply]
Yes. If we are merging entries like this, then the all-caps location seems to make the most sense (with redirects, natch).​—msh210 (talk) 16:28, 17 June 2011 (UTC)[reply]
Ugh! NASTY! SemperBlotto 16:30, 17 June 2011 (UTC)[reply]
This is going to break a lot of templates that depend on {{PAGENAME}}. —CodeCat 16:36, 17 June 2011 (UTC)[reply]
True. If the proposal passes, most likely the parameter {{{head}}} will effectively become mandatory everywhere. --Daniel 16:38, 17 June 2011 (UTC)[reply]
Not necessarily. If we add body.ns-0 h1#firstHeading { text-transform: uppercase; } to [[MediaWiki:Common.css]], then we can use lowercase entry-names, while still displaying the heading in uppercase. —RuakhTALK 17:11, 17 June 2011 (UTC)[reply]
German spellings with ß would have to be changed to SS (Spaß > SPASS), since ß has no uppercase counterpart. This is going to impact every other wiki, since they have copied and linked many of our common nouns and proper nouns and other words the way we have them now. Recently Mglovesfun complained about the capitalized English common nouns on Arabic Wiktionary (see Information desk)...these are artefacts copied from English Wiktionary prior to June 2005 when all of our entries were capitalized. The other wikis are not going to delete or fix all of those thousands of entries copied from English Wiktionary, they will be left as they are. There will be problems with other languages as well. In Turkish, words with lowercase i will see it changed to I (ibibik > IBIBIK), which is a misspelling. —Stephen (Talk) 17:39, 17 June 2011 (UTC)[reply]
Your comment is indented like a reply to mine, but most of its contents don't apply to my suggestion. Yes, ß becomes SS: ß — but no, that wouldn't affect other wikis. As for Turkish — well, the heading at the top of the page isn't in Turkish, or in any other language. It's a generic heading, for all entries on the page. But yeah, you're right that it's not ideal, since case-transformations are language-dependent, so, by definition, no language-independent approach can be perfect. Personally, I think we should stick to all-lowercase. That's how other online dictionaries do it, and I think most people are used to it. —RuakhTALK 17:59, 17 June 2011 (UTC)[reply]
Hm, I didn't think of dotted vs. dotless I (and similar problems). I take back my above 2c. (With interest, please.)​—msh210 (talk) 18:07, 17 June 2011 (UTC)[reply]
Why ibibik would become IBIBIK, especially if the latter is a Turkish misspelling?
Naturally, ibibik would become İBİBİK, wouldn't it? --Daniel 18:18, 17 June 2011 (UTC)[reply]
How does the program know it’s a Turkish i? When i is capitalized using English software, it becomes I, not İ. I don’t see how the program could know to capitalized Turkish words in a special way. But I am not a programmer, so maybe it is not such a problem as it seems. —Stephen (Talk) 18:43, 17 June 2011 (UTC)[reply]
Almost all the wisdom about type legibility has it that lowercase, with its descenders and ascenders, is much more legible than the single-height appearance characteristic of uppercase. Isn't this obvious? Or do I have to dig out some references? DCDuring TALK 18:13, 17 June 2011 (UTC)[reply]
The small advantage of legibility of using only lowercase letters would be immediately suppressed by the need to discern whether or not "wilson" is an English word, as explained by me at the first message above. --Daniel 19:36, 17 June 2011 (UTC)[reply]
How could you possibly know the advantage is small?
If it were small then I would expect to have seen:
  1. much more publication using only uppercase, especially in the early days of printing
  2. much less hostility to the use of uppercase in online forums
  3. some use of uppercase for headwords in other dictionaries, especially online.
Even if it were small, I would argue that even a small benefit for users trumps internal considerations and even linguistic considerations not relevant to users.
This seems to be yet another instance of proposing something for purported reasons of internal technical logic. For whom is this place being run? DCDuring TALK 20:02, 17 June 2011 (UTC)[reply]
Easy. Wiktionary is my private playground. Now that we have settled this, please answer:
How is the proposal of "all uppercase" more technically logic than the proposal of "all lowercase"? --Daniel 20:15, 17 June 2011 (UTC)[reply]
Your opinions and proposals are therefore not to be trusted.
I see no good reason for having all headwords by only uppercase or only lowercase. Orthographic distinctions are often what users seek. This whole line of discussion would be silly, were it not potentially destructive of such meaningful distinctions. DCDuring TALK 20:50, 17 June 2011 (UTC)[reply]
Good thing I didn't ask for your trust, then. The proposal of making titles case-insensitive, however, is not mine. --Daniel 20:59, 17 June 2011 (UTC)[reply]
No, Ruakh, my comment was not meant as a reply to yours. But how can the other wiki NOT be affected? Right now we do not permit common noun and adjectives copied to other wikis before 2005 to link to our entries, since the capitalization is different. We require the same spelling and capitalization. Therefore, the other wikis had to recopy the articles, which leaves them with duplicates, only one of which matches ours and is permitted an interwiki. If we make this change, the other wikis will have the capitalized entries from earlier years, and the currently uncapitalized entries, and they have both capitalized and uncapitalized varieties for words that can be spell both ways. Are we going to allow all of their capitalized and uncapitalized entries to link to our allcap entries? And are we going to have links from our allcap entries to both the capitalized and uncapitalized forms on the other wikis? Or are we going to continue to require the spelling and capitalization matching, so that all the other wikis will have to copy all of our entries for the third time. Most of the disk space on Wikimedias hard drives is going to be taken up with tens or hundreds of millions of duplicated entries as a result of our switching this capitalization rule again. —Stephen (Talk) 18:36, 17 June 2011 (UTC)[reply]
If you had been replying to my comment — my suggestion wouldn't have renamed lowercase entries; we'd still have [[man]], it would just display "MAN" at the top of the page. So interwikis would still be fine. (Interwikis would have been affected in cases like [[California]], when we moved it to [[california]], but that's not specific to the uppercase-display suggestion.) But since you weren't replying to my comment — Turkish isn't a problem, because the renaming to İBİBİK could be handled manually, or with smarter software. (That wouldn't have been possible with my suggestion, but with actual renames it's quite possible to do, as long as we're willing to put in the effort to do it.) —RuakhTALK 18:58, 17 June 2011 (UTC)[reply]
I haven't used every other Wiktionary (obviously) but as far as I know we'd be the first Wiktionary to make such a change (all combinations of uppercase/lowercase on the same page) and we'd lose consistency; other Wiktionaries will have two different entries for earth and Earth and we will have just one. It's not a primary concern, sure, but it shouldn't be wholly ignored either. --Mglovesfun (talk) 18:47, 17 June 2011 (UTC)[reply]
and if such a thing existed, eArth, the Medieval historical Web portal primarily focused on the exploits of King Arthur. -- Avanu 19:25, 17 June 2011 (UTC)[reply]
Another alternative, which may require software changes, is to have the title of the page simply be of the same case as the link you followed to get there (or the term typed in the search box). This way it always appears the way that the reader expects, although some of the senses appearing therein may seem irrelevant. Links will not be broken as long as the software is configured to make page titles case-insensitive (so any casing of the title would link to the same page), which again is possible but may require software changes. Dcoetzee (talk) 23:35, 17 June 2011 (UTC)[reply]
yes, Wikipedia already has that, but only for the first letter, so it (the software) considered earth and Earth identical. Mglovesfun (talk) 23:41, 17 June 2011 (UTC)[reply]
I still don't see this (this proposal, if it were enacted) as a victory for usability. If there were a vote on the issue, and right now that looks unlikely, I think I'd oppose it unless there were clear long term benefits that outweighed a few years of having to fix entries, and a few years of regular Wiktionary users being very confused. Mglovesfun (talk) 23:45, 17 June 2011 (UTC)[reply]
  • I can't believe this is actually being discussed. In order to accommodate for 0.1% of same-language words that differ merely in capitalization, are we going to break the proper spelling of 99.9% of others? Have you all gone mad? If you're so bothered with duplication existing on entries such as [[earth]] and [[Earth]], just merge them normally, with some soft-redirect from capitalized to the uncapitalized form. --Ivan Štambuk 05:57, 18 June 2011 (UTC)[reply]
Why did all of you forget that Wiktionary (or actually, the SQL software that hosts this wiki) has no post-Unicode 3.2 case pairs? Stuff like ɫ -> will never work, period. -- Prince Kassad 09:48, 18 June 2011 (UTC)[reply]
I agree with Ivan Štambuk, it’s folly to think of changing this back to the way it was in the early days (I’m sure there are some here besides me who remember how it was in the old days in case-insensitivity). These issues were discussed three ways from Sunday back in 2005 and our current treatment seems much simpler and more logical than what we had to deal with in those days. If the {{also}} solution seems clumsy and unintuitive (as someone said above), just work on that narrow issue and leave the case-sensitivity as is. Perhaps instead of {{also}} we could put a big graphic button that leads to a dab page. —Stephen (Talk) 10:09, 18 June 2011 (UTC)[reply]
I agree with Stephen and Ivan Štambuk. Furthermore, it seems to me that if we were going to merge entries, it would seem to me more useful to merge euenhede and evenhood (the same word), than to merge Ira (a name), IRA (a group), and ira (a word). I wouldn't merge any of it, though, in the way being discussed — I would continue to use advisories like {{also}}, or the graphics Stephen suggests, to point at other capitalisations. - -sche (discuss) 18:38, 18 June 2011 (UTC)[reply]
I think Ivan's argument is the best, simplest common-sense reason not to implement this. --Mglovesfun (talk) 11:29, 20 June 2011 (UTC)[reply]
Yes, I'm afraid I mentioned this idea, but I also disagree. ira and Ira could be merged, not IRA. Lmaltier 18:29, 20 June 2011 (UTC)[reply]

Ben Zimmer at The Word (Boston Globe)

`The Word' is usually written by Jan Freeman, but she's recently been sharing duties with Erin McKeen, former dictionary editor. Now Ben Zimmer, late of the NYT Magazine's `On Language' column seems to have joined the team. He's got a nice piece in Sunday's paper (yes, I know it's not Sunday yet, but that's the date on the article.) Check it out.--Brett 02:17, 18 June 2011 (UTC)[reply]

Thanks. Way more interesting and important than categories. DCDuring TALK 02:30, 18 June 2011 (UTC)[reply]
For the purposes of this huge new dictionary, Gove had set down a rule that all definitions be written as one-phrase statements, carefully organized according to principles of analytic logic. - And that's how it is should be done. Having usage notes, examples of usage in collocations, and any other additional material that facilitates learning/understanding of a word is not mutually exclusive with this basic principle of defining the word. I'd rather have a convoluted but a farily comprehensive definition, with as much sub-senses that can be discriminated, than a vague dumbed-down one lacking precision and missing some specific but important details. --Ivan Štambuk 05:46, 18 June 2011 (UTC)[reply]

Diacritical marks of various languages

I created Catalan, French, Portuguese and Spanish sections for the entry "´". --Daniel 11:42, 18 June 2011 (UTC)[reply]

Looks good. Mglovesfun (talk) 11:43, 18 June 2011 (UTC)[reply]
Thanks. --Daniel 11:45, 18 June 2011 (UTC)[reply]
Is there a way to explain what the mark actually means in that language? It might be useful to indicate that the acute accent denotes a stressed close or mid-close vowel in Catalan, or that it makes a consonant palatal in Polish. —CodeCat 11:48, 18 June 2011 (UTC)[reply]
I wouldn't mind doing that, but perhaps this information would be better displayed at the entries of "complete" characters, such as á or ú. For starters, they are supposed to contain audio and IPA eventually, anyway. --Daniel 11:54, 18 June 2011 (UTC)[reply]
Maybe, but something like palatalisation is a useful hint. It wouldn't really be very obvious if Ć just said 'palatal affricate' (and right now it doesn't) without mentioning that the acute accent palatalises many other consonants beside Ć. The same applies to other diacritical marks that indicate a specific pronunciaton feature, as well. —CodeCat 12:01, 18 June 2011 (UTC)[reply]
Ideally, the entries Ć and ć (both, presumably) are supposed to contain an etymology, a pronunciation section, a few written examples, audio examples as well, and something like "the acute accent (´) palatalizes consonants like this one". By placing that information only there and not at ´, we are not losing information, just organizing it.
That said, if you would like to mention things like these at ´ as well, though, I suggest creating an "Usage notes" section for that. --Daniel 12:13, 18 June 2011 (UTC)[reply]
I've seen the new "usage notes" sections for diacritical marks. Very good. --Daniel 15:38, 18 June 2011 (UTC)[reply]
Oh, and as an aside... it seems that Źź is missing from the Latin/Roman list of characters at the bottom. —CodeCat 12:02, 18 June 2011 (UTC)[reply]
Erm, no. Or at least I don't think so. AFAIK (not knowing Catalan, French, Portuguese, or Spanish), these languages use not ´, the spacing acute accent, but rather the combining acute accent.​—msh210 (talk) 05:09, 19 June 2011 (UTC)[reply]
I believe that, from the point of view of a computer, the Portuguese word cáspite, despite being written with an acute accent, includes neither the "spacing acute accent" nor the "combining acute accent". The character "á" is a standalone character; it encompasses only 1 byte.
Moreover, I believe that ´ is the perfect place to contain its current contents. Its name in Unicode (according to the entry) is "ACUTE ACCENT", after all. --Daniel 08:15, 19 June 2011 (UTC)[reply]
Yes. Wiktionary is written by human beings for human beings after all. If the computer doesn't recognize that "acute accent" and "combining acute accent" are the same character and that á is a combination of that character with the letter a, that's its problem, not ours. Human beings know perfectly well that when you put a ´ on top of an a, you get á. —Angr 16:57, 19 June 2011 (UTC)[reply]
msh210 is certainly right that these languages are using the combining acute accent (@Daniel: Unicode, which is what MediaWiki uses, defines "á" as equivalent to "a" + the combining acute), but I think we should simply redirect from the combining-acute-accent to the spacing-acute-accent entry, which can explain both, since the former causes weird browser behavior when it's used out of context. (And also for some of the reasons that Angr gives.) —RuakhTALK 18:55, 19 June 2011 (UTC)[reply]
This idea seems big, uncontroversial and very good. These are the typical ingredients that make good votings, so I created this one. --Daniel 00:11, 20 June 2011 (UTC)[reply]

In hero call and crying call I've deliberately linked to call#poker. It would be possible to make {{context}} provide anchors for such links; call has a lot of meaning, and while to some who players poker, the meaning of call is unambiguous, to someone who doesn't, they're gonna want to find the poker meaning of call directly in one click. Thoughts? --Mglovesfun (talk) 18:14, 18 June 2011 (UTC)[reply]

Splitting Serbo-Croatian categories by script

This proposal was mentioned before, so I thought maybe we should see if there is support for this. Should we split the categories for the Serbo-Croatian language(s) into two each, one that has "in Latin script" and one that has "in Cyrillic script" added at the end? —CodeCat 10:35, 19 June 2011 (UTC)[reply]

There are other languages written in multiple scripts, why focus on Serbo-Croatian? If this should be decided by vote, it should be done generally.
Another thing: Serbo-Croatian has been historically also written in Glagolitic and Arabic script. We don't have any words in them yet, but they will be added in the future once certain issues are settled that have been discussed in the past (e.g. whether the distinction between uppercase and lowercase Glagolitic that exists in the Unicode but not in the real world should be made, how to standardize spellings because there were no official rules of orthography beck then etc.)
Regarding this and similar categorizations: I'm against them all, because they are utterly pointless. Cyrillic and Latin scripts do not overlap in category listings, and it's trivial to skip unnecessary letters using the TOC template. What this will do is simply duplicate existing categories without providing any real benefits. We already have countless irrelevant categories that nobody actually uses for browsing that need to be deleted (such as "nouns by gender" in some languages, and all the subcategories in "XXX verb forms"). --Ivan Štambuk 10:53, 19 June 2011 (UTC)[reply]
Instead of splitting them, how about as additional categories? Keep all verbs in Category:Serbo-Croatian verbs but also allow Category:Serbo-Croatian verbs in Latin script. Mglovesfun (talk) 10:57, 19 June 2011 (UTC)[reply]
But, why split existing or at these additional categories at all? Because it feels good? If it serves no purpose, it shouldn't exit.
The only way to do this is to manually add the respective category, because there is no way to detect the script of the {{PAGENAME}} programmatically. If this should be applied to topical categories as well, special-handling code for SC and sc= parameter should to added to every categorizing template and its appropriate call in every instance of their invocation in SC entries - which is just too much work for nothing.
I agree that some of these categories are quite pointless. The categories for Category:Bulgarian noun forms and Category:Bulgarian adjective forms are probably good examples (and I know I created them, but that was because I was deleting older categories that were even worse). But I don't think script categories are pointless. Something like a TOC is a lot less obvious to new users than a category called "in Latin script". —CodeCat 11:00, 19 June 2011 (UTC)[reply]
TOC is the first thing the users see, at the top of the category page. Categories themselves are inconpicuous links at the bottom of the page. I can't fathom how one could see the latter and not the former. --Ivan Štambuk 11:13, 19 June 2011 (UTC)[reply]
I agree with Ivan. Splitting by script is useful when there's overlap; if we don't split Mandarin by Simplified vs. Traditional, then a user interested in just one will have to sift through the other. But when there's no overlap, I don't see the need. (I'm not actively opposed, mind. I just don't see the need.) —RuakhTALK 13:23, 19 June 2011 (UTC)[reply]
As I said at Wiktionary:Grease pit#Distinguishing categorised and not-categorised scripts, I don't see the point either. The category's TOC already sorts the entries into Latin and Cyrillic script. This proposal seems like increased complication for no added benefit. I also don't see the point of voting on the issue. Decisions at Wikimedia Projects are made by discussion and consensus-building, not by voting. —Angr 16:53, 19 June 2011 (UTC)[reply]

Splitting Serbo-Croatian categories by script — Support, split the categories

  1. Support --Daniel 10:46, 19 June 2011 (UTC)[reply]
    I don't speak Serbo-Croatian, so I'm happy either way. If Serbo-Croatian speakers support this proposal, I go along with them. If they don't support it, I don't, too. --Daniel 11:17, 19 June 2011 (UTC)[reply]

Splitting Serbo-Croatian categories by script — Oppose, keep the scripts together

  1. Oppose Chinese can use its own system if so desired, but this doesn't really work and shouldn't be applied to other languages. -- Prince Kassad 16:56, 19 June 2011 (UTC)[reply]
  2. But, why split existing or at these additional categories at all? Because it feels good? If it serves no purpose, it shouldn't exi[s]t. = good argument. It is more convenient for the reader to have both Cyrillic and Latin scripts available to choose from on one page, instead of having to jump around from one to the other. The current category splits the scripts very conveniently. Time making the proposed alterations would be better spent expanding Serbo-Croat content. Tempodivalse [talk] 17:07, 19 June 2011 (UTC)[reply]

Japanese acute accent

The Japanese translation of the word water is romanized as "mizú".

What is the purpose of the acute accent in Japanese? --Daniel 21:39, 19 June 2011 (UTC)[reply]

See Stephen G. Brown's first comment at #How to mark Japanese Pitch Accent, revisited (for batch import). —RuakhTALK 22:33, 19 June 2011 (UTC)[reply]
It is a pitch accent and it can affect the meaning. For example, hana desu , ha desu means "is it a nose or is it a flower" ( = hana = nose; = haná = flower). —Stephen (Talk) 12:33, 20 June 2011 (UTC)[reply]

That sounds clever. Thank you. --Daniel 16:35, 21 June 2011 (UTC)[reply]

Wow, there seems to be ~400-500 translations with such accents. They're not terribly consistent and could probably use some clean-up. Also, isn't it a weird place to put the information? (Why not in the entries themselves?) Still nice though. Vaste 04:01, 1 July 2011 (UTC)[reply]

Renaming proto-language codes

So far, the practice on Wiktionary has been to use the code for language families for their common ancestor language as well. And this has worked, mostly. But now I've run into a problem because I'm creating a template that has a parameter that can be either a language or a family. And it has to be able to tell them apart. This means that the practice of using {{proto:gem}} for Proto-Germanic and {{etyl:gem}} for the Germanic languages no longer works. I've proposed changing this before, but only now I realise how much of a problem it really is. So I would like to rename proto-language templates from {{proto:gem}} to {{proto:gem-pro}} and likewise for others. —CodeCat 12:52, 20 June 2011 (UTC)[reply]

Re: "I'm creating a template that has a parameter that can be either a language or a family": That seems like a mistake. What is the template for? —RuakhTALK 15:58, 20 June 2011 (UTC)[reply]
Derivation categories, which need categories both for terms derived from individual languages and from families. It works well so far, the only problem is that it gets categories for terms derived from proto-languages wrong, because it assumes the code is a family rather than that family's proto-language. I realise there is always the possibility for overlap, but it seems that the ISO 639 standard was designed so that a code always uniquely identifies only a language or a family, never both. —CodeCat 16:15, 20 June 2011 (UTC)[reply]
Is it desirable for a single template to handle both language-family derivation-categories and language derivation-categories? Those two cases seem very different to me. —RuakhTALK 20:20, 20 June 2011 (UTC)[reply]
{{etyl}} does that, too, and has never caused any problems. I don't really see a difference between Category:nl:English derivations and Category:nl:West Germanic derivations. And the new categories (determined by vote) with the new template are Category:Dutch terms derived from English and Category:Dutch terms derived from West Germanic languages, which are the same as well. I see no reason for having two templates to create a single unified category tree. —CodeCat 20:27, 20 June 2011 (UTC)[reply]
  • Please provide more details:
    1. What existing templates are you trying to replace.
    2. What affect would the proposed change have on unnamed parameters for language families and proto-languages in {etyl}, {proto}, {lx} and others. Which would change and which not.
    3. Since derivations from language families are much rarer than from proto-languages, why not simply default to proto-languages and provide special syntax for families instead? --Ivan Štambuk 19:00, 21 June 2011 (UTC)[reply]
    • The template is {{derivcatboiler}} and is needed as a result of this vote. It will replace {{topic cat}} for such categories, so there will still be a lot of family categories. The change would not really have an immediate effect, because it would just mean redirecting {{proto:gem}} to something else (and likewise for other proto-codes). Once that move has been made, all instances of the original code will need to be changed, but that should be fairly easy because most uses of these codes are either in category boilerplate templates or in calls to {{lx}} and {{termx}} (there are about 1500 transclusions of {{proto:gem}}, the most-used code). Once all uses of the old code have been replaced, the redirect will be deleted so that the code gem refers only to the Germanic language family, and not to Proto-Germanic. —CodeCat 19:13, 21 June 2011 (UTC)[reply]
      I support the proposal of splitting the codes. Germanic and Proto-Germanic are different things. --Daniel 19:25, 21 June 2011 (UTC)[reply]

Esperanto x-system spellings allowed?

While looking through Index:Esperanto I saw that some words are listed with the "x" and "h" system spelling variants (e.g., look at Esperanto translations for shampoo). Since these are inherently nonstandard, should we disallow them? A look through WT:About Esperanto and other EO-related pages doesn't reveal any past precedent. Tempodivalse [talk] 22:03, 20 June 2011 (UTC)[reply]

I think if they are attestable in the same way as other spellings, they can be included. —CodeCat 22:05, 20 June 2011 (UTC)[reply]
What he said. --Mglovesfun (talk) 22:07, 20 June 2011 (UTC)[reply]
Ummm...? —CodeCat 22:08, 20 June 2011 (UTC)[reply]
We don't usually do transliterations, and that's all this is. They're useless; everyone knows how to transliterate out of these systems for the Wiktionary entry, and how to transliterate into these spelling systems when you're dealing with a system that can't handle the accents. It's universal; there's no point in only having entries for spellings we can attest, since for every word, ĉ becomes cx.--Prosfilaes 02:17, 21 June 2011 (UTC)[reply]
Yes! That's what I was thinking, but you explained it better. So can I start going around and removing those pesky "transliterations" from the index? Tempodivalse [talk] 15:33, 21 June 2011 (UTC)[reply]
If we decide not to add them, we should at least allow the search function to find them that way. Currently, if you search for "cxiu" it doesn't find anything, but if you look for "ciu" it works. —CodeCat 15:52, 21 June 2011 (UTC)[reply]
The Esperanto Wikimedia projects have such a system implemented. I'd like to have it here, but am concerned that it will conflict with other languages (for instance, "aux" would need to show both the French aux and the Esperanto , instead of defaulting to the latter). Tempodivalse [talk] 16:07, 21 June 2011 (UTC)[reply]
Well, if the search function already considers "c" to be a variant of "ĉ" and "ae" to be a variant of "æ" and "ä", couldn't the same be done to treat "cx" as a possible variant of "ĉ" too? —CodeCat 16:46, 21 June 2011 (UTC)[reply]
Yes, I would say, go ahead and remove those "transliterations": they are not part of official Esperanto AFAIK. —AugPi 18:23, 21 June 2011 (UTC)[reply]
AugPi, if that's your argument - that they are not part of official Esperanto - I want you to go nominate every slang term on wiktionary for deletion.
I don't necessarily think we *really* need entries for these, although I do believe they should be mentioned in the articles - under ===Alternative forms===, maybe unlinked (though I suppose that would be somewhat against "all words all languages"). These forms aren't uncommon, as the letters they represent in Esperanto aren't common on most keyboards. If you don't have them on your keyboard, which you probably don't, you use the x forms. I've only seen two sites that switch what you type for the "proper" forms automatically - the Esperanto Wikipedia and one Esperanto forum. These forms are not negligible, and they're one of the most important things for beginners to learn about Esperanto orthography. — [ R·I·C ] Laurent23:37, 22 June 2011 (UTC)[reply]
I don't think entries are really needed, because they are essentially predictable orthographic varieties. In German, ä, ö and ü can replaced with ae, oe and ue in the same way. But I do think that the search should support these varieties, so that searching for cx finds words with ĉ, just as searching for "gruen" finds the German grün. —CodeCat 23:46, 22 June 2011 (UTC)[reply]
I've wanted this for a long time. How would we go about implementing it? Tempodivalse [talk] 00:58, 23 June 2011 (UTC)[reply]

This is a list of the 500 longest lines in Wiktionary starting with '#:*'. Any quotation starting with #:* is wrong, but it's impossible to distinguish quotations from other things that might conceivably use this format, which is why I sorted by length. Nadando 01:07, 21 June 2011 (UTC)[reply]

Details of saints in etymology.

I've taken to removing mentions of famous saints from name entries, (Monica, Ciaran etc) and have had my edits reverted with the following left on my talk page. (I've no problem discussing it, but feel it's best for a wider audience).

  • There is no data of the name Monica before St. Augustine's Confessions. The mention of the saint is quite essential to the etymology. Ideally all etymologies should say when the name was first assessed- compare Per. That's why saints are put into etymologies. Do you have written proof that the name Ciarán existed before St. Ciarán of Saigir? I will put back that saint too unless you provide evidence. I've the impression that you edit on basis of your personal experience - and that's quite valuable in discussion rooms about English usage - but questions of etymology and frequency must be based on dictionaries and statistics. --Makaokalani 09:09, 21 June 2011 (UTC)[reply]

My reply - Mentioning that Saint Monica was the mother of Saint Augustine is encyclopedic and does not belong in a dictionary. It does not pertain to the etymology which is to discuss how the word originated. It may well be the first mention of the name, but that is still not etymology. Unless there is a clear consensus to include this, (and I can't for the life of me see how it could be justified), then this information should be removed.--Dmol 10:19, 21 June 2011 (UTC)[reply]

Popularized by 'Saint X' perhaps? Don't people only become saints after their death? So the names most exist before that. But 'popularized by'... don't seem 'encyclopedic' to me in an etymology. --Mglovesfun (talk) 10:21, 21 June 2011 (UTC)[reply]
The only reason the obscure Punic word Monica entered the English language is that it was borne by St. Augustine's mother. This information belongs very well in the etymology section. Do not remove it. --Vahag 11:57, 21 June 2011 (UTC)[reply]
+1 —RuakhTALK 15:18, 21 June 2011 (UTC)[reply]
I agree with Vahag. This does not apply to all given names, not even to all those with an associated saint. Names mentioned in Old Testament are similar, though one wonders how many Hebrew names were not mentioned in the Old Testament. DCDuring TALK 16:16, 21 June 2011 (UTC)[reply]
Many. (And I, too, agree with Vahag.)​—msh210 (talk) 18:43, 21 June 2011 (UTC)[reply]

"Variations of..." namespace

I suggest shortening and simplifying appendices of variations this way:

--Daniel 18:43, 21 June 2011 (UTC)[reply]

Could something like this be turned into a dropdown menu somehow? That would be even nicer I think. :) —CodeCat 18:51, 21 June 2011 (UTC)[reply]
If you're thinking what I'm thinking, that seems like a good idea. However, drop-down menus involve JavaScript, so I would advise asking for the help of any of our resident JavaScripters (I'm a humble templatizer.) --Daniel 22:31, 21 June 2011 (UTC)[reply]
I'm thinking of something like the move/delete buttons for administrators, but with text rather than just an arrow, and placed to the left. —CodeCat 22:34, 21 June 2011 (UTC)[reply]
Maybe placing a button only there would be a bad idea, because the area above the title of the entry is basically reserved for editors, and ignored by other readers. However, we can repeat the button somewhere more readable for readers if necessary. I don't know yet; there are many possibilities of design. I would, at least, have to see the new button(s), if they are created, to give more accurate opinions. --Daniel 05:43, 22 June 2011 (UTC)[reply]
The Serbian Wikipedia and Wiktionary have a dropdown box there to switch between scripts. I think we could do it in a similar way. —CodeCat 16:50, 22 June 2011 (UTC)[reply]
I didn't know that! OK, I checked one page (http://sr.wiktionary.org/sr-ec/%D0%B0%D0%B1%D0%B0) and my fear has been confirmed. In my opinion, the idea of adding a drop-down list up there is good for anyone who knows where to look for it. Otherwise, the menu would be effectively hidden, much like the "Citations" button. Good thing we have {{seeCites}} in the middle of the applicable entries, as an additional clue. --Daniel 17:45, 22 June 2011 (UTC)[reply]

I created Wiktionary:Votes/2011-06/Disambiguation: namespace. --Daniel 17:55, 22 June 2011 (UTC)[reply]

Anagrams from Dutch compound words?

See note at: http://en.wiktionary.org/wiki/Talk:achterblijven regarding Anagram at: http://en.wiktionary.org/wiki/achterblijven#Dutch

This is like listing an anagram of "mail box" at entry "mailbox" except the Dutch language is very heavily loaded with compound words. For example, the separable verbs: http://en.wiktionary.org/wiki/Category:Dutch_separable_verbs -- would we want to break all of these into the two words comprising them and tag them as anagrams? (I am guessing not, but if so, it could probably be done quickly to all of them with a small script of some sort.) Neededandwanted 23:51, 21 June 2011 (UTC)[reply]

This is not an issue: mail box is not an anagram of mailbox. Anagrams are words with the same letters in a different order. In French, prisa, pairs, ripas and prias are anagrams of Paris, but paris is not an anagram. Lmaltier 16:48, 23 June 2011 (UTC)[reply]
I think the issue is a little different in this case. In Dutch, there are verbs that are composed of a separable part (usually an adverb) and a main verb. In the infinitive, the separable part comes first and is attached to the verb: achterblijven (achter-blijven). But in some inflected forms, the parts swap places and are separated with a space: blijven achter. There are many of these verbs, and according to the current practice those two verb forms would always be anagrams of each other. —CodeCat 17:10, 23 June 2011 (UTC)[reply]
So, it's an issue, but not really a problem: there is not much harm if the anagram bot includes these "anagrams". Lmaltier 19:16, 23 June 2011 (UTC)[reply]
Another example of countless true anagrams formed in a systematic way: in French, verb forms such as tomberai/retombai, tomberons/retombons, etc. Lmaltier 05:15, 24 June 2011 (UTC)[reply]
Agreed on 'issue... not really a problem' but Disagreed on retombai/tomberai -- these are two different words with different meanings. They have taken advantage of a prefix 're-' and suffix '-er' and are nearly as different as the English 'return' and 'turner.'

The compound words I am referring to certainly include all separable verbs. Two separate words are combined unaltered otherwise to form another 'word' but the Dutch speakers don't even think of them as being a new word, just the non-separated form of the verb. It is just another form of the same verb. There is no direct comparison in English. http://en.wikipedia.org/wiki/Separable_verb So,the compound words are a bit of an extreme case of non-anagrams: they are not a word or phrase created from ANOTHER word. There is no different meaning. It is entirely a grammatical construction.Neededandwanted 04:36, 28 June 2011 (UTC)[reply]

toned pinyin entries - hanzi redirects

Based on the discussion we recently had about User:123abc, and the fact that most toned pinyin entries can't be attested according to Wiktionary's criteria for inclusion, I'd like to propose we make toned pinyin entries merely list and link to hanzi entries, rather than give definitions. For example, right now yánlì only lists "沿例... follow precedents" as one reading, but we should change it to a list of the different hanzi readings without giving any definitions to avoid duplication, e.g., something like:

  1. (pinyin reading of) 嚴厲 (trad.), 严厉 (simp.)
  2. (pinyin reading of) 妍麗 (trad.), 妍丽 (simp.)
  3. (pinyin reading of) 沿例
  4. (pinyin reading of) 岩櫟 (trad.), 岩栎 (simp.)
  5. (pinyin reading of) 沿歷 (trad.), 沿历 (simp.)

Now who can make a pretty template for us? ---> Tooironic 01:18, 22 June 2011 (UTC)[reply]

  • "Google Boooks" are not the only means for attestation. We shouldn't use "Google Books" as an excuse to ban Pinyin enrties. "Mandarin pinyin" likes "Min Nan pinyin", "Min Nan pinyin" sometimes also doesn't pass the "Google Books check" (Please see here). Anyhow, Pinyin entries are allowed by the rule of Wiktionary. If someone wanted to ban Pinyin entries, first of all, Wiktionary should has a new rule instead. However, a rule shouldn't be abolished rashly, otherwise Wiktionary will be harmed. Engirst 21:30, 22 June 2011 (UTC)[reply]
    • It's not about whether or not pinyin entries are allowed, it's about whether they are attestable per our CFI, which I'm sure you haven't read. Most pinyin entries cannot be attested, and the same rule is applicable to every word in every language. ---> Tooironic 01:21, 23 June 2011 (UTC)[reply]
{{pinyin reading of}}: when the second parameter is not present it displays # 3 above- is this the correct behavior? Nadando 01:44, 22 June 2011 (UTC)[reply]
I'm not sure what you mean. By the way is it possible to incorporate (trad.) and (simp.) into the template? This would make it a lot easier to edit. ---> Tooironic 01:53, 22 June 2011 (UTC)[reply]
I wouldn't use brackets for the text. I would use something like what {{form of}} uses instead. —CodeCat 09:43, 22 June 2011 (UTC)[reply]
He means the first parameter is traditional, the second parameter is simplified, and if the simplified (second parameter) is left empty, then you get #3, meaning that it is both traditional and simplified. —Stephen (Talk) 11:00, 22 June 2011 (UTC)[reply]
Can someone make the changes at yánlì? I'm not really sure how to do it. ---> Tooironic 10:18, 24 June 2011 (UTC)[reply]
Does it look good now? Do you like this approach? (The current version of {{pinyin reading of}} is not set in stone; it's just one way to do it.) —RuakhTALK 12:44, 24 June 2011 (UTC)[reply]
I changed the heading to Pinyin. For some reason, the category shows up as Mandarin pinyins instead of Mandarin pinyin. How do I change that (Mandarin pinyin is an existing category). We will also need to make this into a rule just for 123abc's peace of mind. So that he no longer has any excuse for creating pinyin entries his way. Oh, btw, this looks exactly how I wanted it!JamesjiaoTC 22:04, 28 June 2011 (UTC)[reply]
I've modified {{pinyin reading of}} to handle the categorization, so for the headword line you can just do {{infl|cmn|sc=Latn}}. (It is possible to override the categorization in {{infl}}, by doing something like {{infl|cmn|cat=pinyin|sc=Latn}}, but that doesn't seem necessary here.) —RuakhTALK 23:52, 28 June 2011 (UTC)[reply]
I think it looks really good. Shall we start a vote on this? Make this a standard practice for Mandarin entries on Wikt. JamesjiaoTC 23:25, 29 June 2011 (UTC)[reply]

No categories

A dictionary doesn't need categories, and the current development is running amok, so I'll propose that we limit the use of the category name space to maintenance issues.--Leo Laursen – (talk · contribs) 08:12, 23 June 2011 (UTC)[reply]

What about making more of them hidden so that at least users don't see them until there is more defensible logic behind each individual? Wiktionary might also benefit from some kind of criteria and review process for category membership.
Some visible elements of the category structure are helpful for making things like specialized glossary indexes, eg, the context-tag-produced categories that reflect specialized usage contexts for technical terms. The categories function just about right for that purpose. DCDuring TALK 15:10, 23 June 2011 (UTC)[reply]

We need categories. Two basic examples:

  • Without categories, how would you search a Japanese word if you don't know how to enter Japanese characters with your keyboard?
  • If you know the name of a fish, but cannot remember it, how would you find it without a category dedicated to fish names in the language?

Of course, there are many other uses of categories. Lmaltier 16:42, 23 June 2011 (UTC)[reply]

Just a related note: Wiktionary:Beer parlour archive/2011/February#Poll: Deprecation of topical categories failed. People want some categories. --Daniel 16:50, 23 June 2011 (UTC)[reply]
If a category contains more than 200 entries, it shouldn't really exist. Overwhelming majority (my free estimate >90 %) of the many thousands of existing categories are used by basically no one. IMHO they should be replaced by bot-generated specific indexes (i.e. lexicons) similar to those already used to generate Index:All languages. These indexes could then be fine-tuned for a specific purpose. E.g. in case of topical indexes for foreign languages they could contain a definition gloss. In case of etymological derivations the respective etymon(s) could be added and used for grouping. And we could also have reverse indexes for translations generated from English translation tables. --Ivan Štambuk 18:27, 23 June 2011 (UTC)[reply]
A search for "wiktionary category" [without quotation marks], in Google Groups, returns 39.300 results.[2] --Daniel 18:36, 23 June 2011 (UTC)[reply]
Most of these are false positives. Google search results are also not indicative or actual usage (usage as in "learning words by browsing the category". Randomly clicking a single entry is not using the category.). Note also that I do not advocate abolishing categories, simply superseding them by a superior presentation format customizable through wiki markup. They are useful, but this could be done so much better. Even the simple format used for our meager glossaries beats them. --Ivan Štambuk 18:48, 23 June 2011 (UTC)[reply]
Simply excluding "wikipedia" cuts the raw count to 10,400. Most of the usage has nothing to do with Wiktionary categories. Of the portion that does, a great deal has to do with grammatical and register categories. There is little evidence there or elsewhere to support user enthusiasm for topical categories. It doesn't come up as a user complaint on Feedback or on Wiktionary discussion pages. DCDuring TALK 19:01, 23 June 2011 (UTC)[reply]
@DCDuring: Our topical categories typically are barely usable and justifiable, so I wouldn't expect much positive recognition for them. I see some usage of topical categories in the initial results, such as "food" and "theology", nonetheless. --Daniel 19:15, 23 June 2011 (UTC)[reply]
Note that categories are not just intended for users - they're also there to faciliate bot operation and indexing for creating dictionary databases based on our data. Therefore, no categories should be abolished without careful examination of the consequences for automated scripts. -- Prince Kassad 18:55, 23 June 2011 (UTC)[reply]
I'd be fascinated to find out about such bots. Where have the bot owners been communicating their needs? DCDuring TALK 19:05, 23 June 2011 (UTC)[reply]
I personally use categories, even very large ones. Yes, even categories with 1 000 000 entries are useful, when there are links at the beginning of the category to make access easier. I have never used them in my bots, but they may be very useful to bots. Lmaltier 19:12, 23 June 2011 (UTC)[reply]
There's a weird predicament where small categories and large categories aren't very useful. I have used categories for non Latin script languages like Russian where I can't easily type the word, and typing in Category:Russian adverbs and looking at the category is much quicker. As for bots, yeah I've done that with MglovesfunBot, but only really to update categories rather than fix non-category problems. --Mglovesfun (talk) 17:50, 24 June 2011 (UTC)[reply]
I use categories a lot for maintenance work, and also to see if there are any words missing or to see if entries aren't formatted the way they should be. —CodeCat 11:09, 25 June 2011 (UTC)[reply]

I was worried about the tendency to regard a dictionary as an extended encyclopedia of words, and the inherent misconception that words in the daily language can be defined rigidly like a scientific term. The current fixation on categories seems to emphasize that. I do mean that categories are superfluous, but naturally it was mostly an expression of my exasperation. Anyway I've left the project. It was fun, while there was a chance it would develop in the right direction. Thank you.--Leolaursen 08:07, 25 June 2011 (UTC)[reply]

Bye. --Daniel 15:40, 1 July 2011 (UTC)[reply]

Poll: Chinese script or Han script

The poll is here:

WT:RFM#Categories ending in "in traditional script" to "in Traditional Han script".

--Daniel 16:57, 23 June 2011 (UTC)[reply]

Transferring interwikis of categories

I had a conversation with Malafaya (in Portuguese, here) about interwikis of categories.

According to the conversation, in order to keep the interwikis, we should do any of these things, or both of them:

  1. Either copy the interwikis manually.
  2. Or turn the old categories into redirects.

For example, "Category:fr:Spanish derivations" could contain this:

#REDIRECT [[Category:French terms derived from Spanish]]

[[es:Categoría:FR:Palabras de origen español]]
[[fr:Catégorie:Mots français issus d’un mot espagnol]]
[[pt:Categoria:Vocábulo de étimo espanhol (Francês)]]
[[ru:Категория:Слова испанского происхождения/fr]]

That way, all the interwikis should eventually be transferred by MalafayaBot. After that, the old empty categories can be deleted. --Daniel 18:58, 23 June 2011 (UTC)[reply]

Would it be possible to use a bot to add 'soft' redirects to all of the old categories instead? —CodeCat 19:08, 23 June 2011 (UTC)[reply]
Soft redirects would not transfer the interwikis... Perhaps the hypothetical bot could, at least, do the transfer or create the hard redirects that result in the transfer. --Daniel 19:58, 23 June 2011 (UTC)[reply]

One thing to add: if the original category doesn't have any interwikis, you don't have to bother creating/keeping a hard-redirect. I believe most categories will fall in this case. Malafaya 22:56, 23 June 2011 (UTC)[reply]

Let's keep this simple- I can run movepages.py on the remaining categories with redirects turned off. Everything gets moved including the history and we don't have to worry about deletion. Nadando 15:03, 24 June 2011 (UTC)[reply]
The example above would actually categorize the redirect - use {{movecat}} instead. --Mglovesfun (talk) 17:46, 24 June 2011 (UTC)[reply]
If you put a colon before Category (i.e., #REDIRECT [[:Category:...), it won't be categorized. Keeping both categories (without any of them being a redirect to the other) will prevent update of interwikis with bots in auto mode as they will eventually find 2 cats here for the same foreign cat. Malafaya 19:44, 24 June 2011 (UTC)[reply]

I have been "fishing" some of the new categories and updating interwikis accordingly via bot. I believe it got most of the cat's created until now. Malafaya 13:14, 27 June 2011 (UTC)[reply]

Do you think you could keep doing that? It would be a lot easier than if we had to do it manually... —CodeCat 13:54, 27 June 2011 (UTC)[reply]

Scripts of Punjabi

What are the writing systems of Punjabi? --Daniel 23:57, 23 June 2011 (UTC)[reply]

See [[w:Punjabi language#Writing system]]. It uses three: Gurmukhi (the most commonly used; related to other Indic scripts); Shahmukhi (a variant of the Arabic script, via Persian and Urdu; used in Pakistan); and Devanagari (least commonly used; borrowed from Hindi, for which it's the main script). —RuakhTALK 00:12, 24 June 2011 (UTC)[reply]
Thanks. Should we have a code for Shahmukhi? --Daniel 00:29, 24 June 2011 (UTC)[reply]
My opinion is that we should use {{pnb}} to designate Punjabi as written in Arabic script. This is what Wikipedia does. -- Prince Kassad 00:36, 24 June 2011 (UTC)[reply]

I edited the category of Punjabi to make it show Gurmukhi and Devanagari. Feel free to add Arabic there, if necessary as well. --Daniel 16:25, 24 June 2011 (UTC)[reply]

Form of templates

Would it be possible for all form of templates to work in the same way regarding capitalization and final full stop (that is, period)? This seems to be a matter of disagreement; can a vote on the subject be avoided? Probably not. FWIW, the French Wiktionary based such templates having a final full stop a few years ago, as the nodot parameter seems quite confusing - instead, if you want a full stop, erm, write one. PS apologies for my lack of participation in this thread, as am having a bit of a break. --Mglovesfun (talk) 15:59, 24 June 2011 (UTC)[reply]

I support this proposal to standardise the templates that are used as definition lines, either with or without a dot. - -sche (discuss) 03:56, 26 June 2011 (UTC)[reply]
The dot is freakin' easy. Just take it out. If you want a dot at the end of the line, add a period after the template. None of this nodot= bologna.
Capitalization I don't care how you handle, but here's another idea to consider: {{plural of}} vs. {{Plural of}} with obvious meaning. The real question is, how would the lowercase form be used? In other words, do we need some magic where walked is easily defined as follows?
  1. {{form of|Past|past participle|walk}}.
DAVilla 19:04, 4 July 2011 (UTC)[reply]
Capitalization and punctuation can't be made standard unless we forbid the inclusion of other information on the definition line in combination with the template. As it stands, sometimes a form-of entry requires a definition at the outset, or an explanation after the form information. Standard capitalization and punctuation will not be possible as long as the other information is to be included. --EncycloPetey 19:11, 4 July 2011 (UTC)[reply]

Standardizing a few codes

I propose changing the codes for these 11 things, to standardize them.

The proposal is organized this way: "name", "old code" then "proposed code". (And updates are shown either with an underline or with a line-through.)

  • American English - AE. - en-usa
  • Austrian German - AG. - de-aus
  • Ecclesiastical Latin - EL. - la-ecl
  • Late Latin - Late Latin - la-lat
  • Mediaeval Latin - ML. - la-med
  • New Latin - NL. - la-new
  • Old Latin - OL. - la-old
  • Old Northern French - ONF. - fro-onf - fro-nrn
  • Provençal - prv - oc-prv
  • Shanghainese - Sha. - wuu-sha
  • Viennese German - VG. - de-vie - bar-vie
  • Vulgar Latin - VL. - la-vul

--Daniel 16:20, 24 June 2011 (UTC)[reply]

So we should take a bunch of tags that are clearly unstandardized, and move them into a controlled namespace? No. la-new is not a valid language tag, but looks like one. NL. is at least clearly not a valid language tag.--Prosfilaes 17:04, 24 June 2011 (UTC)[reply]
Why would it not be a valid language tag? —CodeCat 17:06, 24 June 2011 (UTC)[reply]
It's not that it wouldn't be a valid language tag, it's that it isn't a valid language tag. Language tags are assigned by designated authorities. la-new could be created, but it hasn't been. (Actually, I'm not even sure it could be. Ext-lang subtags are always alternatives to regular language subtags — for example, zh-yue is equivalent to yue — and new is already a language subtag, with a different meaning. I'm not sure whether that means la-new can't ever be created, or merely that it won't be. But it amounts to the same thing.) —RuakhTALK 17:36, 24 June 2011 (UTC)[reply]
We've already created many other codes in this way, though. We use {{roa-jer}} for Jerriais, and {{gmq-osw}} for Old Swedish. Maybe the only thing we need to change is that the first part should be a family? —CodeCat 17:57, 24 June 2011 (UTC)[reply]
Well, I think we should reevaluate those codes. I'm not sure who exactly "we" is; I don't remember those discussions. Maybe some of them are worth keeping, despite their nonstandard nature; but even if so, that doesn't automatically mean that we should make up codes for everything that pops into our head. —RuakhTALK 19:13, 24 June 2011 (UTC)[reply]
Please reply CodeCat's question, as I'm curious too. in addition, what's the difference? We have the code "itc", which is not for a language. --Daniel 17:10, 24 June 2011 (UTC)[reply]
I agree with this proposal but there are some things to work out. Does American English include only what is spoken in the US, or also in Canada? And does it really need a separate code? For Old Northern French, I think {{fro-nor}} would be better because the 'old french' part is already contained in the 'fro' part. For Viennese German, maybe {{bar-vie}} or {{bar-wie}} would be better, because it's a variety of Austro-Bavarian. There is also Ecclesiastical Latin, which could be {{la-ecl}}? —CodeCat 17:06, 24 June 2011 (UTC)[reply]
I don't mind using "American English" to refer to English spoken only in USA. I don't mind the possibility of alternative proposals, either. --Daniel 17:21, 24 June 2011 (UTC)[reply]
OK. I updated the proposal like you said. Except I added "fro-nrn" somewhere, because I erroneously more-or-less associated your alternative "fro-nor" with Norman. --Daniel 17:21, 24 June 2011 (UTC)[reply]
That's not erroneous. Old Northern French is another name for Old Norman. —RuakhTALK 17:46, 24 June 2011 (UTC)[reply]
But what about the ancestor of Walloon? —CodeCat 19:18, 24 June 2011 (UTC)[reply]
. . . interesting. Thanks for the correction. I've dug a bit further, and found that while some sources don't seem to distinguish "Old Northern French" from "Old Norman", other sources use "Old Northern French" as a slightly broader term, covering all the Oïl dialects that didn't palatalize the "c" in words like "castle". I have no idea how we're using the term; did our ONFs come from the OED, or from Webster 1913? How does that source use the term? —RuakhTALK 18:10, 25 June 2011 (UTC)[reply]
I'm not sure I agree with the general principle here, but leaving that aside for a moment . . . why would we use the nonstandard tag en-usa ("American English") when the standard tag en-US ("English as spoken in the US") is widely recognized and understood? Why de-aus instead of de-AT? Similarly, why use our own nonstandard tag oc-prv instead of at least the semistandard oci-prv? (And some of those should simply be eliminated, I think. Why distinguish Shanghainese from Shanghaiese/Wu? Why distinguish Viennese German from Austrian German?) —RuakhTALK 17:46, 24 June 2011 (UTC)[reply]
I don't really know why they are distinguished, but these are all existing etymology templates, which implies that there will be derivations categories for them at some point. —CodeCat 17:59, 24 June 2011 (UTC)[reply]
Fair enough. I guess the point is that as long as these are just etyl: templates, they don't need to be real language codes, and they certainly don't need to be fake ones. I'm not sure whether it's necessary to distinguish Viennese German from Austrian German in etymologies, but even if it is, it's certainly not necessary to distinguish them anywhere else, so we don't need "codes" for them. —RuakhTALK 19:13, 24 June 2011 (UTC)[reply]
Note: for Latin sound files, we have been using la-ecc, not la-ecl. --EncycloPetey 19:14, 4 July 2011 (UTC)[reply]

A code for Crimean Gothic

It is generally agreed that this language isn't really 'Gothic' at all, and that it didn't descend from the Gothic language of the 4th century. The only thing that's known is that it is East Germanic in origin, and descends from one of the languages of the Germanic people that migrated to eastern Europe in Roman times. But despite that, we call this language Gothic on Wiktionary, as if it were the same language as its 1000 year older sister. That's why I'd like to propose that we use a different, separate name for this language, with its own code, such as (deprecated template usage) gme-crg, and its own set of categories. —CodeCat 21:23, 24 June 2011 (UTC)[reply]

Linguist List uses got-cri. If we're going to use a non-standard code, I think it should be that. —RuakhTALK 21:35, 24 June 2011 (UTC)[reply]
But that code implies that it is a variety of Gothic, and it isn't, really. Crimean Gothic is to Gothic what Middle Dutch is to Old Saxon, more or less. —CodeCat 21:39, 24 June 2011 (UTC)[reply]
I understand that, but it's still better than making up our own code . . . —RuakhTALK 21:45, 24 June 2011 (UTC)[reply]
What's wrong with making up codes? Apparently that's what Linguist List did. --Daniel 21:58, 24 June 2011 (UTC)[reply]
Linguist List is the official code standard for extinct/historical languages. I believe that only covers primary language subtags, not extended-language subtags, so got-cri is still not exactly "standard", but at least the code is well-documented by a recognized authority. —RuakhTALK 22:19, 24 June 2011 (UTC)[reply]
OK. You just didn't convince me. I'm going to believe CodeCat, and vote for the precision of "gme-crg", over LL's authority. --Daniel 22:30, 24 June 2011 (UTC)[reply]
I'm not surprised. I'll add one other fact, for anyone else reading this — I don't expect you two to be convinced — which is that technically, CodeCat is mistaken. "Gothic" (got) is defined to include both (1) what CodeCat refers to as "Gothic" and (2) what CodeCat refers to as "Crimean Gothic". Arguably, that coding is ill-founded: CodeCat believes that "Gothic" and "Crimean Gothic" are no more related to each other than either is to other East Germanic languages. (In biological terms, CodeCat believes that got is not a clade. I'm not sure what terminology linguists use.) Therefore, while the specific code got-cri is not standard, the use of got for Crimean Gothic is standard. (Note: I'm not using the phrases "CodeCat refers" and "CodeCat believes" to imply that she is wrong; it's just a convenient shorthand for the views and terminology that she is advocating. She says that her views and terminology are "generally agreed" upon, and that may well be the case. I do not dispute it.)RuakhTALK 18:06, 25 June 2011 (UTC)[reply]
I know that ISO defines codes in such ways, but ISO codes aren't really designed with the needs of Wiktionary in mind. Just because external bodies define codes in a certain way doesn't mean linguistic consensus is the same, and doesn't mean we have to follow. An example would be our own code {{cel-gau}} for Gaulish, because the ISO codes for Gaulish apply only to two varieties of Gaulish (Cisalpine and Transalpine), not to the language as a whole. —CodeCat 18:24, 25 June 2011 (UTC)[reply]
Right. And I'm on board with such adjustments; for example, I advocated treating all forms of Hebrew under a single language header (==Hebrew==, coded he), rather than trying to follow the not-quite-coherent ISO-language-code breakdown into exactly two languages (he/heb "Hebrew" vs. hbo "Ancient Hebrew"), and I supported the B/C/S editors' decision to treat all of B/C/S/ under a single header (==Serbo-Croatian==, coded sh). In the case of Crimean Gothic, I'm on board with treating it as a separate language from "Gothic", and with using bare got to refer specifically to the latter. I just think we should use got-cri, rather than making up our own code. got-cri is genuinely meaningful to the outside world, whereas gme-crg would be our own affectation. —RuakhTALK 18:53, 25 June 2011 (UTC)[reply]
To clarify: I wouldn't be so bothered about the actual name of the template, but the language code that we include in our HTML should be a genuinely meaningful language code. None of our mirrors should feel compelled to include documentation for the nonstandard language codes that we've decided to put in their HTML as some sort of ego-booster. The only reason I care about the template-name is that our infrastructure more or less requires that our template-names match our language-codes; the exception being the etyl: templates, which never get used as codes (unless you or Daniel has started using them that way, which I suppose wouldn't surprise me). —RuakhTALK 18:59, 25 June 2011 (UTC)[reply]
  • Crimean Gothic "language" (those few dozens glosses recorded in haste by a non-native speaker) hardly deserves its own code. I suggest that we handle it as it is now - as a subproject of Gothic, formatted as ==Gothic== but in Latin script, with context labels sorting it into the appropriate category. Or simply dump it altogether into an appendix page. --Ivan Štambuk 16:14, 2 July 2011 (UTC)[reply]
    • We also consider Phrygian to be a separate language, even though we know only a little more about that as we do about Crimean Gothic. And we have only one entry in Frankish, out of only one known inscription in the language. —CodeCat 16:20, 2 July 2011 (UTC)[reply]
For Phrygian we have abundant original attestations, as opposed to CG which is attested second-hand by a non-native speaker. If there is attestation - it can be added, in the original script. CG is problematic because 1) it's mostly not an attestation (list of words supposedly used 2) quality of that list (the non-speaking author that compiled it as well as his informants of dubious competence). The extent of our knowledge of the language itself is not a relevant factor for inclusion. --Ivan Štambuk 16:46, 2 July 2011 (UTC)[reply]
  • I suggest we handle it as a subproject of Chinese, since it's no more Gothic than it is Chinese, plus I'm sure China would be thrilled to take ownership. DAVilla 18:52, 4 July 2011 (UTC)[reply]

Codes for families consisting of a single language

Families that contain only one language are usually called isolates, and within our new category structure we don't put them into families of their own. But it is different when that single language has ancestors; in theory the ancestor language belongs with its descendant in a small family of two (or more) languages. An example would be Albanian, which is a separate branch of Indo-European, but we also have Proto-Albanian on Wiktionary. Another example is Armenian, which is also a separate branch, but we also consider Old Armenian and Proto-Armenian to be separate languages on Wiktionary. For this reason, I think it would be useful to consider these small families, the "Albanian languages" and "Armenian languages", or maybe other names such as "Albanic" and "Armenic"? As codes, I would suggest {{qfa-sq}} and {{qfa-hy}}, and {{sq-pro}} and {{hy-pro}} for their proto-languages. —CodeCat 19:41, 25 June 2011 (UTC)[reply]

We already have templates for the language families: {{etyl:sqj}} for Albanian, {{etyl:hyx}} for Armenian; both from standard ISO 639-5 codes. As for the proto-languages — I suppose {{proto:sqj-pro}} and {{proto:hyx-pro}}? Isn't that how we're naming proto-language templates these days? —RuakhTALK 20:02, 25 June 2011 (UTC)[reply]
I didn't realise there already were codes for those families, but if they exist then we could use them. I still wonder what to do with any other similar cases, though. Cases that don't have codes yet. —CodeCat 20:04, 25 June 2011 (UTC)[reply]
Though the families should preferably not have the same name as their (only) member, as this has a potential for confusion. -- Prince Kassad 15:58, 26 June 2011 (UTC)[reply]

English terms with obsolete senses, etc.

Category:English rare terms was moved to Category:English terms with rare senses per WT:RFM#Category:English rare terms.

I think some categories should follow suit:

--Daniel 21:26, 25 June 2011 (UTC)[reply]

For consistency, yes, they should all be one or the other. I don't really care which one though. —Internoob (DiscCont) 00:44, 26 June 2011 (UTC)[reply]
I oppose until I see a convincing reasoning. The discussion at WT:RFM#Category:English rare terms, 19 June 2011, does not show anything like a consensual support for renaming, so I don't understand what made you create Category:English terms with rare senses. To the contrary, several people pointed out that "English nouns" also contains non-noun senses. I find your "was moved to" phrasing in your first sentence misleading; it was you who has done the move, and who has wrongly decided there was a consensus. I am far from convinced this increase of verbosity of category names is a good thing: we may soon arrive at "English terms with noun senses", a thing that I hope people can learn to infer from "English nouns". --Dan Polansky 11:29, 27 June 2011 (UTC) Later: On a more careful reading and thought, the RFM discussion showed something approaching at least lack of opposition for renaming, and had voices sympathetic to renaming. Furthermore, the motivation for renaming is that to call a term a "rare term" when it only has a rare sense seems wrong. I don't know any more. --Dan Polansky 11:45, 27 June 2011 (UTC)[reply]
Please note that, in that discussion, everybody ultimately acknowledged "Category:English rare terms" as an inaccurate name, even the people who first compared it to "Category:English nouns" as argument contrary to renaming the category. I don't want the creation of "English terms with noun senses", and I think nobody else wants it either. When I said above I had moved a category according to a RFM discussion, I was being sincere. --Daniel 07:18, 3 July 2011 (UTC)[reply]

Next votes

These votes are scheduled to start in two days:

Feel free to double-check and edit their proposals before they start. --Daniel 04:01, 27 June 2011 (UTC)[reply]

English terms spelt with .

¶ Should any English words be suppressed from Category:English terms spelled with ., or is this better off without unique restrictions? --Pilcrow 06:19, 27 June 2011 (UTC)[reply]

Note: His question stems from this small conversation. --Daniel 06:27, 27 June 2011 (UTC)[reply]

So: no objections for allowing any English terms containing . in that category? --Pilcrow 05:03, 1 July 2011 (UTC)[reply]

I think so. Lack of response is lack of objections. --Daniel 05:14, 1 July 2011 (UTC)[reply]
¶ Well, I would freely go ahead and categorize such words, but I thought rushing into (this) action without discussion was bad faith. --Pilcrow 05:16, 1 July 2011 (UTC)[reply]
You gave a place for people to discuss your idea, especially an idea that is arguably uncontroversial: populating a category with exactly the members mentioned in its title and description. That's what I call good faith. --Daniel 05:41, 1 July 2011 (UTC)[reply]

Pinyin entries are allowed and shíyóu is attested (google books:shíyóu), but why was it deleted by Anatoli? Engirst 04:56, 28 June 2011 (UTC)[reply]

Because there are no hits for it in running text in Mandarin. Mentions are NOT uses. ---> Tooironic 05:17, 28 June 2011 (UTC)[reply]
Actually, it is used in running "running text in Mandarin". One of the hits is a "reader for hanyu pinyin", i.e. a book written completely in pinyin. I guess it's probably to promote pinyin, or as part of pinyin education in school. Not a very typical book anyway. The quote: "Yï jiu liù sl nián kaishï, woguó shíyóu wánquan zljï, yong 'yángyóu' de niándài [...] Xîn yóutián de buduàn faxiàn, zhèngmíng woguó yöuzhe fëngfù de shíyóu" (with OCR-errors). With characters (I think): 1964年開始,我國石油完全自給,用「洋油」的年代 [...] 新油田的不斷發現,證明我國有著豐富的石油... Vaste 02:23, 30 June 2011 (UTC)[reply]
I've added a citation (which I personally cannot read!) to shíyóu - can someone translate it? Is it just a mention? Looking at Google Books it (shíyóu) does seem to appear in running text on three occasions, but all the rest seem to be in dictionaries and text books. Mglovesfun (talk) 10:12, 3 July 2011 (UTC)[reply]
Also, it has OCR-errors (should be yǒu but is yöu, etc). Vaste 00:23, 4 July 2011 (UTC)[reply]
Engirst. You don't care about anybody, why should we care about you? --Anatoli 06:38, 28 June 2011 (UTC)[reply]

We should follow rules.

'“Attested” means verified through

  1. Clearly widespread use,
  2. Usage in a well-known work, or
  3. Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.'

Such as shíyóu. Engirst 13:46, 28 June 2011 (UTC)[reply]

I agree with Anatoli. Why do you expect us to listen to you when you won't cooperate with us? If you are going to ignore everyone, I don't think anyone will want to work with you, no matter if you are following the rules or not. You can follow the rules and still be a troll. —CodeCat 14:07, 28 June 2011 (UTC)[reply]
Then I'll third what Anatoli says. You've pretty much said on your talk page that you're not prepared to follow CFI. I don't see how we can negotiate from that position. If you reject our most fundamental document, there is no way forward from there. Mglovesfun (talk) 19:41, 29 June 2011 (UTC)[reply]

Wymysorys or Vilamovian?

For info on the Vilamovian language, look at this article. I'm thinking that the name Wymysorys is outdated, so either Vilamovian or Wilamowicean is okay. The language code is {{wym}}, and I just got started on importing articles from the Polish Wiktionary (a few articles were imported from the Korean Wiktionary by me). --Lo Ximiendo 06:06, 28 June 2011 (UTC)[reply]

P.s. I mean the language name that is entered on an entry between two sets of two equal signs, along with other things. --Lo Ximiendo 11:49, 28 June 2011 (UTC)[reply]
Now how do I move the Wymysorys language categories? --Lo Ximiendo 05:03, 29 June 2011 (UTC)[reply]
Three steps:
  1. Edit Template:wym to display "Vilamovian". You already did it today.
  2. Check Special:WhatLinksHere/Template:wym, and rename all categories that use "Wymysorys". This includes renaming Category:Wymysorys nouns to Category:Vilamovian nouns.
  3. Use Special:Search to search for all entries that include "Wymysorys", and edit their L2 headers when necessary. This includes editing the entry edikys, for example. Special:WhatLinksHere/Template:wym can list the right entries too, but "What Links Here" is better for finding categories to be updated.
--Daniel 05:19, 29 June 2011 (UTC)[reply]
By the way, you don't literally move or rename categories, because there is not a button to do that. Instead, the old ones have to be deleted, and the new ones have to be created. --Daniel 05:23, 29 June 2011 (UTC)[reply]

Entries in non-standard scripts

The problems we have with some users highlight a problem in our current policy. We don't distinguish between attestations in a language's native or most-used script, and attestations in nonstandard scripts. Pinyin entries are not standard, but it is possible that a few pinyin entries meet CFI nonetheless even if the vast majority doesn't. And in any case, they are nothing more than alternative representations of other entries. So I think it would be a good idea to amend current policy somewhat so that it takes into account entries in scripts that are simply alternative representations of the same word in another script, and not a standard way of writing that language. Perhaps we could limit these entries only to redirects, and require that the main entry be attestable as well? —CodeCat 14:20, 28 June 2011 (UTC)[reply]

Pinyin actually is standard, and I think we should allow properly-diacriticked-pinyin (henceforth: PDP) entries whenever the PDP (or other tone-marked pinyin, e.g. the style with appended numbers) is attested. For simplicity, such entries might as well simply point users at Hanzi entries, since currently Hanzi is much more widely used. (Hopefully that will change someday, but Wiktionary's neutral-point-of-view policy means that we cannot be an agent of such change.)
When a language has two scripts with a one-to-one mapping between them, I think attestations in one script should automatically count toward the other. In the case of Mandarin, we should probably count Traditional attestations toward both Simplified and Traditional, though the reverse is probably not workable. We probably cannot count Hanzi attestations toward PDP or vice versa, because some distinctions are made in Hanzi that are not made in PDP (Hanzi distinguishes many homophones, whereas PDP does not), and some distinctions are made in PDP that are not made in Hanzi (some characters, e.g. , have multiple non-interchangeable pronunciations; also, PDP indicates word breaks, whereas Hanzi does not).
In the case of actual nonstandard scripts, such as toneless Pinyin, transliterated Greek, etc., I think we have to take them on a case-by-case basis. There's not a bright line between a standard script and a nonstandard one.
In all cases, the editors who actually work on affected-language entries need to be the ones really making decisions. The rest of us can (and should!) offer opinions and advice, but in general, we should defer to them.
RuakhTALK 15:59, 28 June 2011 (UTC)[reply]
I'm also thinking of languages like Gothic, which were written in Gothic script but are almost always written in Latin script in modern reprints. —CodeCat 16:43, 28 June 2011 (UTC)[reply]
As previously discussed here, you mean? I still agree with what I said then: "if Gothic works are primarily or exclusively published in the Latin script, then we should definitely have entries for the Latin spellings, either as main entries or as alternatives." But I don't think it's a very similar case to Pinyin; on the one hand, speakers of Gothic never used the Latin script, whereas speakers of Mandarin do, but on the other hand, Gothic is primarily or exclusively published in the Latin script, whereas Mandarin is not. (Personally I support Latin entries in both cases, but the reasons are different.) —RuakhTALK 17:10, 28 June 2011 (UTC)[reply]
It's non-standard as in it's not used to write Chinese, period. No sane person would use pinyin to write a text in Chinese (Mandarin, really), unless it was to prove a point, or for foreigners, language learning purposes or in a dictionary. Not that it couldn't be done, it just isn't. Of course, the same argument could be made about "ㄅㄆㄇㄈ" (bopomofo, 注音符號). It could be used as well, but it just isn't used. Thus, it's very unlikely that any pinyin word would be attestable.
"We probably cannot count Hanzi attestations toward PDP or vice versa" Why not? E.g. "干" in 你干啥? (ni3 gan4 sha2?) Here 干 could be an attestation for gan4 (but not for gan1). "衣服干了。" would work for gan1 (but not for gan4). In the same way 幹 (你干啥) and 乾 (衣服干了) could be attested via 干 respectively. Or am I missing something?
The question is if we want to include them even though they are non-attestable, as some kind of metadata (kinda like a category really, "Chinese words pronounced shi4shi4"). Vaste 10:51, 29 June 2011 (UTC)[reply]
I'm not referring so much to individual pinyin syllables, as to polysyllabic words. We know (well, I don't, but you guys do) which syllable is meant by a given character in a given word, but it doesn't seem verifiable to me, unless we accept secondary sources such as dictionaries. (Actually, come to think of it, since there are books and periodicals published in Taiwan that have bopomofo ruby for every single character, a dedicated attester could probably demonstrate the bopomofo for any many polysyllabic terms, and that would verify the pinyin as well.) —RuakhTALK 11:17, 29 June 2011 (UTC)[reply]
How is IPA and pronunciation handled for English? Isn't it the same issue? After all pinyin is simply used to describe (and prescribe(?)) pronunciation (of Mandarin). Pinyin entries have problems with attestability simply because they aren't used that way (i.e. as a script for Chinese). Vaste 01:58, 30 June 2011 (UTC)[reply]
We list pronunciations for English words, but we don't create entries for the IPA transcriptions. (I actually think that would be awesome if it were feasible, but it's not, for lots of reasons besides just attestability.) Much of the content in our entries is justified, or could be justified, by reference to other dictionaries and such, but we don't create entire entries that rely on secondary sources for justification. But anyway, I'm out of my depth here, seeing as I don't speak any Chinese at all; if you say that pinyin entries can be adequately verified without depending on secondary sources, then I defer to you. —RuakhTALK 02:09, 30 June 2011 (UTC)[reply]
Okay, they exist. Just to be clear though, are you saying pinyin used that way is not extremely rare? Vaste 03:35, 1 July 2011 (UTC)[reply]
The point is it is using. A Pinyin Bible for your reference. Engirst 03:47, 1 July 2011 (UTC)[reply]
And contained in those links you gave us was a grand total of.... one self-published book entirely in pinyin! [3] What compelling evidence that pinyin is actually used in durably archived sources with running Mandarin script! ---> Tooironic 05:59, 1 July 2011 (UTC)[reply]
More Pinyin books/atlas/map/fiction for your references:
<<Huanlede Hai>> (Zhongguo Wenzi Gaige Chubanshe)
<<Zhonghua Renmin Gongheguo Ditu Hanyu Pinyinban>> (Zhongguo Ditu Chubanshe)
<<Zhonghua Renmin Gongheguo Dituce Hanyu Pinyinban>> (Zhongguo Ditu Chubanshe)
A personal question Engirst, if you don't mind. You're Chinese, right? Have you yourself ever read a book all in pinyin? Why do you care so much for pinyin?
Actually, I think Chinese would benefit from start using pinyin as its main script. It'd be great! So, I don't mind having pinyin entries. (They're especially useful due to the large number of homonyms in Chinese.) However, I can't honestly say that I think they are attestable. I'd say you're fighting a losing battle. It'd be wiser to change strategy, and argue that pinyin entries are simply useful, even though they are not (directly) attestable. Just my 2 fēn. Vaste 14:31, 1 July 2011 (UTC)[reply]
"if you say that pinyin entries can be adequately verified without depending on secondary sources": I'm saying they can't be, in general. I think they could be either 1. included for other purposes (indexing, help users find the correct entries etc) or 2. removed. Vaste 02:31, 30 June 2011 (UTC)[reply]

Separate entries for reflexive verbs?

The most common practice on Wiktionary has been to give separate entries for reflexive verbs, treating them as phrasal verbs. But I noticed in many Catalan dictionaries that their practice seems to be to simply list reflexivity as a kind of context. They list adormir and adormir-se on the same page, but on one dictionary the second one has a separate header that says adormir-se on that page, and another one lists them as pron to indicate they take a reflexive pronoun. To find a reflexive verb, you'd look up the non-reflexive verb and look for the reflexive sense. So I'm wondering what would be the best practice on Wiktionary. In many cases, the sense of the reflexive verb isn't idiomatic enough to really warrant a separate entry, but still good to list as a separate sense of the base verb. "adormir" is a good example; it means 'to cause to fall asleep', and its reflexive meaning 'to fall asleep' is more or less predictable from this. Not all reflexive verbs are this way, but in general, languages that use reflexivity to create a mediopassive voice as Catalan and many other Romance languages do tend to have more regular meanings for those verbs. —CodeCat 23:02, 29 June 2011 (UTC)[reply]

I like the system of having a separate page for reflexive verbs. At least you can find the conjugation of a reflexive verb in Catalan, Spanish, or Italian. But if I wanted to learn how to conjugate se marier (or any other French reflexive verb), I just get redirected to marier. It says what the reflexive meaning is, but it doesn't conjugate it. Ultimateria 00:02, 30 June 2011 (UTC)[reply]
Reflexives aren't conjugated any different from other verbs, though. The reflexive pronouns behave like normal object pronouns. There is no real grammatical difference between m'adormo meaning 'I put myself to sleep' and meaning 'I fall asleep'. They are both a combination of a pronoun and a verb. I don't really think we should treat the pronoun as part of the conjugation. —CodeCat 00:06, 30 June 2011 (UTC)[reply]
That's true, but I think the conjugation tables are still helpful. Of course reflexive verbs are logical to you since you've learned how to conjugate them already, but I've seen a lot of people (including me) struggle with reflexive verbs in Spanish class. We knew next to nothing about object pronouns before learning reflexive verbs, so they were a matter memorization, not logic. I'm sure tons of Romance language learners are taught the same way. Ultimateria 00:25, 30 June 2011 (UTC)[reply]
We handle this differently for different languages. So far I haven't seen any approach that I like very much! Personally I find it confusing when a sense at the non-reflexive entry is tagged {{reflexive}}; it always takes me a few seconds to realize that it means (when reflexive), or rather (when used with a reflexive pronoun, the two taken together sometimes mean this:). One possible approach . . . for Hebrew we've recently started using a sort of pseudo-context template, {{he-wv}}, for senses that have a different vocalization from the headword. (See [[נסגר]] for an example.) And many dictionaries do something similar for idioms; for example, one sense at the OED's entry for washen is actually defining the term washen leather, indicated in bold at the start of the sense-line. I think reflexive verbs might benefit from this approach:
  1. (adormir-se) To fall asleep.
(As for conjugation — there's no reason that the non-reflexive entry can't give both conjugation tables. I don't know if that's really desirable, but it's an option.)
RuakhTALK 00:31, 30 June 2011 (UTC)[reply]

In French, providing both conjugations is very useful. Some verbs are used only as reflexive verbs, some verbs may be used as reflexive verbes or not (sometimes with predictable senses, sometimes with different senses). The fr.wiktionary practice is to allow all useful entries. There is no reason to forbid useful entries. Lmaltier 18:51, 30 June 2011 (UTC)[reply]

WINW

I created WT:WINW. We need it badly. It's short, but it says everything it needs. (Feel free to expand it, if the size bothers you.)

I'm going to create a vote to make it a policy. --Daniel 23:16, 29 June 2011 (UTC)[reply]

I think we already have a page like that, but I forgot what it's called. —CodeCat 23:18, 29 June 2011 (UTC)[reply]
WT:NOT. --Yair rand 23:26, 29 June 2011 (UTC)[reply]
Ruakh added[4] a section "Wiktionary is not Wikipedia" to WT:NOT, while keeping the shortcut WT:WINW to that section. I like it. --Daniel 02:25, 30 June 2011 (UTC)[reply]
The shortcut is silly. Should we also have WT:WINE, WT:WINP, WT:WINCB? But whatever. DAVilla 18:39, 4 July 2011 (UTC)[reply]
I added some points that explain the most important diffeences between Wikipedia and Wiktionary. —CodeCat 19:23, 4 July 2011 (UTC)[reply]
I like the shortcut. I don't think it is silly. It is an initialism, just like CFI. --Daniel 19:30, 5 July 2011 (UTC)[reply]
Not really necessary but doesn't hurt. I think we should keep it. (Yes, evidently my feelings about the restricted WT: namespace are laxer than those about the dictionary itself.) Equinox 19:33, 5 July 2011 (UTC)[reply]

Mathematical symbol

I'd like to add the header "Mathematical symbol" to these entries: , , , etc. --Daniel 09:28, 30 June 2011 (UTC)[reply]

I agree. It's not a part of speech. Would we want to change ellipse to Mathematical noun? Equinox 11:29, 1 July 2011 (UTC)[reply]
I agree that "Symbol" is a good part-of-speech header. I'm just saying it's not perfect for all instances. In my point of view, as a reader of Wiktionary, that's sometimes more-or-less like having a POS header "Word"; it's too generic. That's one reason why I introduced the headers "Punctuation mark" and "Diacritical mark", and I think they look really good where they are.
In my opinion, we should have "Mathematical symbol" and not "Mathematical noun", because all nouns share many characteristics of syntax, morphology, grammar, etc. regardless of their meanings: the POS header "Noun" is perfect for the word ellipse.
On the other hand, mathematical symbols have characteristics that all other symbols don't have. In the equation , you can't replace the + by a musical symbol such as 𝅘𝅥𝅯 .
If, hypothetically, "Symbol" was a perfect POS header for all symbols, and context templates (instead of other POS headers) such as {{context|maths|lang=und}} should always be used to give more details about the definitions, then perhaps we would have to deprecate the header "Letter" and start using {{context|typography|lang=und}} for letters instead, as well. --Daniel 15:17, 1 July 2011 (UTC)[reply]
You also can't replace + in with < or . I fail to see the relevance of replacability.​—msh210 (talk) 16:36, 1 July 2011 (UTC)[reply]
I agree with Dan.​—msh210 (talk) 16:36, 1 July 2011 (UTC)[reply]
Actually, you can replace = by <, and you can replace other mathematical symbols by |x|, if you want to represent different values. Among the characteristics that mathematical symbols share, there is the fact that they change the value of numbers: −3 and 3 are different things. They are also expected to be used with other mathematical symbols and/or with numbers: "<dog" does not make sense if "<" is a less-than sign and "dog" is the word meaning domestic canine. --Daniel 16:49, 1 July 2011 (UTC)[reply]
They don't all change the value of numbers; 1 ∈ (1, dog) doesn't change numbers, and is as valid a statement as dog ∈ (1, dog) or dog ∈ (cat, dog).--Prosfilaes 17:06, 1 July 2011 (UTC)[reply]
OK, thanks. I should have mentioned that somehow before. Let's see: While "<" as an algebraic symbol determines a relationship of numeric values, ∈ as a symbol of set theory determines a relationship of sets and their elements. Different fields of mathematic, somewhat different rules. --Daniel 18:00, 1 July 2011 (UTC)[reply]
So would you suggest we have POS headers "Mathematical binary operator symbol" (, ), "Mathematical binary relation symbol" (, <, ), "Mathematical n-ary operator symbol" (, ), "Mathematical delimiter" (, ), "Mathematical constant symbol" (, e), "Mathematical variable symbol" (x, f), and perhaps more?​—msh210 (talk) 19:27, 1 July 2011 (UTC)[reply]
No, I wouldn't. POS headers typically should be as general as possible, without being too generic like "Word". When I proposed the creation of "Diacritical mark" (examples of implementation: acute accent, trema, dakuten and macron), I did not propose these other possible POS headers: "Pitch accent", "Long vowel mark" and "Nasalization mark". Neither I want "Countable noun" or "Transitive verb". --Daniel 19:32, 1 July 2011 (UTC)[reply]