Wiktionary:Beer parlour

From Wiktionary, the free dictionary
Archived revision by Ruakh (talk | contribs) as of 03:21, 19 July 2011.

Latest comment: 12 years ago by Yair rand in topic Definition editing options trial
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Beer parlour Wiktionary:Beer parlour/header

June 2011

Category:All topics - is overrun by other languages e.g. es:All topics, fr:All topics

Does anyone actually use Category:All topics? I find there are hundreds of other language entries there, with just a few English language categories. For example, here are the entries under G:

[+] ga:All topics (27 c, 0 e) [+] gaa:All topics (1 c, 0 e) [+] gag:All topics (6 c, 0 e) [+] gd:All topics (24 c, 0 e) [+] gem:All topics (17 c, 0 e) [+] Geography (241 c, 114 e) [+] gil:All topics (3 c, 0 e) [+] gl:All topics (23 c, 0 e) [+] gmh:All topics (13 c, 0 e) [+] gmy:All topics (2 c, 0 e) [+] gn:All topics (7 c, 0 e) [+] gni:All topics (12 c, 0 e) [+] goh:All topics (12 c, 0 e) [+] got:All topics (7 c, 0 e) [+] grc:All topics (27 c, 0 e) [+] gsw:All topics (6 c, 0 e) [+] gu:All topics (10 c, 0 e) [+] gul:All topics (1 c, 0 e) [+] gv:All topics (23 c, 0 e)

To sift through all that to find Category:Geography is a test of stamina (or boredom). To try to address this, I created a new category Category:All topics (other languages) and put the first entry Category:aa:All topics there, along with a note in Category talk:All topics to see if anyone thought this was a good idea. But I see Category:All topics (other languages) has already been deleted without any mention in Category talk:All topics or on my talk page. What does everyone think? (And yes, I could have brought it up here first, but I wanted to demonstrate the concept). Cheers, Facts707 06:11, 1 June 2011 (UTC)Reply

"I find there are hundreds of other language entries there, with just a few English language categories." can be said of virtually any topical category. --Daniel 06:22, 1 June 2011 (UTC)Reply
There is a vote going on right now to fix this to a degree: Wiktionary:Votes/pl-2011-05/Add en: to English topical categories, part 2. —CodeCat 09:59, 1 June 2011 (UTC)Reply

Image Captions

Do you have a policy like on wikipedia regarding the verboten of periods at the end of sentence fragments in captions of images(/files)?205.206.8.197 10:02, 1 June 2011 (UTC)Reply

Deleting empty categories

While we have a speedy deletion summary 'empty category', there hasn't been much debate over it. Special:UnusedCategories gives a pretty up-to-date list of these, a couple of hundred of them. Clearly, clear to me anyway, we don't want categories like Category:French nouns lacking gender deleted every time it is empty. What about Category:Danish colloquialisms? It's empty right now, but it's a valid name, and if deleted, it has to be restored if used. Or recreated. And only admins can restore categories, other users can recreate them but cannot restore the original page history. Thoughts? --Mglovesfun (talk) 12:23, 1 June 2011 (UTC)Reply

I only delete categories that have obsolete names and only if they are empty. —CodeCat 13:40, 1 June 2011 (UTC)Reply
Links to categories attract readers who may think that there is something to see and navigate. Empty categories are blue links that lead to nothing.
Category:Kabuverdianu language has only five categories for its few contents, but one hundred empty categories can be created for that language. If we do create one hundred empty categories for that language, then finding actual contents would become far more difficult. --Daniel 18:48, 1 June 2011 (UTC)Reply
I only create categories that actually have something in them, I don't create them just to 'fill up the tree'. —CodeCat 19:24, 1 June 2011 (UTC)Reply
I never accused you of "filling up the tree". I just mentioned a hypothetical category filled as an example. --Daniel 20:12, 1 June 2011 (UTC)Reply
Personally, I have nothing against empty categories. If someone just goes out to create a bunch of them for no reason, though, I'd consider it disruptive, and this contributor would receive a warning and/or a short block. -- Prince Kassad 02:04, 2 June 2011 (UTC)Reply

Russenorsk

Russenorsk is an extinct pidgin of Norwegian and Russian, and as far as I can tell filman is the only term we currently have in it. I'm not sure what our policy is, but I noticed we don't have a language code for it, so we can't make categories for it either. Does anyone know what we could do? —CodeCat 17:29, 1 June 2011 (UTC)Reply

I think ISO does not have a code for Russenorsk, but we are a step ahead. If you check Category:Russenorsk language, you'll notice the "crp-rsn" there. --Daniel 17:33, 1 June 2011 (UTC)Reply
I can never quite find a list of all the codes we use on Wiktionary. Do you know if there is one? —CodeCat 21:34, 1 June 2011 (UTC)Reply
If you want to know the code of any language in particular, just check its top-level category. Category:English language displays en and Category:Portuguese language displays pt. Wiktionary:Languages contains (or should contain) a list of all language codes for languages without ISO codes. I think Wiktionary:Index to templates/languages is a full list of language templates, but I wouldn't know that; it's too big to be opened on my computer. --Daniel 21:47, 1 June 2011 (UTC)Reply
Not all languages have a top-level category, though. —CodeCat 22:07, 1 June 2011 (UTC)Reply
Russenorsk does. When a language doesn't have a top-level category, I suggest simply checking ISO. If ISO doesn't help, it's time to create a new code. --Daniel 22:28, 1 June 2011 (UTC)Reply
What languages have entries, but no top-level category? -- Prince Kassad 22:35, 1 June 2011 (UTC)Reply
Naturally any language whose entries are created first. As far as I know, the last one was Category:Kuanua language, created yesterday, five months later than the Kuanua entry iau. --Daniel 14:04, 2 June 2011 (UTC)Reply

A lover of what is past

I would like a word for "a lover of what is past", i.e. somebody who rejects what is modern in favour of older things. I doubt any such word exists, so I am happy to coin a neologism. I'm thinking it might be something like aoristophile for example, but nobody taught me Classics. Can someone suggest something reasonable? Equinox 00:23, 2 June 2011 (UTC)Reply

Amish? conservative? —CodeCat 00:36, 2 June 2011 (UTC)Reply
Luddite perhaps? — lexicógrafa | háblame01:41, 2 June 2011 (UTC)Reply
google books:"lover of the past", while not finding quite the word you want ((deprecated template usage) antiquarian not being quite right, IMHO), provides lots of fodder for possible eponyms. (deprecated template usage) Bedist, perhaps? Oddly, no one seems to have described Carlyle that way, but even so I think (deprecated template usage) Carlylist could be a good one. Annoyingly, neither of those names really lends itself to an (deprecated template usage) -ism, but what can you do? (By the way, if you want to stick to Classical roots, an alternative to (deprecated template usage) -phile might be (deprecated template usage) -later, depending on the tone you aim to strike.) —RuakhTALK 02:01, 2 June 2011 (UTC)Reply
How about archaist or archaeolater? DCDuring TALK 04:55, 2 June 2011 (UTC)Reply
¶ Why is this topic here? What does this have to do with Wiktionary or its policies? --Pilcrow 12:13, 7 June 2011 (UTC)Reply
Because Equinox (talkcontribs) put it in the wrong forum, and it wasn't a big enough deal for anyone to speak up. *shrug* —RuakhTALK 14:34, 7 June 2011 (UTC)Reply

Reverse-mapping of language templates

Right now we have language templates to turn codes into words. But as far as we know we don't have any templates to do the reverse, to turn a name into a code. I think something like that could be useful for maintenance and such. I would like to create a template called {{langcode2name}} or something similar, with one subpage for each English name of a language. So {{langcode2name/English}} would contain 'en'. —CodeCat 13:20, 2 June 2011 (UTC)Reply

This has been proposed quite often before, but such proposals always failed due to technical limitations. -- Prince Kassad 13:22, 2 June 2011 (UTC)Reply
Which limitations are those? —CodeCat 13:28, 2 June 2011 (UTC)Reply
User:MglovesfunBot/switch, but Prince Kassad is right, use with caution. --Mglovesfun (talk) 13:38, 2 June 2011 (UTC)Reply
I started a bunch of templates that could be used for that, beyond just English names of a language, and they wound up going unused and deleted. DAVilla 18:46, 4 June 2011 (UTC)Reply
The template Mglovesfun gave is rather slow because it contains a very large switch statement. It would be faster to use subpages. I created {{langrev}} and {{langrev/English}} which seems to work. If you call {{langrev|English}} it returns en and otherwise nothing. I would like to convert Mglovesfun's template to this, but I don't know if everyone would agree with me creating thousands of subtemplates, so I'm asking now. —CodeCat 14:44, 5 June 2011 (UTC)Reply
Since there were no objections I've now created the remaining subtemplates, based on Mglovesfun's list. —CodeCat 14:45, 7 June 2011 (UTC)Reply

Unattested SI unit entries

A while back we had some deletion discussions concluding that we should not have entries on names of prefixed SI units that are only found in publications listing names of prefixed SI units, and not 'in the wild' (e.g. yottakelvin and zeptocandela). I propose to redirect all such terms (excluding the ones that really are attestable such as attometer) to an appendix listing all of the possible unit/scale combinations and their respective symbols. Does this sound like a workable plan? bd2412 T 20:33, 2 June 2011 (UTC)Reply

If by "redirect" you mean "soft-redirect using {{only in}}", then, yeah, sounds good AFAI'm concerned.​—msh210 (talk) 21:04, 2 June 2011 (UTC)Reply
Is it even necessary to list combinations at all? They are pretty much SOP... —CodeCat 00:22, 3 June 2011 (UTC)Reply
I agree (with CodeCat). —RuakhTALK 01:47, 3 June 2011 (UTC)Reply
I don't see how these can be SOP when they are single, unbroken, unhyphenated words. Conversely, kilometer and milligram are exactly as SOP as any of these combinations, and I doubt anyone would support deleting those as SOP. bd2412 T 14:36, 3 June 2011 (UTC)Reply
Right, and I'm not supporting any sort of "deletion"; I'm just saying that the appendix doesn't really need to list all possible combinations. They're all SOP, so it just needs to explain the Ps. —RuakhTALK 15:18, 3 June 2011 (UTC)Reply
My point is that we generally don't treat unbroken words as SOP. Unattestable, perhaps, but not SOP any more than any unbroken word combining a prefix and a suffix. bd2412 T 15:51, 3 June 2011 (UTC)Reply
SOP — "sum of parts" — just means that anyone who understands the parts will understand their sum. We generally have entries for sum-of-parts words, but (IMHO) only because a reader would have no way of knowing what the parts are; but that reason doesn't apply here, because the appendix makes clear what the parts are. —RuakhTALK 16:57, 3 June 2011 (UTC)Reply
I see your point. Of course, if the appendix lists all the prefixes, and all the suffixes, and says to the reader, you can take anything from column A and prepend it to anything from column B, there is no reason to show all the resulting combinations. If those combinations are soft-redirected to the appendix, there is similarly no concern about what searches will produce. bd2412 T 17:35, 3 June 2011 (UTC)Reply
Also, to Msh210, yes, an {{only in}} soft redirect would be fine by me. bd2412 T 14:40, 3 June 2011 (UTC)Reply
Who would be looking for these words? And why choose just SI units while neglecting the whole lot of possible but unattested combinations of prefixes and words? I can think of unrewind, vice-girlfriend and nephrodonation. (I checked Google Groups and Google Books to make sure they aren't attestable; uncontradict, vice-husband and hemodonation are barely attestable, though.) --Daniel 02:07, 3 June 2011 (UTC)Reply
Regarding why just SI units, let me paraphrase one of my comments from here: It's because we know for sure what all of the SI units would mean, even if we don't know that they do mean that. Even if tomorrow morning, everyone were to start using the word (deprecated template usage) zeptogram with some sort of metaphorical sense, we could still provide the unattested literal sense and know that we were "right". This is because the system is used in a consistent way by actual people, and the gaps are basically real words that just so happen not to have been used three times in durably archived media. It's like how we include full conjugations of Spanish verbs, even ones that are so rare that some individual forms might not actually meet the CFI. —RuakhTALK 02:35, 3 June 2011 (UTC)Reply
I would allow the inclusion of all words with an official status (such as these ones, but also words recommended by official language bodies), even when no actual use (or only 1 or 2) can be found. When appropriate, the pages may explain that no use or almost no use has been found, but that they are standard unit names, or mention which organization promotes them. This would provide some useful information to possible readers. Of course, these pages wil not be read much, but this is not a problem at all. It's always better to provide information. Lmaltier 20:13, 3 June 2011 (UTC)Reply
The noninclusion of these terms as entries has already been decided by the community. The question is what, if anything, to do about them now. bd2412 T 02:30, 4 June 2011 (UTC)Reply
Inclusion in an appendix with soft redirects using {{only in}} seems quite appropriate. DCDuring TALK 03:40, 4 June 2011 (UTC)Reply
I will make it so tomorrow. Cheers! bd2412 T 03:57, 4 June 2011 (UTC)Reply
I don't want to reinvent the wheel, and combining the table at SI prefix#List of SI prefixes with the explanations at SI base unit, SI derived unit, and with the tables assembled by Dcljr at User:Dcljr/Units, already represents pretty much everything I would envision in an appendix. Any thoughts on this proposition? bd2412 T 18:35, 4 June 2011 (UTC)Reply
I have made a mock-up at User:BD2412/Appendix:SI units. Cheers! bd2412 T 19:19, 4 June 2011 (UTC)Reply
Oops, turns out we've had an Appendix:SI units sitting there for years. I touched it up a bit. bd2412 T 22:44, 4 June 2011 (UTC)Reply

Gender-specific babel userboxes

I noticed that in many languages our userboxes for languages are written from a male perspective, which kind of bothered me. So I've now made it so that you can adjust the templates to display a different message depending on the gender you set in your preferences. I've made those changes to a few languages that I was comfortable fixing (French, Catalan, Spanish, Dutch, German) but there are lots more out there that I would have no idea how to fix. So this is a kind of request to please help update the templates of the languages you know. Thank you! —CodeCat 14:46, 3 June 2011 (UTC)Reply

I've updated the Hebrew ones. :-)   —RuakhTALK 15:15, 3 June 2011 (UTC)Reply

Language-specific inclusion

Given the current focus on discussing possible language-specific attestation rules, I thought it would be appropriate to create Wiktionary:Criteria for inclusion/Language-specific with some content. And I created it. Feel free to improve it. --Daniel 16:20, 3 June 2011 (UTC)Reply

I think it's a bad idea to put all of it on one page. It could become very long that way. Maybe it would better go on each language's 'about' page? —CodeCat 16:36, 3 June 2011 (UTC)Reply
I suggest using the new page as a list for ease of comparison rather than as a replacement of "about" pages. If the list is expected to become very long, then it is an additional reason to have it, because searching for individual attestation rules at every "about" page would be troublesome.
Anyway, do you have any idea how it could become very long? I'm curious. --Daniel 16:40, 3 June 2011 (UTC)Reply
Because we have many languages? —CodeCat 16:46, 3 June 2011 (UTC)Reply
Having many languages does not necessarily leads to having a long list of language-specific rules. For starters, we have only three listed rules, which make a very short page. When this page grows, we can choose among countless possibilities of presentation of contents. I'd probably suggest simply organizing languages by types of rules: We could create something like a "List of languages that allow otherwise unattestable romanizations" and a "List of forbidden characters by language". Splitting the list into various pages will probably be unnecessary in the foreseeable future, but it's always an alternative. --Daniel 17:07, 3 June 2011 (UTC)Reply
Sorting by types of rules was actually what I had in mind when I started the vote. It is a good system that works even if many languages are involved. -- Prince Kassad 04:51, 4 June 2011 (UTC)Reply

Deprecating 'plurals' categories in favour of '(POS) plural forms'

I know there has been some debate about the plurals categories. Some like them, some prefer we use 'noun forms'. This isn't really about that, it's just about renaming categories that are ambiguous with respect to their part of speech. In English, plurals can only contain nouns, but many languages like most of the Romance languages have plural nouns and adjectives. So I would like to deprecate 'plurals' and suggest that the entries in those categories be moved to '(part of speech) plural forms'. This would make languages more consistent without forcing languages without cases to adopt 'noun forms' as their category for plurals. I have already done this for Dutch. —CodeCat 11:08, 4 June 2011 (UTC)Reply

English proper nouns can have plurals - Johns, Janes, Jacks etc. Mglovesfun (talk) 13:12, 5 June 2011 (UTC)Reply
English pronouns, too. --Daniel 13:14, 5 June 2011 (UTC)Reply
Then use 'English proper noun plural forms' and 'English pronoun forms'? —CodeCat 13:16, 5 June 2011 (UTC)Reply
The fact that there are plural pronouns just serves as a good example of "Category:English plurals" being ambiguous. Probably "English pronoun forms" would be unwanted, because plural pronouns are just individual words, rather than forms of singular ones.
Yes, "English proper noun plural forms" is a good name. Another good name would be "Plurals of English proper nouns". --Daniel 13:23, 5 June 2011 (UTC)Reply
Is it really necessary to categorise plurals of proper nouns any differently from those of regular nouns? I don't know any language where that distinction would really be meaningful. —CodeCat 13:32, 5 June 2011 (UTC)Reply

Please help adding affix categories!

I've been working on reducing the amount of wanted categories, which is going well. However, most of the wanted categories (about half) seem to be categories of affixes. I could easily add all of those with a bot, but I'm not sure if all of them can actually be considered proper affixes in their languages. Some might have been added with {{prefix}} or {{suffix}} when {{compound}} or something else might have been more appropriate. I can't just create all of them blindly with a bot. So I would like to ask everyone here to help tackle this list and bring it down to a more manageable size. Down to nothing if possible!

The list of categories can be found here. You can create those categories in any way you like, but be sure to first check if they should actually be created. We don't want something like 'English terms prefixed with bread-'! Once you've created them and the links have become blue, or if you decided that they shouldn't be created and removed the categories from the entries, please remove them from the list if you can. Thank you! —CodeCat 22:23, 4 June 2011 (UTC)Reply

Help us help you. It would help to know:
  1. whether the affix was a redlink
  2. what template created the category (possibly also hard categorized).
  3. how many members in the category
So far I have found in English use of prefix when suffix was probably intended and vice versa, misuse of confix, redlink for affix. In each case there was only one member of the category. DCDuring TALK 04:35, 5 June 2011 (UTC)Reply
Most of the categories listed have 1 member, except for a few which may have two or three (but none higher). I don't think it's possible to automatically generate a list of what created the category, it would be too much work to do by hand. I have added links to the affixes themselves, though, and a number to indicate the amount of entries. —CodeCat 10:11, 5 June 2011 (UTC)Reply
It seems that PAGESINCAT doesn't work after too many uses and just returns 0 all the time. I could subst: it, but that might show false results. —CodeCat 10:14, 5 June 2011 (UTC)Reply
I've updated the list. It seems that the languages with the most categories that need to be created are English, Finnish, Italian, Serbo-Croatian and Spanish. Are there any editors here that are able to help with those? —CodeCat 14:54, 12 June 2011 (UTC)Reply
Thanks.
One point is that it helps to check whether the etymologies are historically correct. I just checked caco- and found that both etymologies seemed incorrect, thus obviating the need for the category at this time. DCDuring TALK 15:15, 12 June 2011 (UTC)Reply
I continue to find this a better source of historically erroneous etymologies than of missing categories. It is quite time-consuming to properly populate the entries (very much so for suffixes) and/or to correct the etymologies. DCDuring TALK 15:37, 13 June 2011 (UTC)Reply

Administrator rights

I was an administrator for a while, probably about a year, and in any case I've created shitloads of new entries. I have a horrible temper but I don't think I ever abused it in terms of misusing admin commands. I would like to resume that role, mostly so that I can delete spam instead of having to put the delete template on it. Do I need to start a vote or what? Equinox 23:48, 4 June 2011 (UTC)Reply

Do you still have the sysop bit? If not, how did you cease to have it? bd2412 T 01:20, 5 June 2011 (UTC)Reply
See Special:UserRights/Equinox. —RuakhTALK 01:36, 5 June 2011 (UTC)Reply
Done (can be undone again if anyone objects). Please update list of sysops. SemperBlotto 07:09, 5 June 2011 (UTC)Reply
Thanks. Equinox 22:51, 8 June 2011 (UTC)Reply

Removing the horizontal line between language sections

Our standard practice has always been to add a horizontal line between language sections:

----

It seems a little silly to me because we could easily reach the same effect by using CSS, as far as I know. Maybe we should deprecate this and use CSS formatting? —CodeCat 12:46, 5 June 2011 (UTC)Reply

Can the CSS formatting be reasonably used to add the horizontal line above all language sections except the first one? That seems like additional work, but I'm ready to be proven wrong. --Daniel 13:09, 5 June 2011 (UTC)Reply
I think there is a special way to say that a CSS property should apply to only the first or all except the first. —CodeCat 13:16, 5 June 2011 (UTC)Reply
I think I know how this could be done now, but I don't know if it will work.
body.ns-0 h2 { border-top: 1px; }
body.ns-0 h2:first-child { border-top: 0; }
CodeCat 15:12, 5 June 2011 (UTC)Reply
No, that won't work. h2:first-child doesn't mean "an h2 that is the first h2 within its parent"; it means "an h2 that is the first element (of any type) within its parent". —RuakhTALK 15:41, 5 June 2011 (UTC)Reply
h2:first-of-type, but that won't work with IE8 or lower. --Yair rand 17:14, 5 June 2011 (UTC)Reply
Having '----' in the wikitext makes it very easy to separate the language sections when doing any kind of dump processing. Nadando 17:34, 6 June 2011 (UTC)Reply
In October, 2005, Jon Harald Søby argued strongly for removing the ---- and assigning the task to CSS. However, he insisted on removing all of the ---- first and then sometime later perhaps getting someone to add it to CSS. The counterargument was that he should first make it work in CSS and then we could remove the instances of ----. Nothing ever came of it. —Stephen (Talk) 08:55, 8 June 2011 (UTC)Reply

Poll: Specific fictional characters

We have a policy for attestability of terms originating in fictional universes, which naturally includes names of specific fictional characters. Sometimes it seems to be defended as the highest and unquestionable authority on this subject, and sometimes it seems to be ignored in favor of the argument that they are "not dictionary material" anyway.

Since we have a policy for this issue, it should convey exactly what people think — even if it is just conveying the disagreement, like it happens for other subjects.

So, I think it's time to ask a simple question...

  • In your opinion, how many citeable English names of specific characters of works of fiction (proper nouns such as "Mickey Mouse", "James Bond", "Tiny Tim", "Batman", etc.) should be defined on Wiktionary?

Thank you. --Daniel 13:11, 5 June 2011 (UTC)Reply

Poll: Specific fictional characters — All of them

Poll: Specific fictional characters — Some of them

  1. Agree --Daniel 13:11, 5 June 2011 (UTC)Reply

Poll: Specific fictional characters — None of them

Poll: Specific fictional characters — Discussion

I think a discussion of this sort would be a waste of time. The question is not whether we should have one class of thing or another, but what are the requirements for a word or phrase to enter the lexicon. The CFI, agreed to by consent of the community, already inherently answers this poll with "some of them", subject to qualifications also determined by the community. Obviously if I post a story somewhere on the Internet tomorrow about a space knight named Nordskeeb Bemmeron, I can't have that name included in the dictionary on that basis. It is equally obvious that some fictional characters have become lexical terms: Darth Vader, Robin Hood, Captain Kirk, Sherlock Holmes, Casanova, Aphrodite, etc. The whole discussion will always be on the bounds of inclusion. bd2412 T 19:32, 6 June 2011 (UTC)Reply

  • (By the way, don't bother telling me Casanova was a real person, I will refuse to acknowledge it). bd2412 T 19:40, 6 June 2011 (UTC)Reply
  • I approximately agree with bd2412.​—msh210 (talk) 19:54, 6 June 2011 (UTC)Reply
  • I also approximately agree with bd2412. If he posts a story somewhere on the Internet tomorrow about a space knight named Nordskeeb Bemmeron, this name should not be included. My criterion would be if the name can be considered as a word, and belongs to the culture of the language, it should be includable (e.g. Othello should be includable as a word). The same rule should apply to fictional placenames. I know that such a criterion is not something that can be decided by a computer, and that decisions relying on such a criterion would sometimes be disputable, but it's difficult to imagine a better one. Lmaltier 20:11, 6 June 2011 (UTC)Reply
  • I also approximately agree with bd2412. I would actually be O.K. with categorically excluding the names of all fictional characters, but obviously there's no community support for that, and this poll seems thoroughly unnecessary. —RuakhTALK 22:01, 6 June 2011 (UTC)Reply
  • I agree it's a tricky issue, and RFD and RFV seems the best way to deal with them right now, though both of those are (or can be) long processes. I'm not sure what sort of 'policy change' could simultaneously reflect the opinions of the community and make the issue clearer. Mglovesfun (talk) 22:03, 6 June 2011 (UTC)Reply
Thanks for your insights; I think I'll be able to use them in the future. I like this poll, by the way, even if no one else formally votes on it. I fetched some opinions about all the reasonable options, ("Some of them" and "None of them"), which is more-or-less the purpose of any poll (or, at least, any of my polls).
I tend to agree with the apparent consensus of this discussion — of having nonzero entries for specific individual characters, while following aggressively strict, but yet vaguely shaped, rules for their inclusion — though I still fear I'll have to discuss against visceral opinions about this matter in the future. --Daniel 13:11, 8 June 2011 (UTC)Reply

MglovesfunBot request

I propose to empty the categories Category:Translations to be checked (Serbian), Category:Translations to be checked (Bosnian) and Category:Translations to be checked (Croatian) into Category:Translations to be checked (Serbo-Croatian) by changing the language parameter used inside {{ttbc}} to sh. Does anyone oppose this? --Mglovesfun (talk) 16:52, 6 June 2011 (UTC)Reply

I have no problems with it. —CodeCat 17:05, 6 June 2011 (UTC)Reply
Possibly tonight if I'm home early enough. I will leave at least 24 hours from yesterday for objections. --Mglovesfun (talk) 09:27, 7 June 2011 (UTC)Reply


469 entries that your bot changed on June 7, 2011 between 21:49:49 and 23:00:15 should be returned in those three categories that you deleted. Even if those translations were to be checked, they should have been checked in those three categories. -- Bugoslav 13:55, 8 June 2011 (UTC)Reply
Too late. --Mglovesfun (talk) 14:08, 8 June 2011 (UTC)Reply
It is not too late, please read this. Thanks. -- Bugoslav 14:22, 8 June 2011 (UTC)Reply
I don't see any issues around lateness. But see this. --Mglovesfun (talk) 17:09, 8 June 2011 (UTC)Reply

Categories for inflected forms

Such as præserving (which I will modify as Lua error in Module:parameters at line 290: Parameter 1 should be a valid language or etymology language code; the value "præserve" is not valid. See WT:LOL and WT:LOL/E. in a moment), is this how we want our entries to look? I've heard it said that we don't categorize regular inflections such as -s, -ed and -ing, but what about Category:English words suffixed with -eth and Category:English words suffixed with -est? Aren't these purely inflectional suffixes? Especially Category:English words suffixed with -eth where they seems to be all inflection. --Mglovesfun (talk) 09:24, 7 June 2011 (UTC)Reply

Indeed, we don't categorize mere inflectional suffixes. See the discussion at Category talk:English words suffixed with -s. These erroneous usages should be removed. -- Prince Kassad 09:39, 7 June 2011 (UTC)Reply
So what would Category:English words suffixed with -eth contain, nothing? --Mglovesfun (talk) 10:01, 7 June 2011 (UTC)Reply
Not just nothing, it simply should not exist. -- Prince Kassad 10:02, 7 June 2011 (UTC)Reply
That's kinda what I meant, yes. --Mglovesfun (talk) 10:58, 7 June 2011 (UTC)Reply
So, should {{suffix}} only categorize if the category actually exists? Or how should this work? I assume we don't want to start using From {{term|præserve|lang=en}} + {{term|-ing|lang=en}}. everywhere? —RuakhTALK 20:27, 7 June 2011 (UTC)Reply
¶ I should have read this topic beforehand. I suppose it is actually redundant to include the etymology sections since it is essentially duplicating auto‐categorization such as Template:past of. Still, I hope it is acceptable to include links to the original forms, I just made an example here: keying. --Pilcrow 20:34, 7 June 2011 (UTC)Reply
I think there's a quite a good counter-argument to be made; keying is key suffixed with -ing. Our entry says

"Template:linguistics one or more letters or sounds added at the end of a word to modify the word's meaning". Seems to fit the bill. Mglovesfun (talk) 20:42, 7 June 2011 (UTC)Reply

I see no point in adding etymologies to inflected forms, so there would be nothing like From {{term|præserve|lang=en}} + {{term|-ing|lang=en}}.. The current practice is to mostly avoid categories for inflected forms per inflectional suffix that they contain, a practice which I support. --Dan Polansky 07:34, 8 June 2011 (UTC)Reply
I'm more or less indifferent to categorising inflexions by derivation; however, I think that giving them etymology sections is a good thing, and in the case of homographs like (deprecated template usage) needles, they are indispensable. — Raifʻhār Doremítzwr ~ (U · T · C) ~ 11:36, 8 June 2011 (UTC)Reply

-ies forms in English words not following the I before E except after C rule

IMHO -ies plurals such as "currencies" should not be placed to Category:English words not following the I before E except after C rule, as they are non-lemma forms. The following forms have been recently added to the category: obstinancies, obeisancies, magistracies, lunacies, infrequencies, lieutenancies, frequencies, inaccuracies, latencies, idiosyncracies, accuracies, idiocies, kakistocracies, intimacies, supremacies, ecstacies, fancies, fallacies, extravagancies, exigencies, inconsistencies, constituencies, conspiracies, excellencies, Excellencies, consistencies, conservancies, concurrencies, competencies, delinquencies, emergencies, deficiencies, choccies, efficiencies, currencies, biccies, bibliomancies, belligerencies, bankruptcies, bureaucracies, æquivalencies, agencies, adhocracies, accountancies, aristocracies, aberrancies, abbacies, urgencies, mercies, inefficiencies, delicacies, contingencies, democracies, pharmacies.

I am not really sure what Category:English words not following the I before E except after C rule is worth, but that is another consideration.

Thoughts? --Dan Polansky 09:39, 7 June 2011 (UTC)Reply

Afterthought: If I understand correctly, the forms would not belong to the category even if they were lemma forms. The rule of thumb for which the category was created is that "ei" rarely occurs in English words except when in "cei", that is, after "c". This rule of thumbs helps correct wrong spellings. An example of an exception to the rule is "Fahrenheit", as in there "ei" occurs outside of "cei" sequence, in "hei" sequence. --Dan Polansky 09:51, 7 June 2011 (UTC)Reply

¶ That does not make sense. The trigraph c‐i‐e is clearly inconsistent with the ‘…except after C’ part. There are many terms besides plurals which contain c‐i‐e included in that category: ancient, efficient, science, society—need I continue? Those are also mentioned as exceptions in the Wikipedia article. ¶ The word policies remained tagged for months without concern and I did not categorize it. --Pilcrow 10:16, 7 June 2011 (UTC)Reply
It's a bad rule as it has too many exceptions! It's not really a rule at all, for this reason I could go for an RFD - an appendix seems ok as you have more scope to discuss the issue in an appendix than you do in a category. --Mglovesfun (talk) 10:00, 7 June 2011 (UTC)Reply
¶ The category title is clearly consistent with the forms it contains. It is quite misleading to remove the categorization even if the purpose and title are consistent with the word included. --Pilcrow 10:16, 7 June 2011 (UTC)Reply
@Pilcrow, Re: "The trigraph c‐i‐e is clearly inconsistent with the ‘…except after C’ part": That is not clear. The name "English words not following the I before E except after C rule" refers to the rule as "I before E except after C", and this title alone does not make it clear what the rule says. Can you state what it is that you think the rule states? Does the rule also state that "cie" is rare? --Dan Polansky 10:21, 7 June 2011 (UTC)Reply
Related discussion WT:FEED#Wiktionary:Requested entries. I agree with Pilcrow, these are English words (no argument there is there?) that don't follow the rule. So, they're categorized correctly. We allow plurals and whatnot in Category:English palindromes. --Mglovesfun (talk) 10:40, 7 June 2011 (UTC)Reply
@MG: What does the rule say, then? Does the rule also say that "cie" sequence is rare? --Dan Polansky 10:42, 7 June 2011 (UTC)Reply
Yes the rule in its totality is "I before E, except after C". These words show I before E precisely after C. QED, Pilcrow is right. This is a silly category anyway. Ƿidsiþ 10:49, 7 June 2011 (UTC)Reply
"I before E, except after C" is not a complete specification of a rule, but rather a shortcut with multiple interpretations. The interpretation of this shortcut that I find most straightforward is this: "ei" is rare except when after "c" in "cei". Your interpretation seems to be this: "ei" is rare except when after "c" in "cei", and "cie" is rare. I find the statement '"cie" is rare' implausible anyway. --Dan Polansky 10:56, 7 June 2011 (UTC)Reply
I admit that your reading is consistent with W:I before E except after C, and that my reading is probably not the intended reading of the rule. --Dan Polansky 11:09, 7 June 2011 (UTC)Reply
Isn't there another part about sounding like "a" as in neighbor and weigh? bd2412 T 15:09, 7 June 2011 (UTC)Reply
Isn't the full rule “I before E, except after C, for an ē sound”? — Raifʻhār Doremítzwr ~ (U · T · C) ~ 15:13, 7 June 2011 (UTC)Reply
I learned it as BD has it. I've always found it useful, though riddled with further exceptions, as for borrowings from other languages (notably ancient Greek). DCDuring TALK 15:36, 7 June 2011 (UTC)Reply
I don't understand "I before E, except after C, for an ē sound". Could someone provide a fuller formulation of the rule modified for the thing with the sound of "a" as in "weigh"? Does that mean that "weigh" is not really an exception to the modified rule, or does that mean that "mercies" is not really an exception to the modified rule? --Dan Polansky 16:03, 7 June 2011 (UTC)Reply
I don't think any native speaker would apply the rule to words ending in ies, being confronted with such words on every other page. This is just an orthographic mnemonic, not to be taken too seriously except by someone hoping to get an academic publication out of it. DCDuring TALK 16:14, 7 June 2011 (UTC)Reply
I was asking these specific questions about "weight" and "mercies" to understand what the rule says when modified for the pronunciation. If the rule is useful, it is probably also useful for non-natives, right? Anyway, if the category should stay, we should clarify at some point whether it is driven by a rule that refers to pronunciation or a rule that does not. In the former case, it would be good to know in clear unambiguous terms what the rule actually says. --Dan Polansky 16:19, 7 June 2011 (UTC)Reply
¶ Yes it is considerably excessive to mark normal affixed terms as exceptions; I highly doubt this mnemonic is treated as infallible. That said, I think it is acceptable to nominate this category for deletion. I personally have no interest in seeïng an appendix made instead, I do not recall beïng taught this when I was younger in the first place, so I do not have much else to comment about on this ‘rule’. ¶ In frequency: I desire to add supplementary efforts in my edits, so I did clean‐up some entries. My categorization was simply supplementary. --Pilcrow 16:59, 7 June 2011 (UTC)Reply
@DanP. I think that it is an empirical question as the various forms of the rule have little real authority behind them. The clearly inadequate "i before e except after c" get 3800 bgc hits. The more adequate "i before e except after c and when it sounds like a" gets 5. This "rule" seems better suited for WP than for us and our categories. DCDuring TALK 18:12, 7 June 2011 (UTC)Reply
  • You should all watch Stephen Fry and Harry Potter (no, really!) talking about why this rule is not very useful here. Ƿidsiþ 08:28, 8 June 2011 (UTC)Reply
    :). I think I now have a formulation that refers to the sound of "a": '"ei" is rare except when after "c" in "cei" and except when pronounced like "a" as in "weigh"'. I am not saying that this is what was intended but rather that this seems a fairly accurate statement about English spellings. Of course, the "is rare" predicate allows for a host of exceptions, but they should really be rare for the rule to hold. The part of the rule criticized in the program is '"cie" is rare', the part of which I said it was implausible :P. --Dan Polansky 09:01, 8 June 2011 (UTC)Reply

Uncategorized definitionless Chinese entries

For an entry such as , how would contributors feel about Category:Cantonese hanzi and Category:Mandarin hanzi? Or should it be Cantonese han characters? --Mglovesfun (talk) 10:14, 8 June 2011 (UTC)Reply

The current system of treating Chinese characters is unreasonable. Chinese characters were originally invented to record the Chinese language, and applying them in non-Old-Chinese-derived languages was a much later event. The way the characters were designed was tightly associated with their pronunciations at the time of invention in (Early Old) Chinese, as the majority of Chinese characters have a phonetic component. The "etymology" of a character thus comprises two parts: one phonetic and one graphic. In the "definitions" section the original sense of the character in Old Chinese needs to be listed first, and then the rest in a roughly chronological order. The current arrangement of a "translingual" section at the top and putting usually non-Chinese languages next is illogical. The usage of characters in non-Chinese languages is 99.5% of the time determined by their original meanings in Chinese. There is no need to state what the character means in non-Chinese languages again if the meanings in Chinese had already been explained; only their non-borrowed meanings (as determined from the fossilised sinoxenic vocabularies) need to specified.
There are regular sound correspondences between Middle Chinese pronunciation of characters and their modern readings in non-Chinese languages. The development of characters' (literary) pronunciations in varieties of Chinese is also reasonably regular. Provided a Chinese character has its Middle Chinese pronunciation recorded in a rhyme book, one can predict from the initial and finals the expected pronunciations in modern varieties and languages, and that's the way the pronunciations of rarely used characters are usually determined. The categories "Cantonese hanzi", "Mandarin hanzi" are obviously inappropriate. 60.240.101.246 12:10, 8 June 2011 (UTC)Reply
Why? Apparently, it's so obvious you haven't bothered to mention it in the two paragraphs above. --Mglovesfun (talk) 12:28, 8 June 2011 (UTC)Reply
I do kind of understand the reasoning. The Romans, when they created their alphabet, assigned sounds to their letters, just like the Chinese, when they created their script, assigned meanings and sounds to their characters. The problem, though, is that the sound and the meaning can change, and it has in both cases. The Romans created V to represent Lua error in Module:parameters at line 290: Parameter 1 should be a valid language or etymology language code; the value "/w/" is not valid. See WT:LOL and WT:LOL/E. and Lua error in Module:parameters at line 290: Parameter 1 should be a valid language or etymology language code; the value "/u/" is not valid. See WT:LOL and WT:LOL/E., neither of which are now common pronunciations of the letter. And in the same way, Chinese characters have changed over time and may no longer have the meaning they had when they were created. And because of semantic drift, they can differ depending on where they are used, so that some languages preserve more archaic meanings that others lost. This is just as some Latin letters are pronounced closer to the original Roman pronunciation in some languages (Irish always pronounces C as Lua error in Module:parameters at line 290: Parameter 1 should be a valid language or etymology language code; the value "/k/" is not valid. See WT:LOL and WT:LOL/E. like the Romans did, but Slavic languages pronounce an affricate Lua error in Module:parameters at line 290: Parameter 1 should be a valid language or etymology language code; the value "/ts/" is not valid. See WT:LOL and WT:LOL/E.). Granted, most of the Latin letters haven't changed much in sound, and the Chinese characters haven't changed in meaning, but some still mean slightly different things in different areas or languages. A character may signify a meaning that is common in one language but rare or archaic in another. So to call meanings 'translingual' is a bit strange. —CodeCat 12:51, 8 June 2011 (UTC)Reply
Sure but, how does this relate to my question? --Mglovesfun (talk) 12:53, 8 June 2011 (UTC)Reply
What I mean is that I would prefer to reduce the 'translingual' header, and instead allow each language to deal with the characters individually just as we do now with Latin letters. The translingual section itself could stay, but it should only deal with the character itself and its origin and original meaning, and not with contemporary meanings and pronunciations. So I think that just as we allow Category:English letters, we should also allow Category:Mandarin Han characters or some variation. —CodeCat 13:00, 8 June 2011 (UTC)Reply
And if the pronunciation of Middle Chinese is important for understanding the development of non-Chinese languages, then we should simply have a separate ==Middle Chinese== section with the appropriate pronunciation. —CodeCat 13:02, 8 June 2011 (UTC)Reply
This is another thread regarding my drive to empty User:Yair rand/uncategorized language sections/Not English‎. We already have Category:Japanese kanji, which nobody has disputed, I don't see why Category:Mandarin Han characters would be 'clearly unacceptable'. --Mglovesfun (talk) 13:05, 8 June 2011 (UTC)Reply
I don't disagree with it either, that's my point. —CodeCat 13:06, 8 June 2011 (UTC)Reply
Seriously, how can a rude, mentally disabled person be given administrator rights when he has explicitly stated in his own user page that he cannot be held accountable even for his own actions? 60.240.101.246 11:41, 9 June 2011 (UTC)Reply
Do you have an answer to my question? --Mglovesfun (talk) 11:48, 9 June 2011 (UTC)Reply
Any Chinese character, is by default a "Mandarin Hanzi", "Cantonese Hanzi", ... And if the Middle Chinese pronunciation of that character is known, its readings in Korean, Japanese, Vietnamese can also be deduced (see for example this incomplete page for predicting Sino-Vietnamese readings), and in that way it is also a "Japanese Kanji", "Korean Hanja" and a "Vietnamese hán tự". 60.240.101.246 12:18, 9 June 2011 (UTC)Reply
But what's your objection? --Mglovesfun (talk) 12:53, 9 June 2011 (UTC)Reply
  • We don't do that because they only have one script. But I wouldn't disagree with doing something like that for Serbo-Croatian, for example, which uses Latin and Cyrillic. —CodeCat 13:06, 9 June 2011 (UTC)Reply

This question is regarding the addition of an etymon of a said word to the list of related terms. What is the current rule on this, are words already mentioned in the etymology allowed to be added to the related terms list? I am only asking this as I have seen a few users remove these from the list of related terms; therefore it would be helpful to have this matter cleared up. Caladon 11:25, 8 June 2011 (UTC)Reply

I think that there is absolutely no reason to disallow words listed in the etymology to be also listed in related terms.Matthias Buchmeier 11:39, 8 June 2011 (UTC)Reply
I generally remove these when the related terms section and etymology are close together as 'duplication of links', but in long entries, they can be useful as the separation between the etymology and related/derived terms is significant. --Mglovesfun (talk) 11:45, 8 June 2011 (UTC)Reply
I know of no specific rule. I generally remove these. (But I don't remove from the =Related terms= section same-language cognates that happen to be listed in the =Etymology= section.)​—msh210 (talk) 17:28, 10 June 2011 (UTC)Reply
If it's best to repeat such related terms on a long page, then it's best to include these related terms on any page, short or long, since ultimately any page could develop into a long page with pronunciation from several regions, quotations and example sentences, etc. It's not necessary to duplicate them, but if the goal is to develop pages, then removing them is actually a bad idea. DAVilla 19:02, 5 July 2011 (UTC)Reply

ICAO phonetic alphabet

Would this be considered translingual or English? The pronunciation of the words is based on English, but the alphabet is used around the world even by people who don't speak English. As far as I know the standard itself prescribes the pronunciation as well, which isn't specific to English. —CodeCat 14:14, 9 June 2011 (UTC)Reply

I would venture to say translingual. DAVilla 19:04, 5 July 2011 (UTC)Reply

Inflected German participles

As in English, verbs in German have a present and a past participle form (e.g. (deprecated template usage) spielen -- present participle (deprecated template usage) spielend, past participle (deprecated template usage) gespielt). Those participles can be used in such a way that they are syntactically adjectives -- they are also inflected like adjectives then, e.g. die spielenden Kinder ("the children playing"), das gespielte Spiel ("the game played"). How can these verb (or adjective?) forms best be added to Wiktionary (I know of no such entry)? One option would be to add two adjectives for each verb reflecting the present and the past participle -- i.e. (deprecated template usage) spielend would get an adjective section and so would (deprecated template usage) gespielt, the inflected forms would then link to their "base form" (i.e. spielenden would link to spielend). But firstly, that would make a lot of redundant sections, and secondly, most of those participles are only used as adjectives -- i.e., they are adjectives syntactically, but not lexically. (Actually there are many lexicalized participle forms which are proper adjectives now, such as (deprecated template usage) gefragt, those would of course get adjective sections either way.) Does anyone have a better idea or is there even some policy already that I missed? Longtrend 17:15, 9 June 2011 (UTC)Reply

I have often wondered this myself as well. Participles behave as adjectives but they sometimes behave as verbs too. There are some cases where participles can be replaced by adjectives, and some where they can't. —CodeCat 17:19, 9 June 2011 (UTC)Reply
I don't speak German, but from your explanation, it seems to me that these are verb forms — "Strong masculine singular nominative past participle of ____" and so on. I think (deprecated template usage) gespielt would then be something like "Uninflected past participle of spielen", rather than simply "Past participle of spielen". —RuakhTALK 20:21, 9 June 2011 (UTC)Reply
Participles are verb forms that are adjectives. They're both, but their inflection is adjective-like. In a way, a participle in German is like an adjective derived from the verb, except that it retains some verbal properties because you can use it with an auxiliary verb. —CodeCat 21:52, 9 June 2011 (UTC)Reply
Yeah, I think that's true in all languages. That's basically the definition of (deprecated template usage) participle. :-)   —RuakhTALK 22:27, 9 June 2011 (UTC)Reply
Then I don't understand why you think they are verb forms and not adjectives. —CodeCat 22:43, 9 June 2011 (UTC)Reply
I think you may be misunderstanding my original comment. I didn't say "they are verb forms and not adjectives". I said they are verb forms; and you obviously agree with that. But perhaps I said the wrong thing, anyway; what I really should have said was that they are {{infl|de|verb form}}s. But regardless, to answer your question: It's because of what you and Longtrend say above — "most of those participles are only used as adjectives -- i.e., they are adjectives syntactically, but not lexically"; "There are some cases where participles can be replaced by adjectives, and some where they can't" — which matches my (vague) background knowledge, as well as what I've found today by Googling. Past participles in German, as in English, seem to be verb forms that behave in many respects like adjectives, with most not being quite the same as normal adjectives. Present participles I'm less clear on — they don't seem to have any purely verb-y uses (though I'd welcome correction on that) — but it seems natural to treat them the same way (especially since, according to some web-sites at least, most can't be used predicatively, only attributively, whereas obviously normal adjectives have both uses). —RuakhTALK 23:52, 9 June 2011 (UTC)Reply
German participles can be used anywhere regular adjectives can. They can be used attributively and predicatively, and also adverbially (German adverbs of manner are identical to the base adjective). Although it depends on the transitivity of the verb whether the combination makes sense. In theory, based on semantic grounds, some participles such as (present) (deprecated template usage) bedeutend or (past) (deprecated template usage) geeignet even have degrees of comparison. —CodeCat 00:14, 10 June 2011 (UTC)Reply
@CodeCat: Actually, examples such as bedeutend and geeignet are what I meant by lexicalized participles. They are undoubtedly true lexical adjectives now. gespielt, for example, is not lexicalized in this way, that's why I'm asking how to treat it. @Ruakh: Your assumption is right: Present participles don't ever have any verbal uses anymore, yet a present participle can be made out of any verb. It's also correct that they usually can't be used predicatively, only attributively. But it's not correct that normal adjectives always have both uses; actually there are many that can be used only predicatively. Longtrend 08:41, 10 June 2011 (UTC)Reply

Have you looked at how Latin participles work in Wiktionary? I've been thinking the situation must be similar. -- Prince Kassad 23:08, 9 June 2011 (UTC)Reply

They are treated as adjectives but with a ==Participle== header instead, right? —CodeCat 23:12, 9 June 2011 (UTC)Reply
I think so. I like this solution. If we can't decide whether they are verbs or adjectives, why put them in either of these categories? A Participle header probably makes more sense. So if we go with this solution, (deprecated template usage) gespielt would have (only) a Participle section with an inflection table, whereas the lexicalized (deprecated template usage) gefragt would have an additional Adjective section. The inflection tables for the participle and for the adjective wouldn't differ, except that the adjective has comparative and superlative forms. Longtrend 08:54, 10 June 2011 (UTC)Reply
A fairly straightforward treatment seems to be this. When a participle form can behave both like a verb form and an adjective, let it have both sections, as in this revision of "gefragt". When a participle form is only used as an adjective, let it only have an adjective section, and let its origination as a participle be mentioned in its etymology. This treatment seems to be simple and clear, tightly following the notion of part of speech as something that is revealed in positions in sentences that a word form takes, and in inflection. Its only disadvantage, AFAICS, is that it often mandates two sections of one word form, but it is this disadvantage that offers a lot of clarity to the reader: it tells that the word form is used both as a verb--as in "wurde gefragt"--and an adjective. What I admit not to understand is what is meant by an adjective's being lexicalized or not lexicalized: that is, what observable properties a word form has that leads to its being ranked as "lexicalized adjective", and what observable properties a word form has that leads to its being ranked as "non-lexicalized adjective". Whether something originates from a verb should not be a key consideration for its being classified as a verb, or it definitely should not be the only consideration; there are plentiful classes of counterexamples, among which I pick agent nouns such as "doer" or "writer". If "bedeutend" is never used as a verb, all its positions in sentences are adjectival ones, and it is inflected as an adjective, then it is IMHO best classed as an adjective, no matter its origin. This was the treatment of "bedeutend" in this revision; the later addition of a verb form section to "bedeutend" seems wrong. --Dan Polansky 10:48, 10 June 2011 (UTC)Reply
"When a participle form is only used as an adjective, let it only have an adjective section": this would apply to every present participle form then. But the present participle appears in each verb inflection table, so wouldn't it be a contradiction to have only an adjective entry for it? I think participle as a header makes more sense, with an inflection table because each present participle can be inflected. There can then be an additional adjective section for words that originated as participles but have now a meaning on their own. I admit it's sometimes difficult to draw the line between lexicalized and non-lexicalized forms (as it's always the case with lexicalization), but firstly that problem is already there (it would not be introduced with my suggestion) and secondly there are some reliable indicators. For example, if present or past participle forms have comparative and superlative forms, they are certainly lexicalized. "When a participle form can behave both like a verb form and an adjective, let it have both sections": this would apply to virtually every past participle form, apart from some intransitive-verb-based ones. This means that (almost) every single verb would have two sections for its past participle -- quite redundant, don't you think? Why not make one participle section for each verb, and then an additional adjective section for those participles that have developed a new meaning independent from their origin as participles? Maybe you're skeptical because we would need to introduce a new participle header for Geman. I wouldn't want to do that either, but it's already there for Latin verb forms (and perhaps in other languages as well) and I don't see a reason to give it up there. So why shouldn't we use it for German, too? Longtrend 11:59, 10 June 2011 (UTC)Reply
I still don't quite understand the difference between a 'lexicalised' participle and a regular participle. To me, a participle simply is an adjective. In Dutch (which is the same as German in this respect) if I say het (deprecated template usage) gebouwde huis ("the built house") then I don't see any difference between a verb form and an adjective because the participle clearly inflects as an adjective. In het huis is gebouwd, the same thing applies but there are two interpretations: the house has been built, or the house is now in a state of having been built. This split applies to every participle I can think of, not just those that seem to have a different meaning from the verb. And with present participles it's the same... I can't be sure whether (deprecated template usage) doorlatender is simply something that does more (deprecated template usage) doorlaten than normal, or whether it stands apart from the verb. —CodeCat 12:15, 10 June 2011 (UTC)Reply
Okay, back to my initial example for a lexicalized participle, (deprecated template usage) gefragt. It originated as the past participle of the verb (deprecated template usage) fragen, which can be translated as "(be) asked", and is still used as such (e.g. in er wurde gefragt ("he was asked")). As such it is quite clearly a verb form. BUT another form has developed out of it, that differs both semantically and syntactically. (deprecated template usage) gefragt as an adjective means more than its verbal origin suggests -- it's not just "asked", but also "asked for", "in demand", "requested". It is lexicalized just like (deprecated template usage) toaster is not just any person that toasts but rather a special device. And syntactically the lexicalized gefragt is an adjective, inflecting just like other adjectives, and it has comparative ((deprecated template usage) gefragter) and superlative ((deprecated template usage) gefragtesten) forms. Whether the same applies to Dutch (deprecated template usage) doorlatender, I can't tell. But do you know what I mean? Longtrend 12:32, 10 June 2011 (UTC)Reply
@Longtrend 11:59: You have a good point with present participles being in the inflection tables of verbs. The question is whether they really belong there. Many nouns can fairly regularly form diminutives, and many verbs can form agent nouns, but that does not mean we want to treat diminutives and agent nouns as inflected forms. If present participles are always used as adjectives, then they are adjectives, IMHO.
As regards "lexicalized", you have stated only one indicator: whether the adjective has a comparative and a superlative form. But there are adjectives that do not have these forms (see Category:English uncomparable adjectives), so I do not see why this is a good necessary condition for a word form's being a true adjective, or "lexicalized adjective". I still do not know what is meant by "lexicalized", as you have stated only one observable property, such one that is not true of unquestionable adjectives.
Above all, I do not think that having both a verb section and an adjective section is redundant: the verb section contains other information than the adjective section. Yes, they could be merged into one "Participle" section, but who or what stands to gain? The savings in the entry length would be minimal, I think, and the reader has to figure out what part of speech "participle" is. IMHO "participle" is not really a part of speech at all; it is a tag from which a bundle of parts of speech has to be derived. The benefit of having two sections is clarity for the reader, and alignment with a neat notion of part of speech, as I have detailed in my previous response.
Latin: There is now in Wiktionary the part of speech of "participle" for Latin, but it would be good first to understand why this was introduced before you introduce it also for German. "Participle" for Latin was introduced by EncycloPetey, I think, and I vaguely remember he was hesitant to introduce it, but it solved some problem that seemed worth it. --Dan Polansky 12:41, 10 June 2011 (UTC)Reply
What about Esperanto? In Esperanto the part of speech can be determined from the last letter(s) of the word. Participles end in -a, like adjectives do. So in Esperanto there is no doubt that they are adjectives at all. But nonetheless, Esperanto does have compound tenses that use auxiliary verbs with participles even though the participles are clearly adjectives. —CodeCat 12:57, 10 June 2011 (UTC)Reply
You are engaging in the fallacy of equivocation. (See [[w:Equivocation]].) The term "adjective" is frequently applied to all Esperanto words ending in "-a", and the term "part of speech" is frequently applied to groups of Esperanto words ending with the same last letter(s), but such use of said terms is not commensurate with any of use of "adjective" or "part of speech" as applied to other languages. There's a close connection, obviously, in that Zamenhof intended his adjectives[Esperantists] to be adjectives[linguists] and his parts-of-speech[Esperantists] to be parts-of-speech[linguists], but he was a fallible human being — and not even a trained linguist — and anyway the study of syntax has progressed a great deal since his death, so there's certainly no reason to take his analysis as definitive for any language besides Esperanto. (Honestly, I'm not sure we should even take it as definitive for Esperanto itself; but on that point I have no axe to grind.) —RuakhTALK 17:20, 10 June 2011 (UTC)Reply
Inflected German participles — AEL
@Dan Polansky: As for Latin participle headers: Agreed, we should be sure how and why they were introduced. Does anyone remember? Until then I can just say that I like the solution and that a Participle header reflects the fact that those words are something in between verbs and adjectives. Merging verbal and adjectival uses under such a header is the lesser evil, IMO, compared to having two separate sections for one past participle, and perhaps an additional one for "lexicalized" participles (see below), but I guess that's a matter of taste and should be decided democratically, if necessary.
As for lexicalized participles: Have you read my last reply to CodeCat? I think I'm quite clear there. Comparability is only one criterion. The more important one is semantics, see my gefragt example.
Why are present participles included in verb inflection tables? I don't know, but I know that this is common practice in every grammar I know. I wouldn't oppose removing them from verb inflection tables (the arguments are too good :)), but we should be aware that we set apart from common practice then. Longtrend 13:03, 10 June 2011 (UTC)Reply
Re: "... the fact that those words are something in between verbs and adjectives": That is not really a fact AFAICS. Neither do I think that "paper" is something between a noun, a verb and an adjective. One word form can have several part-of-speech classes of occurrences; the part-of-speech classification applies to occurrences rather than word forms.
Re: "Merging verbal and adjectival uses under such a header is the lesser evil": Having several part-of-speech sections is not evil, if you ask me. As regards the requirement of economy, it is imperfect. As regards the requirement of clarity, it is seems excellent. As regards accuracy, once you put each present participle under "Participle" heading, it will no longer be true that each German term classed as "participle" can behave both like a verb and an adjective.
Lexicalization: Per your explanation, in order for a word form to be a "lexicalized adjective", it must have a semantics that is more specific than or otherwise deviating from the semantics that directly follows from the generating word and the suffix. I do not see why we should accept this notion of "lexicalized adjective" as driving our classification: it would be similar to classing some "-ness" word forms in English as adjectives and some as "lexicalized" nouns. Even if the semantics of a "-ness" form is perfectly predictable from the adjective that generates it, the "-ness" form is still a noun, "lexicalized" or not. A German analogue is "Schönheit", which is a noun even if its semantics can be directly derived from "schön" and "-heit". (I was responding to "@Longtrend 11:59", where "11:59" is the time of your post, after an edit conflict, so my previous response indeed did not accomodate your later post.) --Dan Polansky 18:07, 10 June 2011 (UTC)Reply
We really shouldn't worry now about what I call "lexicalized participles". Look again at the entry for gefragt as it is now, we have two sections there, one for the participle and one for the lexicalized meaning (however you want to call it). Neither your nor my proposal would affect that second section, right? (I just brought those lexicalized forms up to make clear that I don't attempt to put everything that looks like a participle or was one originally under a "Participle" header. Let's forget about that now.)
I just did a quick search in the archives and you seem to remember correctly about the introduction of Latin participles. This doesn't mean, however, that it's a bad solution. A lot depends on whether the community thinks that this solution is a good one for Latin -- if yes, then I really don't see any reason to treat German participles differently; if no, then obviously it's a bad idea for German as well. Do you agree with me here?
Perhaps we should consider present and past participles separately (only if we don't go for the Participle header solution, of course). Then treating present participles is easy: since they occur only as adjectives, they would each get one adjective header (no verb one). The only problem would be our current practice of listing present participles as verb forms in verb inflection tables (which, as I said, is common practice in many grammars). What about past participles? According to your suggestion, (almost) each of them would get at least two sections, since they can be used in two ways: as verbs (ich habe das Buch gekauft) and as adjectives (das gekaufte Buch). What makes me worry about this solution, apart from the fact that we'd need two POS headers for each and every verb, at least each transitive and many intransitives, is that there actually are cases where it's not clear at all whether the participle is used as an adjective or as a verb. This is exactly the split CodeCat mentioned above for Dutch: Consider his example, translated to German: das Haus ist gebaut. Is gebaut a verb here ("the house has been built")? Or is it a predicatively used adjective ("the house is built, i.e. in a state of having been built")? I'd say it's something in between, and if that's correct, it's a strong argument in favor of a Participle header. Longtrend 20:37, 10 June 2011 (UTC)Reply
In French, past participles and present participles are verb forms. But, very often, adjectives are derived from these forms. The verb form and the adjective must be considered as distinct words. An example: sucré is a past participle, (e.g. in j'ai sucré mon café), but also an adjective (e.g. in il aime tout ce qui est sucré). In some cases, determining the status of the word in a sentence is not obvious, but you may apply the following principle: when you think to the action expressed by the verb, it's a verb form, when you think to a characteristic of the thing, it's an adjective. In some cases, it may be both depending on what you mean (e.g. the second sucré in Il a sucré son café, puis a bu le café sucré : if you mean the coffee which has just been sugared, sucré is a participle, if you mean the coffee with a sweet taste, it's an adjective). The existence of the adjective is not systematic for all participles. I think that this analysis is also applicable to English and to many other languages. Lmaltier 16:31, 11 June 2011 (UTC)Reply
Yup, so in French, too, there are cases where participles are used ambiguously between verbs and adjectives. Perhaps this discussion should be extended to participles in general, cross-linguistically? By doing this, we could also question whether the Latin way is the right way to go. Longtrend 12:59, 12 June 2011 (UTC)Reply
I think a cross-linguistic discussion might be productive (with the understanding that editors for individual languages will have to reach their own conclusions about what to take from the cross-linguistic discussion). We've discussed this a few times for Hebrew, and haven't yet reached any clear conclusions. I'd welcome input from editors of languages that have similar issues. —RuakhTALK 15:16, 12 June 2011 (UTC)Reply
Maybe a separate discussion page should be created instead, though. This discussion is getting rather long... —CodeCat 15:30, 12 June 2011 (UTC)Reply

Diacritical marks

I would like to add the POS header "Diacritical mark" to ¨, ` and other entries. --Daniel 19:43, 10 June 2011 (UTC)Reply

Why is Symbol not good enough? (If you do want a new header, please also update WT:POS.) Conrad.Irwin 07:34, 11 June 2011 (UTC)Reply
They're not "symbols", are they? A "symbol" generally represents an idea or a thing. What does (deprecated template usage) ` represent in the French word (deprecated template usage) ? —RuakhTALK 13:24, 11 June 2011 (UTC)Reply

I see no objections. The proposal passed. When I have the time, I will feel free to add the aforementioned POS headers to the applicable entries. --Daniel 16:37, 17 June 2011 (UTC)Reply

煽动仇恨者 (hatemonger)

煽动仇恨者 is not sum-of-parts. It is a word because is a suffix. So, it should not be deleted. [1] 2.25.193.35 13:36, 11 June 2011 (UTC)Reply

煽动仇恨者 is not a "word". It's not even attestable. It literally reads: "someone who incites hatred". Go away User:123abc. I'm just gonna keep blocking your multiple user accounts until you do. ---> Tooironic 23:12, 11 June 2011 (UTC)Reply
煽动仇恨者 is a word and attestable, please see Google hits: "煽动仇恨者" 2.25.191.243 23:51, 11 June 2011 (UTC)Reply
"People who incite hatred" also has thousands of hits on Google, that doesn't make it attestable. In that word, is not a suffix. It is only a legitimate suffix in two-character words such as 学者, 记者, etc. Obviously you know nothing about Mandarin word boundaries. ---> Tooironic 07:14, 12 June 2011 (UTC)Reply
So, you base on your "knowledge" to delete 志愿者 too? 2.25.191.247 10:03, 12 June 2011 (UTC)Reply
I agree with the banned user that -者 (zhe3) is a suffix, and that by extension, e.g. 志願者 (and perhaps 煽動仇恨者) are words. It may come from Classical Chinese originally, where it serves a grammatical function, but I'd say it has already become a suffix. Some non-suffix constructions remain, such as "XX者YY也", or like in the example below, but I'd say it's pretty rare. "在學術上,溫泉的學術定義中把湧出地表的泉水溫度高於當地的地下水溫,即可稱為溫泉。" (from Chinese Wikipedia). (Note: I'm not a native speaker.) Vaste 02:39, 13 June 2011 (UTC)Reply
No, it's not a word. Not all suffixations lead to words worthy of inclusion, for example "湧出地表的泉水溫度高於當地的地下水溫者" ("springs which produce water with temperatures higher than those of (local) underground water") from your example above. 60.240.101.246 03:08, 13 June 2011 (UTC)Reply
Yes, that was meant to be an example of where XXX+者 is clearly not a word. I.e. 者 is both a suffix (like in 志願者 or 學者), and a grammatical particle (or something). Or am I misunderstanding the meaning of suffix?
Regarding 煽動仇恨者, I think its use mainly seems to be an artifact of translating "hatemonger" to Chinese, though I honestly don't know. The question is how it is used in Chinese. Is it seen as a word? Is it used as a word? Would it be used outside translation? A "bad translation" that's popular enough is still a word. Btw, how do we treat words that only appear in translation? Vaste 05:19, 14 June 2011 (UTC)Reply
There are two translations of hatemonger from Google for your reference:
"煽动仇恨者" (煽动仇恨+者)
"仇恨煽动者" (仇恨+煽动者) 2.27.72.254 06:09, 14 June 2011 (UTC)Reply
Doesn't seem awfully common, now does it? Come to think of it, why is there no entry for 我爸是李刚? It has 1000+ times more hits on google. :) Vaste 08:31, 14 June 2011 (UTC)Reply

志愿者 (volunteer)

Why is it deleted? 2.25.191.81 03:49, 12 June 2011 (UTC)Reply

Because you created it when you shouldn't be on here at all. You've been blocked time and time again after abusing multiple accounts and anonymous IPs. When will you understand you are not welcome here? ---> Tooironic 00:00, 13 June 2011 (UTC)Reply
志愿者 is not created by me, but is deleted by you. You block me because you hate Pinyin entries, but Pinyin entries are allowed. 2.25.211.239 01:16, 13 June 2011 (UTC)Reply
I don't "hate" Pinyin entries. Understand this: Wiktionary only keeps words which can be attested. Have you ever actually read Wiktionary:Criteria for inclusion? ---> Tooironic 13:30, 13 June 2011 (UTC)Reply
志愿者 is a word and attested. 2.27.72.254 13:37, 13 June 2011 (UTC)Reply
Google hits: "志愿者"
Google Books: "志愿者" 2.27.72.254 21:52, 13 June 2011 (UTC)Reply
Entry restored. But please remember just because a word has 者 on the end of it doesn't mean it is a word in its own right. 煽动仇恨者 will never be recreated because it's not a word in Chinese - the idea that you can put two multiple-character words together PLUS a prefix and call it a legitimate word is absurd. ---> Tooironic 23:51, 13 June 2011 (UTC)Reply
One point that I would like to stress is that Google searches can be helpful, but are not without problems when it comes to establishing a word or phrase as a valid entry. Common phrases or sentences that have a lot of Google hits would never be accepted here at Wiktionary. One silly example is Google hits: 我喜欢唱歌 and 我喜欢唱歌. In this case, something like Baidu would be much more convincing. -- A-cai 23:25, 1 July 2011 (UTC)Reply

Twitter

Who runs the @Wiktionary Twitter account? It tweets the word of the day every day, which is a Good Thing, but it could be a lot more visible if all the tweets were tagged #wotd (which is quite an active hashtag). Is it automatic or is someone actually doing this every day? Ƿidsiþ 09:11, 12 June 2011 (UTC)Reply

Fairly sure it's automatic (it didn't update on days there wasn't a 2011 word), but I don't know how it's being done or who owns the account. — lexicógrafa | háblame12:13, 12 June 2011 (UTC)Reply

Voting is about to close

It is currently 10:41, 12 June 2011 (UTC) , voting closes at 23:59 UTC, so just hours after this post. If you want to vote, please do so. Additionally, there's been an on-going discussion about a recommendation for voting to be extended to give people more time, opinions welcome. --Alecmconroy 10:41, 12 June 2011 (UTC)Reply

You might want to say what the vote is. --Mglovesfun (talk) 16:31, 13 June 2011 (UTC)Reply
It is the Board of Trustees elections. Our own User:GerardM is in the running, so you should think about voting. —Stephen (Talk) 08:56, 14 June 2011 (UTC)Reply
Except that it's too late now. The polls closed thirty-six hours ago. (The results will be announced "tomorrow", though I'm not sure what time tomorrow. It could even be twelve hours from now, 8:00 PM EDT, for all I know.) —RuakhTALK 12:06, 14 June 2011 (UTC)Reply

Tabbed languages, definition editing, again

Previous discussions: Wiktionary:Beer parlour archive/2011/March#Tabbed Languages, Definition side boxes, and Sense IDs, Wiktionary:Beer parlour archive/2011/April#Edit definition gadget

For those who missed the discussions last time around, tabbed languages is a script that displays language sections as "tabs" rather than having languages stacked on top of each other (along with some additional additional features such as category editing and a tool for adding new language sections via WT:EDIT), and the definition editing script allows simple editing of definitions and adding example sentences and such. Since the last discussion, I reworked tabbed languages based on a new design by Brandon Harris, a designer employed by the WMF, and I changed some elements of the definition editing tool, doing away with the expandable boxes. Both scripts can be tested by clicking the button below:

(Please purge your cache so that this button will work.)

During the last discussion it was suggested by Ivan Štambuk that the definition editing tool be enabled "for some representative period of time (e.g. 1 week) so that one can gather statistics (from edit summaries) how many new users (IPs) edited with it, so that we can have hard numbers quantifying its potential benefits". This sounds like a good idea to me, except that since it can take as long as 30 days until all users' browsers no longer cache the previous revision of common.js, the trial period would probably have to be longer than one week in order to get good statistics. --Yair rand 22:02, 12 June 2011 (UTC)Reply

How much work would it be to make it into a Gadget? That way it could be turned on-by-default and off-by-default on a dime. —RuakhTALK 23:13, 12 June 2011 (UTC)Reply
Definition editing options is now available as a gadget, and it seems to work. If I remember correctly, last time I tried to make tabbed languages into a gadget something broke. --Yair rand 03:12, 13 June 2011 (UTC)Reply
(Note: The option to make gadgets on by default is not available in the current version of Mediawiki. --Yair rand 08:20, 11 July 2011 (UTC))Reply
I like it, but there would be some bugs to work out. For instance, if you go to cheese with tabbed languages enabled there's a massive white space next to where the picture is. ---> Tooironic 23:55, 12 June 2011 (UTC)Reply
The picture shouldn't really be placed where it is, though. Ideally nothing should be before the first header, except 'see also' links. —CodeCat 00:18, 13 June 2011 (UTC)Reply
Really? What if the image applies to more than one language? We can't really have them repeating throughout. ---> Tooironic 01:35, 13 June 2011 (UTC)Reply
I'm pretty sure this has been discussed and the consensus was that images should be placed only in individual sections. (Not completely sure about that.) --Yair rand 03:12, 13 June 2011 (UTC)Reply
On second thought it's probably not a big deal. See, for example, tank. Users can see it upon first viewing, regardless of the language. ---> Tooironic 01:36, 13 June 2011 (UTC)Reply
When you select a non-English section, the language codes get dropped from the categories- "English derivations" instead of "oc:English derivations". This seems non intuitive and wrong to me. Nadando 00:50, 13 June 2011 (UTC)Reply
I added that function during the discussion on topic category naming format above. The prefixes really don't mean anything to the reader, and it really doesn't make sense to display them to users if categories are being sorted into specific sections. If there isn't consensus for removing topic category prefixes from the display, I can revert the change. --Yair rand 03:12, 13 June 2011 (UTC)Reply
I don't think those prefixes should be hidden, unless the full-word prefixes are hidden as well. Aside from that, I think tabbed languages is pretty sweet! —RuakhTALK 22:35, 13 June 2011 (UTC)Reply
The problem with hiding full-word prefixes is that then we have conflicting visible category names ([[gerund]] would appear to be in "Nouns" twice.) --Yair rand 23:09, 13 June 2011 (UTC)Reply
All the more reason to retain the distinction, I think. By the way, I find another current asymmetry confusing as well: categorizing [[gerund#Dutch]] under "Flowers" adds the page to [[Category:Flowers]] rather than to [[Category:nl:Flowers]], which is not what I would expect a user to expect. —RuakhTALK 23:21, 13 June 2011 (UTC)Reply
Good point. I've turned off category prefix removal for now. There's no way that I know of for the script to tell if the newly added category is a topic category and requires a prefix. Hopefully the category structure can be reworked at some point to allow for a simple display of categories. --Yair rand 23:52, 13 June 2011 (UTC)Reply
Both look good to me, although I'm a tech n00b so my feedback is kinda vague. There is something with the "add language" button though: once you put in a language name and press Add, it brings up something for about a millisecond, then something refreshes and you lose your progress. — lexicógrafa | háblame00:37, 14 June 2011 (UTC)Reply
It doesn't bring up the new section with input boxes for filling in content? What browser do you use? --Yair rand 00:43, 14 June 2011 (UTC)Reply
Well, in that millisecond I see a green box in the upper corner (same type that appears for other things) and a box in the entry that says "[Language] categories:", but that's it. I'm in Chrome. — lexicógrafa | háblame01:24, 14 June 2011 (UTC)Reply
Hm, it works in Chrome for me. Do you have any gadgets enabled? Does it happen in all entries? Also, does the URL change after clicking "Add" (maybe adding a question mark at the end?)? And does the "Add part of speech" button in the bar on the left also break? Does the [Language] categories box have a (+) icon in it? (And is that too many questions? :) ) --Yair rand 01:43, 14 June 2011 (UTC)Reply
Gadgets as in Special:Preferences; yes. All entries I've checked; yes. Yes, it adds a '?' to the end of the URL. The "Add part of speech" and "Add definition" buttons have never worked for me, and they still don't. No, it doesn't. And no. :p — lexicógrafa | háblame02:23, 14 June 2011 (UTC)Reply
Does it work now? --Yair rand 03:50, 14 June 2011 (UTC)Reply
Yes, it works now. :) — lexicógrafa | háblame11:51, 14 June 2011 (UTC)Reply
Does tabbed languages remove the ability to see hidden categories? —RuakhTALK 02:37, 14 June 2011 (UTC)Reply
Not anymore, thanks for pointing out the bug. --Yair rand 03:50, 14 June 2011 (UTC)Reply
In the entry a, most of the categories appear under the Filipino tab. —Internoob (DiscCont) 17:18, 14 June 2011 (UTC)Reply
That is because the recently added Finnish section following the Filipino section is entirely uncategorized, breaking the category sorting. See User:Yair rand/uncategorized language sections/Not English for a list of similar entries, which will need to be fixed before tabbed languages is implemented. (Or, alternatively, we could modify {{attentioncat}} so that "Category:X terms needing attention" aren't hidden categories, and then bot-add {{attention}} to uncategorized entries...) --Yair rand 17:52, 14 June 2011 (UTC)Reply
Is there a way to edit a language section when tabbed languages is turned on? —RuakhTALK 01:54, 16 June 2011 (UTC)Reply
The former version of tabbed view had the option to edit individual sections, but this one, sadly, apparently doesn't. --Ivan Štambuk 06:26, 18 June 2011 (UTC)Reply
There is now a button for editing individual language sections. --Yair rand 11:00, 27 June 2011 (UTC)Reply
I've started discussions at User talk:Yair rand/TabbedLanguages2.js for various open issues. Some of them are bigger issues than others. —RuakhTALK 02:16, 19 June 2011 (UTC)Reply

Awesome! So now can we list English in alphabetical order with all the other languages, and just default to opening that tab? DAVilla 19:25, 4 July 2011 (UTC)Reply

If this does get enabled, then we could, in theory, but I don't really see what the advantages of moving down English would be. --Yair rand 15:44, 5 July 2011 (UTC)Reply

Multiple context labels and parentheses

The definition of the entry goodness gracious starts this way:

  1. Template:idiomatic Template:euphemism Template:dated

While I think this way would be more appropriate:

  1. Template:idiomatic

That is, I prefer having only one set of parentheses per sense, and all context labels inside it (regardless of which context labels are appropriate). I often "correct" entries that way. Thoughts? --Daniel 08:46, 13 June 2011 (UTC)Reply

This is I think a straightforward change that doesn't need any kind of discussion. -- Prince Kassad 09:11, 13 June 2011 (UTC)Reply
I've considered how to do this by bot, but it would be quite complicated for a relatively small number of entries; merging the labels by hand seems therefore superior. --Mglovesfun (talk) 10:08, 13 June 2011 (UTC)Reply
The bot could, at least, list all the entries that need this cleanup work. WT:Todo/Separated context labels, perhaps. --Daniel 11:07, 13 June 2011 (UTC)Reply
Mine can't! Bequw, Nadando and Ruakh are best at generating these sorts of lists. --Mglovesfun (talk) 16:26, 13 June 2011 (UTC)Reply
I'll make a list, but it will probably have from false positives- I can't very easily tell what's a context label and what's not. Nadando 18:22, 13 June 2011 (UTC)Reply
That's why I didn't want to try and do it by bot! Mglovesfun (talk) 18:25, 13 June 2011 (UTC)Reply

Numbers and numerals

Yes, I know this has been discussed before, and I know the discussion didn't reach a consensus. If I remember right, the discussion centred mainly around whether number or numeral was a better term to refer to the part of speech. I have a slight preference towards number, because it's a little simpler and more people will be familiar with the word. But I don't think there is really going to be a way out of this debate on semantic grounds alone, so maybe we could try a different approach and just look at current practice. Category:Numerals by language contains 197 languages, while Category:Numbers by language contains 50. There are also a few wanted categories for both, but the majority of wanted 'numbers' categories have an equivalent 'numerals' category that already exists. So overall, the vast majority of languages already uses numerals, and this can be considered the more usual practice. Even if we don't all agree on which term to use, it would be easier to use numerals because it means less work. —CodeCat 12:24, 13 June 2011 (UTC)Reply

However, consider the following facts:
  1. Many subcategories of the numeral category are in fact empty.
  2. Some entries are in both the number and numeral categories (like altmyş).
  3. The numerals category doesn't really distinguish between language and script. See Category:Arabic numerals, it contains symbols, and not words in the Arabic language, which makes creating a category for Arabic (as in the language) numerals impossible. -- Prince Kassad 12:55, 13 June 2011 (UTC)Reply
That's a good point, but I remember seeing 'Hindu-Arabic numerals' somewhere as well. The category is also a subcategory of Category:Arabic alphabet, which is kind of strange because the numerals aren't even used in Arabic in that form. —CodeCat 12:58, 13 June 2011 (UTC)Reply
I seem to think EncycloPetey and Daniel Carrero (or Daniel. at the time) strongly opposed numbers. I'm hoping now tha EncycloPetey's gone, Daniel will just recognize he's in the minority and say 'ah well'. --Mglovesfun (talk) 16:29, 13 June 2011 (UTC)Reply
That's not very nice... I agree that consensus is a good thing but to actually pick on people for disagreeing and blocking consensus is a bit much, don't you think? —CodeCat 16:31, 13 June 2011 (UTC)Reply
Depends if it's disagreeing or blocking. Mglovesfun (talk) 18:56, 13 June 2011 (UTC)Reply

Single lowercase entry in situations where both cases are present (English Wiktionary)

From a standpoint of Web navigation and complete comprehension by readers, my proposal is that in cases where both upper and lower cases are used, for example Brook (the name) and brook (the water feature), that we redirect the uppercase entry to the lowercase entry and co-locate the text of both entries on the same page. Case distinctions can be provided on one page. In cases where *only* an uppercase entry exists, the redirection (if any) can be from the lowercase entry.

Current policy states "For languages with two cases of script, the entry name will usually begin with a lowercase letter. Exceptions include proper nouns, German nouns, and many abbreviations." It seems needlessly pedantic to have two separate entries where the spelling is the same.

For some reason, several editors have interpreted this wording to mean that Wiktionary *requires* a separate entry for Proper nouns, rather than a word that *only* has a proper noun meaning being in uppercase. For words with multiple meanings that include non-proper-noun types (verbs, regular nouns, pronouns, adverbs, adjectives, prepositions, conjunctions, and interjections), we don't create a separate entry, so why in the world should a second entry happen just because it happens to be a proper noun?

The practical effect of this change would make it easier to find word variants as well as making maintenance of entries easier.

Thanks. -- Avanu 13:31, 13 June 2011 (UTC)Reply

I'd more or less dispute all of this. First of all uppercase and lowercase isn't all about proper nouns - consider American. Also in this, you'd be redirect American to american, which seems counter-intuitive for an English speaker. First of all, I don't see how it would make anything easier, all information on all pages should be correct and well formatted. If anything, merging pages (and lots of them, tens of thousands) would make the entries larger, thus harder to navigate - consider merging Malta with malta! Thirdly, how about acronyms and initialisms? What do you do with MAN in reference to man and Man? I hate the idea, I suppose that capitalization isn't a spelling issue, but it is an orthography issue, and it seems reasonable that two non-identical forms should not be treated as though they are identical. --Mglovesfun (talk) 16:14, 13 June 2011 (UTC)Reply
Dubiously relevant, but I feel roughly the same about French with and without accents - siecle isn't spelt different from siècle, the spelling is the same, but we keep both forms as they exist. --Mglovesfun (talk) 16:16, 13 June 2011 (UTC)Reply
Just to be clear, I'm essentially talking about the URL, not the appearance of the entry once you get there. Words like "run" have over 100 meanings, and that is just the lowercase version. In situations where this has been done with separate entries, it should be relatively trivial from a programming standpoint to merge such articles. Why should there be two separate entries just because a word is one part of speech and not another? A run in the stockings and we compile at run time and We run to the grocery. The idea that these should all be on separate pages also seems a bit silly, and as you say, I'm only talking about the English dictionary, not the French or any other. -- Avanu 16:22, 13 June 2011 (UTC)Reply
As Mg pointed out, "part of speech" is only a little part of the issue. In German, every noun begins with a capital. Plenty of English capitalised words are not proper nouns, too. Equinox 16:29, 13 June 2011 (UTC)Reply
Best example I can find is Austrian, no proper noun meaning. --Mglovesfun (talk) 16:40, 13 June 2011 (UTC)Reply
Just making sure, you do realize that English Wiktionary covers all languages, right? --Yair rand 19:10, 13 June 2011 (UTC)Reply
Another thing: you're pushing this as a usability win, but it depends on the user. A guy who is an expert user of Wiktionary knows he can visit the URL /ira and see the French verb, /IRA and see the political abbreviation, /Ira and see the girl's name. By mashing everything into one page you are making it harder for him. Equinox 16:45, 13 June 2011 (UTC)Reply
We have 'expert' users of Wiktionary? Shouldn't a dictionary be simple enough for almost anyone to use? -- Avanu 18:31, 15 June 2011 (UTC)Reply
It should be, and it is; but likewise we should not hamper and punish skilled users, i.e. "dumbing-down". Computer software companies have to make the same decisions: making Word easier to use must not remove the keyboard shortcuts that save so much time for experts. Equinox 18:36, 15 June 2011 (UTC)Reply
Well, in that vein, it would be easy to make a redirect that points to a specific anchor in a destination page like:
  1. http://en.wiktionary.org/wiki/IRA - the organization, would redirect to
  2. http://en.wiktionary.org/wiki/ira#IRA
or
  1. http://en.wiktionary.org/wiki/Ira - the name, might redirect to
  2. http://en.wiktionary.org/wiki/ira#Ira
-- Avanu 20:11, 15 June 2011 (UTC)Reply
Isn't this why we have see also at the top of the page? -- ALGRIF talk 16:57, 13 June 2011 (UTC)Reply
I support having a single page [[man]] to cover (deprecated template usage) man, (deprecated template usage) Man, and (deprecated template usage) MAN. Not only is it not always obvious to users (and editors) that we make this distinction in general, but also, it's not always obvious to people reading a book whether a given word is capitalized because it's a capitalized word (of the sort that we would cover at the capitalized entry-name), or just for some other reason. (If I see a word at the beginning of a sentence, how am I supposed to know whether it would have been capitalized otherwise? If I'm reading the U.S. Declaration of Independence, how am I supposed to know that (deprecated template usage) Providence is capitalized because it refers to G-d, and therefore is covered at [[Providence]], when most of the other capitalized words are covered at their lowercase spellings?) —RuakhTALK 18:45, 13 June 2011 (UTC)Reply
Didn't Wiktionary used to make no distinctions between capitalization due to problems in Mediawiki? --Yair rand 19:10, 13 June 2011 (UTC)Reply
The Wiktionaries used to be like the Wikipedias, with the first letter of a page-name automatically being capitalized. I don't know if "problems in MediaWiki" is the right phrase; I think it was an intentional feature. When they changed the Wiktionaries to allow lowercase first-letters, they didn't make such a change on the Wikipedias. —RuakhTALK 19:19, 13 June 2011 (UTC)Reply
Some of the discussions leading up to the original decision to split entries by capitalization: Wiktionary:Capitals again, Wiktionary:Beer parlour/case-sensitivity vote, Wiktionary:Beer parlour/First letter capitalization. --Yair rand 19:30, 13 June 2011 (UTC)Reply

I think that both solutions are acceptable: keeping the Wikipedia solution would have been acceptable (and might have been wiser, despite its drawbacks): yes, it would have been possible to address man and Man (or the adjective parisien and the noun Parisien) in the same page (but MAN must be a different page anyway, unless all letters are capitalized in page titles). Lmaltier 20:55, 13 June 2011 (UTC)Reply

The French Wiktionary is wholly case sensitive, AFAICT other Wiktionaries where I have a handful of edits (Italian, Portuguese, Occitan) are also case sensitive. I see Ruakh's point, though. Other online dictionaries tend to have them on the same page with the headword showing whether it's the capitalized form or not. On WT:FEED you do see readers with this problem; they type in a German word and find an English one, because they forgot to enter the initial capital letter. But on the other hand, you get readers who ask why troika doesn't have a Russian section or para- doesn't have an Ancient Greek section. Mglovesfun (talk) 21:04, 13 June 2011 (UTC)Reply

Support the proposal. Solar System and solar system should be on the same page. See also links are hard to notice and are unintuitive. --Vahag 06:40, 14 June 2011 (UTC)Reply

The reason that several editors have interpreted this wording to mean that Wiktionary *requires* a separate entry for proper nouns is that, until June, 2005, all entries were capitalized the same as on Wikipedia. We had a long, involved discussion here about it and decided to have this feature changed on Wiktionary so that most words would be on pages spelled with lowercase, like brook, but proper nouns and Germans nouns and other words of that sort would be moved to separate pages with a capital letter, like Brook. After this feature was adjusted for us, we used a conversion script to move every article to the lowercase spelling, and then we manually moved proper nouns and German nouns back to the uppercase spelling, and over time we gradually separated all of the pages that had common and proper nouns together into separate pages. In a nutshell, the reason several editors have interpreted this wording thus is because it was our original intent.
If we’re going to merge proper and common nouns on the same page, we should simply have this feature reversed so that all pages are automatically capitalized the way it is on Wikipedia, and go back to the confusing and complicated way it used to be. —Stephen (Talk) 17:02, 17 June 2011 (UTC)Reply
Two-ish questions. Why would you need the title of the page to be capitalized (like you say Wikipedia has it)? Can't they just be lowercase? And, I'm missing something... how is it more complicated/confusing to have all versions of a particular spelling on the same page? -- Avanu 03:42, 18 June 2011 (UTC)Reply
Hmm, this would work by moving all contents to one, lower case page. I guess it would work for Turkish too, where entries with the dotless i would be on a separate page and any word with dotted i:s would be on the same page as any identical English word, if any exist.
How about other languages though? There are a lot of languages to be considered. What about German? Also, showing it would show the title as all-lower case, even in cases where no such word exists. E.g. armenian in English is always capitalized. Similar for German nouns.
Also, is this a suggestion to move all entries for one word to a single page, regardless of script used? E.g. should nippon, nihon, 日本 and rìběn all be one one page? What about hana, はな, 花, 鼻? misu, ミス, 御簾?
Very controversial, a lot of work, with lots of issues. Perhaps useful in the end though. But is it really worth it? Vaste 06:37, 18 June 2011 (UTC)Reply
My proposal is primarily for the English entries (although I realize English has many loan words). Also, something like Armenian, being a proper noun would have a uppercase first letter unless there is a non-proper-noun form. Then both entires would inhabit the same entry, with the uppercase first letter spelling being indicated in that section of the entry. -- Avanu 06:52, 19 June 2011 (UTC)Reply

I think that using capitalized titles (same rule as Wikipedia) would have several advantages:

  • in many languages, all words are capitalized in some cases, especially at the beginning of sentences. It's not always obvious that you have to enter the word as uncapitalized
  • it would be much easier when using automated tools allowing a simple double click on any word of any Internet page (or something of the kind) to get its definition (cf. Wikilook)
  • it might save many pages (maybe 100 000 or so on fr.wiktionary, especially cases such as parisienne/Parisienne)
  • it would make easier to compare the senses of a capitalized word and of the (usually) uncapitalized word with the same spelling.

The main drawback is that it makes pages longer (this drawback is important, because some pages may become very, very long with time).

To Avanu: would it really be reasonable to propose incorrect spellings such as churchill as page titles? And the rule must be the same for all words, not only English words. To Vaste: I think nobody proposed to study nippon, nihon, etc. on the same page. It would be inconsistent with the most basic principle of the project: the access key is the spelling (whatever the language of the word). Lmaltier 19:55, 20 June 2011 (UTC)Reply

What about Vahag's example above, [[solar system]] vs. [[Solar System]]? Would we then keep both entries, one at [[Solar system]] and one at [[Solar System]]? —RuakhTALK 20:12, 20 June 2011 (UTC)Reply
I think so (because the senses are different). But not in cases such as red fox/Red Fox, because the sense is the same, the only difference is that the capitals provide a clue about the general character of the use of the word (not an individual), which is one of the standard and systematic use of capitals. Lmaltier 16:41, 21 June 2011 (UTC)Reply
Lmaltier, how is churchill an incorrect spelling? (unless you're trying to spell zebra) Also, why does this rule have to be the same for *all* words? If so, that seems like a silly rule. I'm just suggesting a way to make this dictionary work in a common sense and easy fashion. The idea that Brook and brook are on separate pages goes against Web usability and common sense. Why add an unnecessary click into the process? -- Avanu 21:56, 20 June 2011 (UTC)Reply
The only correct spelling is Churchill, with a capital. The rule must be the same for all languages for simplicity and conistency, but also because the software parameter for automatic management of page title capitalization is at the project level. Lmaltier 16:41, 21 June 2011 (UTC)Reply
So I assume that CHURCHILL is wrong then? Or what if I wrote a novel and "the Nazi spy quietly whispered the code word to me ... 'churchill', which I knew was chosen because of the prime minister's special interest in this mission." Your point is not well taken since the spelling is perfectly fine, but as our very own rule on Wiktionary says "For languages with two cases of script". In other words... 'C' is a different case from 'c' but is the exact same spelling. Except for uber-nerdom, "we like Linux because it knows its cases!", I can't see the reason there is so much resistance to a common sense approach to dictionary entries. The esoteric idea that people need to have things perfectly cased in order to get to the information they're seeking is just silly. We're trying to help people find information, not making it harder? Right? -- Avanu 01:35, 22 June 2011 (UTC)Reply
No, CHURCHILL is used in some circumstances, and it's not a different spelling, just like using a capital at the beginning of sentences does not make a different spelling. But churchill is never used (when it's used, it's not considered as correct). Have you ever heard of a dictionary removing the capitals, or choosing an uncapitalized entry when the normal spelling is capitalized? (Webster's, maybe: it assumes that, in many words, capitalization is not quite systematic, but it stil uses capitals for some words). Do you really think that using non-standard spellings as entries helps people? Lmaltier 05:19, 22 June 2011 (UTC)Reply
I already argued against your usability/"fewer clicks" argument. Losing the "unnecessary click" adds more unnecessary scrolling and scanning to find one item on a page of many — not to mention that you can already avoid the clicks if you look up the word with the desired case in the first place (e.g. Ira, IRA). Equinox 22:00, 20 June 2011 (UTC)Reply
That scrolling argument is empty though. You can add anchor tags into a web page that move you directly to a section. So why the problem? And what if the version you found isn't even the one you wanted? Oops, I accidentally typed Equinox or eQuinox instead of equinox... guess I'll just have to get my spelling right until I find it. (oh, but wait Equinox already redirects to equinox, what luck! Why don't all entries just work this way?) -- Avanu 22:09, 20 June 2011 (UTC)Reply
Just browsing around Wikipedia and saw the article Capitonym. Very interesting. -- Avanu 04:25, 22 June 2011 (UTC)Reply
Equinox is right. The argument of "save one click = add unnecessary scrolling and scanning" is not empty just because of possible anchor tags. We would have to click on a link to reach the anchor in the first place; ergo, one anchor does not save one click. --Daniel 05:40, 22 June 2011 (UTC)Reply
You obviously don't understand how a redirect works then. You can easily redirect to an anchor. But for some reason you want more clicks instead of less, and more pages instead of fewer. What's the logic for keeping multiple definitions on the same page then? I do not understand the mentality of those who think separate pages make any sense at all when you have the same spelling. -- Avanu 10:54, 27 June 2011 (UTC)Reply

123abc

User:123abc seems to be able to use a massive range of IP addresses, whenever we block him, he moves on. These blocks could potentially stop valid contributors contributing. What can we do, if anything? --Mglovesfun (talk) 16:35, 13 June 2011 (UTC)Reply

Assuming that he really is just a pure vandal — which I kind of wonder about, because he keeps trying to engage us in conversation, but I'm not qualified to judge — so anyway, given that assumption: If we use shorter-term blocks, they'll still force him to "move on" pretty often, presumably lowering his throughput, while lessening the risk of valid contributors being affected. Aside from that — users who are capable of recognizing good and bad Chinese edits, or who are capable of distinguishing his edits from other people's, should spend some time patrolling. About the only thing that I can think of that anyone else can do is maybe help develop tools to identify unpatrolled Chinese-related edits? And maybe to make it quicker/easier to block someone while patrolling? —RuakhTALK 19:15, 13 June 2011 (UTC)Reply
I agree that there are two issues here, the initial block, which I feel uneasy about, and the constant circumnavigation of blocks. FWIW 123abc under his latest alias seemed to give up creating toneless pinyin entries, which is why he was blocked in the first place. I don't really know what his current block was for. Mglovesfun (talk) 19:21, 13 June 2011 (UTC)Reply
I am entering high quality Mandarin toned Pinyin entries for Wiktionary (please see here and a recent example: zhōngnián). I don't understand why did you block me? Ddpy

Account admits editing through IP to avoid block, see diff. Just an FYI. -- Cirt (talk) 20:40, 13 June 2011 (UTC)Reply

Your second blocking was made because you continuously create pinyin entries which are totally unattestable. And before you go do a Google search and post the results here let me remind you that a Google search only tells you if words are used together in a whole variety of non-archived texts. Please, for the hundredth time, I beg you to actually read our WT:CFI so you can learn what criteria we use to keep entries in Wiktionary. Let me give you a simple example - a common pinyin word like nǐhǎo has 28 results on Google Books (note: that's NOT Google Web Search), whilst words like "zhìyuànzhě" and the hundreds of other pinyin entries that you created have ZERO hits. This violates Wiktionary's criteria for inclusion - yet you keep creating them, getting blocked, changing your IP, creating them again, getting blocked again, etc, etc... then you innocently wonder why you are getting this treatment! Give me a break! ---> Tooironic 00:07, 14 June 2011 (UTC)Reply
zhìyuànzhě (volunteer) and shāndòng (to incite) are words and attested.
Google hits: zhìyuànzhě
Google hits: shāndòng 2.27.72.254 00:22, 14 June 2011 (UTC)Reply
How many times do I have to tell you - just because something is on Google does not mean it meets Wiktionary's Criteria for Inclusion. Are you actually reading the messages I am typing to you? ---> Tooironic 00:55, 14 June 2011 (UTC)Reply
Creating strings of unattestable entries, in my opinion, could justify a block per WT:BLOCK "The block tool should only be used to prevent edits that will, directly or indirectly, hinder or harm the progress of the English Wiktionary." --Mglovesfun (talk) 10:48, 14 June 2011 (UTC)Reply
  • If Tooironic believed an entry was unattestable, he should put the entry to Request For Deletion such as here. However, Tooironic just directly deleted the entry and blocked me. Eventually, Tooironic admitted having done wrong and restored the entry (please see here). 2.25.213.208 12:34, 14 June 2011 (UTC)Reply
    • There's no need to RfD entries which are clearly unattestable by doing a search on Google Books, a durable archive. You keep on creating unattestable entries over and over again and then wonder why I occasionally delete an attestable one - you should just not create these entries in the first place! If you're going to keep evading our blocks then the least you can do is do a search on Google Books BEFORE you add an entry, pinyin, hanzi or otherwise. Your behaviour really is a joke. ---> Tooironic 12:59, 14 June 2011 (UTC)Reply

In light of this problem, I propose that we implement a simple and straightforward rule for pinyin entries. All such entries must include at least one citation to a durably archived reference. Any such entry lacking a citation will be speedily deleted on the spot. bd2412 T 15:46, 14 June 2011 (UTC)Reply

I disagree. Are there editors adding valid pinyin entries? If so, why should they have to do extra work just because of one jerk who doesn't care whether we want the garbage he's adding? And if not, then it's academic. —RuakhTALK 16:41, 14 June 2011 (UTC)Reply
I think it would be reasonable to require that the corresponding hanzi entries exist first. If there is already an acceptable hanzi entry, then there should be no objection to making a pinyin entry that links to it. If the hanzi entry if found to be unacceptable, then of course the pinyin would have to go as well. —Stephen (Talk) 16:49, 14 June 2011 (UTC)Reply
That sounds reasonable to me. As far as I know pinyin is a 'secondary' representation, sort of like how umlauts are written as following e in German, or circumflexes are written as following x in Esperanto, if the proper characters can't be written normally. —CodeCat 17:43, 14 June 2011 (UTC)Reply
That also sounds reasonable to me. All pinyin representations are of words originating in a Chinese script. If the original doesn't exist, then the pinyin doesn't actually represent anything. Doubtless some pinyin writings can be found that errantly run together syllables that would be understood in script to be separate words, so we should have the Hanzi term before a pinyin form can be said to exist. I think the "ˈblu.ˌbɛ.ri" point made by Vaste below is instructive on this point; it only exists because blueberry exists in the first place. bd2412 T 17:52, 15 June 2011 (UTC)Reply
That's not true — not for Chinese, and not for English. The pronunciation /ˈblu.ˌbɛ.ri/ came first, and the notation "ˈblu.ˌbɛ.ri" is a representation of that pronunciation. The spelling <blueberry> is another way of representing that word, but the word would exist even if there were no way to spell it. Likewise, pinyin is one way to represent a Chinese word, irrespective of whether there exist Hanzi for it. Obviously it's quite rare, both in English and in Chinese, for a word not to have any spelling at all; but it's not unheard-of. —RuakhTALK 18:12, 15 June 2011 (UTC)Reply
It may be that the word blueberry was part of an oral tradition before anyone wrote it down, but no one would have bothered to create the notation "ˈblu.ˌbɛ.ri" unless and until the written word existed for the notation to explain. The average person would likely represent a word with no spelling by the closest approximation to existing words, or by onomatopoeia, before representing it by a phonetic notation likely to be understood only by dictionary writers. Similarly, there are sounds in Chinese not directly represented by any symbol, and therefore having no pinyin transliteration of the same. There simply won't be any pinyin versions of words that do not exist first in Hanzi. bd2412 T 21:44, 15 June 2011 (UTC)Reply
I think if sometimes (or, often) Hanzi terms are attested when Pinyin terms aren't, it's because Hanzi is used significantly more to write Mandarin, and the fact that our entries reflect that doesn't seem like a bad thing at all. Mglovesfun (talk) 21:31, 14 June 2011 (UTC)Reply
Specific example of what I've said directly above: 香皂 (xiāngzào) gets 14 600 Google Book hits, while the pinyin xiāngzào gets just one; and it's not in Mandarin, though it's a reference to the Mandarin word in running English text. Mglovesfun (talk) 21:35, 14 June 2011 (UTC)Reply
I'd say "xiāngzào" (for 香皂) is as attestable in Chinese as "ˈblu.ˌbɛ.ri" is in English. It is not a form actually used in Chinese. Surely including them serves no other purpose than helping learners (or for indexing/searching etc)? Now, words like LZ, PK, TMD etc. are actually used, though mostly colloquially as internet slang. Vaste 03:05, 15 June 2011 (UTC)Reply
Guys, my point of view might be different from the majority, but what I think is toned pinyin entries should be included - attestable or unattestable, as a way of conveniencing Mandarin learners. What I have problem with is ddpy/123abc's effort to duplicate the definitions - now by duplication, he doesn't actually copy them from the existing character entries, he creates his own.. often creating inconsistencies between his pinyin entries and character entries. I want to see pinyin entries as points of reference to their character counterparts (or pointers / shortcuts if you will), much like the alternative forms we have (pinyin entries are strictly speaking alternative forms), without making them hard redirects. This makes it much easier to create homophonous pinyin entries. I also have issues with him creating suffix/prefix/affix categories, as to me, there is no true affix in Mandarin (or Chinese in general) as it's not an agglutinative language like most Germanic languages. Almost everything comes in compounds and should be labelled as such. JamesjiaoTC 03:45, 15 June 2011 (UTC)Reply
Most Germanic languages are not agglutinative. 60.240.101.246 03:57, 15 June 2011 (UTC)Reply
You know very well what he meant, you’re just trying to lead the discussion astray. Jamesjiao, I agree with you and that was what I had in mind in my comment above. Pinyin entries should link to existing hanzi entries, and all the definitions and examples should be restricted to the hanzi entries. —Stephen (Talk) 08:24, 15 June 2011 (UTC)Reply
Something like {{ja-def}} (hence {{cmn-def}}), or perhaps {{pinyin form of}}. My other problem is that when 123abc posts on my talk page, and I ask him a question, he doesn't reply. All we need to do is get 123abc to read WT:CFI, understand it an abide by it. I don't think it's his less than perfect English that's the problem, I think he's just refusing to play ball; you can lead a horse to water but you can't make it drink!
I also agree with Jamesjiao that we should get rid of all this "prefix/suffix" nonsense - in 90 percent of cases, Chinese does not create words like that. About pinyin entries - users can easily find pinyin via the search box, so I don't agree with adding pinyin entries based on the idea that doing so will somehow make it easier for users to find entries. The only reason I think we should keep them is to allow users to look up pinyin combinations which have multiple readings (that would be really useful!), but we should only define the individual readings at their hanzi entries. So, in short, let's make toned pinyin entries link to hanzi entries without giving any more information. Most pinyin entries, after all, can't be attested anyway. Now who can help create an appropriate template? ---> Tooironic 10:40, 15 June 2011 (UTC)Reply
I've gone with {{cmn-def}}, per {{ja-def}} which I rather like myself. --Mglovesfun (talk) 16:57, 16 June 2011 (UTC)Reply
Pretty simple and straight forward. I'd use the template. I reserve my opinion. Toned pinyin entries yes, but separate defintions, no; linking to character entries with this template, yes. JamesjiaoTC 04:45, 20 June 2011 (UTC)Reply
On User talk:Engirst, 123abc is essentially saying WT:CFI should not apply to pinyin entries, and for the moment, is refusing to budge. Unless he either agrees to WT:CFI, or we agree to not apply CFI to Mandarin pinyin, then there is no middle ground. --Mglovesfun (talk) 12:44, 17 June 2011 (UTC)Reply
Or he can be perma blocked. That's another option? Here is an example of what I want to see on all Pinyin entries: qín - single chars or compounds. Essentially it points to all character entries that have this pronunciation and an optional (very brief) main (very subjective here) definition - no parts of speech headings allowed except for the one for Pinyin. We will need to get the ball rolling here. JamesjiaoTC 04:43, 22 June 2011 (UTC)Reply
Jamesjiao, do you mean like this or like this? (the current revision) Vaste 04:22, 1 July 2011 (UTC)Reply

Wikimania submission on dictionaries

[[wm2011:Submissions/A Smart Dictionary. From Information to Knowledge]] may be of interest.​—msh210 (talk) 18:47, 14 June 2011 (UTC)Reply

Ligatures in English headwords

I wonder whether there has been a discussion about whether we want to have ligatures such as "areolæ" in headwords, alongside "areolae". I cannot easily find the discussion. From what I recall, there was either a tentative decision to avoid English entries with long s (ſ), or people argued in favor of avoiding such entries.

Judging from the long existence of "fœtus" (created on 2 May 2005) alongside "foetus", it seems that headwords with ligatures have so far been tolerated as alternative spellings. Category:English terms spelled with Œ now has 277 entries, while Category:English terms spelled with Æ has 708 entries; both categories were created on 25 December 2009.

I wonder whether it is a good idea to have these entries with ligatures.

What are your thoughts on this?

Do you know of any past discussions to recommend for reading?

--Dan Polansky 10:16, 15 June 2011 (UTC)Reply

I have no problem with alternative forms/alternative spellings where it's perfectly obvious what the word means. This could be ligatures like fœtus and fœtal, but also things like French siecle, the spelling of siècle before roughly 1780. I say keep the lot, and add any more that are attestable. --Mglovesfun (talk) 11:52, 16 June 2011 (UTC)Reply
I agree (w/Mglovesfun). —RuakhTALK 12:13, 16 June 2011 (UTC)Reply

Arabic Afrikaans

I think this would come to a surprise to many people, but Afrikaans was actually written in Arabic script in some Muslim communities. I can't read any Arabic, but I am familiar with Afrikaans, and I think it would be nice if there was at least some basic coverage of this. I'm not sure how this should be done, but I imagine that we could create subcategories for script in the same way we do with Chinese. Category:Afrikaans nouns in Arabic script maybe? Or we could just put the Arabic script entries in with the Latin script entries. I don't think there is a need for a category for Latin script, though. And how should the definitions themselves be formatted? A transliteration and then {{alternative form of}} to redirect to the Latin script entry? I don't really have the knowledge to contribute to this in any way (beyond being able to understand Afrikaans) but I thought it would be good to table (UK) the idea in case anyone would like to work on this. —CodeCat 13:51, 16 June 2011 (UTC)Reply

I like the idea; by analogy Category:Serbo-Croatian nouns in Latin script, Category:Serbo-Croatian nouns in Cyrillic script also sounds good to me; could be added via {{sh-noun}}. Regarding specifically Afrikaans, it might be just as good to tag them by hand, as I suspect there won't be too many of them. --Mglovesfun (talk) 10:52, 17 June 2011 (UTC)Reply
Right now there are no Arabic entries for Afrikaans at all. I was hoping someone would be willing to create some. :) I could copy the words from the Afrikaans wiktionary, but I would rather not do that without some way of verifying it. And their entries don't have transliterations for the Arabic writing either. —CodeCat 12:51, 17 June 2011 (UTC)Reply
I was able to find some more sources and I've now added some entries: (deprecated template usage) كُوْنِڠْ, (deprecated template usage) ڨَارْلِكْ, (deprecated template usage) دي, (deprecated template usage) ان, (deprecated template usage) بَاس, (deprecated template usage) فِـَرْ, (deprecated template usage) اِتْسْ, (deprecated template usage) اِسْ. —CodeCat 14:09, 17 June 2011 (UTC)Reply

Alternative forms inside inflexions

¶ Could I please have permission to include alternative forms inside the varying inflexions of terms? --Pilcrow 01:05, 17 June 2011 (UTC)Reply

Like sayeth on the headword line for say? We've discussed that and the conclusion AFAIR was that we don't want it.​—msh210 (talk) 05:13, 17 June 2011 (UTC)Reply
If we did, we would have {{en-verb}} do it automatically (I hope, anyway). FWIW the counterargument is that if we have terms like sayest, liketh and whatnot we should link to them. --Mglovesfun (talk) 10:57, 17 June 2011 (UTC)Reply
I like that counterargument. Yes, I think we should link to them. --Daniel 12:51, 17 June 2011 (UTC)Reply
Maybe we could turn this into a preference setting, or maybe a collapsible extension? Something like 'show archaic forms of terms', which could then be applied to other languages as well. —CodeCat 12:55, 17 June 2011 (UTC)Reply
  • We should think twice before we place archaic inflected forms on many headword lines of verbs. For "acquire", this would involve adding at least four forms: "acquirest", "acquireth", "acquir'd", "acquiredst"; an expert on archaic forms could possibly provide more. --Dan Polansky 13:52, 17 June 2011 (UTC)Reply
See travel#Verb for an example of how to link to more than one past tense. The problem with uniformly linking to -est and -eth forms is that more recent verbs like upload won't have attestable -est/-eth forms. Another problem is we'd have to manually a lot of them, a few thousand, no doubt. We could use some sort of #ifexist: syntax, but that might be 'expensive' with respect to server workload. --Mglovesfun (talk) 18:50, 17 June 2011 (UTC)Reply

¶ I was wanting to include ‘Alternative forms’ sections inside verb forms, plurals, and the like. Here is an example I made. --Pilcrow 00:21, 18 June 2011 (UTC)Reply

By all means, no. It reduces clarity (makes it harder to find the actual definition). Besides, it informs the reader who is interested in learning the common form about the existence of an uncommon form, an information of little use. That is, the link goes into the wrong direction. -- Gauss 12:10, 18 June 2011 (UTC)Reply

Completely uppercase titles

If you guys are considering making the titles of all entries be case-insensitive, I suggest displaying them completely in capital letters, rather than completely in lowercase letters.

The results would include:

That way, the case-insensitivity of the titles would be much more clear, to newbies and experienced users alike.

Completely lowercase titles would be easily misleading, because they would give the impression that there are English words written like "australian" and "wilson". On the other hand, the titles "AUSTRALIAN" and "WILSON" are not misleading, because all English words can be written with capital letters, regardless of their original spellings, anyway. --Daniel 16:12, 17 June 2011 (UTC)Reply

Yes, I prefer all uppercase to all lowercase, if I had to choose (which right now, I don't). --Mglovesfun (talk) 16:21, 17 June 2011 (UTC)Reply
Yes. If we are merging entries like this, then the all-caps location seems to make the most sense (with redirects, natch).​—msh210 (talk) 16:28, 17 June 2011 (UTC)Reply
Ugh! NASTY! SemperBlotto 16:30, 17 June 2011 (UTC)Reply
This is going to break a lot of templates that depend on {{PAGENAME}}. —CodeCat 16:36, 17 June 2011 (UTC)Reply
True. If the proposal passes, most likely the parameter {{{head}}} will effectively become mandatory everywhere. --Daniel 16:38, 17 June 2011 (UTC)Reply
Not necessarily. If we add body.ns-0 h1#firstHeading { text-transform: uppercase; } to [[MediaWiki:Common.css]], then we can use lowercase entry-names, while still displaying the heading in uppercase. —RuakhTALK 17:11, 17 June 2011 (UTC)Reply
German spellings with ß would have to be changed to SS (Spaß > SPASS), since ß has no uppercase counterpart. This is going to impact every other wiki, since they have copied and linked many of our common nouns and proper nouns and other words the way we have them now. Recently Mglovesfun complained about the capitalized English common nouns on Arabic Wiktionary (see Information desk)...these are artefacts copied from English Wiktionary prior to June 2005 when all of our entries were capitalized. The other wikis are not going to delete or fix all of those thousands of entries copied from English Wiktionary, they will be left as they are. There will be problems with other languages as well. In Turkish, words with lowercase i will see it changed to I (ibibik > IBIBIK), which is a misspelling. —Stephen (Talk) 17:39, 17 June 2011 (UTC)Reply
Your comment is indented like a reply to mine, but most of its contents don't apply to my suggestion. Yes, ß becomes SS: ß — but no, that wouldn't affect other wikis. As for Turkish — well, the heading at the top of the page isn't in Turkish, or in any other language. It's a generic heading, for all entries on the page. But yeah, you're right that it's not ideal, since case-transformations are language-dependent, so, by definition, no language-independent approach can be perfect. Personally, I think we should stick to all-lowercase. That's how other online dictionaries do it, and I think most people are used to it. —RuakhTALK 17:59, 17 June 2011 (UTC)Reply
Hm, I didn't think of dotted vs. dotless I (and similar problems). I take back my above 2c. (With interest, please.)​—msh210 (talk) 18:07, 17 June 2011 (UTC)Reply
Why ibibik would become IBIBIK, especially if the latter is a Turkish misspelling?
Naturally, ibibik would become İBİBİK, wouldn't it? --Daniel 18:18, 17 June 2011 (UTC)Reply
How does the program know it’s a Turkish i? When i is capitalized using English software, it becomes I, not İ. I don’t see how the program could know to capitalized Turkish words in a special way. But I am not a programmer, so maybe it is not such a problem as it seems. —Stephen (Talk) 18:43, 17 June 2011 (UTC)Reply
Almost all the wisdom about type legibility has it that lowercase, with its descenders and ascenders, is much more legible than the single-height appearance characteristic of uppercase. Isn't this obvious? Or do I have to dig out some references? DCDuring TALK 18:13, 17 June 2011 (UTC)Reply
The small advantage of legibility of using only lowercase letters would be immediately suppressed by the need to discern whether or not "wilson" is an English word, as explained by me at the first message above. --Daniel 19:36, 17 June 2011 (UTC)Reply
How could you possibly know the advantage is small?
If it were small then I would expect to have seen:
  1. much more publication using only uppercase, especially in the early days of printing
  2. much less hostility to the use of uppercase in online forums
  3. some use of uppercase for headwords in other dictionaries, especially online.
Even if it were small, I would argue that even a small benefit for users trumps internal considerations and even linguistic considerations not relevant to users.
This seems to be yet another instance of proposing something for purported reasons of internal technical logic. For whom is this place being run? DCDuring TALK 20:02, 17 June 2011 (UTC)Reply
Easy. Wiktionary is my private playground. Now that we have settled this, please answer:
How is the proposal of "all uppercase" more technically logic than the proposal of "all lowercase"? --Daniel 20:15, 17 June 2011 (UTC)Reply
Your opinions and proposals are therefore not to be trusted.
I see no good reason for having all headwords by only uppercase or only lowercase. Orthographic distinctions are often what users seek. This whole line of discussion would be silly, were it not potentially destructive of such meaningful distinctions. DCDuring TALK 20:50, 17 June 2011 (UTC)Reply
Good thing I didn't ask for your trust, then. The proposal of making titles case-insensitive, however, is not mine. --Daniel 20:59, 17 June 2011 (UTC)Reply
No, Ruakh, my comment was not meant as a reply to yours. But how can the other wiki NOT be affected? Right now we do not permit common noun and adjectives copied to other wikis before 2005 to link to our entries, since the capitalization is different. We require the same spelling and capitalization. Therefore, the other wikis had to recopy the articles, which leaves them with duplicates, only one of which matches ours and is permitted an interwiki. If we make this change, the other wikis will have the capitalized entries from earlier years, and the currently uncapitalized entries, and they have both capitalized and uncapitalized varieties for words that can be spell both ways. Are we going to allow all of their capitalized and uncapitalized entries to link to our allcap entries? And are we going to have links from our allcap entries to both the capitalized and uncapitalized forms on the other wikis? Or are we going to continue to require the spelling and capitalization matching, so that all the other wikis will have to copy all of our entries for the third time. Most of the disk space on Wikimedias hard drives is going to be taken up with tens or hundreds of millions of duplicated entries as a result of our switching this capitalization rule again. —Stephen (Talk) 18:36, 17 June 2011 (UTC)Reply
If you had been replying to my comment — my suggestion wouldn't have renamed lowercase entries; we'd still have [[man]], it would just display "MAN" at the top of the page. So interwikis would still be fine. (Interwikis would have been affected in cases like [[California]], when we moved it to [[california]], but that's not specific to the uppercase-display suggestion.) But since you weren't replying to my comment — Turkish isn't a problem, because the renaming to İBİBİK could be handled manually, or with smarter software. (That wouldn't have been possible with my suggestion, but with actual renames it's quite possible to do, as long as we're willing to put in the effort to do it.) —RuakhTALK 18:58, 17 June 2011 (UTC)Reply
I haven't used every other Wiktionary (obviously) but as far as I know we'd be the first Wiktionary to make such a change (all combinations of uppercase/lowercase on the same page) and we'd lose consistency; other Wiktionaries will have two different entries for earth and Earth and we will have just one. It's not a primary concern, sure, but it shouldn't be wholly ignored either. --Mglovesfun (talk) 18:47, 17 June 2011 (UTC)Reply
and if such a thing existed, eArth, the Medieval historical Web portal primarily focused on the exploits of King Arthur. -- Avanu 19:25, 17 June 2011 (UTC)Reply
Another alternative, which may require software changes, is to have the title of the page simply be of the same case as the link you followed to get there (or the term typed in the search box). This way it always appears the way that the reader expects, although some of the senses appearing therein may seem irrelevant. Links will not be broken as long as the software is configured to make page titles case-insensitive (so any casing of the title would link to the same page), which again is possible but may require software changes. Dcoetzee (talk) 23:35, 17 June 2011 (UTC)Reply
yes, Wikipedia already has that, but only for the first letter, so it (the software) considered earth and Earth identical. Mglovesfun (talk) 23:41, 17 June 2011 (UTC)Reply
I still don't see this (this proposal, if it were enacted) as a victory for usability. If there were a vote on the issue, and right now that looks unlikely, I think I'd oppose it unless there were clear long term benefits that outweighed a few years of having to fix entries, and a few years of regular Wiktionary users being very confused. Mglovesfun (talk) 23:45, 17 June 2011 (UTC)Reply
  • I can't believe this is actually being discussed. In order to accommodate for 0.1% of same-language words that differ merely in capitalization, are we going to break the proper spelling of 99.9% of others? Have you all gone mad? If you're so bothered with duplication existing on entries such as [[earth]] and [[Earth]], just merge them normally, with some soft-redirect from capitalized to the uncapitalized form. --Ivan Štambuk 05:57, 18 June 2011 (UTC)Reply
Why did all of you forget that Wiktionary (or actually, the SQL software that hosts this wiki) has no post-Unicode 3.2 case pairs? Stuff like ɫ -> will never work, period. -- Prince Kassad 09:48, 18 June 2011 (UTC)Reply
I agree with Ivan Štambuk, it’s folly to think of changing this back to the way it was in the early days (I’m sure there are some here besides me who remember how it was in the old days in case-insensitivity). These issues were discussed three ways from Sunday back in 2005 and our current treatment seems much simpler and more logical than what we had to deal with in those days. If the {{also}} solution seems clumsy and unintuitive (as someone said above), just work on that narrow issue and leave the case-sensitivity as is. Perhaps instead of {{also}} we could put a big graphic button that leads to a dab page. —Stephen (Talk) 10:09, 18 June 2011 (UTC)Reply
I agree with Stephen and Ivan Štambuk. Furthermore, it seems to me that if we were going to merge entries, it would seem to me more useful to merge euenhede and evenhood (the same word), than to merge Ira (a name), IRA (a group), and ira (a word). I wouldn't merge any of it, though, in the way being discussed — I would continue to use advisories like {{also}}, or the graphics Stephen suggests, to point at other capitalisations. - -sche (discuss) 18:38, 18 June 2011 (UTC)Reply
I think Ivan's argument is the best, simplest common-sense reason not to implement this. --Mglovesfun (talk) 11:29, 20 June 2011 (UTC)Reply
Yes, I'm afraid I mentioned this idea, but I also disagree. ira and Ira could be merged, not IRA. Lmaltier 18:29, 20 June 2011 (UTC)Reply

Ben Zimmer at The Word (Boston Globe)

`The Word' is usually written by Jan Freeman, but she's recently been sharing duties with Erin McKeen, former dictionary editor. Now Ben Zimmer, late of the NYT Magazine's `On Language' column seems to have joined the team. He's got a nice piece in Sunday's paper (yes, I know it's not Sunday yet, but that's the date on the article.) Check it out.--Brett 02:17, 18 June 2011 (UTC)Reply

Thanks. Way more interesting and important than categories. DCDuring TALK 02:30, 18 June 2011 (UTC)Reply
For the purposes of this huge new dictionary, Gove had set down a rule that all definitions be written as one-phrase statements, carefully organized according to principles of analytic logic. - And that's how it is should be done. Having usage notes, examples of usage in collocations, and any other additional material that facilitates learning/understanding of a word is not mutually exclusive with this basic principle of defining the word. I'd rather have a convoluted but a farily comprehensive definition, with as much sub-senses that can be discriminated, than a vague dumbed-down one lacking precision and missing some specific but important details. --Ivan Štambuk 05:46, 18 June 2011 (UTC)Reply

Diacritical marks of various languages

I created Catalan, French, Portuguese and Spanish sections for the entry "´". --Daniel 11:42, 18 June 2011 (UTC)Reply

Looks good. Mglovesfun (talk) 11:43, 18 June 2011 (UTC)Reply
Thanks. --Daniel 11:45, 18 June 2011 (UTC)Reply
Is there a way to explain what the mark actually means in that language? It might be useful to indicate that the acute accent denotes a stressed close or mid-close vowel in Catalan, or that it makes a consonant palatal in Polish. —CodeCat 11:48, 18 June 2011 (UTC)Reply
I wouldn't mind doing that, but perhaps this information would be better displayed at the entries of "complete" characters, such as á or ú. For starters, they are supposed to contain audio and IPA eventually, anyway. --Daniel 11:54, 18 June 2011 (UTC)Reply
Maybe, but something like palatalisation is a useful hint. It wouldn't really be very obvious if Ć just said 'palatal affricate' (and right now it doesn't) without mentioning that the acute accent palatalises many other consonants beside Ć. The same applies to other diacritical marks that indicate a specific pronunciaton feature, as well. —CodeCat 12:01, 18 June 2011 (UTC)Reply
Ideally, the entries Ć and ć (both, presumably) are supposed to contain an etymology, a pronunciation section, a few written examples, audio examples as well, and something like "the acute accent (´) palatalizes consonants like this one". By placing that information only there and not at ´, we are not losing information, just organizing it.
That said, if you would like to mention things like these at ´ as well, though, I suggest creating an "Usage notes" section for that. --Daniel 12:13, 18 June 2011 (UTC)Reply
I've seen the new "usage notes" sections for diacritical marks. Very good. --Daniel 15:38, 18 June 2011 (UTC)Reply
Oh, and as an aside... it seems that Źź is missing from the Latin/Roman list of characters at the bottom. —CodeCat 12:02, 18 June 2011 (UTC)Reply
Erm, no. Or at least I don't think so. AFAIK (not knowing Catalan, French, Portuguese, or Spanish), these languages use not (deprecated template usage) Lua error in Module:parameters at line 290: Parameter "sc" should be a valid script code; the value "unicode" is not valid. See WT:LOS., the spacing acute accent, but rather the combining acute accent.​—msh210 (talk) 05:09, 19 June 2011 (UTC)Reply
I believe that, from the point of view of a computer, the Portuguese word cáspite, despite being written with an acute accent, includes neither the "spacing acute accent" nor the "combining acute accent". The character "á" is a standalone character; it encompasses only 1 byte.
Moreover, I believe that ´ is the perfect place to contain its current contents. Its name in Unicode (according to the entry) is "ACUTE ACCENT", after all. --Daniel 08:15, 19 June 2011 (UTC)Reply
Yes. Wiktionary is written by human beings for human beings after all. If the computer doesn't recognize that "acute accent" and "combining acute accent" are the same character and that á is a combination of that character with the letter a, that's its problem, not ours. Human beings know perfectly well that when you put a ´ on top of an a, you get á. —Angr 16:57, 19 June 2011 (UTC)Reply
msh210 is certainly right that these languages are using the combining acute accent (@Daniel: Unicode, which is what MediaWiki uses, defines "á" as equivalent to "a" + the combining acute), but I think we should simply redirect from the combining-acute-accent to the spacing-acute-accent entry, which can explain both, since the former causes weird browser behavior when it's used out of context. (And also for some of the reasons that Angr gives.) —RuakhTALK 18:55, 19 June 2011 (UTC)Reply
This idea seems big, uncontroversial and very good. These are the typical ingredients that make good votings, so I created this one. --Daniel 00:11, 20 June 2011 (UTC)Reply

call#poker

In hero call and crying call I've deliberately linked to call#poker. It would be possible to make {{context}} provide anchors for such links; call has a lot of meaning, and while to some who players poker, the meaning of call is unambiguous, to someone who doesn't, they're gonna want to find the poker meaning of call directly in one click. Thoughts? --Mglovesfun (talk) 18:14, 18 June 2011 (UTC)Reply

Splitting Serbo-Croatian categories by script

This proposal was mentioned before, so I thought maybe we should see if there is support for this. Should we split the categories for the Serbo-Croatian language(s) into two each, one that has "in Latin script" and one that has "in Cyrillic script" added at the end? —CodeCat 10:35, 19 June 2011 (UTC)Reply

There are other languages written in multiple scripts, why focus on Serbo-Croatian? If this should be decided by vote, it should be done generally.
Another thing: Serbo-Croatian has been historically also written in Glagolitic and Arabic script. We don't have any words in them yet, but they will be added in the future once certain issues are settled that have been discussed in the past (e.g. whether the distinction between uppercase and lowercase Glagolitic that exists in the Unicode but not in the real world should be made, how to standardize spellings because there were no official rules of orthography beck then etc.)
Regarding this and similar categorizations: I'm against them all, because they are utterly pointless. Cyrillic and Latin scripts do not overlap in category listings, and it's trivial to skip unnecessary letters using the TOC template. What this will do is simply duplicate existing categories without providing any real benefits. We already have countless irrelevant categories that nobody actually uses for browsing that need to be deleted (such as "nouns by gender" in some languages, and all the subcategories in "XXX verb forms"). --Ivan Štambuk 10:53, 19 June 2011 (UTC)Reply
Instead of splitting them, how about as additional categories? Keep all verbs in Category:Serbo-Croatian verbs but also allow Category:Serbo-Croatian verbs in Latin script. Mglovesfun (talk) 10:57, 19 June 2011 (UTC)Reply
But, why split existing or at these additional categories at all? Because it feels good? If it serves no purpose, it shouldn't exit.
The only way to do this is to manually add the respective category, because there is no way to detect the script of the {{PAGENAME}} programmatically. If this should be applied to topical categories as well, special-handling code for SC and sc= parameter should to added to every categorizing template and its appropriate call in every instance of their invocation in SC entries - which is just too much work for nothing.
I agree that some of these categories are quite pointless. The categories for Category:Bulgarian noun forms and Category:Bulgarian adjective forms are probably good examples (and I know I created them, but that was because I was deleting older categories that were even worse). But I don't think script categories are pointless. Something like a TOC is a lot less obvious to new users than a category called "in Latin script". —CodeCat 11:00, 19 June 2011 (UTC)Reply
TOC is the first thing the users see, at the top of the category page. Categories themselves are inconpicuous links at the bottom of the page. I can't fathom how one could see the latter and not the former. --Ivan Štambuk 11:13, 19 June 2011 (UTC)Reply
I agree with Ivan. Splitting by script is useful when there's overlap; if we don't split Mandarin by Simplified vs. Traditional, then a user interested in just one will have to sift through the other. But when there's no overlap, I don't see the need. (I'm not actively opposed, mind. I just don't see the need.) —RuakhTALK 13:23, 19 June 2011 (UTC)Reply
As I said at Wiktionary:Grease pit#Distinguishing categorised and not-categorised scripts, I don't see the point either. The category's TOC already sorts the entries into Latin and Cyrillic script. This proposal seems like increased complication for no added benefit. I also don't see the point of voting on the issue. Decisions at Wikimedia Projects are made by discussion and consensus-building, not by voting. —Angr 16:53, 19 June 2011 (UTC)Reply

Splitting Serbo-Croatian categories by script — Support, split the categories

  1. Support --Daniel 10:46, 19 June 2011 (UTC)Reply
    I don't speak Serbo-Croatian, so I'm happy either way. If Serbo-Croatian speakers support this proposal, I go along with them. If they don't support it, I don't, too. --Daniel 11:17, 19 June 2011 (UTC)Reply

Splitting Serbo-Croatian categories by script — Oppose, keep the scripts together

  1. Oppose Chinese can use its own system if so desired, but this doesn't really work and shouldn't be applied to other languages. -- Prince Kassad 16:56, 19 June 2011 (UTC)Reply
  2. But, why split existing or at these additional categories at all? Because it feels good? If it serves no purpose, it shouldn't exi[s]t. = good argument. It is more convenient for the reader to have both Cyrillic and Latin scripts available to choose from on one page, instead of having to jump around from one to the other. The current category splits the scripts very conveniently. Time making the proposed alterations would be better spent expanding Serbo-Croat content. Tempodivalse [talk] 17:07, 19 June 2011 (UTC)Reply

Japanese acute accent

The Japanese translation of the word water is romanized as "mizú".

What is the purpose of the acute accent in Japanese? --Daniel 21:39, 19 June 2011 (UTC)Reply

See Stephen G. Brown's first comment at #How to mark Japanese Pitch Accent, revisited (for batch import). —RuakhTALK 22:33, 19 June 2011 (UTC)Reply
It is a pitch accent and it can affect the meaning. For example, hana desu , ha desu means "is it a nose or is it a flower" ( = hana = nose; = haná = flower). —Stephen (Talk) 12:33, 20 June 2011 (UTC)Reply

That sounds clever. Thank you. --Daniel 16:35, 21 June 2011 (UTC)Reply

Wow, there seems to be ~400-500 translations with such accents. They're not terribly consistent and could probably use some clean-up. Also, isn't it a weird place to put the information? (Why not in the entries themselves?) Still nice though. Vaste 04:01, 1 July 2011 (UTC)Reply

Renaming proto-language codes

So far, the practice on Wiktionary has been to use the code for language families for their common ancestor language as well. And this has worked, mostly. But now I've run into a problem because I'm creating a template that has a parameter that can be either a language or a family. And it has to be able to tell them apart. This means that the practice of using {{proto:gem}} for Proto-Germanic and {{etyl:gem}} for the Germanic languages no longer works. I've proposed changing this before, but only now I realise how much of a problem it really is. So I would like to rename proto-language templates from {{proto:gem}} to {{proto:gem-pro}} and likewise for others. —CodeCat 12:52, 20 June 2011 (UTC)Reply

Re: "I'm creating a template that has a parameter that can be either a language or a family": That seems like a mistake. What is the template for? —RuakhTALK 15:58, 20 June 2011 (UTC)Reply
Derivation categories, which need categories both for terms derived from individual languages and from families. It works well so far, the only problem is that it gets categories for terms derived from proto-languages wrong, because it assumes the code is a family rather than that family's proto-language. I realise there is always the possibility for overlap, but it seems that the ISO 639 standard was designed so that a code always uniquely identifies only a language or a family, never both. —CodeCat 16:15, 20 June 2011 (UTC)Reply
Is it desirable for a single template to handle both language-family derivation-categories and language derivation-categories? Those two cases seem very different to me. —RuakhTALK 20:20, 20 June 2011 (UTC)Reply
{{etyl}} does that, too, and has never caused any problems. I don't really see a difference between Category:nl:English derivations and Category:nl:West Germanic derivations. And the new categories (determined by vote) with the new template are Category:Dutch terms derived from English and Category:Dutch terms derived from West Germanic languages, which are the same as well. I see no reason for having two templates to create a single unified category tree. —CodeCat 20:27, 20 June 2011 (UTC)Reply
  • Please provide more details:
    1. What existing templates are you trying to replace.
    2. What affect would the proposed change have on unnamed parameters for language families and proto-languages in {etyl}, {proto}, {lx} and others. Which would change and which not.
    3. Since derivations from language families are much rarer than from proto-languages, why not simply default to proto-languages and provide special syntax for families instead? --Ivan Štambuk 19:00, 21 June 2011 (UTC)Reply
    • The template is {{derivcatboiler}} and is needed as a result of this vote. It will replace {{topic cat}} for such categories, so there will still be a lot of family categories. The change would not really have an immediate effect, because it would just mean redirecting {{proto:gem}} to something else (and likewise for other proto-codes). Once that move has been made, all instances of the original code will need to be changed, but that should be fairly easy because most uses of these codes are either in category boilerplate templates or in calls to {{lx}} and {{termx}} (there are about 1500 transclusions of {{proto:gem}}, the most-used code). Once all uses of the old code have been replaced, the redirect will be deleted so that the code gem refers only to the Germanic language family, and not to Proto-Germanic. —CodeCat 19:13, 21 June 2011 (UTC)Reply
      I support the proposal of splitting the codes. Germanic and Proto-Germanic are different things. --Daniel 19:25, 21 June 2011 (UTC)Reply

Esperanto x-system spellings allowed?

While looking through Index:Esperanto I saw that some words are listed with the "x" and "h" system spelling variants (e.g., look at Esperanto translations for shampoo). Since these are inherently nonstandard, should we disallow them? A look through WT:About Esperanto and other EO-related pages doesn't reveal any past precedent. Tempodivalse [talk] 22:03, 20 June 2011 (UTC)Reply

I think if they are attestable in the same way as other spellings, they can be included. —CodeCat 22:05, 20 June 2011 (UTC)Reply
What he said. --Mglovesfun (talk) 22:07, 20 June 2011 (UTC)Reply
Ummm...? —CodeCat 22:08, 20 June 2011 (UTC)Reply
We don't usually do transliterations, and that's all this is. They're useless; everyone knows how to transliterate out of these systems for the Wiktionary entry, and how to transliterate into these spelling systems when you're dealing with a system that can't handle the accents. It's universal; there's no point in only having entries for spellings we can attest, since for every word, ĉ becomes cx.--Prosfilaes 02:17, 21 June 2011 (UTC)Reply
Yes! That's what I was thinking, but you explained it better. So can I start going around and removing those pesky "transliterations" from the index? Tempodivalse [talk] 15:33, 21 June 2011 (UTC)Reply
If we decide not to add them, we should at least allow the search function to find them that way. Currently, if you search for "cxiu" it doesn't find anything, but if you look for "ciu" it works. —CodeCat 15:52, 21 June 2011 (UTC)Reply
The Esperanto Wikimedia projects have such a system implemented. I'd like to have it here, but am concerned that it will conflict with other languages (for instance, "aux" would need to show both the French aux and the Esperanto , instead of defaulting to the latter). Tempodivalse [talk] 16:07, 21 June 2011 (UTC)Reply
Well, if the search function already considers "c" to be a variant of "ĉ" and "ae" to be a variant of "æ" and "ä", couldn't the same be done to treat "cx" as a possible variant of "ĉ" too? —CodeCat 16:46, 21 June 2011 (UTC)Reply
Yes, I would say, go ahead and remove those "transliterations": they are not part of official Esperanto AFAIK.   AugPi 18:23, 21 June 2011 (UTC)Reply
AugPi, if that's your argument - that they are not part of official Esperanto - I want you to go nominate every slang term on wiktionary for deletion.
I don't necessarily think we *really* need entries for these, although I do believe they should be mentioned in the articles - under ===Alternative forms===, maybe unlinked (though I suppose that would be somewhat against "all words all languages"). These forms aren't uncommon, as the letters they represent in Esperanto aren't common on most keyboards. If you don't have them on your keyboard, which you probably don't, you use the x forms. I've only seen two sites that switch what you type for the "proper" forms automatically - the Esperanto Wikipedia and one Esperanto forum. These forms are not negligible, and they're one of the most important things for beginners to learn about Esperanto orthography. — [ R·I·C ] Laurent23:37, 22 June 2011 (UTC)Reply
I don't think entries are really needed, because they are essentially predictable orthographic varieties. In German, ä, ö and ü can replaced with ae, oe and ue in the same way. But I do think that the search should support these varieties, so that searching for cx finds words with ĉ, just as searching for "gruen" finds the German grün. —CodeCat 23:46, 22 June 2011 (UTC)Reply
I've wanted this for a long time. How would we go about implementing it? Tempodivalse [talk] 00:58, 23 June 2011 (UTC)Reply

Wiktionary:Todo/quotation format

This is a list of the 500 longest lines in Wiktionary starting with '#:*'. Any quotation starting with #:* is wrong, but it's impossible to distinguish quotations from other things that might conceivably use this format, which is why I sorted by length. Nadando 01:07, 21 June 2011 (UTC)Reply

Details of saints in etymology.

I've taken to removing mentions of famous saints from name entries, (Monica, Ciaran etc) and have had my edits reverted with the following left on my talk page. (I've no problem discussing it, but feel it's best for a wider audience).

  • There is no data of the name Monica before St. Augustine's Confessions. The mention of the saint is quite essential to the etymology. Ideally all etymologies should say when the name was first assessed- compare Per. That's why saints are put into etymologies. Do you have written proof that the name Ciarán existed before St. Ciarán of Saigir? I will put back that saint too unless you provide evidence. I've the impression that you edit on basis of your personal experience - and that's quite valuable in discussion rooms about English usage - but questions of etymology and frequency must be based on dictionaries and statistics. --Makaokalani 09:09, 21 June 2011 (UTC)Reply

My reply - Mentioning that Saint Monica was the mother of Saint Augustine is encyclopedic and does not belong in a dictionary. It does not pertain to the etymology which is to discuss how the word originated. It may well be the first mention of the name, but that is still not etymology. Unless there is a clear consensus to include this, (and I can't for the life of me see how it could be justified), then this information should be removed.--Dmol 10:19, 21 June 2011 (UTC)Reply

Popularized by 'Saint X' perhaps? Don't people only become saints after their death? So the names most exist before that. But 'popularized by'... don't seem 'encyclopedic' to me in an etymology. --Mglovesfun (talk) 10:21, 21 June 2011 (UTC)Reply
The only reason the obscure Punic word Monica entered the English language is that it was borne by St. Augustine's mother. This information belongs very well in the etymology section. Do not remove it. --Vahag 11:57, 21 June 2011 (UTC)Reply
+1 —RuakhTALK 15:18, 21 June 2011 (UTC)Reply
I agree with Vahag. This does not apply to all given names, not even to all those with an associated saint. Names mentioned in Old Testament are similar, though one wonders how many Hebrew names were not mentioned in the Old Testament. DCDuring TALK 16:16, 21 June 2011 (UTC)Reply
Many. (And I, too, agree with Vahag.)​—msh210 (talk) 18:43, 21 June 2011 (UTC)Reply

"Variations of..." namespace

I suggest shortening and simplifying appendices of variations this way:

--Daniel 18:43, 21 June 2011 (UTC)Reply

Could something like this be turned into a dropdown menu somehow? That would be even nicer I think. :) —CodeCat 18:51, 21 June 2011 (UTC)Reply
If you're thinking what I'm thinking, that seems like a good idea. However, drop-down menus involve JavaScript, so I would advise asking for the help of any of our resident JavaScripters (I'm a humble templatizer.) --Daniel 22:31, 21 June 2011 (UTC)Reply
I'm thinking of something like the move/delete buttons for administrators, but with text rather than just an arrow, and placed to the left. —CodeCat 22:34, 21 June 2011 (UTC)Reply
Maybe placing a button only there would be a bad idea, because the area above the title of the entry is basically reserved for editors, and ignored by other readers. However, we can repeat the button somewhere more readable for readers if necessary. I don't know yet; there are many possibilities of design. I would, at least, have to see the new button(s), if they are created, to give more accurate opinions. --Daniel 05:43, 22 June 2011 (UTC)Reply
The Serbian Wikipedia and Wiktionary have a dropdown box there to switch between scripts. I think we could do it in a similar way. —CodeCat 16:50, 22 June 2011 (UTC)Reply
I didn't know that! OK, I checked one page (http://sr.wiktionary.org/sr-ec/%D0%B0%D0%B1%D0%B0) and my fear has been confirmed. In my opinion, the idea of adding a drop-down list up there is good for anyone who knows where to look for it. Otherwise, the menu would be effectively hidden, much like the "Citations" button. Good thing we have {{seeCites}} in the middle of the applicable entries, as an additional clue. --Daniel 17:45, 22 June 2011 (UTC)Reply

I created Wiktionary:Votes/2011-06/Disambiguation: namespace. --Daniel 17:55, 22 June 2011 (UTC)Reply

Anagrams from Dutch compound words?

See note at: http://en.wiktionary.org/wiki/Talk:achterblijven regarding Anagram at: http://en.wiktionary.org/wiki/achterblijven#Dutch

This is like listing an anagram of "mail box" at entry "mailbox" except the Dutch language is very heavily loaded with compound words. For example, the separable verbs: http://en.wiktionary.org/wiki/Category:Dutch_separable_verbs -- would we want to break all of these into the two words comprising them and tag them as anagrams? (I am guessing not, but if so, it could probably be done quickly to all of them with a small script of some sort.) Neededandwanted 23:51, 21 June 2011 (UTC)Reply

This is not an issue: mail box is not an anagram of mailbox. Anagrams are words with the same letters in a different order. In French, prisa, pairs, ripas and prias are anagrams of Paris, but paris is not an anagram. Lmaltier 16:48, 23 June 2011 (UTC)Reply
I think the issue is a little different in this case. In Dutch, there are verbs that are composed of a separable part (usually an adverb) and a main verb. In the infinitive, the separable part comes first and is attached to the verb: (deprecated template usage) achterblijven (achter-blijven). But in some inflected forms, the parts swap places and are separated with a space: (deprecated template usage) blijven achter. There are many of these verbs, and according to the current practice those two verb forms would always be anagrams of each other. —CodeCat 17:10, 23 June 2011 (UTC)Reply
So, it's an issue, but not really a problem: there is not much harm if the anagram bot includes these "anagrams". Lmaltier 19:16, 23 June 2011 (UTC)Reply
Another example of countless true anagrams formed in a systematic way: in French, verb forms such as tomberai/retombai, tomberons/retombons, etc. Lmaltier 05:15, 24 June 2011 (UTC)Reply
Agreed on 'issue... not really a problem' but Disagreed on retombai/tomberai -- these are two different words with different meanings. They have taken advantage of a prefix 're-' and suffix '-er' and are nearly as different as the English 'return' and 'turner.'

The compound words I am referring to certainly include all separable verbs. Two separate words are combined unaltered otherwise to form another 'word' but the Dutch speakers don't even think of them as being a new word, just the non-separated form of the verb. It is just another form of the same verb. There is no direct comparison in English. http://en.wikipedia.org/wiki/Separable_verb So,the compound words are a bit of an extreme case of non-anagrams: they are not a word or phrase created from ANOTHER word. There is no different meaning. It is entirely a grammatical construction.Neededandwanted 04:36, 28 June 2011 (UTC)Reply

toned pinyin entries - hanzi redirects

Based on the discussion we recently had about User:123abc, and the fact that most toned pinyin entries can't be attested according to Wiktionary's criteria for inclusion, I'd like to propose we make toned pinyin entries merely list and link to hanzi entries, rather than give definitions. For example, right now yánlì only lists "沿例... follow precedents" as one reading, but we should change it to a list of the different hanzi readings without giving any definitions to avoid duplication, e.g., something like:

  1. (pinyin reading of) 嚴厲 (trad.), 严厉 (simp.)
  2. (pinyin reading of) 妍麗 (trad.), 妍丽 (simp.)
  3. (pinyin reading of) 沿例
  4. (pinyin reading of) 岩櫟 (trad.), 岩栎 (simp.)
  5. (pinyin reading of) 沿歷 (trad.), 沿历 (simp.)

Now who can make a pretty template for us? ---> Tooironic 01:18, 22 June 2011 (UTC)Reply

  • "Google Boooks" are not the only means for attestation. We shouldn't use "Google Books" as an excuse to ban Pinyin enrties. "Mandarin pinyin" likes "Min Nan pinyin", "Min Nan pinyin" sometimes also doesn't pass the "Google Books check" (Please see here). Anyhow, Pinyin entries are allowed by the rule of Wiktionary. If someone wanted to ban Pinyin entries, first of all, Wiktionary should has a new rule instead. However, a rule shouldn't be abolished rashly, otherwise Wiktionary will be harmed. Engirst 21:30, 22 June 2011 (UTC)Reply
{{pinyin reading of}}: when the second parameter is not present it displays # 3 above- is this the correct behavior? Nadando 01:44, 22 June 2011 (UTC)Reply
I'm not sure what you mean. By the way is it possible to incorporate (trad.) and (simp.) into the template? This would make it a lot easier to edit. ---> Tooironic 01:53, 22 June 2011 (UTC)Reply
I wouldn't use brackets for the text. I would use something like what {{form of}} uses instead. —CodeCat 09:43, 22 June 2011 (UTC)Reply
He means the first parameter is traditional, the second parameter is simplified, and if the simplified (second parameter) is left empty, then you get #3, meaning that it is both traditional and simplified. —Stephen (Talk) 11:00, 22 June 2011 (UTC)Reply
Can someone make the changes at yánlì? I'm not really sure how to do it. ---> Tooironic 10:18, 24 June 2011 (UTC)Reply
Does it look good now? Do you like this approach? (The current version of {{pinyin reading of}} is not set in stone; it's just one way to do it.) —RuakhTALK 12:44, 24 June 2011 (UTC)Reply
I changed the heading to Pinyin. For some reason, the category shows up as Mandarin pinyins instead of Mandarin pinyin. How do I change that (Mandarin pinyin is an existing category). We will also need to make this into a rule just for 123abc's peace of mind. So that he no longer has any excuse for creating pinyin entries his way. Oh, btw, this looks exactly how I wanted it!JamesjiaoTC 22:04, 28 June 2011 (UTC)Reply
I've modified {{pinyin reading of}} to handle the categorization, so for the headword line you can just do {{infl|cmn|sc=Latn}}. (It is possible to override the categorization in {{infl}}, by doing something like {{infl|cmn|cat=pinyin|sc=Latn}}, but that doesn't seem necessary here.) —RuakhTALK 23:52, 28 June 2011 (UTC)Reply
I think it looks really good. Shall we start a vote on this? Make this a standard practice for Mandarin entries on Wikt. JamesjiaoTC 23:25, 29 June 2011 (UTC)Reply

No categories

A dictionary doesn't need categories, and the current development is running amok, so I'll propose that we limit the use of the category name space to maintenance issues.--Leo Laursen – (talk · contribs) 08:12, 23 June 2011 (UTC)Reply

What about making more of them hidden so that at least users don't see them until there is more defensible logic behind each individual? Wiktionary might also benefit from some kind of criteria and review process for category membership.
Some visible elements of the category structure are helpful for making things like specialized glossary indexes, eg, the context-tag-produced categories that reflect specialized usage contexts for technical terms. The categories function just about right for that purpose. DCDuring TALK 15:10, 23 June 2011 (UTC)Reply

We need categories. Two basic examples:

  • Without categories, how would you search a Japanese word if you don't know how to enter Japanese characters with your keyboard?
  • If you know the name of a fish, but cannot remember it, how would you find it without a category dedicated to fish names in the language?

Of course, there are many other uses of categories. Lmaltier 16:42, 23 June 2011 (UTC)Reply

Just a related note: Wiktionary:Beer parlour archive/2011/February#Poll: Deprecation of topical categories failed. People want some categories. --Daniel 16:50, 23 June 2011 (UTC)Reply
If a category contains more than 200 entries, it shouldn't really exist. Overwhelming majority (my free estimate >90 %) of the many thousands of existing categories are used by basically no one. IMHO they should be replaced by bot-generated specific indexes (i.e. lexicons) similar to those already used to generate Index:All languages. These indexes could then be fine-tuned for a specific purpose. E.g. in case of topical indexes for foreign languages they could contain a definition gloss. In case of etymological derivations the respective etymon(s) could be added and used for grouping. And we could also have reverse indexes for translations generated from English translation tables. --Ivan Štambuk 18:27, 23 June 2011 (UTC)Reply
A search for "wiktionary category" [without quotation marks], in Google Groups, returns 39.300 results.[2] --Daniel 18:36, 23 June 2011 (UTC)Reply
Most of these are false positives. Google search results are also not indicative or actual usage (usage as in "learning words by browsing the category". Randomly clicking a single entry is not using the category.). Note also that I do not advocate abolishing categories, simply superseding them by a superior presentation format customizable through wiki markup. They are useful, but this could be done so much better. Even the simple format used for our meager glossaries beats them. --Ivan Štambuk 18:48, 23 June 2011 (UTC)Reply
Simply excluding "wikipedia" cuts the raw count to 10,400. Most of the usage has nothing to do with Wiktionary categories. Of the portion that does, a great deal has to do with grammatical and register categories. There is little evidence there or elsewhere to support user enthusiasm for topical categories. It doesn't come up as a user complaint on Feedback or on Wiktionary discussion pages. DCDuring TALK 19:01, 23 June 2011 (UTC)Reply
@DCDuring: Our topical categories typically are barely usable and justifiable, so I wouldn't expect much positive recognition for them. I see some usage of topical categories in the initial results, such as "food" and "theology", nonetheless. --Daniel 19:15, 23 June 2011 (UTC)Reply
Note that categories are not just intended for users - they're also there to faciliate bot operation and indexing for creating dictionary databases based on our data. Therefore, no categories should be abolished without careful examination of the consequences for automated scripts. -- Prince Kassad 18:55, 23 June 2011 (UTC)Reply
I'd be fascinated to find out about such bots. Where have the bot owners been communicating their needs? DCDuring TALK 19:05, 23 June 2011 (UTC)Reply
I personally use categories, even very large ones. Yes, even categories with 1 000 000 entries are useful, when there are links at the beginning of the category to make access easier. I have never used them in my bots, but they may be very useful to bots. Lmaltier 19:12, 23 June 2011 (UTC)Reply
There's a weird predicament where small categories and large categories aren't very useful. I have used categories for non Latin script languages like Russian where I can't easily type the word, and typing in Category:Russian adverbs and looking at the category is much quicker. As for bots, yeah I've done that with MglovesfunBot, but only really to update categories rather than fix non-category problems. --Mglovesfun (talk) 17:50, 24 June 2011 (UTC)Reply
I use categories a lot for maintenance work, and also to see if there are any words missing or to see if entries aren't formatted the way they should be. —CodeCat 11:09, 25 June 2011 (UTC)Reply

I was worried about the tendency to regard a dictionary as an extended encyclopedia of words, and the inherent misconception that words in the daily language can be defined rigidly like a scientific term. The current fixation on categories seems to emphasize that. I do mean that categories are superfluous, but naturally it was mostly an expression of my exasperation. Anyway I've left the project. It was fun, while there was a chance it would develop in the right direction. Thank you.--Leolaursen 08:07, 25 June 2011 (UTC)Reply

Bye. --Daniel 15:40, 1 July 2011 (UTC)Reply

Poll: Chinese script or Han script

The poll is here:

WT:RFM#Categories ending in "in traditional script" to "in Traditional Han script".

--Daniel 16:57, 23 June 2011 (UTC)Reply

Transferring interwikis of categories

I had a conversation with Malafaya (in Portuguese, here) about interwikis of categories.

According to the conversation, in order to keep the interwikis, we should do any of these things, or both of them:

  1. Either copy the interwikis manually.
  2. Or turn the old categories into redirects.

For example, "Category:fr:Spanish derivations" could contain this:

#REDIRECT [[Category:French terms derived from Spanish]]

[[es:Categoría:FR:Palabras de origen español]]
[[fr:Catégorie:Mots français issus d’un mot espagnol]]
[[pt:Categoria:Vocábulo de étimo espanhol (Francês)]]
[[ru:Категория:Слова испанского происхождения/fr]]

That way, all the interwikis should eventually be transferred by MalafayaBot. After that, the old empty categories can be deleted. --Daniel 18:58, 23 June 2011 (UTC)Reply

Would it be possible to use a bot to add 'soft' redirects to all of the old categories instead? —CodeCat 19:08, 23 June 2011 (UTC)Reply
Soft redirects would not transfer the interwikis... Perhaps the hypothetical bot could, at least, do the transfer or create the hard redirects that result in the transfer. --Daniel 19:58, 23 June 2011 (UTC)Reply

One thing to add: if the original category doesn't have any interwikis, you don't have to bother creating/keeping a hard-redirect. I believe most categories will fall in this case. Malafaya 22:56, 23 June 2011 (UTC)Reply

Let's keep this simple- I can run movepages.py on the remaining categories with redirects turned off. Everything gets moved including the history and we don't have to worry about deletion. Nadando 15:03, 24 June 2011 (UTC)Reply
The example above would actually categorize the redirect - use {{movecat}} instead. --Mglovesfun (talk) 17:46, 24 June 2011 (UTC)Reply
If you put a colon before Category (i.e., #REDIRECT [[:Category:...), it won't be categorized. Keeping both categories (without any of them being a redirect to the other) will prevent update of interwikis with bots in auto mode as they will eventually find 2 cats here for the same foreign cat. Malafaya 19:44, 24 June 2011 (UTC)Reply

I have been "fishing" some of the new categories and updating interwikis accordingly via bot. I believe it got most of the cat's created until now. Malafaya 13:14, 27 June 2011 (UTC)Reply

Do you think you could keep doing that? It would be a lot easier than if we had to do it manually... —CodeCat 13:54, 27 June 2011 (UTC)Reply

Scripts of Punjabi

What are the writing systems of Punjabi? --Daniel 23:57, 23 June 2011 (UTC)Reply

See [[w:Punjabi language#Writing system]]. It uses three: Gurmukhi (the most commonly used; related to other Indic scripts); Shahmukhi (a variant of the Arabic script, via Persian and Urdu; used in Pakistan); and Devanagari (least commonly used; borrowed from Hindi, for which it's the main script). —RuakhTALK 00:12, 24 June 2011 (UTC)Reply
Thanks. Should we have a code for Shahmukhi? --Daniel 00:29, 24 June 2011 (UTC)Reply
My opinion is that we should use {{pnb}} to designate Punjabi as written in Arabic script. This is what Wikipedia does. -- Prince Kassad 00:36, 24 June 2011 (UTC)Reply

I edited the category of Punjabi to make it show Gurmukhi and Devanagari. Feel free to add Arabic there, if necessary as well. --Daniel 16:25, 24 June 2011 (UTC)Reply

Form of templates

Would it be possible for all form of templates to work in the same way regarding capitalization and final full stop (that is, period)? This seems to be a matter of disagreement; can a vote on the subject be avoided? Probably not. FWIW, the French Wiktionary based such templates having a final full stop a few years ago, as the nodot parameter seems quite confusing - instead, if you want a full stop, erm, write one. PS apologies for my lack of participation in this thread, as am having a bit of a break. --Mglovesfun (talk) 15:59, 24 June 2011 (UTC)Reply

I support this proposal to standardise the templates that are used as definition lines, either with or without a dot. - -sche (discuss) 03:56, 26 June 2011 (UTC)Reply
The dot is freakin' easy. Just take it out. If you want a dot at the end of the line, add a period after the template. None of this nodot= bologna.
Capitalization I don't care how you handle, but here's another idea to consider: {{plural of}} vs. {{Plural of}} with obvious meaning. The real question is, how would the lowercase form be used? In other words, do we need some magic where walked is easily defined as follows?
  1. {{form of|Past|past participle|walk}}.
DAVilla 19:04, 4 July 2011 (UTC)Reply
Capitalization and punctuation can't be made standard unless we forbid the inclusion of other information on the definition line in combination with the template. As it stands, sometimes a form-of entry requires a definition at the outset, or an explanation after the form information. Standard capitalization and punctuation will not be possible as long as the other information is to be included. --EncycloPetey 19:11, 4 July 2011 (UTC)Reply

Standardizing a few codes

I propose changing the codes for these 11 things, to standardize them.

The proposal is organized this way: "name", "old code" then "proposed code". (And updates are shown either with an underline or with a line-through.)

  • American English - AE. - en-usa
  • Austrian German - AG. - de-aus
  • Ecclesiastical Latin - EL. - la-ecl
  • Late Latin - Late Latin - la-lat
  • Mediaeval Latin - ML. - la-med
  • New Latin - NL. - la-new
  • Old Latin - OL. - la-old
  • Old Northern French - ONF. - fro-onf - fro-nrn
  • Provençal - prv - oc-prv
  • Shanghainese - Sha. - wuu-sha
  • Viennese German - VG. - de-vie - bar-vie
  • Vulgar Latin - VL. - la-vul

--Daniel 16:20, 24 June 2011 (UTC)Reply

So we should take a bunch of tags that are clearly unstandardized, and move them into a controlled namespace? No. la-new is not a valid language tag, but looks like one. NL. is at least clearly not a valid language tag.--Prosfilaes 17:04, 24 June 2011 (UTC)Reply
Why would it not be a valid language tag? —CodeCat 17:06, 24 June 2011 (UTC)Reply
It's not that it wouldn't be a valid language tag, it's that it isn't a valid language tag. Language tags are assigned by designated authorities. la-new could be created, but it hasn't been. (Actually, I'm not even sure it could be. Ext-lang subtags are always alternatives to regular language subtags — for example, zh-yue is equivalent to yue — and new is already a language subtag, with a different meaning. I'm not sure whether that means la-new can't ever be created, or merely that it won't be. But it amounts to the same thing.) —RuakhTALK 17:36, 24 June 2011 (UTC)Reply
We've already created many other codes in this way, though. We use {{roa-jer}} for Jerriais, and {{gmq-osw}} for Old Swedish. Maybe the only thing we need to change is that the first part should be a family? —CodeCat 17:57, 24 June 2011 (UTC)Reply
Well, I think we should reevaluate those codes. I'm not sure who exactly "we" is; I don't remember those discussions. Maybe some of them are worth keeping, despite their nonstandard nature; but even if so, that doesn't automatically mean that we should make up codes for everything that pops into our head. —RuakhTALK 19:13, 24 June 2011 (UTC)Reply
Please reply CodeCat's question, as I'm curious too. in addition, what's the difference? We have the code "itc", which is not for a language. --Daniel 17:10, 24 June 2011 (UTC)Reply
I agree with this proposal but there are some things to work out. Does American English include only what is spoken in the US, or also in Canada? And does it really need a separate code? For Old Northern French, I think {{fro-nor}} would be better because the 'old french' part is already contained in the 'fro' part. For Viennese German, maybe {{bar-vie}} or {{bar-wie}} would be better, because it's a variety of Austro-Bavarian. There is also Ecclesiastical Latin, which could be {{la-ecl}}? —CodeCat 17:06, 24 June 2011 (UTC)Reply
I don't mind using "American English" to refer to English spoken only in USA. I don't mind the possibility of alternative proposals, either. --Daniel 17:21, 24 June 2011 (UTC)Reply
OK. I updated the proposal like you said. Except I added "fro-nrn" somewhere, because I erroneously more-or-less associated your alternative "fro-nor" with Norman. --Daniel 17:21, 24 June 2011 (UTC)Reply
That's not erroneous. Old Northern French is another name for Old Norman. —RuakhTALK 17:46, 24 June 2011 (UTC)Reply
But what about the ancestor of Walloon? —CodeCat 19:18, 24 June 2011 (UTC)Reply
. . . interesting. Thanks for the correction. I've dug a bit further, and found that while some sources don't seem to distinguish "Old Northern French" from "Old Norman", other sources use "Old Northern French" as a slightly broader term, covering all the Oïl dialects that didn't palatalize the "c" in words like "castle". I have no idea how we're using the term; did our ONFs come from the OED, or from Webster 1913? How does that source use the term? —RuakhTALK 18:10, 25 June 2011 (UTC)Reply
I'm not sure I agree with the general principle here, but leaving that aside for a moment . . . why would we use the nonstandard tag en-usa ("American English") when the standard tag en-US ("English as spoken in the US") is widely recognized and understood? Why de-aus instead of de-AT? Similarly, why use our own nonstandard tag oc-prv instead of at least the semistandard oci-prv? (And some of those should simply be eliminated, I think. Why distinguish Shanghainese from Shanghaiese/Wu? Why distinguish Viennese German from Austrian German?) —RuakhTALK 17:46, 24 June 2011 (UTC)Reply
I don't really know why they are distinguished, but these are all existing etymology templates, which implies that there will be derivations categories for them at some point. —CodeCat 17:59, 24 June 2011 (UTC)Reply
Fair enough. I guess the point is that as long as these are just etyl: templates, they don't need to be real language codes, and they certainly don't need to be fake ones. I'm not sure whether it's necessary to distinguish Viennese German from Austrian German in etymologies, but even if it is, it's certainly not necessary to distinguish them anywhere else, so we don't need "codes" for them. —RuakhTALK 19:13, 24 June 2011 (UTC)Reply
Note: for Latin sound files, we have been using la-ecc, not la-ecl. --EncycloPetey 19:14, 4 July 2011 (UTC)Reply

A code for Crimean Gothic

It is generally agreed that this language isn't really 'Gothic' at all, and that it didn't descend from the Gothic language of the 4th century. The only thing that's known is that it is East Germanic in origin, and descends from one of the languages of the Germanic people that migrated to eastern Europe in Roman times. But despite that, we call this language Gothic on Wiktionary, as if it were the same language as its 1000 year older sister. That's why I'd like to propose that we use a different, separate name for this language, with its own code, such as (deprecated template usage) gme-crg, and its own set of categories. —CodeCat 21:23, 24 June 2011 (UTC)Reply

Linguist List uses got-cri. If we're going to use a non-standard code, I think it should be that. —RuakhTALK 21:35, 24 June 2011 (UTC)Reply
But that code implies that it is a variety of Gothic, and it isn't, really. Crimean Gothic is to Gothic what Middle Dutch is to Old Saxon, more or less. —CodeCat 21:39, 24 June 2011 (UTC)Reply
I understand that, but it's still better than making up our own code . . . —RuakhTALK 21:45, 24 June 2011 (UTC)Reply
What's wrong with making up codes? Apparently that's what Linguist List did. --Daniel 21:58, 24 June 2011 (UTC)Reply
Linguist List is the official code standard for extinct/historical languages. I believe that only covers primary language subtags, not extended-language subtags, so got-cri is still not exactly "standard", but at least the code is well-documented by a recognized authority. —RuakhTALK 22:19, 24 June 2011 (UTC)Reply
OK. You just didn't convince me. I'm going to believe CodeCat, and vote for the precision of "gme-crg", over LL's authority. --Daniel 22:30, 24 June 2011 (UTC)Reply
I'm not surprised. I'll add one other fact, for anyone else reading this — I don't expect you two to be convinced — which is that technically, CodeCat is mistaken. "Gothic" (got) is defined to include both (1) what CodeCat refers to as "Gothic" and (2) what CodeCat refers to as "Crimean Gothic". Arguably, that coding is ill-founded: CodeCat believes that "Gothic" and "Crimean Gothic" are no more related to each other than either is to other East Germanic languages. (In biological terms, CodeCat believes that got is not a clade. I'm not sure what terminology linguists use.) Therefore, while the specific code got-cri is not standard, the use of got for Crimean Gothic is standard. (Note: I'm not using the phrases "CodeCat refers" and "CodeCat believes" to imply that she is wrong; it's just a convenient shorthand for the views and terminology that she is advocating. She says that her views and terminology are "generally agreed" upon, and that may well be the case. I do not dispute it.)RuakhTALK 18:06, 25 June 2011 (UTC)Reply
I know that ISO defines codes in such ways, but ISO codes aren't really designed with the needs of Wiktionary in mind. Just because external bodies define codes in a certain way doesn't mean linguistic consensus is the same, and doesn't mean we have to follow. An example would be our own code {{cel-gau}} for Gaulish, because the ISO codes for Gaulish apply only to two varieties of Gaulish (Cisalpine and Transalpine), not to the language as a whole. —CodeCat 18:24, 25 June 2011 (UTC)Reply
Right. And I'm on board with such adjustments; for example, I advocated treating all forms of Hebrew under a single language header (==Hebrew==, coded he), rather than trying to follow the not-quite-coherent ISO-language-code breakdown into exactly two languages (he/heb "Hebrew" vs. hbo "Ancient Hebrew"), and I supported the B/C/S editors' decision to treat all of B/C/S/ under a single header (==Serbo-Croatian==, coded sh). In the case of Crimean Gothic, I'm on board with treating it as a separate language from "Gothic", and with using bare got to refer specifically to the latter. I just think we should use got-cri, rather than making up our own code. got-cri is genuinely meaningful to the outside world, whereas gme-crg would be our own affectation. —RuakhTALK 18:53, 25 June 2011 (UTC)Reply
To clarify: I wouldn't be so bothered about the actual name of the template, but the language code that we include in our HTML should be a genuinely meaningful language code. None of our mirrors should feel compelled to include documentation for the nonstandard language codes that we've decided to put in their HTML as some sort of ego-booster. The only reason I care about the template-name is that our infrastructure more or less requires that our template-names match our language-codes; the exception being the etyl: templates, which never get used as codes (unless you or Daniel has started using them that way, which I suppose wouldn't surprise me). —RuakhTALK 18:59, 25 June 2011 (UTC)Reply
  • Crimean Gothic "language" (those few dozens glosses recorded in haste by a non-native speaker) hardly deserves its own code. I suggest that we handle it as it is now - as a subproject of Gothic, formatted as ==Gothic== but in Latin script, with context labels sorting it into the appropriate category. Or simply dump it altogether into an appendix page. --Ivan Štambuk 16:14, 2 July 2011 (UTC)Reply
    • We also consider Phrygian to be a separate language, even though we know only a little more about that as we do about Crimean Gothic. And we have only one entry in Frankish, out of only one known inscription in the language. —CodeCat 16:20, 2 July 2011 (UTC)Reply
For Phrygian we have abundant original attestations, as opposed to CG which is attested second-hand by a non-native speaker. If there is attestation - it can be added, in the original script. CG is problematic because 1) it's mostly not an attestation (list of words supposedly used 2) quality of that list (the non-speaking author that compiled it as well as his informants of dubious competence). The extent of our knowledge of the language itself is not a relevant factor for inclusion. --Ivan Štambuk 16:46, 2 July 2011 (UTC)Reply
  • I suggest we handle it as a subproject of Chinese, since it's no more Gothic than it is Chinese, plus I'm sure China would be thrilled to take ownership. DAVilla 18:52, 4 July 2011 (UTC)Reply

Codes for families consisting of a single language

Families that contain only one language are usually called isolates, and within our new category structure we don't put them into families of their own. But it is different when that single language has ancestors; in theory the ancestor language belongs with its descendant in a small family of two (or more) languages. An example would be Albanian, which is a separate branch of Indo-European, but we also have Proto-Albanian on Wiktionary. Another example is Armenian, which is also a separate branch, but we also consider Old Armenian and Proto-Armenian to be separate languages on Wiktionary. For this reason, I think it would be useful to consider these small families, the "Albanian languages" and "Armenian languages", or maybe other names such as "Albanic" and "Armenic"? As codes, I would suggest {{qfa-sq}} and {{qfa-hy}}, and {{sq-pro}} and {{hy-pro}} for their proto-languages. —CodeCat 19:41, 25 June 2011 (UTC)Reply

We already have templates for the language families: {{etyl:sqj}} for Albanian, {{etyl:hyx}} for Armenian; both from standard ISO 639-5 codes. As for the proto-languages — I suppose {{proto:sqj-pro}} and {{proto:hyx-pro}}? Isn't that how we're naming proto-language templates these days? —RuakhTALK 20:02, 25 June 2011 (UTC)Reply
I didn't realise there already were codes for those families, but if they exist then we could use them. I still wonder what to do with any other similar cases, though. Cases that don't have codes yet. —CodeCat 20:04, 25 June 2011 (UTC)Reply
Though the families should preferably not have the same name as their (only) member, as this has a potential for confusion. -- Prince Kassad 15:58, 26 June 2011 (UTC)Reply

English terms with obsolete senses, etc.

Category:English rare terms was moved to Category:English terms with rare senses per WT:RFM#Category:English rare terms.

I think some categories should follow suit:

--Daniel 21:26, 25 June 2011 (UTC)Reply

For consistency, yes, they should all be one or the other. I don't really care which one though. —Internoob (DiscCont) 00:44, 26 June 2011 (UTC)Reply
I oppose until I see a convincing reasoning. The discussion at WT:RFM#Category:English rare terms, 19 June 2011, does not show anything like a consensual support for renaming, so I don't understand what made you create Category:English terms with rare senses. To the contrary, several people pointed out that "English nouns" also contains non-noun senses. I find your "was moved to" phrasing in your first sentence misleading; it was you who has done the move, and who has wrongly decided there was a consensus. I am far from convinced this increase of verbosity of category names is a good thing: we may soon arrive at "English terms with noun senses", a thing that I hope people can learn to infer from "English nouns". --Dan Polansky 11:29, 27 June 2011 (UTC) Later: On a more careful reading and thought, the RFM discussion showed something approaching at least lack of opposition for renaming, and had voices sympathetic to renaming. Furthermore, the motivation for renaming is that to call a term a "rare term" when it only has a rare sense seems wrong. I don't know any more. --Dan Polansky 11:45, 27 June 2011 (UTC)Reply
Please note that, in that discussion, everybody ultimately acknowledged "Category:English rare terms" as an inaccurate name, even the people who first compared it to "Category:English nouns" as argument contrary to renaming the category. I don't want the creation of "English terms with noun senses", and I think nobody else wants it either. When I said above I had moved a category according to a RFM discussion, I was being sincere. --Daniel 07:18, 3 July 2011 (UTC)Reply

Next votes

These votes are scheduled to start in two days:

Feel free to double-check and edit their proposals before they start. --Daniel 04:01, 27 June 2011 (UTC)Reply

English terms spelt with .

¶ Should any English words be suppressed from Category:English terms spelled with ., or is this better off without unique restrictions? --Pilcrow 06:19, 27 June 2011 (UTC)Reply

Note: His question stems from this small conversation. --Daniel 06:27, 27 June 2011 (UTC)Reply

So: no objections for allowing any English terms containing . in that category? --Pilcrow 05:03, 1 July 2011 (UTC)Reply

I think so. Lack of response is lack of objections. --Daniel 05:14, 1 July 2011 (UTC)Reply
¶ Well, I would freely go ahead and categorize such words, but I thought rushing into (this) action without discussion was bad faith. --Pilcrow 05:16, 1 July 2011 (UTC)Reply
You gave a place for people to discuss your idea, especially an idea that is arguably uncontroversial: populating a category with exactly the members mentioned in its title and description. That's what I call good faith. --Daniel 05:41, 1 July 2011 (UTC)Reply

shíyóu

Pinyin entries are allowed and shíyóu is attested (google books:shíyóu), but why was it deleted by Anatoli? Engirst 04:56, 28 June 2011 (UTC)Reply

Because there are no hits for it in running text in Mandarin. Mentions are NOT uses. ---> Tooironic 05:17, 28 June 2011 (UTC)Reply
Actually, it is used in running "running text in Mandarin". One of the hits is a "reader for hanyu pinyin", i.e. a book written completely in pinyin. I guess it's probably to promote pinyin, or as part of pinyin education in school. Not a very typical book anyway. The quote: "Yï jiu liù sl nián kaishï, woguó shíyóu wánquan zljï, yong 'yángyóu' de niándài [...] Xîn yóutián de buduàn faxiàn, zhèngmíng woguó yöuzhe fëngfù de shíyóu" (with OCR-errors). With characters (I think): 1964年開始,我國石油完全自給,用「洋油」的年代 [...] 新油田的不斷發現,證明我國有著豐富的石油... Vaste 02:23, 30 June 2011 (UTC)Reply
I've added a citation (which I personally cannot read!) to shíyóu - can someone translate it? Is it just a mention? Looking at Google Books it (shíyóu) does seem to appear in running text on three occasions, but all the rest seem to be in dictionaries and text books. Mglovesfun (talk) 10:12, 3 July 2011 (UTC)Reply
Also, it has OCR-errors (should be yǒu but is yöu, etc). Vaste 00:23, 4 July 2011 (UTC)Reply
Engirst. You don't care about anybody, why should we care about you? --Anatoli 06:38, 28 June 2011 (UTC)Reply

We should follow rules.

'“Attested” means verified through

  1. Clearly widespread use,
  2. Usage in a well-known work, or
  3. Usage in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year.'

Such as shíyóu. Engirst 13:46, 28 June 2011 (UTC)Reply

I agree with Anatoli. Why do you expect us to listen to you when you won't cooperate with us? If you are going to ignore everyone, I don't think anyone will want to work with you, no matter if you are following the rules or not. You can follow the rules and still be a troll. —CodeCat 14:07, 28 June 2011 (UTC)Reply
Then I'll third what Anatoli says. You've pretty much said on your talk page that you're not prepared to follow CFI. I don't see how we can negotiate from that position. If you reject our most fundamental document, there is no way forward from there. Mglovesfun (talk) 19:41, 29 June 2011 (UTC)Reply

Wymysorys or Vilamovian?

For info on the Vilamovian language, look at this article. I'm thinking that the name Wymysorys is outdated, so either Vilamovian or Wilamowicean is okay. The language code is {{wym}}, and I just got started on importing articles from the Polish Wiktionary (a few articles were imported from the Korean Wiktionary by me). --Lo Ximiendo 06:06, 28 June 2011 (UTC)Reply

P.s. I mean the language name that is entered on an entry between two sets of two equal signs, along with other things. --Lo Ximiendo 11:49, 28 June 2011 (UTC)Reply
Now how do I move the Wymysorys language categories? --Lo Ximiendo 05:03, 29 June 2011 (UTC)Reply
Three steps:
  1. Edit Template:wym to display "Vilamovian". You already did it today.
  2. Check Special:WhatLinksHere/Template:wym, and rename all categories that use "Wymysorys". This includes renaming Category:Wymysorys nouns to Category:Vilamovian nouns.
  3. Use Special:Search to search for all entries that include "Wymysorys", and edit their L2 headers when necessary. This includes editing the entry edikys, for example. Special:WhatLinksHere/Template:wym can list the right entries too, but "What Links Here" is better for finding categories to be updated.
--Daniel 05:19, 29 June 2011 (UTC)Reply
By the way, you don't literally move or rename categories, because there is not a button to do that. Instead, the old ones have to be deleted, and the new ones have to be created. --Daniel 05:23, 29 June 2011 (UTC)Reply

Entries in non-standard scripts

The problems we have with some users highlight a problem in our current policy. We don't distinguish between attestations in a language's native or most-used script, and attestations in nonstandard scripts. Pinyin entries are not standard, but it is possible that a few pinyin entries meet CFI nonetheless even if the vast majority doesn't. And in any case, they are nothing more than alternative representations of other entries. So I think it would be a good idea to amend current policy somewhat so that it takes into account entries in scripts that are simply alternative representations of the same word in another script, and not a standard way of writing that language. Perhaps we could limit these entries only to redirects, and require that the main entry be attestable as well? —CodeCat 14:20, 28 June 2011 (UTC)Reply

Pinyin actually is standard, and I think we should allow properly-diacriticked-pinyin (henceforth: PDP) entries whenever the PDP (or other tone-marked pinyin, e.g. the style with appended numbers) is attested. For simplicity, such entries might as well simply point users at Hanzi entries, since currently Hanzi is much more widely used. (Hopefully that will change someday, but Wiktionary's neutral-point-of-view policy means that we cannot be an agent of such change.)
When a language has two scripts with a one-to-one mapping between them, I think attestations in one script should automatically count toward the other. In the case of Mandarin, we should probably count Traditional attestations toward both Simplified and Traditional, though the reverse is probably not workable. We probably cannot count Hanzi attestations toward PDP or vice versa, because some distinctions are made in Hanzi that are not made in PDP (Hanzi distinguishes many homophones, whereas PDP does not), and some distinctions are made in PDP that are not made in Hanzi (some characters, e.g. Template:Hans, have multiple non-interchangeable pronunciations; also, PDP indicates word breaks, whereas Hanzi does not).
In the case of actual nonstandard scripts, such as toneless Pinyin, transliterated Greek, etc., I think we have to take them on a case-by-case basis. There's not a bright line between a standard script and a nonstandard one.
In all cases, the editors who actually work on affected-language entries need to be the ones really making decisions. The rest of us can (and should!) offer opinions and advice, but in general, we should defer to them.
RuakhTALK 15:59, 28 June 2011 (UTC)Reply
I'm also thinking of languages like Gothic, which were written in Gothic script but are almost always written in Latin script in modern reprints. —CodeCat 16:43, 28 June 2011 (UTC)Reply
As previously discussed here, you mean? I still agree with what I said then: "if Gothic works are primarily or exclusively published in the Latin script, then we should definitely have entries for the Latin spellings, either as main entries or as alternatives." But I don't think it's a very similar case to Pinyin; on the one hand, speakers of Gothic never used the Latin script, whereas speakers of Mandarin do, but on the other hand, Gothic is primarily or exclusively published in the Latin script, whereas Mandarin is not. (Personally I support Latin entries in both cases, but the reasons are different.) —RuakhTALK 17:10, 28 June 2011 (UTC)Reply
It's non-standard as in it's not used to write Chinese, period. No sane person would use pinyin to write a text in Chinese (Mandarin, really), unless it was to prove a point, or for foreigners, language learning purposes or in a dictionary. Not that it couldn't be done, it just isn't. Of course, the same argument could be made about "ㄅㄆㄇㄈ" (bopomofo, 注音符號). It could be used as well, but it just isn't used. Thus, it's very unlikely that any pinyin word would be attestable.
"We probably cannot count Hanzi attestations toward PDP or vice versa" Why not? E.g. "干" in 你干啥? (ni3 gan4 sha2?) Here 干 could be an attestation for gan4 (but not for gan1). "衣服干了。" would work for gan1 (but not for gan4). In the same way 幹 (你干啥) and 乾 (衣服干了) could be attested via 干 respectively. Or am I missing something?
The question is if we want to include them even though they are non-attestable, as some kind of metadata (kinda like a category really, "Chinese words pronounced shi4shi4"). Vaste 10:51, 29 June 2011 (UTC)Reply
I'm not referring so much to individual pinyin syllables, as to polysyllabic words. We know (well, I don't, but you guys do) which syllable is meant by a given character in a given word, but it doesn't seem verifiable to me, unless we accept secondary sources such as dictionaries. (Actually, come to think of it, since there are books and periodicals published in Taiwan that have bopomofo ruby for every single character, a dedicated attester could probably demonstrate the bopomofo for any many polysyllabic terms, and that would verify the pinyin as well.) —RuakhTALK 11:17, 29 June 2011 (UTC)Reply
How is IPA and pronunciation handled for English? Isn't it the same issue? After all pinyin is simply used to describe (and prescribe(?)) pronunciation (of Mandarin). Pinyin entries have problems with attestability simply because they aren't used that way (i.e. as a script for Chinese). Vaste 01:58, 30 June 2011 (UTC)Reply
We list pronunciations for English words, but we don't create entries for the IPA transcriptions. (I actually think that would be awesome if it were feasible, but it's not, for lots of reasons besides just attestability.) Much of the content in our entries is justified, or could be justified, by reference to other dictionaries and such, but we don't create entire entries that rely on secondary sources for justification. But anyway, I'm out of my depth here, seeing as I don't speak any Chinese at all; if you say that pinyin entries can be adequately verified without depending on secondary sources, then I defer to you. —RuakhTALK 02:09, 30 June 2011 (UTC)Reply
Okay, they exist. Just to be clear though, are you saying pinyin used that way is not extremely rare? Vaste 03:35, 1 July 2011 (UTC)Reply
The point is it is using. A Pinyin Bible for your reference. Engirst 03:47, 1 July 2011 (UTC)Reply
And contained in those links you gave us was a grand total of.... one self-published book entirely in pinyin! [3] What compelling evidence that pinyin is actually used in durably archived sources with running Mandarin script! ---> Tooironic 05:59, 1 July 2011 (UTC)Reply
More Pinyin books/atlas/map/fiction for your references:
<<Huanlede Hai>> (Zhongguo Wenzi Gaige Chubanshe)
<<Zhonghua Renmin Gongheguo Ditu Hanyu Pinyinban>> (Zhongguo Ditu Chubanshe)
<<Zhonghua Renmin Gongheguo Dituce Hanyu Pinyinban>> (Zhongguo Ditu Chubanshe)
A personal question Engirst, if you don't mind. You're Chinese, right? Have you yourself ever read a book all in pinyin? Why do you care so much for pinyin?
Actually, I think Chinese would benefit from start using pinyin as its main script. It'd be great! So, I don't mind having pinyin entries. (They're especially useful due to the large number of homonyms in Chinese.) However, I can't honestly say that I think they are attestable. I'd say you're fighting a losing battle. It'd be wiser to change strategy, and argue that pinyin entries are simply useful, even though they are not (directly) attestable. Just my 2 fēn. Vaste 14:31, 1 July 2011 (UTC)Reply
"if you say that pinyin entries can be adequately verified without depending on secondary sources": I'm saying they can't be, in general. I think they could be either 1. included for other purposes (indexing, help users find the correct entries etc) or 2. removed. Vaste 02:31, 30 June 2011 (UTC)Reply

Separate entries for reflexive verbs?

The most common practice on Wiktionary has been to give separate entries for reflexive verbs, treating them as phrasal verbs. But I noticed in many Catalan dictionaries that their practice seems to be to simply list reflexivity as a kind of context. They list (deprecated template usage) adormir and (deprecated template usage) adormir-se on the same page, but on one dictionary the second one has a separate header that says adormir-se on that page, and another one lists them as pron to indicate they take a reflexive pronoun. To find a reflexive verb, you'd look up the non-reflexive verb and look for the reflexive sense. So I'm wondering what would be the best practice on Wiktionary. In many cases, the sense of the reflexive verb isn't idiomatic enough to really warrant a separate entry, but still good to list as a separate sense of the base verb. "adormir" is a good example; it means 'to cause to fall asleep', and its reflexive meaning 'to fall asleep' is more or less predictable from this. Not all reflexive verbs are this way, but in general, languages that use reflexivity to create a mediopassive voice as Catalan and many other Romance languages do tend to have more regular meanings for those verbs. —CodeCat 23:02, 29 June 2011 (UTC)Reply

I like the system of having a separate page for reflexive verbs. At least you can find the conjugation of a reflexive verb in Catalan, Spanish, or Italian. But if I wanted to learn how to conjugate se marier (or any other French reflexive verb), I just get redirected to marier. It says what the reflexive meaning is, but it doesn't conjugate it. Ultimateria 00:02, 30 June 2011 (UTC)Reply
Reflexives aren't conjugated any different from other verbs, though. The reflexive pronouns behave like normal object pronouns. There is no real grammatical difference between m'adormo meaning 'I put myself to sleep' and meaning 'I fall asleep'. They are both a combination of a pronoun and a verb. I don't really think we should treat the pronoun as part of the conjugation. —CodeCat 00:06, 30 June 2011 (UTC)Reply
That's true, but I think the conjugation tables are still helpful. Of course reflexive verbs are logical to you since you've learned how to conjugate them already, but I've seen a lot of people (including me) struggle with reflexive verbs in Spanish class. We knew next to nothing about object pronouns before learning reflexive verbs, so they were a matter memorization, not logic. I'm sure tons of Romance language learners are taught the same way. Ultimateria 00:25, 30 June 2011 (UTC)Reply
We handle this differently for different languages. So far I haven't seen any approach that I like very much! Personally I find it confusing when a sense at the non-reflexive entry is tagged Template:reflexive; it always takes me a few seconds to realize that it means (deprecated template usage) Lua error in Module:parameters at line 290: Parameter 1 should be a valid language or etymology language code; the value "when reflexive" is not valid. See WT:LOL and WT:LOL/E., or rather (deprecated template usage) Lua error in Module:parameters at line 290: Parameter 1 should be a valid language or etymology language code; the value "when used with a reflexive pronoun, the two taken together sometimes mean this:" is not valid. See WT:LOL and WT:LOL/E.. One possible approach . . . for Hebrew we've recently started using a sort of pseudo-context template, {{he-wv}}, for senses that have a different vocalization from the headword. (See [[נסגר]] for an example.) And many dictionaries do something similar for idioms; for example, one sense at the OED's entry for (deprecated template usage) washen is actually defining the term (deprecated template usage) washen leather, indicated in bold at the start of the sense-line. I think reflexive verbs might benefit from this approach:
  1. (adormir-se) To fall asleep.
(As for conjugation — there's no reason that the non-reflexive entry can't give both conjugation tables. I don't know if that's really desirable, but it's an option.)
RuakhTALK 00:31, 30 June 2011 (UTC)Reply

In French, providing both conjugations is very useful. Some verbs are used only as reflexive verbs, some verbs may be used as reflexive verbes or not (sometimes with predictable senses, sometimes with different senses). The fr.wiktionary practice is to allow all useful entries. There is no reason to forbid useful entries. Lmaltier 18:51, 30 June 2011 (UTC)Reply

WINW

I created WT:WINW. We need it badly. It's short, but it says everything it needs. (Feel free to expand it, if the size bothers you.)

I'm going to create a vote to make it a policy. --Daniel 23:16, 29 June 2011 (UTC)Reply

I think we already have a page like that, but I forgot what it's called. —CodeCat 23:18, 29 June 2011 (UTC)Reply
WT:NOT. --Yair rand 23:26, 29 June 2011 (UTC)Reply
Ruakh added[4] a section "Wiktionary is not Wikipedia" to WT:NOT, while keeping the shortcut WT:WINW to that section. I like it. --Daniel 02:25, 30 June 2011 (UTC)Reply
The shortcut is silly. Should we also have WT:WINE, WT:WINP, WT:WINCB? But whatever. DAVilla 18:39, 4 July 2011 (UTC)Reply
I added some points that explain the most important diffeences between Wikipedia and Wiktionary. —CodeCat 19:23, 4 July 2011 (UTC)Reply
I like the shortcut. I don't think it is silly. It is an initialism, just like CFI. --Daniel 19:30, 5 July 2011 (UTC)Reply
Not really necessary but doesn't hurt. I think we should keep it. (Yes, evidently my feelings about the restricted WT: namespace are laxer than those about the dictionary itself.) Equinox 19:33, 5 July 2011 (UTC)Reply

Mathematical symbol

I'd like to add the header "Mathematical symbol" to these entries: , , , etc. --Daniel 09:28, 30 June 2011 (UTC)Reply

I agree. It's not a part of speech. Would we want to change ellipse to Mathematical noun? Equinox 11:29, 1 July 2011 (UTC)Reply
I agree that "Symbol" is a good part-of-speech header. I'm just saying it's not perfect for all instances. In my point of view, as a reader of Wiktionary, that's sometimes more-or-less like having a POS header "Word"; it's too generic. That's one reason why I introduced the headers "Punctuation mark" and "Diacritical mark", and I think they look really good where they are.
In my opinion, we should have "Mathematical symbol" and not "Mathematical noun", because all nouns share many characteristics of syntax, morphology, grammar, etc. regardless of their meanings: the POS header "Noun" is perfect for the word ellipse.
On the other hand, mathematical symbols have characteristics that all other symbols don't have. In the equation , you can't replace the + by a musical symbol such as 𝅘𝅥𝅯 .
If, hypothetically, "Symbol" was a perfect POS header for all symbols, and context templates (instead of other POS headers) such as {{context|maths}} should always be used to give more details about the definitions, then perhaps we would have to deprecate the header "Letter" and start using {{context|typography}} for letters instead, as well. --Daniel 15:17, 1 July 2011 (UTC)Reply
You also can't replace + in with < or . I fail to see the relevance of replacability.​—msh210 (talk) 16:36, 1 July 2011 (UTC)Reply
I agree with Dan.​—msh210 (talk) 16:36, 1 July 2011 (UTC)Reply
Actually, you can replace = by <, and you can replace other mathematical symbols by |x|, if you want to represent different values. Among the characteristics that mathematical symbols share, there is the fact that they change the value of numbers: −3 and 3 are different things. They are also expected to be used with other mathematical symbols and/or with numbers: "<dog" does not make sense if "<" is a less-than sign and "dog" is the word meaning domestic canine. --Daniel 16:49, 1 July 2011 (UTC)Reply
They don't all change the value of numbers; 1 ∈ (1, dog) doesn't change numbers, and is as valid a statement as dog ∈ (1, dog) or dog ∈ (cat, dog).--Prosfilaes 17:06, 1 July 2011 (UTC)Reply
OK, thanks. I should have mentioned that somehow before. Let's see: While "<" as an algebraic symbol determines a relationship of numeric values, ∈ as a symbol of set theory determines a relationship of sets and their elements. Different fields of mathematic, somewhat different rules. --Daniel 18:00, 1 July 2011 (UTC)Reply
So would you suggest we have POS headers "Mathematical binary operator symbol" (, ), "Mathematical binary relation symbol" (, <, ), "Mathematical n-ary operator symbol" (, ), "Mathematical delimiter" (, ), "Mathematical constant symbol" (, e), "Mathematical variable symbol" (x, f), and perhaps more?​—msh210 (talk) 19:27, 1 July 2011 (UTC)Reply
No, I wouldn't. POS headers typically should be as general as possible, without being too generic like "Word". When I proposed the creation of "Diacritical mark" (examples of implementation: acute accent, trema, dakuten and macron), I did not propose these other possible POS headers: "Pitch accent", "Long vowel mark" and "Nasalization mark". Neither I want "Countable noun" or "Transitive verb". --Daniel 19:32, 1 July 2011 (UTC)Reply

July 2011

Redirecting single-character digraphs

I suggest redirecting Dz to Dz, and doing the same from all single-character digraphs to their attestable two-character versions. --Daniel 15:31, 1 July 2011 (UTC)Reply

Support.RuakhTALK 17:12, 1 July 2011 (UTC)Reply
Why? A reason to redirect would be nice, and none has yet been supplied (here). Moreover, (at least in absence of such reason) I oppose: A reason to have them separate is to show etymology sections stressing different aspects of their development. (The digraph's etymology section can stress how long the digraph's been in use, and the other version's can stress how long it's been in use. Etc.) Another reason is that the digraph (or two-character version) might be a letter in some language that the other version is not a letter in, so we'd need a language section for that language on the one page and not the other. These and similar reasons are why we keep alternative spellings on separate pages.​—msh210 (talk) 19:06, 1 July 2011 (UTC)Reply
I don't think Dz to Dz are alternative spellings of each other, or that the two-character version can't be considered a letter in some language. They are the same set of characters: "D" followed by "z".
Moreover, the Unicode FAQ says[5] some interesting things:
  • A digraph, for example “xy”, looks just like two ordinary letters in a row (in this example “x” and “y”), and there is already a way to represent it in Unicode: <U+0078, U+0079>.
  • [...] the UTC has taken the position that no new digraphs should be encoded, and that their special support should be handled by having implementations recognize the character sequence and treat it like a digraph.
--Daniel 19:19, 1 July 2011 (UTC)Reply
I didn't say the two-character version can't be considered a letter. In any event, exposed to the light of your quotes from Unicode, my objections wither and die. I support redirection.​—msh210 (talk) 19:30, 1 July 2011 (UTC)Reply
I take back my support. (Am wearing flip-flops, too.) See my comment on the vote's talkpage for more.​—msh210 (talk) 17:17, 6 July 2011 (UTC)Reply
If this is supposed to be handled in the implementation like Unicode says, then shouldn't the Mediawiki software itself perform the redirect? —CodeCat 19:44, 1 July 2011 (UTC)Reply
MediaWiki doesn't do that, and I don't know why. --Daniel 19:49, 1 July 2011 (UTC)Reply
I don't understand your question. Unicode says that implementations should recognize that <ng> (two characters) in a Tagalog context is a single digraph; it doesn't say that implementations should canonicalize <ʣ> (one character) to <dz> (two). —RuakhTALK 13:49, 2 July 2011 (UTC)Reply
Oh, I thought that it was that way. Nevertheless, could something like that be done for Wiktionary? —CodeCat 13:51, 2 July 2011 (UTC)Reply
Redirecting individual entries manually can be done. I think that automatic redirects can be done, too, with JavaScript. --Daniel 10:55, 4 July 2011 (UTC)Reply

I created Wiktionary:Votes/2011-07/Redirecting single-character digraphs. --Daniel 10:55, 4 July 2011 (UTC)Reply

I don't think we're ready for that. I'd like to know more generally if this should always be done, and what the language implications are for using Dz -> D + z and similar rules in any spellings in which they appear. For that it may be useful to have a list of such digraphs. DAVilla 16:57, 4 July 2011 (UTC)Reply
See Wiktionary talk:Votes/2011-07/Redirecting single-character digraphs for a start . . . —RuakhTALK 00:17, 5 July 2011 (UTC)Reply
Several of those I would cross out, like ₨. Affected languages seem to be Arabic, Armenian, and Latin. DAVilla 16:29, 6 July 2011 (UTC)Reply
And Hebrew, and Lao. --Daniel 16:58, 6 July 2011 (UTC)Reply

2Ps has two letters

Should Category:English two-letter words contain 3Ps? --Daniel 16:56, 1 July 2011 (UTC)Reply

Apparently not, because the category description (which you wrote!) is "English individual words comprised of exactly two letters". "3Ps" is not composed of exactly two letters: it's composed of two letters and a digit. I don't think the description needs to be changed. (Technically I suppose a word isn't really composed of letters — a word's spelling is composed of letters — but the meaning of the description seems clear. Oh, but if you do change the description, you might want to change "comprised of" to either "composed of" or "comprising"; this use of "comprised of" is common, but is frequently considered incorrect.) —RuakhTALK 17:10, 1 July 2011 (UTC)Reply
OK, I'm going to change the description of that category to allow "3Ps", unless someone objects in the near future.
If a word's spelling, rather than the word, is comprised of letters, perhaps the name of the category should be changed as well. Since we have Category:Japanese terms written with four Han script characters, we could have Category:English terms written with two letters. --Daniel 18:36, 1 July 2011 (UTC)Reply
I'm objecting in the near future for some value of near. I don't see the purpose in having a category for two-letter words including also words that include digits. OTOH, two-letter words sans digits form a category useful to people doing crosswords, cryptograms, and other word puzzles. Re the category name, "English two-letter words" is fine: that's what everyone calls them (at least here in Leftpondia). Moreover, "English terms written with two letters" might be read as "...with two different letters" (e.g. (deprecated template usage) sass).​—msh210 (talk) 18:54, 1 July 2011 (UTC)Reply
3Ps is not a two-letter word. Category:English terms written with numbers (and also letters by presumption) might be more interesting. DAVilla 18:34, 4 July 2011 (UTC)Reply

Letters and typography

My own idea expressed in a recent discussion made me curious.

Should A, a, B, b, etc. be members of Category:en:Typography? --Daniel 18:20, 1 July 2011 (UTC)Reply

No, use a subcategory, Category:en:Letter names or the like. DAVilla 18:30, 4 July 2011 (UTC)Reply

English names of stars, etc.

Today I feel like creating this streak of new and relevant discussions. Feel free to ask me to take it easy in the future if you want less discussions, though I'm finished for today anyway.

Now to the proposal. Renaming categories, this way:

The proposed names are in line with the idea of deprecating language codes piecemeal from category names, which some people seem to approve, and also fits the existence of Category:English surnames, which is codeless, and contains proper nouns, as well. --Daniel 18:46, 1 July 2011 (UTC)Reply

Would these categories go under Category:English names? If so, then maybe an intermediate category like Category:English topographical names and Category:English astronomical names would be good too, so that they don't all go into the main category. —CodeCat 19:21, 1 July 2011 (UTC)Reply
Yes, that's a good idea, too. --Daniel 19:38, 1 July 2011 (UTC)Reply

I created Wiktionary:Votes/2011-07/Categories of names. --Daniel 05:28, 2 July 2011 (UTC)Reply

I oppose, while I am willing to yield to a significant majority. The proposed renaming is a first step in making the names of hyponymic topical categories needlessly long; "mammals", and "animals" are likely to follow if this naming scheme is to be applied throughout Wiktionary. Furthermore, there was a recent poll from which it was not obvious that a large majority of editors prefers to get rid of language codes: Wiktionary:Beer parlour archive/2011/May#Straw poll: Topical category languages. --Dan Polansky 10:26, 2 July 2011 (UTC)Reply

That poll ended in a very approximate draw between Category:de:Mountains and Category:de:Physics and Category:German terms relating to mountains and Category:German terms relating to physics, one of these options, the slightly more voted one, being long and written in plain English rather than making use of codes.
There was some disagreement, however, between the voters of long names, about what would be the exact longer names: people proposed "regarding mountains", "involving mountains", "relating to mountains" and even "Mountain terminology".
Concerning this relatively small but cumbersome disagreement of wording: "Category:English names of mammals" would not be an accurate option of name of a hyponymic category; "Category:English names of species of mammals" or "Category:English hyponyms of mammal" would be better.
On the other hand, "Category:English names of stars" is just accurate enough for its contents. This category should contain Sun and Aldebaran, but not red giant or supernova. In fact, the title is so specific I know it can't contain starlight as well. --Daniel 13:05, 2 July 2011 (UTC)Reply
  • These new category names look very cumbersome and needlessly worsen the usability of categories. As with the category names for etymologies, this should be handled at the presentation level with two lines of Javascript code doing the necessary substitution/prefixation. When Wiktionary finally switches to language-specific presentation (tabbed view or whatever), these "XXX names of" will become redundant. --Ivan Štambuk 16:33, 2 July 2011 (UTC)Reply
    No, not actually, they would not become redundant. The vote I created (Wiktionary:Votes/2011-07/Categories of names) says why not.
    Actually, the part "names of" would not be redundant, because "Mountains" and "Names of mountains" are different things. The part "German" would be redundant if only German categories are shown, just like how "de:" would be redundant in that case, but perhaps in a more readable manner. --Daniel 16:52, 2 July 2011 (UTC)Reply
    Users who see category names such as "Mountains" or "Continents" would expect them to contain names of mountains and continents. In fact, AFAICS, the abovelisted categories don't contain anything other than proper nouns. The new category names are more more precise, but non-intuitive and cumbersome IMHO. --Ivan Štambuk 17:23, 2 July 2011 (UTC)Reply
    Allow me to disagree with you again: No, I don't think so. Some categories could be populated only with proper nouns but contain terms of other parts of speech, too: I only remember Category:Planets and Category:Gods. (and Category:Stars contains star, but that can be ignored) --Daniel 17:43, 2 July 2011 (UTC)Reply
    I don't really have an opinion about the naming, but I do like the idea of having separate categories for names of certain things as opposed to words about those things. —CodeCat 17:46, 2 July 2011 (UTC)Reply
  • For information, fr.wiktionary uses names of the kind Countries in English and Lexicon in English of (a domain) (for words used in the domain). The only drawback is that not only proper nouns, but also some common nouns, may be considered as relevant to the first case. This proposal is about this issue. Any other better proposal about it? Lmaltier 17:33, 2 July 2011 (UTC)Reply
    I learned in this page that "Mountains in Tonga" would be very ambiguous, because there is a language and a place named Tonga. That should be considered when analyzing the idea of coyping the system of the French Wiktionary. --Daniel 17:47, 2 July 2011 (UTC)Reply
    This is probably the only language for which this ambiguity arises. DAVilla 18:28, 4 July 2011 (UTC)Reply

For the record, these are the current entries of Category:Planets: carbon planet, double planet, Earth, exoplanet, exosolar planet, extrasolar planet, gas giant, giant planet, Herschel, hot Jupiter, ice giant, inner planet, Jupiter, Le Verrier, major planet, Mars, Mercury, mesoplanet, minor planet, Neptune, outer planet, planet, Planet X, protoplanet, Saturn, silicate planet, sub-brown dwarf, superplanet, Teegeeack, terrestrial planet, Uranus, Venus and Vulcan.

And these are the ones of Category:Gods: Allah, Amaterasu, Discordia, Flying Spaghetti Monster, FSM, Galaxia, goddess, Haumea, Huitzilopochtli, Invisible Pink Unicorn, IPU, Izanagi, Izanami, Jah, Jehovah, Keb, Makemake, momentary god, momentary gods, Nike, Pele, Tezcatlipoca, Tyr, Wenis and Yahweh. --Daniel 05:55, 3 July 2011 (UTC)Reply

For the sake of brevity in titles, what we might need is the bi- or trifurcation of the Category: namespace. Topic:Mountains in English could allow any terms related to mountains, Names:Mountains in English just the proper nouns, leaving Category:English idioms and the like. DAVilla 18:28, 4 July 2011 (UTC)Reply

In this case, Everest would be a member of both "Topic:Mountains in English" and "Name:Mountains in English"? In my opinion, "Category:English names of mountains" and possibly "Category:English terms relating to mountains" are better, even simpler. --Daniel 18:04, 7 July 2011 (UTC)Reply

100 is a number

I propose using the POS header "Number" (and not "Numeral", "Cardinal number", "Cardinal numeral" or "Symbol") for all definitions that meet these requirements:

  • Is Translingual.
  • Is defined as a number.
  • Is written with the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and/or 9.

Examples of affected entries: 1, 8, 33, 100, 101 and 420. Other standardizations of numbers can come some other day. Today I want to standardize these. --Daniel 01:15, 2 July 2011 (UTC)Reply

If we standardise this to be called "Number", then the category for words would automatically become "Numeral" unless we agree to put them both together. —CodeCat 10:33, 2 July 2011 (UTC)Reply
I oppose this piecemal approach to treatment of number words. We had discussions and votes about "number" vs "numeral" that were done in a wrong order. Instead of asking the questions in the right order, you just put forward one question without explaining the implications for the overall treatment of number words. Like, if your proposal is accepted, an implication of the proposal is that "four" will get the part of speech "Number" rather than "Numeral", "Cardinal number", "Cardinal numeral", or "Adjective". If "four" should have the part of speech "Number", that should be decided explicitly rather than by first drawing attention only to entries for sequences of decimal digits.

On a process note, please wait at least a week before you start acting on this Beer parlour discussion. --Dan Polansky 10:44, 2 July 2011 (UTC)Reply

About your last sentence: Yes, I typically wait a week, sometimes much more, to start acting on a BP discussion; as an exception, I don't see creating votes as acting soon per se, so sometimes I create votes quicker than that.
About the proposal: I did not propose any change to the entries one, two, three, four, etc. I just want to edit 1, 2, 3, 4, etc. I think that is clear. Other languages may have different approaches.
If you want some reasonings, here they are. These are my points of view, of course, so feel free to prove me wrong:
  • "Numeral" very often implies "single numeric symbol" or "single digit". "Number" is better than "Numeral" for being more generic, since we have numbers with two or more digits as well.
  • The POS header "Cardinal number" does not fit all the number senses of 1. The definition "A digit in decimal and every other base numbering system, including binary, octal, and hexadecimal." is not of a cardinal number, and I would like to keep all number senses of that entry together within one comprehensive enough POS header.
--Daniel 12:23, 2 July 2011 (UTC)Reply
I think there's an argument that if we don't use Transitive verb and Intransitive verb as headers, then simply 'Number' is sufficient for cardinal and ordinal numbers too. Mglovesfun (talk) 12:38, 2 July 2011 (UTC)Reply
But cardinal and ordinal numbers often differ in their inflection and part of speech. Ordinal numbers are usually adjective-like while cardinal numbers are more often uninflected and behave as a class of their own. They also behave differently in a sentence, because cardinals can generally not be used predicatively while ordinals usually can. Transitive and intransitive verbs are much more alike, they only differ in whether they can have an object. —CodeCat 12:51, 2 July 2011 (UTC)Reply
"Numbers" are by definition a made-up lexical category, comprised of all kinds of words relating to quantity. We should simply follow the respective language's grammar tradition into what classifies as a number/numeral, and what's just simply an adjective/noun specifying order, distribution, multiplication/partialness etc. because it is impossible to devise a policy that would apply to all. AFAICS, the main issue here is terminological, which of the two terms (number/numeral) should be used and where (separate header or a label). --Ivan Štambuk 16:27, 2 July 2011 (UTC)Reply
I agree with Ivan: "We should simply follow the respective language's grammar tradition into what classifies as a number/numeral, and what's just simply an adjective/noun specifying order, distribution, multiplication/partialness etc. because it is impossible to devise a policy that would apply to all."
I think we don't need to discuss about all languages together anymore, if we can agree on analyzing "Number/Numeral/Adjective/etc." headers on a case-by-case basis. I opened a discussion about certain Translingual terms, and that's them I want to edit. --Daniel 16:40, 2 July 2011 (UTC)Reply
I'd put number/numeral alongside abbreviation, acronym, initialism, phrase (etc.) as headers to be avoided wherever possible. For example NATO should have a proper noun header, not an acronym header. Similarly cinq in French is a masculine invariable noun. So noun is the correct header, but Category:French cardinal numbers is a correct category too, just like woof can be a noun a verb and an onomatpoeia. Mglovesfun (talk) 12:09, 4 July 2011 (UTC)Reply
Abbreviation, Acronym and Initialism should be avoided for English and other languages, because these headers don't accurately display the grammatical, morphological and syntactical characteristics of something that would rather fit a Noun, Verb, Proper noun or other header.
I already explained above why "Number" is the best option for certain Translingual numbers. As of now, the only argument opposing my conclusion would be the implication that words of other languages, such as the English four, would have that header as well. However, this argument subsequently has been disproved: 4 can have a Number header while four can have other headers. --Daniel 00:59, 5 July 2011 (UTC)Reply

A week passed. The proposal passed, too. --Daniel 14:52, 9 July 2011 (UTC)Reply

Call for image filter referendum

The Wikimedia Foundation, at the direction of the Board of Trustees, will be holding a vote to determine whether members of the community support the creation and usage of an opt-in personal image filter, which would allow readers to voluntarily screen particular types of images strictly for their own account.

Further details and educational materials will be available shortly. The referendum is scheduled for 12-27 August, 2011, and will be conducted on servers hosted by a neutral third party. Referendum details, officials, voting requirements, and supporting materials will be posted at m:Image filter referendum shortly.

For the coordinating committee,
Philippe (WMF)
Cbrown1023
Risker
Mardetanha
PeterSymonds
Robert Harris

— This unsigned comment was added by EdwardsBot (talkcontribs) at 06:00, 3 July 2011.

CT: → Citations:

I propose implementing "CT:" as an alias for the namespace "Citations:"

As a result, CT:Egyptic would automatically be a shortcut to Citations:Egyptic, CT:hydrogen to Citations:hydrogen, and so on. --Daniel 06:31, 3 July 2011 (UTC)Reply

Yes, why not. Mglovesfun (talk) 10:06, 3 July 2011 (UTC)Reply
I use citations a lot. I might be in favor of this. Still, it doesn't seem entirely essential, and there's some ambiguity with categories. DAVilla 16:50, 4 July 2011 (UTC)Reply
Would you rather prefer a shortcut "CI:"? (CI:waterCitations:water). --Daniel 23:29, 4 July 2011 (UTC)Reply
No, and if there isn't a good abbreviation to use then I'd just as well not have one. Speaking of which, why do we even have a shortcut for Wikisaurus? I don't remember a discussion on it. The vote that created Wikisaurus, by the way, had overwhelmingly suggested Thesaurus: as the name. DAVilla 16:17, 5 July 2011 (UTC)Reply
Re WS shortcut: WT:Votes/2009-12/WT: redirect to Wiktionary:, WS: redirect to Wikisaurus:. --Yair rand 16:43, 5 July 2011 (UTC)Reply
Cool. 100% approval, no less. --Daniel 16:52, 5 July 2011 (UTC)Reply
Huh! Missed that one. DAVilla 04:35, 13 July 2011 (UTC)Reply

I created Wiktionary:Votes/2011-07/CT: → Citations:. --Daniel 02:47, 4 July 2011 (UTC)Reply

This doesn't seem useful to me. The WT alias is useful, because we have short names for various pages in the project namespace; [[Wiktionary:CFI]] would not be so useful a shortcut. But with the citations namespace, we're not going to shorten the pagename, so what's the point of shortening the namespace-name? Personally, I'd rather we didn't bug developers to do things that we don't actually benefit from. —RuakhTALK 17:08, 4 July 2011 (UTC)Reply
We have Wikisaurus:woman and WS:woman as a precedent. I like and use the shortcuts, so I guess "the point" is catering at least to my personal taste, unless more people would like them and use them too. Naturally, with the discussion and the vote I created, I expect to know more opinions. --Daniel 23:23, 4 July 2011 (UTC)Reply
For your own use, you could just use $(function(){$("#searchInput").keyup(function(){var q=/^ct:/i,w=this,e=w.value;if(q.exec(e)){w.value=e.replace(q,'Citations:')}})}); in your personal JS. --Yair rand 00:55, 5 July 2011 (UTC)Reply

Pinyin entries (do we want them and how should they look?)

In light of the recent edits by the strong-headed Engirst, I think it'd be useful to clarify our policy on pinyin entries.

Do we want them? First of all, I don't think it's possible to attest (more than a few) pinyin entries. Assuming this is true, and going only by WT:CFI, they should all be deleted. So I don't think attestability is a useful criterion for inclusion for pinyin entries.

Instead, I think they can serve:

  1. as an index (the page "lǐcài" works like "Mandarin words pronounced lǐcài")
  2. as a help for learners/users

So, assuming we want to keep them, what should they look like? I think they should include links to the character entries (the "real" entries), and only that. I.e. no long, multiline definitions, citations/examples, etymology, pronunciation etc. Only links to entries, with a short translation. Much like the policy for romaji for Japanese: WT:AJ#Romaji_entries.

(About Chinese written in pinyin. I you'll almost only find: a few bibles, some middle school "China is a glorious country" readers and some "Why Chinese could and should be written with pinyin" articles.) Vaste 09:22, 3 July 2011 (UTC)Reply

It's the pinyin version of a map. That work would be part of what I called middle school readers. (Or maybe it's for foreigners?)
Can you find citations for common word such as wúliáo or (dài) lǜmàozi? How about politically sensitive words such as liùsìshìjiàn? Vaste 00:23, 4 July 2011 (UTC)Reply
The Pinyin Atlas usually used in business organization. Engirst 01:22, 4 July 2011 (UTC)Reply

(moved unrelated post)

If it is possible to attest a few pinyin entries, then we should have those few. As a dictionary, we would not be doing our job if we kicked out attestable terms merely because other terms in the same class were unattestable. Keep what is attestable and throw out what is not. bd2412 T 02:06, 4 July 2011 (UTC)Reply

I disagree. Any pinyin entry is fully described by its character equivalent(s). After all, they describe the same word. Furthermore, these character versions will (probably) always be more mature, detailed and correct. I say, let's not stop at adding a few hundred or thousand attestable pinyin entries, let's add tens of thousands or hundreds of thousands non-attestable pinyin entries that are simply links to the character versions. This would be much more useful for the typical user, easier to maintain, and also consistent with what's done in e.g. Japanese. It would also better reflect how modern Chinese is actually used.
It would be easier to maintain because there is less duplication of effort, and less entries to attempt to keep in sync. E.g part of what belongs in would also be in jiào and jiāo. If is updated/corrected, then jiào and jiāo would be out of date, and possibly wrong/misleading. Then if a user goes to jiào, and since there is already a definition right there, he/she might not see the updated entry in . Vaste 04:28, 4 July 2011 (UTC)Reply
In that case, are you willing to either attest or delete the hundreds of pinyin entries User:123abc/Engirst has created over the past year? Note that this user has evaded blocking about a dozen times by changing his IP continuously. ---> Tooironic 04:19, 4 July 2011 (UTC)Reply
If we created all remaining pinyin entries with a bot, he no longer would have anything to do and he would probably go away on his own. :p —CodeCat 09:31, 4 July 2011 (UTC)Reply
Actually, though he's being a total ass about it, some of what he does is useful. I just wished he would stop caring so much about pinyin entries and putting all that energy into improving our (often quite lacking) Chinese entries instead. Also, I'm a bit worried that he might just be copying copyrighted definitions (maybe from that Wenlin software he cites all over the place?). Vaste 09:39, 4 July 2011 (UTC)Reply
"though he's being a total ass about it, some of what he does is useful" I couldn't put it better myself, thank you! Mglovesfun (talk) 12:07, 4 July 2011 (UTC)Reply
Firstly, in fact, Pinyin entries are useful. Such as Wenlin Pinyin dictionary is a good example of Pinyin entries, its entries sorted by Pinyin are beneficial to users especially for learners of Chinese language.
Secondly, the example sentences from Pinyin Bible are useful and good for references, and Bible is a well-known work. Engirst 15:56, 4 July 2011 (UTC)Reply
There have been many "Bibles" throughout history; some are well-known works (the Hebrew Tanakh, the Septuagint, the Greek and Latin New Testaments, the Peshitta, the Authorized King James Version, etc.), while others are not. Maybe the specific Pinyin edition you cite is a well-known work, but I doubt it, and I do not trust you to judge. —RuakhTALK 21:44, 4 July 2011 (UTC)Reply
It is the Authorized King James Version, and The King James Bible is Not Copyrighted. Engirst 22:15, 4 July 2011 (UTC)Reply
Nonsense. The well-known, public-domain Authorized King James Version is in English. You're talking about some sort of Chinese Bible translation in Pinyin. —RuakhTALK 23:31, 4 July 2011 (UTC)Reply
The example sentences are in Pinyin and English as well (Please see here). Engirst 00:13, 5 July 2011 (UTC)Reply
Do you have a point, or are you just trying to waste people's time? —RuakhTALK 00:16, 5 July 2011 (UTC)Reply
The English example sentences are from King James Version, and the Chinese example sentences are from Chinese Union Version. Both are royalty free and well-known. Engirst 00:28, 5 July 2011 (UTC)Reply
Your example sentences are adapted from the Chinese Union Version by transposing the Traditional characters into pinyin. The result is not well-known. —RuakhTALK 00:45, 5 July 2011 (UTC)Reply
Whether in English, Hanzi, or Pinyin, they are the Word of God (Please see here). Engirst 01:11, 5 July 2011 (UTC)Reply
I'm sorry, but that is irrelevant. Wiktionary is not a soapbox for spreading the Word of G-d. —RuakhTALK 01:14, 5 July 2011 (UTC)Reply
The example sentences are not for spreading the Word of God, but for learning Mandarin Chinese. Engirst 01:20, 5 July 2011 (UTC)Reply
Surely the CUV is not a very good source for learning modern Mandarin Chinese? Just like the KJV is not very good for English. Both are quite dated, right?
Wikipedia says:

The vernacular Chinese language has changed a lot since 1919. Indeed, CUV’s language sounds stilted to modern readers. Furthermore, a lot of Chinese characters used in the CUV have fallen into disuse and cannot be found in commonly-available dictionaries today.

It seems that the original CUV is now in PD, but it's dated. There are also slightly modernized versions, but they would be copyrighted. Exactly what version does Engirst use? Vaste 01:43, 5 July 2011 (UTC)Reply
I'm not saying the example sentences need to be removed, or anything like that; I'm just saying that they don't count toward attestation per WT:CFI. Above, you wrote, "Bible is a well-known work"; I thought you were trying to say that these example sentences count satisfy the "well-known work" requirement in WT:CFI. Did I misunderstand your purpose? —RuakhTALK 02:48, 5 July 2011 (UTC)Reply

About emoticons

I've added a "Punctuation mark" header to all entries of emoticons, because that's what they are.

I orphaned and deleted Category:Emoticons, in favor of Category:Translingual emoticons, because I could populate separate categories for Japanese emoticons and Korean emoticons. --Daniel 11:27, 3 July 2011 (UTC)Reply

Emoticons are not punctuation marks; ";)" is not a puntuation mark. --Dan Polansky 08:10, 4 July 2011 (UTC)Reply
Do you think ";)" is something else? What is that, if not a punctuation mark? --Daniel 08:32, 4 July 2011 (UTC)Reply
It is not a punctuation mark because it does not serve to punctuate but to convey an emotion. Equinox 09:02, 4 July 2011 (UTC)Reply
OK; I think that is reasonable, yet false. However, I'm not in the mood to defend the hypothesis of "emoticons as punctuation marks" against its simple negation. I'm just asking what emoticons are, if not punctuation marks. That would help. --Daniel 09:09, 4 July 2011 (UTC)Reply
Symbols? Mglovesfun (talk) 09:05, 4 July 2011 (UTC)Reply
Does anyone got a better answer? If they really are symbols rather than punctuation marks, I'd be happy to undo the change, adding a "Symbol" header to every sense of emoticon. --Daniel 09:10, 4 July 2011 (UTC)Reply
Symbols, yes. —RuakhTALK 17:10, 4 July 2011 (UTC)Reply

OK... Sometimes, I try to prove my points of view through long streaks of arguments, but this time I'll begin by just trying to disprove what you guys said up until now.

  • Emoticons are punctuation marks; they punctuate.
  • Emoticons are punctuation marks, punctuation marks are symbols, and emoticons are symbols. Both "Symbol" and "Punctuation mark" would be accurate headers. The latter is just more strict, thus better to my taste. (Alternatively, the header "Emoticon" would be very very strict and accurate, even natural, but too strict to my taste.)
  • "I'm happy." is an example of something that is written, not a punctuation mark and conveys an emotion. However, there are punctuation marks that convey emotions, notably "!", "??", "..." and scare quotes.

--Daniel 18:19, 5 July 2011 (UTC)Reply

IMO, emoticons do not punctuate. The only reason they are typically placed between clauses or sentences is because placing them in mid-clause or mid-sentence would disrupt the reader's flow. They do not, themselves, indicate a specific kind of grammatical break as e.g. a comma does. Yes, you could separate two sentences with an emoticon in lieu of a full stop, but that will work with any visual break (e.g. a vertical line on a poster or greeting card); it does not have a punctuational meaning and you would not know whether it was meant to be a comma, full stop, semicolon, etc. except by working it out from existing knowledge of grammar. Equinox 19:04, 5 July 2011 (UTC)Reply
I agree with Equinox that emoticons are not punctuation. The primary purpose of punctuation is to indicate a mixture of prosody or grammar (with different languages, writing systems, time-periods, and individual writers tending to put greater emphasis on one or on the other). Even something like the exclamation point, which tends to express surprise, has I think the primary purpose of marking the end of a sentence; the surprise is merely what distinguishes it from certain other punctuation marks that have the same primary purpose. (There's a spectrum of uses, of course: the exclamation point ranges from purely grammatical uses, as in "What a lovely home!", where a period would be incorrect, to purely expressive uses, as in "She ordered (!) him to [] ", where it's really acting exactly like an emoticon. So I don't think there's a bright-line test. But so far, emoticons are really only in the purely-expressive-uses part of the spectrum, so I wouldn't consider them punctuation.) —RuakhTALK 19:52, 5 July 2011 (UTC)Reply
Maybe we should just use ===Emoticon=== as the header? They don't really seem to resemble anything else in usage, so comparing them seems a bit... fruitless. —CodeCat 19:58, 5 July 2011 (UTC)Reply
My personal preference is "symbol", because (i) I doubt anyone would dispute that they are symbols (while the idea that they are punctuation is very dubious; see above), and (ii) "emoticon" seems to be getting a bit specific, as with the "mathematical symbol" (or whatever it was) that Daniel proposed. Headers are ultimately supposed to indicate the part of speech, not the category, and we don't want to turn into some kind of character-centric Unicode consortium. Of course we can use the "emoticon" gloss and category, just as we do with "math", "typography", etc. Equinox 20:00, 5 July 2011 (UTC)Reply
If we are going to treat emoticons as parts of speech, I can't really think of anything beyond 'phrase' or 'interjection' that fits. —CodeCat 20:08, 5 July 2011 (UTC)Reply

I created Wiktionary:Votes/2011-07/External links, which stems from old discussions. --Daniel 14:03, 3 July 2011 (UTC)Reply

WT:CFI question

What do we really mean by "Usage in permanently recorded media"? User:Engirst is trying to argue that this site is durably archived because it claims to have CD versions of its Bible content available - even though all that I can find on the webpage is a collection of mp3s. Surely this is not permanently recorded media, nor durably archived. See also our discussion User_talk:Engirst#copyrighted_material. ---> Tooironic 00:31, 4 July 2011 (UTC)Reply

"Usage in permanently recorded media" essentially is just Google Books and Usenet. --Daniel 02:50, 4 July 2011 (UTC)Reply
Note: Last year, I asked "What are the durably archived sources?" and got some replies. --Daniel 09:02, 4 July 2011 (UTC)Reply
What about the Internet archive? Aren't internet pages, that have been archived there also permanent? Matthias Buchmeier 12:08, 4 July 2011 (UTC)Reply
No, they are not permanent. The archive is subject to the whim of the copyright holders. Per the Terms of Use, "if the author or publisher of some part of the Archive does not want his or her work in our Collections, then we may remove that portion of the Collections without notice." DAVilla 17:24, 4 July 2011 (UTC)Reply
The King James Bible is Not Copyrighted Engirst 19:40, 4 July 2011 (UTC)Reply
"Wordproject is an open, royalty free web page, online and on CD, which aims to make the Word of God - the Bible - available to as many people as possible, through a means that is simple, up-to-date and cheap to reproduce and use." CD is durably archived as well. Please see here. Engirst 12:17, 4 July 2011 (UTC)Reply
Show me where the CDs are. All I can see is a series of mp3s. In any case it's not a legitimate publication. If it were that would mean that any one could just create a site with some mp3s and call it a durably archived source! ---> Tooironic 22:16, 4 July 2011 (UTC)Reply
Note that we consider movies "durably archived" as well. There have been cases where movie quotes were enough to verify an entry in terms of CFI. -- Prince Kassad 13:30, 4 July 2011 (UTC)Reply
How do spoken quotes work if there is no known written representation? —CodeCat 14:54, 4 July 2011 (UTC)Reply
How do written quotes work if there is no known oral pronunciation? DAVilla 17:20, 4 July 2011 (UTC)Reply
I agree that spoken word only doesn't work, we are a written dictionary only (apart from audio files). Mglovesfun (talk) 17:25, 4 July 2011 (UTC)Reply
I understand if there's some inherent ambiguity in trying to spell something oral, but who ever said we're a written dictionary only? DAVilla 16:29, 5 July 2011 (UTC)Reply
I would rule it as durably archived and therefore citable, definitely quotable even if not. DAVilla 17:20, 4 July 2011 (UTC)Reply
Not all CDs are durably archived, since you can burn something to CD without archiving it durably. (Similarly, not all books are durably archived, since you can write something in your personal diary without archiving it durably. Regardless of the medium, common sense is required.) —RuakhTALK 17:26, 4 July 2011 (UTC)Reply
I don't remember anyone calling something from Google Books not durably archived. In my experience, people seem to think that everything that comes from that website is durably archived by definition, because, when an entry is attested through it, nobody questions it. I'm ready to be proven wrong, if I am. --Daniel 22:53, 4 July 2011 (UTC)Reply
Well, I meant physical books — but no, not everything from Google Books is durably archived, either. Some of the "books" on there are print-on-demand, and there's not even any guarantee that any hard-copies exist, let alone archived anywhere durable. (I also mentioned this in the discussion you linked to above.) That said, Google Books always indicates where the book came from (e.g., if they got it from a certain library), and in cases where it was supplied directly to them by a publisher in digital form, you can usually tell from the editing quality whether it was really published or not. As long as we're aware that presence on Google Books is not necessarily sufficient, we can generally apply common sense in individual cases. —RuakhTALK 23:29, 4 July 2011 (UTC)Reply
So physical form is sufficient if it's a mass produced book, but not clear if it's a mass produced CD? I interpret the scenario as the latter, but now I'm wondering if that assumption is incorrect. DAVilla 16:34, 5 July 2011 (UTC)Reply
One more Pinyin Bible with CD for your reference. Engirst 21:59, 5 July 2011 (UTC)Reply

Control characters

I created this as a simple entry for a control character. Feel free to improve the idea of defining control characters somehow. Perhaps they should be in appendices instead; I don't know. --Daniel 17:50, 5 July 2011 (UTC)Reply

First bad result: The new entry appears among the recent changes, but can't be clicked on from there. --Daniel 17:59, 5 July 2011 (UTC)Reply
I don't think adding control characters is a good idea. But we could add their names instead, like NUL or STX. —CodeCat 21:24, 5 July 2011 (UTC)Reply
Names of control characters would include ^G and COMBINING GRAPHEME JOINER. An appendix might list them all, and their Unicode codepoints. --Daniel 21:52, 5 July 2011 (UTC)Reply

Please help clean up the topical categories!

Since the vote that created subcategories for English topical categories, a lot of entries have been left behind, still in the 'main' category. The main categories should now be empty, but there are many that aren't yet. So I'd like to ask everyone who can and is willing to help fix this, by adding the prefix en: to those categories in each entry. There is now a list of topics, which shows how many entries each category still has. Once they all show no entries, we can be satisfied. :) —CodeCat 19:41, 5 July 2011 (UTC)Reply

Is this what we want? If yes, I can contribute with my bot. --flyax 21:09, 5 July 2011 (UTC)Reply
Yes, but you have to be very careful not to add it to categories where it shouldn't be added. Category:British English and Category:German verbs should stay as they are, for example. —CodeCat 21:23, 5 July 2011 (UTC)Reply
OK. With this regex (User:Flubot/en to topical categories) there won't be any problem I guess. --flyax 08:47, 6 July 2011 (UTC)Reply

User:123abc's sockpuppets User:Ddpy and User:Engirst

I propose that we delete all pinyin entries created by both these user accounts since the vast majority of these hundreds of entries cannot be attested. ---> Tooironic 00:32, 6 July 2011 (UTC)Reply

Does "cannot be attested" mean "not a real word"? I would think we'd have trouble attesting almost all pinyin entries, since the large majority of Chinese literature prefers to use characters instead. Tempodivalse [talk] 01:05, 6 July 2011 (UTC)Reply
"Cannot be attested" means "not a real word yet as far as Wiktionary is concerned". - [The]DaveRoss 01:11, 6 July 2011 (UTC)Reply
Even if anyone with a solid grasp of the language would be able to tell you that the pinyin is an accurate transliteration of the word? I see some trouble down the road if we decide to be extremely stringent about "attestation". For instance, many obscurely inflected Russian, Esperanto, and (especially) Latin words we have might not be fully "attestable", even though a fluent speaker of the language will tell you it is a perfectly valid word. (That's probably something to be discussed in another topic, however, and I don't want to distract from the initial purpose of this thread.) Tempodivalse [talk] 01:20, 6 July 2011 (UTC)Reply
Tempodivalse, all the valid Russian words can be attested, including slang, it's not a purely spoken language but a language well-described and used if you know, I have yet to find a Russian word that doesn't exist on the internet or in dictionaries, perhaps standard transliterations of foreign concepts that are rarely discussed by Russian. (end of distraction)
As for pinyin entries, although there are rules about spacing, capitalisation, spelling of erhua, even the tone marks and tone numbers and absence of them, pinyin is only used in learning materials, dictionaries, books for students or when hanzi can't be entered for technical reasons. Same can be said about bopomofo - it's a tool, not the proper script. Almost invariably, pinyin follows the proper hanzi (simplified or traditional) text to help with pronunciation, the primary script for Mandarin Chinese, pinyin on its own is otherwise useless. Pinyin alone is used by people who have the agenda to convince people that Mandarin can be written in pinyin, like Pinyininfo web-site and our ill-famed User:123abc and his various incarnations. I second deletion of his entries, even if they don't break rules. Keeping pinyin entries in sync with hanzi entries is made impossible due to his utter lack of cooperation with other Wiktionarians. --Anatoli 04:08, 6 July 2011 (UTC)Reply
I agree. As it is now, he does little/nothing to help us improve Wiktionary in a way that *I* (we?) care about. Vaste 04:25, 6 July 2011 (UTC)Reply
I doubt all valid Russian inflections can be attested, but that's no reason to exclude them. If words written using pinyin are not counted as "real words", then all pinyin should be excluded regardless of attestation. If that's not the case, then I don't see why they should be deleted. Considering words of different writing systems to require attestation separately doesn't make sense to me. --Yair rand 04:27, 6 July 2011 (UTC)Reply

Pinyin entries

Could we keep only the pinyin section of the entries? This way we handle it like romaji in Japanese instead. I.e. a list with short definitions and links to entries (in characters). See example for shū:

Pinyin

shū (with tone numbers: shu1)

  1. , : book, letter, document; writings
  2. : father's younger brother
  3. : comb; brush
  4. : open up, unfold, stretch out; comfortable, easy
  5. : neglect; careless, lax
  6. , : transport, carry, haul

Useful, to the point, and zero problems with attestation. Vaste 04:25, 6 July 2011 (UTC)Reply

It would be perfect if the entries were maintained the way you suggested but we have a case where a wayward editor does with pinyin entries what he thinks is right, not what is being discussed and agreed on. Do you see the difference? --Anatoli 05:42, 6 July 2011 (UTC)Reply
I think we could clarify the guidelines we have. Right now, about pinyin entries it says:

Pinyin entries: The entire simplified phrase and the entire traditional phrase should be hyperlinked to allow for easy navigation to the simplified and traditional entries (which often contain additional information that is lacking in the Pinyin entry).

This to me implies that pinyin entries such as the ones Engirst are creating are perfectly okay (if they fulfill WT:CFI). (I.e. "additional information" doesn't have to be "lacking" in the Pinyin entry.) I would like this changed to limit the scope of pinyin entries to something like the example above. Do you agree? Vaste 07:07, 6 July 2011 (UTC)Reply
Yes, I agree. The scope of pinyin entries should be limited. All examples, etymology, usage notes, etc. should go into the main entry but pinyin entries should list possible hanzi (simp. and trad.) that could have the given reading. Engirst (being so keen to write pinyin) could fix some hanzi entries where pinyin is missing. --Anatoli 07:46, 6 July 2011 (UTC)Reply
I just noticed that WT:About Sinitic languages also says this:

Headwords that are romanizations point to both the traditional and simplified forms, but do not duplicate all entries with that pronunciation, instead having a “Pinyin” L3 heading (likewise for other romanizations), linking to characters with that reading; see ài.

(Note: when added the article looked like this: ài)
I've always felt that this ultimately should depend on a technical solution. It was attempted several years back, it was called WiktionaryZ, but ended it up going nowhere. In a nutshell, I should be able to fill out a single form for any word or phrase, including all orthographies and meanings on that form, and be able to search for that term using any of the orthographies that were entered. We are so far from that at Wiktionary, it's not even funny. It is absolutely insane for me to be creating duplicate entries for each and every word (simplified and traditional). But, that's currently the only way available to me if I want to ensure a consistant user experience. Now, we're debating the wisdom of creating a third version (the pinyin). There should be a button on the screen that I can press to toggle between traditional, simplified and pinyin. If you have an iphone, ipod touch or ipad, check out Pleco. Now that's what we should be working toward here. -- A-cai 00:05, 7 July 2011 (UTC)Reply
P.S. I stand corrected. wiktionaryZ eventually became OmegaWiki. -- A-cai 00:09, 7 July 2011 (UTC)Reply
So what would we need in order to simplify the process? Some kind of automatic synchronization system tied to whenever anyone clicks the save button? That wouldn't be all that hard to produce with javascript, but with pinyin it appears that it's not one-to-one correspondence between entries. --Yair rand 07:58, 7 July 2011 (UTC)Reply
One solution to keep the information in one place only could be: create the "real" entry as a subpage somewhere (e.g. of the traditional entry), and then include that as a template in both the traditional and the simplified page. A special link to edit the subpage could be included. Quotations etc (given in trad and/or simp) could then be conditionally included, depending on if it's the simplified or traditional page importing it. Perhaps a bit complicated though.
It isn't one-to-one with trad/simp either. trad -> simp is almost many-to-one (the only exception I know of is trad 著 matching simp 着 and 著). Vaste 02:55, 8 July 2011 (UTC)Reply

The one-to-one problem is a legitimate hurdle.  There are a number of ways one could tackle this.  Ideally, the user should only have to enter the term in one orthography.  The computer should do the rest.  However, it is not a straightforward process.  In a nutshell, the computer needs access to two key/value lists (or one big list):

  1. simplified/traditional
  2. Pinyin/character

Because it is not a one-to-one correlation, a partial list of key/value pairs would look like:

  • 书 書
  • 字 字
  •  云 云雲
  • 发 發髮
  • ken3 肯啃恳垦懇墾
  • ken4 掯

In cases where there is no one-to-one correlation, the computer would need to prompt the user for the correct choice.  A similar technique is used by most modern Pinyin input method editors.

Anyway, the first step is to create the key/value lists from some source on the Internet (CEDICT might be an option as a source).  Next, a process would need to be put in place whereby the lists  are made available to some kind of JavaScript or python call etc.  The rest of the process would involve a bot auto creating the missing components from the seed entry.  Please let me know if you're having trouble seeing where I'm going with this.  Thanks.  -- A-cai 23:19, 7 July 2011 (UTC)Reply

Modern Pinyin input method editors work on the assumption that their users know characters; but I don't think we can assume that an editor entering a word in Simplified characters knows which of the corresponding Traditional characters is correct. Can we? —RuakhTALK 00:42, 8 July 2011 (UTC)Reply
Correct, you would have to know which one to choose. I don't see how else it could be done. Automation only gets you so far. Language expertise would have to carry you to the finish line, I think. BTW, we have the same problem with static entries. At the end of the day, we need language experts to do the heavy lifting. -- A-cai 01:33, 8 July 2011 (UTC)Reply
I'd say we cannot assume that. Though perhaps not the best example, a simplified editor might know that "里头" -> "裡頭" (or "裏頭"), and assume that "千里" -> "千裡" (which is wrong, should remain "千里") or "乡里" -> "鄉裡" (this is typically "鄉里").
The most serious attempt to tackle this problem that I know of can be seen at Chinese Wikipedia. Vaste 02:54, 8 July 2011 (UTC)Reply
I think you're right about Chinese Wikipedia.  The only one that doesn't seem to require any human intervention whatsoever is when you are going from traditional to simplified.  However, even there, you still have the problem of multiple pinyin readings (ex. 好 hao3,hao4).  On the other hand, if an interface could be developed whereby the user is prompted to select the correct reading or character in such cases, it would at least reduce the amount of work that would need to be done by a human language expert (although it probably wouldn't eliminate it).  In any case, the process that I described above is rather basic compared to what they do at Chinese Wikipedia, so I'm pretty sure that the process of conversion can be optimized over time, thereby minimizing the number of choices that would need to be made by humans.  Furthmore, you do actually get one-to-one correlations most of the time, so for the vast majority of entries, the computer could do the whole thing without breaking a sweat.  -- A-cai 10:01, 8 July 2011 (UTC)Reply

"yuan is a nonstandard spelling of yuán"?

Previous vote: Wiktionary:Votes/pl-2009-12/Treatment of toneless pinyin syllables

It is not a fact. yuan is written in Renminbi, it is a standard spelling. It is a toneless spelling of yuán, but not a nonstandard spelling of yuán. Please see the banknote of Renminbi for your reference. Engirst 09:54, 6 July 2011 (UTC)Reply

You mean how it says "20 yuan" on the banknote, right? Isn't that in English? One side has English, one side has Chinese, right? Vaste 10:04, 6 July 2011 (UTC)Reply
There is no English in Renminbi banknote. Please see the banknote of Renminbi for your reference. Engirst 10:16, 6 July 2011 (UTC)Reply
But the pinyin appears in the same place as the other languages (zhuang etc). Isn't it just a way to write something meaningful for non-Chinese to read?
Then again, toneless pinyin is much more commonly seen in Chinese than toned pinyin in general. I wouldn't call it a "standard" way to write Mandarin though. Maybe "toneless variant of yuán" would be more appropriate? Vaste 10:24, 6 July 2011 (UTC)Reply
So, the wording of "toneless Pinyin is a (deprecated use of |lang= parameter) nonstandard spelling of toned Pinyin" is inappropriate. Engirst 10:56, 6 July 2011 (UTC)Reply
I disagree, and so far as I can recall we had a vote to settle this question. Toneless pinyin (like accentless Hebrew) is sometimes used because the target audience is expected to know the meanings intended even without the tones, or because the authors do not appreciate the significance of tones. It is not a misspelling, per se, but is not the idea presentation either. bd2412 T 17:02, 11 July 2011 (UTC)Reply

Linking words in definitions

Hi, I'd like to suggest that instead of piecemeal linking of "significant" words in definitions, Wiktionary silently links every word automatically. OK, there would be no blue cue, but people would get the idea fairly quickly I think if the cursor changed on mouseover? OTOH, I suppose this must have been suggested before....

There is no reason to link a, an, the, and other words that will not assist in understanding the definition. The blue links are more than clues to linking; they are also emphasis on important aspects of the definition. --EncycloPetey 18:07, 6 July 2011 (UTC)Reply
I see no downside at all (unless it's a performance one, which seems unlikely) to linking words that people are unlikely to click on. The point is that they can click on any word they want more information about, irrespective of whether someone's pre-decided that they might want to. 86.181.204.160 19:08, 6 July 2011 (UTC)Reply

Wikilook might help you. Lmaltier 18:22, 6 July 2011 (UTC)Reply

This is an interesting idea with merit. I've thought about it in passing a few times. This might be useful for Simple English Wiktionary, which is intended primarily for learners. (One Esperanto site I know employs a similar feature: each word in an article or message can be clicked on to reveal a pop-up. It really helped me boost my vocabulary and is more efficient than searching through a dictionary all on your own.) Tempodivalse [talk] 19:41, 6 July 2011 (UTC)Reply
I dislike this idea. Every word being a link makes it much more difficult to copy and paste things. Number of times I've wanted to copy and paste something from a definition: +∞. Number of times I've come across the word "the" in a definition and decided to look it up: 0. —RuakhTALK 20:13, 6 July 2011 (UTC)Reply
Wikilook has not this drawback. The main advantage of Wikilook is that it works on all sites, not only here (but you need Firefox). Here is a link for more information: https://addons.mozilla.org/en-US/firefox/addon/wikilook/ Lmaltier 20:15, 6 July 2011 (UTC)Reply
That looks pretty cool — and much more sophisticated than what the anon is suggesting. —RuakhTALK 20:33, 6 July 2011 (UTC)Reply
Maybe we could do the same thing with a preference setting and Javascript? —CodeCat 20:36, 6 July 2011 (UTC)Reply
Wikilook is available in WT:PREFS, but afaict it only works on Firefox and Opera. --Yair rand 01:12, 7 July 2011 (UTC)Reply

five quondam Jōyō kanji, 196 recently added Jōyō kanji

In 2010 five kanji lost their Jōyō kanji status and 196 Jinmeiyō kanji and Hyōgaiji were recognised as Jōyō kanji. I have updated the information concerning the five former Jōyō kanji (, , , , ) in their respective articles (however, the Jōyō tag remained, because I am not aware of their current status - are they now Jinmeiyō kanji or Hyōgaiji ?), but how are the 196 newly added Jōyō kanji to be dealt with? Some of them are not tagged at all, others are tagged as Hyōgaiji, which is now obsolete. I suggest adding a tag (Common Jōyō kanji since 2010), but they should probably not be updated manually given the considerable number of entries concerned. The uſer hight Bogorm converſation 07:32, 7 July 2011 (UTC)Reply

While browsing wikipedia I discovered that all 5 quondam Jōyō kanji were added to the Jinmeiyō kanji list. How about referencing the Jōyō tag with the usage notes like that in order to raise the reader's awareness of the alteration of their status? There is a source added by a user which facilitates and justifies referencing. The uſer hight Bogorm converſation 07:45, 7 July 2011 (UTC)Reply

Altaic languages

I noticed that several etymologies refer to the Altaic languages or to Proto-Altaic. As far as I know, the existence of that family is disputed, and so is the question of which languages belong to it, so should we really allow it on Wiktionary? —CodeCat 17:25, 7 July 2011 (UTC)Reply

No. Category:Altaic languages failed RFD. -- Liliana 17:28, 7 July 2011 (UTC)Reply
It looks like it failed for the same reasons. Does that mean any Proto-Altaic etymologies should be removed as well? —CodeCat 17:31, 7 July 2011 (UTC)Reply
I suppose yes, since the Altaic proposal is, as you said, not universally accepted, therefore any etymologies involving Altaic are nothing but spurious theory. -- Liliana 17:34, 7 July 2011 (UTC)Reply

I'm sorry, but that's ridiculous! Have you ever heard of the comparative method? Don't you know that all linguistic reconstruction cannot be proven? Even Indo-European cannot be. It is not spurious theory, Altaic is based on the comparative method like Indo-European, Uralic, Dravidian, Sino-Tibetan, et al. There is merely some kind of bizarre prejudice against Altaic. Have you ever read any literature on Altaic or the comparative method in general? Why should Altaic etymologies be ignored? How are they any less valid than any others? Again, no linguistic proposal with regard to a proto-language is ever universally accepted because none of them can ever be proven, as all proto-languages are, by definition, from times when there are no written attestations -- And we drown

Yes, all proto-languages are theoretical, but some are more widely accepted than others. Proto-Indo-European is very widely accepted; no serious linguist disputes it. Proto-Altaic, on the other hand, is rather controversial; it's almost as controversial as "Proto-Nostratic". Most terms should probably not be taken back further than Proto-Turkic, Proto-Mongolian, etc., though I suppose we can mention Proto-Altaic when the term is specifically mentioned in the literature as a term used to argue in favor of the Altaic hypothesis. Even then we should couch it correctly: don't just say "Proto-Turkic *XYZ < Proto-Altaic *UVW", but rather "Proto-Turkic *XYZ. So-and-so (1957) compares this form to Proto-Mongolian *RST and derives both from Proto-Altaic *UVW." We can do that for Nostratic too: if a form is "notable" for its use in the reconstruction of Nostratic, we can mention that, but we shouldn't just slap Nostratic etymologies down as if they were as noncontroversial as PIE etymologies. —Angr 17:58, 15 July 2011 (UTC)Reply
Should we have categories such as Category:Turkish terms derived from Proto-Altaic? Category:Altaic languages failed RFD some time ago. —CodeCat 18:28, 15 July 2011 (UTC)Reply
No, we shouldn't. Mentioning the fact that a term has been used in the reconstruction of Proto-Altaic doesn't mean we need to categorize the terms. —Angr 06:55, 16 July 2011 (UTC)Reply
And that creates a problem because people will use {{proto}}, like And we drown did, which automatically categorizes. In effect, while we are able to control language templates and their uses to some extent, proto-language etymologies have free reign and can be called anything at all, because the template copies the name provided. —CodeCat 13:24, 16 July 2011 (UTC)Reply

"A long rode to ho" or is it "A long road to hoe"????????

A long rode to ho. It is not mispronounced it is misspelled. Rode is a length of chain and rope that is put out from the ship to the anchor. A long rode is required when it is windy or stormy. To pull a rope on a ship is to ho from the term “Heave Ho”. The group will advance on the rope on the command Heave and sailors expressed in unison "Ho" as they pulled. If it is stormy and windy or the current is strong the long rode to ho is hard work that takes a long time.

I would be ever so pleased if people would stop misspelling the phrase or replacing the word, rode with row, as if that would makes more sense. The term has been around before we where colonies. If you look this idiom up on Google you will find 200,000 web pages CAN BE WRONG! Sports writers are the worst. Journalist will put it in quotations but it is not a correct quote. Imagine journalist getting a quote wrong, it gives me shivers.

My 3rd grade teacher, Mrs. Samuels, told me in 1969. There are 2 versions. “The long rows are not hoed they are plowed. Now road and rode sound alike but they are not the same as in ‘The long rode to ho means its hard work and takes a long time. Like on a boat anchor. See why spelling is so important many adults spell that word wrong”. We looked the words up in the dictionary and the phrase made perfect sense to an 8 year old. Mrs. Samuels was a great teacher , and she is still teaching from my heart, this very moment.

This phrase has lost its way from its nautical roots.

— This unsigned comment was added by Rekamlias (talkcontribs) at 00:47, 8 July 2011 (UTC).Reply

This is a policy discussion page. You probably meant to post this somewhere else. —RuakhTALK 00:52, 8 July 2011 (UTC)Reply
By the way, for the record — the versions in “row to hoe” are original (attested since 1835), and still wildly more popular than the versions in "road to hoe" that started to arise after a few decades. With all due respect to Mrs. Samuels, no "rode to ho" version exists at all. —RuakhTALK 01:05, 8 July 2011 (UTC)Reply

«Derivations» in topical categories

Do we really want categories like Category:Biblical derivations to follow the en:categoryname scheme? Wouldn't a name like "English terms derived from the Bible" be more appropriate? Does the use of "derived from" apply to languages only? --flyax 06:19, 8 July 2011 (UTC)Reply

The vote that changed the derivations categories only affected derivations from languages. So there are still some categories in Category:Etymology that haven't been changed. —CodeCat 10:35, 8 July 2011 (UTC)Reply

Quotation index

I've been thinking about writing an indexer for {{quote-book}} that would generate an index arranged by author. This would make it easier to find inconsistencies in our entries, as well as hopefully pushing people to use a more standardized format for quotations. Some potential issues / questions:

  • Need to sort by first name, or by the name given in the author field. I could try to extract the last name but it would be inconsistent at best.
  • I'm only planning on doing English sections to start, due to the small number of non English entries that use the template.
  • Is sorting by author the most convenient format to use?

So, would this be useful to anyone? Nadando 06:23, 8 July 2011 (UTC)Reply

Personally, I don't use {{quote-book}}, and don't plan to start; and my general impression is that the majority of well-formatted quotations are not using it, either. So the index would be rather permanently incomplete. —RuakhTALK 10:04, 8 July 2011 (UTC)Reply
I disagree, I think quote-book does a good job. --Mglovesfun (talk) 12:25, 8 July 2011 (UTC)Reply
What do you disagree with? —RuakhTALK 13:19, 8 July 2011 (UTC)Reply
I'ne never used {{quote-book}}. I tend to look for Latin quotes or 18th-century English quotes through Wikisource, and use templates set up for particular oft-used sources only. These typically lack ISBNs and have peculiar formatting or linking that can't be tied in through the {{quote-book}} template. --EncycloPetey 14:56, 8 July 2011 (UTC)Reply

Bosnian, Croatian and Serbian translations

I've been thinking about this issue for a while. The merge debate on WT:RFM was about Serbo-Croatian categories, while translation templates don't categorize anything. Also there are Bosnian, Croatian and Serbian Wiktionaries, so occasionally converting a translation to Serbo-Croatian will remove a valid link. Thoughts? Mglovesfun (talk) 11:24, 8 July 2011 (UTC)Reply

Maybe the translation template could have three links instead of one for Serbo-Croatian? —CodeCat 11:26, 8 July 2011 (UTC)Reply
There is a Serbo-Croatian Wiktionary, so {{t|sh}} is also correct. Mglovesfun (talk) 11:36, 8 July 2011 (UTC)Reply
I meant that when you type {{t|sh}} the result shows three (or four) links to languages: (bs) (hr) (sh) (sr) —CodeCat 12:23, 8 July 2011 (UTC)Reply

Slang senses that are not in widespread use

We have an appendix for protologisms, but those only really consider new words that are created in the hopes that they will be used. But there are also quite a few cases where an existing word is used in a sense that isn't widely known outside a certain group of people. In other words, it's not the word that's new, it's the meaning. In many cases these terms are in widespread use, but only within that community or context, which makes them hard to attest elsewhere or even at all. So I am wondering if there is a way to define such senses at all on Wiktionary? Is there an appendix for such slang senses? —CodeCat 12:52, 8 July 2011 (UTC)Reply

Why not use LOP?​—msh210 (talk) 15:23, 8 July 2011 (UTC)Reply
If a word already exists, can it still be a protologism? —CodeCat 15:28, 8 July 2011 (UTC)Reply
I dunno, but a bunch are already there. Starting at the beginning, you'll soon find "aardvark" and "abdicate". (Also "a" and "aa", though those are basically separate words that happen to be spelled the same as existing ones, so maybe they don't count.) —RuakhTALK 15:32, 8 July 2011 (UTC)Reply
But there is still a difference. Does a term really belong there if it couldn't be meaningful outside of the context of a certain community of speakers? For example, if someone coined a word that was simply not useful outside of Facebook, could it still be listed on that page? —CodeCat 17:18, 8 July 2011 (UTC)Reply

Category:Biblical derivations and Category:Fictional derivations

These two categories, and also their subcategories, are the only categories that are left from the 'old' set of derivations categories. They were not affected by the recent vote because they are not languages, so they would not belong under Category:English terms derived from other languages. Now that things have settled a little I think we can try to move these categories as well. I would like to propose the following names:

CodeCat 20:07, 8 July 2011 (UTC)Reply

Category:English terms coined by J. R. R. Tolkien would be more informative for the last one. "Tolkien's legendarium" is too fanspeak-ish. --Daniel 20:22, 8 July 2011 (UTC)Reply
There are also:
--flyax 20:56, 8 July 2011 (UTC)Reply
Category:en:Australian Aboriginal derivations should really be emptied and deleted, and there is only one entry in it. Does anyone know from what Australian language family it derives? —CodeCat 21:04, 8 July 2011 (UTC)Reply
Anyhow, it should use {{etyl|aus}}, and that's what it does now. -- Liliana 21:17, 8 July 2011 (UTC)Reply
Category:eo:Fictional locations seems to be ok, the word 'derivation' does not appear in the title. Mglovesfun (talk) 12:38, 9 July 2011 (UTC)Reply
But it is inside Category:eo:Fictional derivations, so that can't be deleted until Category:eo:Fictional locations is deleted or removed from it. —CodeCat 12:43, 9 July 2011 (UTC)Reply

This is just a small question but... what exactly is the usual practice on the 'see also' links at the top of the page? I noticed it is used to redirect between spellings that look the same. But can it also be used between words that may sound the same? For example, would it be useful to link between dança and dansa, given that these two words are pronounced identical in several languages and one could easily be mistaken for the other? I'm thinking of cases where a learner of Catalan hears dansa but assumes it is spelled dança based on English spelling. —CodeCat 22:23, 8 July 2011 (UTC)Reply

No, it's not used for homophones or near homophones. Pronunciation varies so much between languages that we'd have a serious headache if we attempted that. Someone searching a particular language by spelling can use that language's Index of entries to search alphabetically. --EncycloPetey 22:26, 8 July 2011 (UTC)Reply
It would be interesting to have a lookup based on the pronunciation of a word instead of the spelling, e.g. IPA:dansa. One major problem is that inflections important in one language may not matter in another. At least for non-tonal languages, it might make sense to omit these in the title, only indicating stress in the language section where it applies. I don't know if there is a minimal set for tones. Aside from that, the same sound can have different interpretations in different languages. The transcriptions would have to be pretty narrow, distinguishing for instance the aspiration on b and p in English. But the narrower the transcriptions, the less likely they will overlap. There would have to be see alsos for similar pronunciations. Because of the way phonemes can group many adjacent phones for arbitrary languages, this would make for some long lists. DAVilla 04:57, 13 July 2011 (UTC)Reply

How to treat participles on Wiktionary

I'd like to continue the discussion started above at Inflected German participles, but this time not only for German, but cross-linguistically since it turned out to be a problem that concerns many languages. Ok, so the basic question is how participles are to be treated best on Wiktionary. An example for participles in English would be (deprecated template usage) playing as the present participle and (deprecated template usage) played as the past participle of (deprecated template usage) play. Let me sum up the previous discussion. Traditionally, participles are treated as verb forms, so they normally appear in inflection tables of verbal infinitives (see here for German (deprecated template usage) spielen), in Wiktionary too (German, Dutch, French...). The tricky point is: Often participles are used as adjectives in sentences (and can then be declined like normal adjectives). This goes as far as that, for example, the German present participle cannot be used as a verb, only as an adjective (or adverb). This might apply to other languages and really questions whether such participles should be put under "Verb" headers (as is currently done in German and probably most other languages), and even whether they should appear in verb inflection tables.

All in all it seems that the current "German" way of treating participles is rather bad. I know of two other possible solutions. One was proposed by Dan Polansky above. When participles only appear as adjectives (such as German present participles) they don't get a Verb but an Adjective header. When participles are used both as verbs and as adjectives (such as most German past participles), they get both headers. Personally, I think this solution makes sense except for two problems: First, for almost any verb we'd have a verb as well as an adjective section for its past participle. To me this seems redundant, but I also understand the contrary attitude that it's more clear-cut. A more serious problem would be that there appear to be cases where participles are used ambiguously so one cannot tell for sure whether they are verbs or adjectives -- e.g. German das Haus ist gebaut, Dutch het huis is gebouwd (thanks to CodeCat), French Il a sucré son café, puis a bu le café sucré (thanks to Lmaltier). If it's true that participles are something in between verbs and adjectives here, another solution might be appropriate, and that solution is already being used in Latin. For this language, there's a separate Participle header which subsumes the different Latin participles. See (deprecated template usage) āctus for an example. What's the downside of such an approach? As I said, participles can be inflected, and such inflected participle forms (such as (deprecated template usage) āctī) are also under a Participle header. This misses the fact that those forms are completely unambiguously used as adjectives (ambigious cases can still be inflected, as in Spanish la casa está construida, thanks CodeCat), and "participle" is probably not a proper part of speech either.

That's quite complicated, and if anything's unclear or if I put something wrongly, I'm looking forward to your comments. So, how do participles behave in other languages? How are they treated on Wiktionary, and do you think it makes sense? What do you think about the Latin way? Is there possibly a uniform way to represent participles on Wiktionary independent of language, or should we continue to have language-dependent ways of treating them? But, as I said, all the current ways I know of have flaws. At the end of the discussion, of course I'd like to have a good solution for German, but if other languages benefit, so much the better. Longtrend 10:37, 9 July 2011 (UTC)Reply

The inflected forms are not always unambiguously adjectives either. In French, for example, when a past participle is used to form the perfect tense, it still inflects based on the gender and number of its direct object. So they could arguably be considered 'declined verb forms'. —CodeCat 11:40, 9 July 2011 (UTC)Reply
Thanks for the notice, I missed that. In languages that inflect for case, it would be more appropriate to say "non-nominative past participle forms are used unambiguously as adjectives". Longtrend 12:11, 9 July 2011 (UTC)Reply
That may not be accurate either. In early Old Norse, the agreement in the perfect tense was actually the accusative, which later became specifically neuter accusative, but still agreed in gender in earlier texts. This example is found in Völuspá (with the agreement in bold): hverir hafði lopt alt lævi blandit , eða ætt iotuns Óðs mey gefna , with the first agreement being neuter nominative/accusative, but the second is feminine accusative. This is because the combination of participle and object was still considered an object of 'to have' in that language, and was therefore placed in the accusative case. That is, a sentence like 'I have painted a door' was not distinguished grammatically from 'I have a painted door' or 'I have a door painted'. —CodeCat 12:33, 9 July 2011 (UTC)Reply
That's interesting. I think I better don't try another generalization :) But your example seems to be a strong argument in favor of the thesis that participles are (or can be) something in between verbs and adjectives -- that is, if we are going to treat participles uniformly across languages; otherwise it's at best an argument for Old Norse. Longtrend 12:57, 9 July 2011 (UTC)Reply
Hungarian has present, past, future, and adverbial participles. The Etymology section contains the information that this entry is the participle of a verb. There can be adjective and noun sections to illustrate the appropriate usage and declension. See for example nevelő, the present participle of nevel (to educate). --Panda10 13:11, 9 July 2011 (UTC)Reply
Latin participles also come in past, present, and future, and have mood (active or passive as well). There are some Latin participles that were used as adjectives, but since Classical Latin did not always clearly distinguish between adjectives and nouns (they had the same inflectional endings), this means that some participles were used as substantive nouns. In fact, the future passive participle eventually came to replace gerunds and infinitives to funtion as a noun. However, it still had a verb funtion in the passive periphrastic conjugation, and was never used in the nominative (you had to use a verbal infinitive for that). In other words, the situation was rather complicated as to what part of speech these things were. For Latin, we've chosen simply to recognize "Participle" is a separate part of speech because it simplifies everything. Other languages are free to make similar choice in how they handle their parts of speech, but I don't think there's a single way to handle everything that will work across all languages. --EncycloPetey 14:12, 9 July 2011 (UTC)Reply
We should not invent anything: words should be addressed according to traditions of each language. In French, it's clear that participles are verb forms, not adjectives, and that adjectives are not participles, are not verb forms. I provided an example of a sentence with an ambiguous meaning. This sentence shows that this is not always an easy distinction, and this is a good reason to make it as clear as possible here, this is not a reason to blur the difference. Lmaltier 15:22, 9 July 2011 (UTC)Reply
"words should be addressed according to traditions of each language" -- so in your opinion, we should treat German present participles as verb (form)s, even though they are never used as such, just because they are traditionally regarded as verb forms? "In French, it's clear that participles are verb forms, not adjectives" -- how come past participles inflect for gender in predicative use then, a behavior you can only find in adjectives otherwise? Longtrend 12:03, 10 July 2011 (UTC)Reply
French past participles are inflected in some cases, yes, this does not make adjectives. Actually, I think that the distinction between participles and adjectives is exactly the same in English and in French. I also think that all German verbs have compound tenses, and, therefore, that all German participles are actually used as verb forms. Am I wrong? Lmaltier 13:56, 10 July 2011 (UTC)Reply
Something that's still not clear to me is just when something is a verb and when it's an adjective. I can understand that finite verb forms are verb forms... but what about non-finite forms? Why are they verb forms? Etymologically they are often not verb forms at all (like in the Old Norse example; Romance participles have a similar history), so why do we call them verb forms now? —CodeCat 14:11, 10 July 2011 (UTC)Reply
Because (1) we speak English, (2) English has become less inflected and so its grammar has changed, (3) the original categories for parts of speech were set up by the Romans and Greeks, and (4) we have a better understanding of rammar in the 21st century. --EncycloPetey 14:29, 10 July 2011 (UTC)Reply
That still doesn't answer my question though. Why are they verb forms now when they were not originally? What about them makes us consider them verb forms? —CodeCat 14:35, 10 July 2011 (UTC)Reply
In English, our classification of -ing forms and -ed forms and specific senses thereof depends on such things as whether there is a corresponding base form, and whether the forms behave like adjectives or nouns. The verb form is assumed to exist because it is hardly ever possible to find such forms never modified by any adverb. If derived from transitive verbs, they usually take complements just like other forms of the verb.
The conversion process of denominal verbs seems to sometimes begin with -ing and -ed forms. For example, one can be coffeed out or coffeed up, but instances like "He coffees himself up every morning" are more rare.
The answer to the question seems to be simple: when you think to the verb when using the word (when you think to the action expressed by the verb), it's a participle, a verb form (even when an ellipsis blurs this fact); when you don't think to any action, only to a characteristic of the thing (not to how the thing got this characteristic), then it's an adjective. See adjective and verb for definitions. Lmaltier 16:09, 10 July 2011 (UTC)Reply
@Lmaltier: I still don't quite understand your analysis of French participles. You probably know what I was talking about, but to be sure here's an example: Le café est sucré_ vs. La sauce est sucrée (excuse me if those sentences are wrong -- I just have some very basic knowledge of French, but you get what I mean). Correct me if I'm wrong, but sucré(e) behaves just like an adjective and nothing like a verb here -- you can perfectly replace it by a "proper" adjective but not by a "proper" verb. So what makes you think it's a verb other than 1) tradition and 2) the fact that it's obviously derived from a verb (which is not sufficient, as Dan Polansky convincingly demonstrated above -- the fact that in English almost each verb can be "agentivized" (for lack of a better word) by -er in English doesn't make the new forms verbs)? And as of German: Yes, you are wrong in your assumption that all participles are used in compound tenses. Present participles are never used in such constructions. In English, I perfectly agree with the analysis that present participles are (or can be) verbs, since there's such cases as I am playing -- however, there is no equivalent form *Ich bin spielend in German or any other complex verbal constructions with a present participle. Longtrend 17:00, 10 July 2011 (UTC)Reply
You misunderstand me. In your examples, used alone, they are not verbs: very clearly, both sucré(e) are adjectives. They refer to a characteristic of the thing. In Il a sucré le café or (passive form) La sauce a été sucrée avant d'être servie, it's also very clear that they are not adjectives, they are verb forms. The same applies to present participles (this is an easy case, as present participles are never inflected in French: when they can be inflected, then the words are not present participles, they are adjectives). For German, I was thinking to past participles. But, for German too, I think that the criterion should be: do you think to the action expressed by the verb or not? The difference between an adjective and a verb is not related to a suffix or anything of the kind, it's related to how it is used and what is meant by people using it; do people want to use the verb (to refer to an action), or do they want to use an adjective (to refer to a characteristic)? Lmaltier 18:43, 10 July 2011 (UTC)Reply
Your criterion is semantics, which is not valid. Expressing an action is neither a necessary nor a sufficient condition for being a verb. Verbs can also express characteristics ("shine") and nouns can express actions (just take the word "action") -- whether in some language there are adjectives that express actions I can't tell, but probably there are. We define parts of speech not semantically, but syntactically. Back to Le café est sucré, couldn't that also be a passive sentence (perhaps continued by "par...")? In this case the participle could be analysed as a verb, couldn't it? Longtrend 19:14, 10 July 2011 (UTC)Reply
But the definition of verbs and adjectives includes important semantic considerations! If you forget them, you won't be able to make the distinction in difficult cases. Of course, some verbs are not action verbs, but they probably don't cause problems. You are right: in Le café est sucré, sucré is an adjective, but in Le café est sucré par mes soins., it's a verb. It's exactly like sugared in English. Lmaltier 19:30, 10 July 2011 (UTC)Reply
Many east Asian languages have verbs that express states or properties rather than actions, as does Esperanto ("mi estas blua" and "mi bluas" both mean 'I am blue', "mi estas bluinta" means 'I have been blue'). Several old Indo-European languages also have stative verbs, which are semantically very much like a copula and a participle in English. —CodeCat 19:40, 10 July 2011 (UTC)Reply
English and French too have verbs that express states. But are there examples of an unclear status (verb(participle) or adjective?) for these verbs? Lmaltier 19:53, 10 July 2011 (UTC)Reply
There is when dealing with Latin. There are a whole set of Latin deponent verbs whose meaning can only be conveyed in English using adjectves. A Latin scholar would identify the Latin translation as a verb, but only because it has verb endings and not because of any functional or semantic distinction. Latin participles are likewise not always verbs but primarily for the reason that they take the endings of an adjective, inflecting for gender which Latin verbs don't do. And yet the "participial form" is listed as a verb form in most texts and conjugation tables, and forms part of certain compound conjugations. So, in Latin the "verbness" of a participle comes from its tense and context, but its "adjectiveness" comes from its gender and inflectional endings. --EncycloPetey 20:21, 10 July 2011 (UTC)Reply
How to treat participles on Wiktionary — AEL
· [de-indenting] I agree with Lmaltier about sucré. Let me give a similar example, but in English. Take the sentence “At 3:00 PM, the window was closed”: it can mean either “At 3:00 PM, someone closed the window”, or else “At 3:00 PM, the window was not open”. When it has the former sense, it's a use of the participle: “was closed” is just “closed” cast into the passive voice. When it has the latter sense, it's a use of the adjective: “was closed” means “was a closed window”. The important point is that this ambiguity is specific to the word closed. English has a lot of participial adjectives, but it also has a lot of participles that do not double as adjectives. “At 3:00 PM, the window was opened” has only one meaning. (The analogous alternative meaning would be expressed as “At 3:00 PM, the window was open.”) So it's hard to imagine a solution that uses just a single POS header for words like closed: even though participles are often called "verbal adjectives", we still must distinguish between those that double as real adjectives and those that do not. The former clearly need an ===Adjective=== POS header in addition to whatever POS header the latter have; and I think it's clearly a bad idea to use ===Adjective=== for words like "opened".
· I agree also with Lmaltier that we should generally follow language-specific traditions. That doesn't necessarily mean following two-hundred-year-old theories of grammar; there are current active linguistic traditions for all of these languages. If all of the linguists working on German describe the present participle as a verb form, then we should at least figure out why that is, before just deciding that we know better!
RuakhTALK 20:46, 10 July 2011 (UTC)Reply
Thanks for your input. Actually, I agree with you on almost all points. sucré was probably a very bad example to argue for my position, since it has developed a new adjective meaning and usage independent of the participle. Just like closed, it falls under the category of what I dubbed "lexicalized participles" in the initial discussion above (which, surprisingly for me, seemed to be rather unintuitive to many). I absolutely agree with you that such lexicalized participles indeed need two sections -- one Adjective section for the lexicalized usage, and one for the participle, and I think we only need to discuss the latter, since (from my point of view) in many cases it's really unclear whether we are dealing with verbs or with adjectives here (or perhaps even with ===Participle===s?). As an example, imagine English had gender, and in the sentence At 3:00 PM, the window was opened the word opened agreed in gender with the subject window. Would we still be so sure that opened was a verb, it would be declined for gender, after all? It's more than a hypothetical situation, this is exactly what we find in Spanish and French and probably many other languages: La respuesta está obviada "The reply is avoided" -- obviada has feminine gender here which comes from the feminine respuesta, and as far as I know it is not the case that "obviado" has developed adjectival meaning and usage. So what about cases like that?
As for German current linguistic tradition, it's certainly not the case that all linguists describe the present participle as a verb form. It's what you learn at school, and in many cases present participles are listed in verb conjugation tables. For example, the Institut für Deutsche Sprache describes past participles as inflected forms of elements of the word class verb and present participles as adjectives formed from verbs by word formation. canoo.net, on the other hand, lists present participles in its grammar as infinite verb forms, but then says that "all present participles have the form and the function of adjectives" and also lists them as adjectives in its dictionary (e.g. spielend). I'll see if I can consult some printed grammars. Longtrend 09:16, 11 July 2011 (UTC)Reply
@Longtrend, re: verb forms and gender: A word form's having a gender that matches the subject of the sentence does not speak against the form's being a verb form. Czech simple past tenses of verbs show the gender of the subject of the sentence, as in the verb dělat (to do) with its masculine simple past tense dělal, its feminine simple past tense dělala, and its neuter simple past tense dělalo. The same thing is seen in Russian, in its делать, де́лал, де́лала, and де́лало. Unlike these languages, German simple past tense machte does not show gender. --Dan Polansky 10:06, 11 July 2011 (UTC)Reply
Sorry, I was unclear here. Of course claiming that verbs cannot inflect for gender would be wrong. My point is that in the languages under consideration, inflection for gender does not happen (I hope this is correct), except for the dubious cases of participles, so we'd have to assume that for some reason verbs inflect for gender in that kind of construction and only there. But maybe that's not too good an argument, since in Czech gender agreement on verbs only seems to happen in simple past forms, too. Still: even if there is no strong evidence that we are dealing with adjectives here, is there any evidence that they are verbs? Or is it possibly adequate to say that participles in such positions are something "in between"? Longtrend 10:24, 11 July 2011 (UTC)Reply
Czech forms that have a similar function as English past participles (called Czech "passive participles" per W:Czech conjugation, whyever) show gender equally well as Czech simple past tense forms: dělán m, dělána f, děláno n, of dělat. They resemble their corresponding adjectival forms: dělaný m, dělaná f, dělané n. For example, "je dělán" corresponds to German "wird gemacht" and English "is made" or "is being made". --Dan Polansky 11:33, 11 July 2011 (UTC)Reply
@Longtrend, re gender: I see no contradiction whatsoever in saying that a participle (a "verbal adjective", as they're often called) is a non-finite verb form that (often) has various adjective-like properties, including (often) agreeing in gender/number/case/definiteness/c. with a modified noun. And there's no need to imagine a hypothetical English-With-Gender; in actual English, verbs do not agree with their subject at all ("I/we/you/he/she/it/they went") — except for present-tense verbs, which display a bit of agreement, and be, which displays a bit more agreement. Do we therefore say that be is a different part of speech — say, ===Copula=== rather than ===Verb=== — and that present-tense verbs are a weird in-between form that has properties both of a ===Verb=== and of a ===Copula===? —RuakhTALK 12:20, 11 July 2011 (UTC)Reply
As you already said, participles are sometimes called "verbal adjectives", and some experts don't even give a POS for them but say simply that they are "lexical items" that have "characteristics and functions of both verbs and adjectives" (see here). So discussing a ===Participle=== header is not as absurd as your analogy with English present tense verbs suggests (nobody doubts their verbal status). Of course inflection is only one criterion, there are other criteria that solidly confirm that English present tense verbs are verbs, such as position in the sentence. But I still miss any such criteria for past participles, let alone for present participles. I could agree very well with the approach to treat participles as verbs if they are used to form complex tense or voice constructions. This is the case in English with both present and past participles, so personally I would not change anything about the "English way" (unless we are going to find one solution for all languages). But this doesn't help for German present participles, since they are neither used as stand-alone verbs nor to form complex constructions. So what are they? Longtrend 16:57, 11 July 2011 (UTC)Reply
If, when you use them, you think to the verb, to the meaning of the verb, you feel you use the verb, then, they are verb forms. In French too, the phrase adjectif verbal is used by some authors, but it's misleading, because they are not verbs at all, their only relationship with verbs is etymological. And these authors don't use this phrase for participles... Lmaltier 18:15, 11 July 2011 (UTC)Reply
That doesn't always work either. When I think of (deprecated template usage) verwarring in Dutch, I definitely think of (deprecated template usage) verwarren. The form with -ing is very predictable like this in Dutch. But it's not a present participle like in English, it's a verbal noun. I've never heard of this form being considered a verb form any time, but I still think of the verb when the word is mentioned. —CodeCat 18:23, 11 July 2011 (UTC)Reply
Well, in some languages (Bulgarian...), such forms, even nouns, are traditionally mentioned in conjugations. This is why traditions of the language are important. Your reference is right when stating that participles share characteristics of verbs and adjectives. Actually, they are verb forms with some characteristics of adjectives. But it's wrong when stating "In English, participles may be used as adjectives" (cf. opened, see above). Lmaltier 18:29, 11 July 2011 (UTC)Reply
Are there participles that can not be used as adjectives? Or can all participles behave as an adjective in all languages that have them? It seems more economic to me to say 'participles are adjectives that may sometimes be used as verb forms' than 'participles are verb forms that can always be used as adjectives'. —CodeCat 18:52, 11 July 2011 (UTC)Reply
I just answered: opened is a participle, and is not an adjective. And, in French too, corresponding adjectives don't exist for all participles: they're rather common, but not systematic at all, for past participles, and much less common for present participles (note that, for present participles, derived adjectives often have the same pronunciation as the participle, but not the same spelling, e.g. intriguant is a participle, intrigant is the adjective derived from the participle). Lmaltier 19:07, 11 July 2011 (UTC)Reply
I'm sorry, that's not what I meant. By 'adjective' I meant 'showing adjective-like behaviour', not necessarily having 'adjective' as its part of speech. Opened can be used like an adjective: the opened door. So my question is, are all participles able to be used as adjectives? Are they all able to be used in non-adjectival ways (which apparently implies 'as a verb form')? —CodeCat 19:10, 11 July 2011 (UTC)Reply
By that approach, we might as well list all words as ===Adjective===, since all words show adjective-like behavior. —RuakhTALK 19:20, 11 July 2011 (UTC)Reply
@Lmaltier: You honestly think that English participles cannot be used as adjectives? So all these are wrong? Do you have any reason for asserting that apart from your "emotional" analysis? If not, what is that analysis you're proposing based on? Even if we accept a semantic analysis, it's really fuzzy. When I say watcher as a nominalization of watch, I certainly think of the action expressed by the verb. So, is watcher a verb in your opinion? All the syntactic evidence suggests it's a noun, and we treat it as a noun. Longtrend 19:24, 11 July 2011 (UTC)Reply
In my opinion, in the opened door, opened is used as a participle, not as an adjective. I think that it can be considered as an ellipsis for the door which has been opened. But you probably know better than me.
@ CodeCat: I already answered your first question just above. I add that, in French, past participles of 100 % intransitive verbs are never inflected, it would be quite absurd to consider that they behave as adjectives. Second question: yes, in French, participles can always be used as verb forms (as they are verb forms). In English too. Most typical uses in French (not the only ones) are in compound tenses for past participles, and in the "en + participle" form for present participles. These forms are clearly verb forms.
About watcher: of course, you don't feel that you use a verb when you use watcher, you feel that you use a noun derived from the verb. Of course, it's not a verb form. Lmaltier 19:34, 11 July 2011 (UTC)Reply
I just fixed intrigant: I removed the verb form section for French (it was a Tbot mistake). As you can see, considering that participles = verbal adjectives leads to serious mistakes. Lmaltier 19:34, 11 July 2011 (UTC)Reply
Do you care to explain why "of course" watcher is not a verb form but participles undoubtedly are? I'm sorry, but your criterion just seems to be circular and fuzzy. Why does a word belong to a certain POS? Because you feel it. Why do you feel that it belongs to the POS? Because it does. Longtrend 19:59, 11 July 2011 (UTC)Reply
I never explained than all words directly derived from verbs are verb forms. I even explain that adjectives derived from participles are not verb forms, and that verb forms are not adjectives, even if they share some characteristics. Lmaltier 21:19, 11 July 2011 (UTC)Reply
@ Longtrend: I don't speak German, so it's impossible for me to judge; but there are other things you can look for. For example, in English, a transitive verb's present participle can take a direct object even outside of explicit progressive/continuous constructions: “while heating the milk, continue checking the temperature and consistency”. (There are a few adjectives that take directly construed complements, as in “it was worth every penny”, but that's very unusual among adjectives, but absolutely universal among transitive verbs' present participles.) —RuakhTALK 19:20, 11 July 2011 (UTC)Reply
Yes, this is possible in German as well. The Institut für Deutsche Sprache already quoted above states, in my translation: "The present participle -- unlike the past participle -- is never used as a part of analytical verb forms but only in contexts where adjectives occur otherwise. However, present participles show a verbal 'heritage' through their valency". So on the one hand, valency is an argument for verb status of present participles, but on the other hand, both inflection and distribution are arguments for adjective status. (Besides, I'm not sure why you accept valency as an argument for verb status of present participles [there are only few other adjectives taking direct objects] but at the same time reject gender agreement as an argument for adjective status of past participles [there are no other verbs inflecting for gender in French or Spanish].) Longtrend 19:46, 11 July 2011 (UTC)Reply
There are also many languages in which past participles can inflect as adjectives even if they are from an intransitive verb. I think Latin is an example, and so is modern Icelandic: hann er kominn (he has come) but hún er komin (she has come), the endings of 'come' differ based on the gender of the subject. This is apparently unlike French (it would literally translate as il est venu and elle est venue), but it just shows how much variation there is in each language. —CodeCat 20:00, 11 July 2011 (UTC)Reply
Sorry if I'm misunderstanding you, but « il est venu » and « elle est venue » are exactly how you say it in French. I guess you're thinking that in French it would be *« il/elle a venu »? Most French verbs form the perfect by using (deprecated template usage) avoir and an uninflected past participle, and that's the case we were talking about above, but a bunch of common ones, including (deprecated template usage) venir, form it using (deprecated template usage) être and an inflected one. (Lmaltier erred when he wrote that "past participles of 100 % intransitive verbs are never inflected", unless he was rounding to the nearest percent. :-) ) Some verbs, by the way, can go either way, depending on syntax or semantics or speaker preference. And some use (deprecated template usage) être and an uninflected past participle, for reasons that make sense if you know French but aren't worth going into if you don't. —RuakhTALK 20:46, 11 July 2011 (UTC)Reply
If that's the case, then it seems to me that such a sentence is just a subject, copula and an adjective, much like 'elle est verte'. Venu is simply an adjective that means 'in a state of having come' (also etymologically), parallel to 'in a state of being green'. —CodeCat 20:49, 11 July 2011 (UTC)Reply
Yes, I was wrong, venir is an intransitive verb with an inflected past participle (but I was meaning always intransitive verbs, not 100% of intransitive verbs). What I was having in mind was only verbs using avoir, the common case. And, no, in this sentence, venue is not an adjective, no Francophone would consider it as an adjective, it's part of the "passé composé" of the verb. Lmaltier 21:10, 11 July 2011 (UTC)Reply
[after e/c] @CodeCat: No, sorry. I see why you would say that, and that may well be the origin of the construction; but in everyday Modern French « elle est venue » can simply mean "she came", without any implication about present circumstances. (And even in literary French, which retains a separate preterite construction « elle vint » for that sense, one can write something like « elle est venue trois fois », meaning "she has come three times", where I think it's a bit farfetched to posit a state of "having come three times". Certainly in English you can't say "the window is open three times".) —RuakhTALK 21:20, 11 July 2011 (UTC)Reply
@Longtrend: I'm not rejecting gender agreement as an argument for adjective status, I just don't see it as conclusive. In French and Spanish, it is not only adjectives and sometimes past participles that show gender agreement, but also determiners (la femme, la mujer) and many pronouns (elle, ella; la tienne, la tuya); and many animate nouns come in masculine–feminine pairs that resemble gender agreement (japonais(e)ADJun(e) Japonais(e)N, japonés/esaADJun(a) japonés/esaN). And of course, many Slavic, Afro-Asiatic, and other languages have gender agreement even in finite verb forms, so it's not like it's unheard-of. —RuakhTALK 21:57, 11 July 2011 (UTC)Reply
How to treat participles on Wiktionary — AEL 2

I'm not sure about languages other than English, but in English, there are some simple syntactical clues to tell whether a participle form has split off and become a full adjective. If it can be modified by very, it certainly exists as an adjective (and continues to exist as a participle). You can't say, for example that the sandwich was *very eaten that the letter was *very typed or that the world was *very created. I suspect a similar test would work in French. Would tres créé, tres dactylographié, or tres mangé be acceptable? Of course, this doesn't work all the time because not all adjectives are gradable. Another test is to see whether it can be the complement of certain linking verb other than be (particularly become), for example he became closed, the movie became interesting, and the muscles became bruised, but not *the letter became typed, *the sandwich became eaten, or *the world became created.--Brett 01:45, 12 July 2011 (UTC)Reply

Yes, the sense of adjective is exactly the same in English and in French. Lmaltier
The test with 'became' only works for English, because in Dutch de boterham werd gegeten (the sandwich became/was eaten) is not just valid, it's very common. The test with 'very' doesn't always work either, because there are certain verbs that indicate a progressive action. These are especially common in Dutch, where they begin with (deprecated template usage) ver- (although not all verbs in ver- have this progressive aspect). In these verbs, very would simply indicate that the progress had continued to an exceptional degree. decomposed is a good example: it was very decomposed. This does not necessarily indicate an adjective, since you could easily imagine that the decomposition process had progressed to a significant degree. There are probably a lot of other verbs like this. I'm not arguing that this means decomposed is a verb form in such cases, I'm just saying that the test is ambiguous. —CodeCat 10:26, 12 July 2011 (UTC)Reply
As I said, I was making the specific point for English, but it seems likely that, in Dutch or other languages, there would be certain modifiers that will modify verbs and not adjectives or adjectives and not verbs. It might not be the equivalent of very, but there may be something. Similarly, while the Dutch word for become may take both verbs and adjectives as complements, there is likely some verb that will take only adjectives (or AdjPs) as complements.--Brett 11:09, 12 July 2011 (UTC)Reply
I know nothing about Dutch, but would de boterham schijnt/lijkt gegeten be grammatical?--Brett 12:33, 12 July 2011 (UTC)Reply
It would be grammatical even though it sounds a little strange, mostly because people would not say it that way. Dutch has a separate verb (deprecated template usage) opeten which is used when something is eaten completely. It's also more usual to add te zijn after 'schijnen' or 'lijken' and an adjective: de boterham schijnt/lijkt opgegeten te zijn (the sandwich seems to be eaten up), just like de boterham schijnt/lijkt rood te zijn (the sandwich seems to be red). But de boterham schijnt/lijkt gegeten is not really wrong, because people will understand 'gegeten' as an adjective. —CodeCat 12:52, 12 July 2011 (UTC)Reply
That's true in English as well; participles can productively be turned into adjectives. (Just as you can reply to "Are you inside yet?" with "Very inside", even though "inside" is a preposition rather than an adjective and "very inside" doesn't have a single specific meaning, you can reply to "Is it eaten yet?" with "Very eaten", even though "eaten" is a participle rather than an adjective and "very eaten" doesn't have a single specific meaning. For example, it could mean that even the crumbs got eaten; or it could just mean that it was eaten a long time ago: "Am I too late? Is the cake eaten yet?" "Very eaten. You're about a week too late." That doesn't mean that eaten is normally an adjective, only that participles can be stretched into use as adjectives.) —RuakhTALK 13:37, 12 July 2011 (UTC)Reply
The discussion is going a bit in circles right now. If they can be used as adjectives in all cases (not including cases that some 'known' adjectives lack, such as comparison), then why are they not adjectives after all? It doesn't really matter if they have extra properties that most other adjectives don't. Do they meet all the minimum requirements to qualify as adjectives? —CodeCat 14:45, 12 July 2011 (UTC)Reply
All words can be used as adjectives. The point of parts of speech is not "is it remotely possible to use this word in this way?", but rather, "is this how this word is normally used?" It is possible to press participles into service as adjectives, and this is a fairly productive process: plenty of normal adjectives (tired, interesting, closed) began life as participles. But most participles are not normally used this way. —RuakhTALK 14:58, 12 July 2011 (UTC)Reply
Historically it's actually the opposite. The oldest participles in English actually began life as adjectives and only later became used as verb forms. Proto-Indo-European had no periphrastic tenses (or even tenses at all!), and even in Proto-Germanic participles were still mostly adjectival (compare the Old Norse and Icelandic examples above, which closely reflect the PG situation). I realise this doesn't really change the situation for English as it is currently spoken, but it does point out that the question of 'which was first' is definitely 'adjective'. The productive process eventually came to be reversed, but it was not always so. I think if you go back far enough in history, you'll find that many old English participles were originally adjectives, then became participles, and (maybe?) had adjectives formed from them again. —CodeCat 15:05, 12 July 2011 (UTC)Reply
You'll forgive me for not just taking your word for that, given that you also think that participles today are definitely adjectives. Just because they're not used in any periphrastic verb constructions, doesn't mean they're not verb forms. (I'm certainly not saying you're wrong. I'm just not confident that, if I knew more about those languages, that I would agree with you.) —RuakhTALK 15:55, 12 July 2011 (UTC)Reply
In PIE, the distinguishing feature between verb forms and verbal adjectives is that the former are based on aspect stems (stative, perfective and imperfective) while the latter are based directly on roots. Strictly, only aspect stems form verbs in PIE, since they are conjugated while roots are not (unless it's an athematic root verb such as Template:termx, but those are rare). The English weak past participle and the Latin perfect passive participle both derive from a verbal adjective in *-tos which was attached directly to the root and had no aspect-forming infix originally. Irregular weak participles like (deprecated template usage) brought are still remnants of that. —CodeCat 16:11, 12 July 2011 (UTC)Reply
@Ruakh: Isn't there a third group of "original" participles between those that you just mentioned (participles that cannot be used as adjectives or just in such a way that all words can, and lexicalized participles -- tired etc. -- that are now true adjectives independent of the original verb): participles that are regularly used as adjectives and are not in any way peculiar in such constructions. I'm thinking of such cases as the opened window (I'm not even sure whether this is grammatical -- please correct me if it's not!). It is not lexicalized as an adjective here (compare open), but it's not just a weird way to use an adjective either (compare *the cried child). Longtrend 15:11, 12 July 2011 (UTC)Reply
I wonder why 'cried child' is strange but 'fallen child' is fine, especially since both cry and fall are intransitive. There must be something inherent in the meanings of these participles that makes them different somehow. Maybe some participles like fall are active by nature while cried is passive? —CodeCat 15:14, 12 July 2011 (UTC)Reply
Is fallen child really acceptable in the sense "child that fell"? Or is it rather only acceptable under a lexicalized interpretation of fallen? Longtrend 15:24, 12 July 2011 (UTC)Reply
@Longtrend: I believe "the opened window" is a reduced passive; you can also say "the just-opened window", for example, meaning "the window that had just been opened", or "the next-opened window", meaning "the window that had been opened next". It's not really an adjective; you can't say *"the very opened window", even though semantically that would make sense. —RuakhTALK 15:55, 12 July 2011 (UTC)Reply
Okay, I think this makes sense for English. In German there is the exact same kind of construction (das geöffnete Fenster) and you can also say das gerade (just) geöffnete Fenster but not *das sehr (very) geöffnete Fenster. Here, however, the participle inflects just like an adjective. That is, unlike in the discussion we led above, it doesn't just take one category typical of adjectives (gender), but inflects according to a whole adjective paradigm. Would you still say the participle is a verb there, given that info? Longtrend 16:35, 12 July 2011 (UTC)Reply
Yes, that's what I'd say: lexically speaking, it's a non-finite verb form, and grammatically speaking, it differs in consistent ways from true lexical adjectives, so it's best thought of as a ===Verb=== rather than as an ===Adjective===. But I'd say it very cautiously, doing my best to make very clear that (1) this is my tentative opinion based on almost no knowledge of the language at all and (2) I mean, I'm not a linguist or anything. I'm just doing my best to understand what linguists have figured out. —RuakhTALK 17:28, 12 July 2011 (UTC)Reply
Okay, I appreciate your assessment anyway. What I don't like about that solution is that we'd weirdly have an adjective declension table under a Verb header. I wouldn't even know how to handle this. Longtrend 14:04, 14 July 2011 (UTC)Reply
When the word is not an adjective, it's not an adjective declension table, it's a verb form declension table... This may be included in the conjugation table. Lmaltier 19:46, 15 July 2011 (UTC)Reply

In Greek, (deprecated template usage) Lua error in Module:parameters at line 290: Parameter "sc" should be a valid script code; the value "polytonic" is not valid. See WT:LOS. is one of the ten parts of speech, at least according to school grammars. Its special character of being something that shares ((deprecated template usage) Lua error in Module:parameters at line 290: Parameter "sc" should be a valid script code; the value "polytonic" is not valid. See WT:LOS.) qualities of both verb and adjective makes it worth distinguishing it from other POS. On el.wiktionary we follow this distinction and use μετοχή as an L3 header for Greek words. I see that there is in use a Participle L3 header for "some Russian, Lithuanian, and many Latin entries" (Wiktionary:Entry_layout_explained/POS_headers). So I think that we could also discuss the possibility of a more extended use of this header. --flyax 15:29, 12 July 2011 (UTC)Reply

That's what I originally considered the best possibility (or rather after Prince Kassad's comment in the initital discussion) since there appear to be cross-linguistic problems of assigning participle forms to parts of speech, but at the moment I tend to a language-specific approach (I'll give my arguments later). That doesn't mean, though, that it's impossible that more languages use a Participle header, let alone that the header is wrong for the languages that already use it. Longtrend 15:39, 12 July 2011 (UTC)Reply

Since this discussion is currently inactive (thank you all for your contributions!), I'll try to sum it up and draw my personal conclusions from it. If there is one thing that we all agree on, I think it's the fact that the matter is very complicated and not easy to handle. Put more concretely, it is not desirable to simply have a linguistically universal Participle header for everything that is traditionally called a participle. Even if there seem to be cross-linguistic problems of assigning participles to a POS, each language should be considered separately and carefully.
For German, after this discussion and checking out some grammars, my personal impression is that the introduction of a Participle POS header should be taken into consideration. I'll give my arguments for that impression, which might also be relevant for other languages.

  • First of all, it should be questioned whether different kinds of participle in one language even form a more or less homogeneous class, or if they should be treated separately: e.g. for German, should present (pr.p.) and past participles (pa.p.) be treated the same or differently? Opinions differ slightly here, Peter Eisenberg's grammar Grundriss der deutschen Grammatik only treats pa.p. as infinite verb forms, but pr.p. as adjectives. But most grammars agree in putting pr.p. as well as pa.p. into the same class (mostly infinite verb forms). There is an interesting article by Heinrich Weber (unfortunately in German) discussing the classification of German participles on the basis of twelve criteria that help distinguish verbs from adjectives (such as including a verbal lexeme, governing accusative and/or oblique cases, usability as an adverbial, gradability). He comes to the conclusion that of those, pr.p. and pa.p. have eight charasterics in common, pr.p. and the infinitive six characteristics, pa.p. and infinitive also six, but only five common characteristics for pr.p. + adjectives / pr.p. + finite verbs and four common characteristics for pa.p. + adj. / pa.p. + finite v. So present and past participles have more characteristics in common both with each other and with the infinitive than with either finite verbs or adjectives. This is an argument in favour of treating German pr.p. and pa.p. basically the same, whatever that solution may look like.
  • So what header should we use for German participles: Verb, Adjective or what? All grammars I checked out agree in that pa.p. are to be treated as a verb form, but that most can also be used as an adjective. For pr.p., there is less of a consensus: For Peter Eisenberg and the Institut für deutsche Sprache (IDS), pr.p. are not (infinite) verb forms but adjectives that are merely formed from verbs. All other grammars I know of classify them roughly as verb forms, but some then weirdly say that they are used only as adjectives (such as canoo.net or the Duden-Grammatik which states that pr.p. aren't conjugational forms of verbs). Since pa.p. are also used to form complex tenses, I think we can agree that putting both pr.p. and pa.p. solely under an Adjective header makes no sense.
    What solid arguments are there against using a Participle header for German (for both pr.p. and pa.p.)? Traditionally, "participle" was considered a separate part of speech. This has changed, now they are often regarded either as verbs or as adjectives, so this might be an argument against the Participle header. But I believe this to be simply due to basic differences between grammars and such dictionaries as the Wiktionary. We here at Wiktionary are forced to assign each word form to a POS. This is not the case for grammars. If we can't decide for a POS after considering all relevant aspects, why not recognize that what we need may be a separate POS? It might seem that pr.p. in German can be perfectly treated as adjectives, according to syntactic distribution and morphological inflection. But then they govern arguments like verbs, are generally not prefixable by (deprecated template usage) un- or gradable, etc. They simply don't fit either category. And the same is true for pa.p., which might seem to be clearly verbs. But then they can be used attributively, decline like adjectives, are sometimes governed by other verbs (unlike finite verbs, but like adjectives), etc. Let's assume we use a Verb header for German participles despite the adjectival characteristics. How would we solve the dilemma of needing to have an adjective declension table under a Verb header?

For those reasons, it seems to me that introducing a Participle header would be the best option for German. We could put declension tables there without a contradiction (as there would occur for declined "verbs") and at the same time link to the verbal origin. Just for clarification, lexicalized participles such as (deprecated template usage) wütend or (deprecated template usage) verrückt that are now true adjectives would of course be unaffected. All those who disagree with me: in which points exactly do you think I'm wrong or I drew wrong conclusions? I'd be very glad to hear your comments, especially since I really want to reach a consensus. I'm well aware that introducing a new POS to a language needs more justification than keeping the status quo -- but the status quo in this case is not an option, since currently we have no way at all to treat declined participles (AFAIK there is not a single such entry on Wiktionary yet). Longtrend 18:54, 15 July 2011 (UTC)Reply

I imagine Dutch will be treated the same, because its participles are more or less identical to German ones. Is the situation for the Romance languages much like German as well (apart from the fact that they show gender agreement in predicates, which German doesn't)? —CodeCat 19:04, 15 July 2011 (UTC)Reply
French and Spanish (the only Romance languages I speak) differ from German in important ways: (1) French distinguishes blatantly and obviously between present participles, which are very restricted in their uses and which do not inflect for gender or number, and adjectives derived therefrom, which are normal adjectives and often spelled differently from their participles; (2) Spanish has two different constructions that could be called "present participles", of which one (the gerundio; here we call it the "adverbial present participle") is considered to be a verbal adverb and does not inflect for gender or number, and the other (the participio presente; dunno if we have a name for it here) is no longer productive, but rather survives only as various nouns and adjectives; (3) neither French nor Spanish requires the declension tables that have Longtrend so bothered, since their adjectives and past participles inflect only for gender (masc/fem) and number (sing/pl), not for definiteness or position or case. The closest thing to that is Spanish forms like (deprecated template usage) dándo-, which we're currently not worrying about SFAICT, and which anyway are further evidence for ===Verb===ness. Personally I still suspect that ===Verb=== is the way to go for German as well, but many of Longtrend's reasons for using ===Participle=== for German don't apply to French and Spanish anyway. —RuakhTALK 20:17, 15 July 2011 (UTC)Reply
Thanks for the analysis and information you provide. It clarifies things much for German. My conclusion is that a Participle POS is no more justified in German than in English or in French. Why? Because specialists call them either verb forms or adjectives. It's possible to treat the declension of adjectives in an adjective declension table, and the declension of verb forms in the conjugation table. Lmaltier 19:57, 15 July 2011 (UTC)Reply
But that would mean that some verb forms can have an adjective declension section. Do we want that? —CodeCat 20:05, 15 July 2011 (UTC)Reply
@Lmaltier: Since when do you listen to specialists' analyses rather than the speakers' emotions? Or is it just because it's a convenient way to prove my point wrong? I already said why I think it is that participles are often treated either as verb forms or as adjectives. Until you respond to my arguments, I see no reason to take over your point of view instead. While responding, keep in mind that experts by no way agree in the decision whether participles are verbs or adjectives. Longtrend 20:12, 15 July 2011 (UTC)Reply
I always said that we should not invent anything (and this is one of the basic principles of the Foundation),and that we should follow specialists, traditions of the language. And verb forms cannot have an adjective declension section, as they are not adjectives. The best place for the declension of these forms is the conjugation table. Also note that I don't propose anything on how to deal with the question in German (this is not easy if opinions differ among specialists, and it's true that a decision should be taken). I only think that, in German, we can do with the verb and adjective POS, according to what you explain. Lmaltier 06:30, 16 July 2011 (UTC)Reply
You still haven't responded to my argument about the difference between grammars and Wiktionary. What linguists do agree in is that German participles have characteristics of both verbs and adjectives, so I have a bad feeling about just squeezing them in one of these groups. (And putting them in both groups would suggest that in one usage they are clearly verbs, while in the other they are clearly adjectives, which does not seem to be the case either.) I don't really see a problem about a Participle header, which to the contrary would solve those problems. This is my impression specifically for German, my proposal would not affect any other languages, since I know too little about them -- we should refer to linguists' analyses there as well. If you worry that Participle is not a proper POS, well, "proper noun", "prefix" and "symbol" aren't either, as is even in the ELE. Do you think the Participle POS is inappropriate in Latin, too?
You probably know what I meant by "verbs would have adjective declension sections": this was short for "verbs would have a declension section that would include exactly the same forms as adjective declension templates". And this would be a problem in my opinion. You propose to include those forms in conjugation tables. Just to make sure I understand you correctly: you want to change the verb conjugation template so it includes all the declined forms? But that's declension, not conjugation as the header would suggest. The contradiction remains. Participles inflect for completely different categories than normal verbs. Longtrend 09:56, 16 July 2011 (UTC)Reply
For Latin, I don't know: when I was learning Latin, participles were ccnsidered as verb forms, but this tradition may be different in different countries, and may change with time. The tradition to be adopted is the one currently used for Latin in English-speaking countries. In French, nobody considers that it's a problem to consider aimée, aimées, aimés as conjugated forms of aimer. I don't see why it is a problem to decline a verb form. Lmaltier 11:19, 16 July 2011 (UTC).Reply
Discussing this topic would be a lot easier if you replied to my arguments and all my questions... Longtrend 12:21, 16 July 2011 (UTC)Reply

Inferring structure of entries

Currently, it is almost always the case that left-aligned emboldened text are headers (the main exception being inflection lines). I find this to be a helpful clue to inferring the structure of entries, especially when the ToC isn't visible (e.g. when explicitly hidden or when too far above). Do others find this useful? If so, are their easy layout tweaks we can do to make this more standard? Inflection lines could be adorned somehow, which has the ancillary benefit of drawing attention closer to the definitions. I think most other left-aligned emboldened text is used to segment sections such as "Derived terms", where we could just as easily use italics. Is there some way we can increase usability here? --Bequw τ 05:08, 10 July 2011 (UTC)Reply

Maybe indentation would be useful as well? If everything within ==English== were indented a bit (including the L3 headers), everything within ===Verb=== were indented a bit further (including the L4 headers), and so on, that might help clarify the structure as well. Unfortunately, I don't think that would be achievable with just CSS; I think we'd have to use some DOM-inspecting JavaScript. So maybe it's not worth it. —RuakhTALK 12:24, 11 July 2011 (UTC)Reply
I like indentation for this purpose. It would enable us to consider smaller fonts for headings which would be more economical of vertical screen space.
Also, there are a number of entries that have bold headings not sanctioned by WT:ELE that are created by starting the line with ";". They would seem to interfere with human users' ability to make structure inferences reliably. DCDuring TALK 13:03, 11 July 2011 (UTC)Reply
It'd be doable with just CSS if we had consistent header levels. That is, if the POS were always L3 (not sometimes L4 because "Etymology 1" is L3) and "See also" were always L3 (not sometimes L4 under a POS and sometimes L5 under an L4 POS and sometimes L3 under the language), etc., then we should be able to use CSS and templates to indent all definitions a certain amount, all derived/related/'nymous/translation terms a certain amount more, and so on. But the way we do things now, I agree with Ruakh that doing it with CSS is far more trouble than it's worth (though it's still not impossible).​—msh210 (talk) 16:14, 11 July 2011 (UTC)Reply
I despair of our ability to make headers other than L2 consistent. In English it is not too far-fetched to imagine having some kind of etymology, possibly with trivial content, for every single-word lemma entry. But, even in English, misspellings, multiword entries, and inflected forms and other form-of entries would be a challenge. Foregoing the cognitive advantages of grouping etymologically related PoSes under a shared Etymology heading seems a bad exchange even for the layout improvements under discussion. DCDuring TALK 20:10, 11 July 2011 (UTC)Reply
Maybe we could try putting the part of speech on the headword line and remove the header altogether. In most cases, the headword line is the most important defining element of an entry, not the headers. So it makes sense to let it stand out as much as possible. —CodeCat 20:16, 11 July 2011 (UTC)Reply
I don't think we can just remove the POS header, since then it would no longer be evident that we're breaking the word up by part of speech; and in the MediaWiki and HTML structure of the page, we'd then have all the verb definitions inside the noun's ====Translations==== section. But it could work if we merged the POS sections — putting all definitions in a single list regardless of part of speech — with the headword lines just kind of interspersed. That's what Dictionary.com does. —RuakhTALK 21:34, 11 July 2011 (UTC)Reply
The problem seems to be that we want the table of contents to behave as if the POS header exists, but we really want the headword line to take its place and show no header. Is there a way to do that? Aside from that, I don't think the current structure is very useful. We organise terms by etymology, but that is counterintuitive for most people who are just looking up a word. —CodeCat 21:39, 11 July 2011 (UTC)Reply
Maybe that's what you want, but it's not what I want. :-/   But yes, it could be more-or-less achieved by using CSS, and 100% achieved by using JavaScript. —RuakhTALK 22:02, 11 July 2011 (UTC)Reply
Grouping all the parts of speech together has additional structural problems besides just the translations. In inflected languages, the inflection/conjugation then is not separated by headers that will indicate which definitions go with which inflection pattern. It could/should be possible to set a preferennce that does something like that, but it would mean that the POS headers would not show up, however all the subheaders under that POS would still show up. --EncycloPetey 18:16, 12 July 2011 (UTC)Reply

{{l}}

I've noticed that BigDom uses the template {{l}} in his Luxembourgish entries to link to the English translations. While this is not officially mandated by ELE, I do think it is a nice idea, especially for pages starting with a Translingual entry, or terms which are the same in two languages (like water). I've been wondering if we should introduce that practice. -- Liliana 23:14, 11 July 2011 (UTC)Reply

So, for example, we might define (deprecated template usage) important as # Having {{l|en|relevant}} or {{l|en|crucial}} value.? I'm not a big fan of that idea. I like that {{l}} is reserved for mentions of a term, as distinct from mere linkified uses. It's an important distinction for us, and I think it makes sense to promote it even in our wikitext. (And the generated HTML is different, in such a way that readers can customize the display of {{l}} if they like.) Also, I'm guessing that such an approach would add needless headache for downstream entities that use our definitions. —RuakhTALK 01:45, 12 July 2011 (UTC)Reply
I only started yesterday and it just so happened that when I looked at an entry to find out how to create an entry, it used the {{l}} template, so I just assumed that all the entries used it. If it's not common practice, I can stop using it if people want me to. BigDom 13:03, 12 July 2011 (UTC)Reply
I've only ever used the {{l}} template in lists of terms, which is where I understand the name "L" came from. It's most useful for lists of non-English terms in a non-English section where there are to be linked Derived terms, Related terms, Synonyms, and so forth, and you want to link to the correct language section. --EncycloPetey 18:19, 12 July 2011 (UTC)Reply
I always understood it to mean 'link'. So I've used it in any situations that call for linking to a specific language. —CodeCat 18:51, 12 July 2011 (UTC)Reply
Ditto, I always thought it was for 'link'. --Mglovesfun (talk) 18:01, 15 July 2011 (UTC)Reply

Prepositional phrase as a POS

Some pages have Prepositional phrase as their POS (e.g. like a lamb to the slaughter). But I think that this is not a POS at all. Fortunately, in most cases, prepositional phrase pages use Adverb, etc. (and they are categorized as prepositional phrases). I propose to change all Prepositional phrase POS to the appropriate POS (most often, it should be Adverb). Lmaltier 07:27, 16 July 2011 (UTC)Reply

sorry, but it was voted on: Wiktionary:Votes/pl-2010-01/Allow "Prepositional phrase" as a POS header. -- Liliana 12:03, 16 July 2011 (UTC)Reply
Thanks, I was not aware of this vote. I don't understand this decision: to me, it's like adopting Phrase instead of Noun or Verb for both red fox an fill the bill (Noun phrase and Verb phrase would be acceptable too, but not Phrase alone). I understand it better after reading the discussion page, but I still fully agree with DAVilla. Note that, to take the example I gave, I find uses of very like ..., such as But then, in his circle, an innocent would have been very like a lamb to the slaughter. (samanthalucas.com/books.php?title=bodyheartandsoul). Lmaltier 13:43, 16 July 2011 (UTC)Reply
"Like" is only sometimes a preposition. In the case you cite, it functions as an adjective. Modification by "very" is an indication, as is its use as a predicate. Just like#Adjective "worth" (adjective), it takes an NP complement. It is specifically analyzed as such in CGEL. "Like" is (almost ?) always a preposition when it has an NP complement and is an adjunct: "Like his brother, John writes left-handed." DCDuring TALK 14:41, 16 July 2011 (UTC)Reply
I think in that example, 'very' modifies the entire clause that follows it. 'like a lamb to the slaughter' seems to me like an adjectival phrase that 'very' modifies in its entirety. —CodeCat 14:53, 16 July 2011 (UTC)Reply
You may well be right — I don't know whether the modifier attaches above the complement or below it — but I don't think that changes anything. The term "adjectival phrase" gets applied by traditional grammar both to phrases that are actually headed by adjectives (which very can modify) and to other phrases that modify nouns (which very cannot easily modify). For example, you can say "the restaurant is slightly outside the town", but not normally *"the restaurant is very outside the town". True adjective phrases, however, don't have this restriction: "the restaurant is very far outside the town" is fine. Whether that's because very is just modifying the adjective that heads the phrase (like or far), or because the phrase as a whole retains its head's ability to be modified by very, I don't know, but the diagnostic works either way. —RuakhTALK 18:05, 18 July 2011 (UTC)Reply
You could say that "slightly outside" is a prepositional phrase as well, so that instead of saying that "slightly" modifies "outside the town" you could also read it so that "slightly outside" has "the town" as its antecedent. The second example is a little more awkward, but I think most people would read it as "very much outside the town". But what happens when you leave the antecedent out? "slightly outside" as an adverb is more usual than "very outside", so this may hold for its use as a preposition as well, in which case it's not really a good example. —CodeCat 18:19, 18 July 2011 (UTC)Reply
When you leave the antecedent out, it's an intransitive preposition; hence *"very outside" is ungrammatical even then. It's actually a perfect example. (Traditional grammar calls it an "adverb" in that case — and that's how nearly all dictionaries handle it — but as you've observed, it behaves similarly whether or not it has an antecedent, so modern linguists recognize it as a preposition.) Re: how most people would understand it: that's exactly the point. Most people would read *"he stupid" as "he is stupid", because only the latter is grammatical Standard English. —RuakhTALK 18:47, 18 July 2011 (UTC)Reply
@Lmaltier: You advocate ===Adverb===, but then you give an example where it's modifying a noun! Syntactically, prepositional phrases don't behave quite like adjectives or adverbs, and I think ===Prepositional phrase=== is exactly the POS. (I, too, agree with DAVilla's comment that we should use real POSes rather than stuff ===Abbreviation===; but ===Prepositional phrase=== is the real POS.) —RuakhTALK 18:05, 18 July 2011 (UTC)Reply
I think this specific case can't be an adverb because 'like' can actually mean two things. It can mean either 'resembling' or 'similar to', or it can mean 'in the manner of'. But such phrases behave very differently, compare 'he remained like a deer in headlights' and 'he stared like a deer in headlights'. The first case describes a property of something and so it's more adjective-like, while the second case describes a manner so it's more like an adverb. You can say 'he remained green' but not 'he stared green', while you can't say 'he remained quickly' while 'he stared quickly' is ok. —CodeCat 18:27, 18 July 2011 (UTC)Reply
Nearly all prepositional phrases have both adjective-like uses and adverb-like uses; that's not particular to phrases headed by like. (Like is unusual, though, in actually being an adjective sometimes, rather than merely heading an adjective-like phrase. On this, traditional grammar and modern linguists agree; for example, the OED gives the quotation “The fixed stars are like our sun in every point in which it is possible to compare them” under one of its adjective senses, not one of its preposition senses.) —RuakhTALK 18:47, 18 July 2011 (UTC)Reply

Rollback

Hi, I was wondering whether someone could give me the rollback tool. I've been fighting the vandalism for a little while now and wanted a review into it and maybe some advice on how to improve please. Thanks a lot.
Here are some useful links:

  • My contributions: 1
  • My reports of vandalism: 2

I really like this place, it is very community based and I hope to help out and contribute here for a long time to come.
-- PoliMaster talk/spy 13:19, 16 July 2011 (UTC)Reply

First tell us, are you a clone of User:Razorflame? --Vahag 13:23, 16 July 2011 (UTC)Reply
No I am not a clone user of Razorflame. Now can you give me some advice please? -- PoliMaster talk/spy 13:29, 16 July 2011 (UTC)Reply
I find it a bit odd, a user who fights vandalism, but doesn't do anything else. It seems your second ever edit was on User talk:SemperBlotto to ask how to fight vandalism. Not 'bad' or 'negative', just a bit 'odd'. --Mglovesfun (talk) 13:36, 16 July 2011 (UTC)Reply
Well he's been talking to me in the IRC chat too, and it seems he really wants to do something. -- Liliana 13:37, 16 July 2011 (UTC)Reply
@Mglovesfun Different users help in different ways to the project, I'm not so fantastic with my language but still satisfactory so my way to help and contribute back to Wiktionary was to help with vandalism fighting. Hope this is OK? - PoliMaster talk/spy 13:41, 16 July 2011 (UTC)Reply

Getting a rollback right does not change anything to what you can do, and almost anything to how you can do it. You are welcome to fight vandalism, and you don't have to do anything else if you don't want to. However, if somebody participates only to get special rights (as this sometimes happens), he should not get them, in my opinion. Lmaltier 17:15, 16 July 2011 (UTC)Reply

Hi Lmaltier, no that wasn't the case, the tool will make it easier for me to revert the vandalism meaning that I can revert quicker and get it reported quicker. Here are an updated list of the vandals I have had blocked. Also they've not re-vandalised after their block. I will help either way the decision goes. :-D Merci beaucoup. -- PoliMaster talk/spy 17:04, 18 July 2011 (UTC)Reply
What I wrote is the result of experience. Yes, some people participate only to get and use special rights. This is not a good thing. Lmaltier 20:18, 18 July 2011 (UTC)Reply
What you seem to be saying is that the tool won't give him the ability to do anything he hasn't been able to do before, although it will make what he's been doing (without the objection of anyone here) easier for him to do. So am I correct in saying that this is not a question of new privileges or trust, that it's simply a matter of turning on a feature that could only be abused by people who don't know how to use it? If so, then who really cares if he gets it or not? We would still have the ability to patrol the edits, right? DAVilla 20:26, 18 July 2011 (UTC)Reply
Privileges have a psychological dimension as well as a practical one. (Which is mostly unfortunate, but it's not entirely a bad thing; for example, new Wiktionarians who have experience with other WMF projects are likely to know that they can seek out an administrator if they need help from a trusted editor who's knowledgeable about the project.) A user whose only goal is to obtain a privilege, however minor, may well be a user who will abuse the psychological dimension of that privilege. Re: "We would still have the ability to patrol the edits, right?": Right! I didn't think that was true, but I just tested, and yeah, if you're in the "rollbacker" group but not the "autopatroller" group, then clicking "rollback" will create an unpatrolled edit. —RuakhTALK 21:25, 18 July 2011 (UTC)Reply
@Lmaltier I understand what you're saying. This is a tool to be used in order to make tasks easier and quicker to perform and to increase performance quality when going about “vandalism fighting”. It isn't a right, it is a privilege and a tool to be used properly.
@DAVilla The tool can be so easily switched on and off. If the tool is misused which I intend not to, then surely it should be removed. The tool gives the ability to make my tasks quicker and much,much easier though. My intention never has been to patrol my edits, I can't do that now and didn't put in a request for that. My intention is to fight vandalism and unconstrutive, disruptive edits which is why rollback is being requested.
@Ruakh As I said above I never had an intention for patrolling my edits and that can be left to other users or admins. My request was to get rid of the mess and disruption that gets left by vandalisers.
Thanks. :-) - PoliMaster talk/spy 22:02, 18 July 2011 (UTC)Reply

Definition editing options trial

There wasn't much response to the suggestion in the earlier discussion on enabling the definition editing tool for a trial period, so I'm going to assume that there aren't any objections to temporarily enabling the script for one month. If anyone objects to the trial, the trial can be ended right away. To disable it for personal use, click the button below.

...

--Yair rand 18:27, 17 July 2011 (UTC)Reply

I object to it being available by default without more explanation. How many folks have used it and not disabled it? That option, inviting people to use it voluntarily, seems like the first deployment step. If noone keeps using it or accepts that invitation, that seems to be a reliable indication that it is not a good solution to a meaningful problem. DCDuring TALK 19:47, 17 July 2011 (UTC)Reply
It's been available opt-in for months, and I've lost count of how many times people have been invited to try it out. I've stopped the trial. Since it's targeted at users who never see the beer parlour, knowing how many Wiktionary regulars use it is irrelevant. --Yair rand 20:09, 17 July 2011 (UTC)Reply
Does it address any need stated by actual users? Does it bring us up to some industry standard? Do our veteran users want it and keep it? DCDuring TALK 22:23, 17 July 2011 (UTC)Reply
That depends on what you mean by "actual users", no, and I have no idea. The point of the tool is to make it easy for people to edit. A simple way to figure out whether it will be successful at doing that would be to trial it. --Yair rand 19:09, 18 July 2011 (UTC)Reply
I think that such trials should first be made opt-out for all administrators, and only later (if ever) rolled out to everyone. In the specific case of definition editing, the lack of objections in the above-linked section may not be meaningful, given that (1) most of that discussion was about a different feature entirely and (2) I'm betting that most active editors never actually bothered to try it out. People are lazy. Also, it might help if you explained what exactly this feature is supposed to do. I find that even when I turn it off at Special:Preferences, I still get the little pencils to the left of definitions that let me edit them; the main differences I actually see when I turn the gadget on are that I get a little "Add language" box in the language tabs, and I get many more "(−)(±)" things in the list of categories (rather than just a "(+)" at the end). Either this is the least-accurately-named feature ever, or I'm seeing some sort of bug, possibly relating to different preferences I have set in different places (?), but since it's not clear what the intended behaviors are, it's impossible to report deviations from them. —RuakhTALK 19:53, 18 July 2011 (UTC)Reply
The "(−)(±)" category editing buttons are built into tabbed languages, not at all connected to definition editing options. I don't know why they might sometimes not appear. The "Add language" box is also part of tabbed languages, but since it's dependent on the definition editing tool to allow the user to edit the definitions in the new section, the button doesn't appear unless the tool loads before tabbed languages, which will always happen if the gadget is on (gadgets load in order before anything else), but only sometimes happen if they're loaded through prefs or the button. If turning it off through Special:Preferences still leaves the edit definition buttons there, you probably also have it enabled through WT:PREFS or the button. Re an opt-out trial for admins, that wouldn't really give information on whether newbies will be able to edit more easily. (And I really don't see how a ten-pixel icon next to definitions could be all that harmful...) --Yair rand 20:22, 18 July 2011 (UTC)Reply
There are too many ways to turn these things on and off! I looked at WT:PREFS and saw that it wasn't checked; I didn't look at the button, because I didn't remember that a single button controlled both features (and I had tabbed-languages turned on in other ways, so it wasn't obvious that I had that button pressed as well). In the past I thought that giving people more ways to turn something on was a good thing, but my experience with these two features of yours has taught me that it's not. (By the way, isn't your proposed way of turning it on for all users equivalent to the button, rather than to the Gadget? Doesn't that imply that the behavior of tabbed-languages+definition-editing will be nond-eterministic if and when they're both turned on for all users?) Re: opt-out trial for admins: It would tell you whether other admins think it will make newbies able to edit more easily, which is a start. And I doubt you'll get admins to support the tool until they've tried it out and decided for themselves that it seems likely to be useful for newbies. —RuakhTALK 21:38, 18 July 2011 (UTC)Reply
If tabbed languages and the definition editing tool are enabled by default, then the check for the availability of the definition editing function could be removed, as there won't be situations where it doesn't get loaded. --Yair rand 22:09, 18 July 2011 (UTC)Reply
I would hope that in the final version it doesn't apply one style sheet and then overwrite it seconds later with another. In other words, if this is to be changed for everyone, make it a planned core change at that time, and thus only apply it with the certainty that it works. You have my support for broad testing, meaning it's forced but only on those of us who would know or bother finding how to turn it off if we needed to. Maybe include the link to opt out in the news heading. DAVilla 20:11, 18 July 2011 (UTC)Reply