Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives +/-


February 2014

Citations page vs Quotation

When to put quotes in the main namespace and when to put them in the citations namespace? --kc_kennylau (talk) 07:40, 1 February 2014 (UTC)

I usually put them in the main namespace if there are only a few (say, up to three). Any more than that I put in the Citations: namespace. —Aɴɢʀ (talk) 07:45, 1 February 2014 (UTC)
Thanks. So they are redundant, I assume from your answer? --kc_kennylau (talk) 08:19, 1 February 2014 (UTC)
Well, not in cases where there are fewer than 3 quotes and the Citations: tab is still a red link. But there certainly are cases where a quote is found both in the main namespace and in the Citations: tab. Then there's the redundancy within the main namespace between putting the quotes directly under the sense and putting them in a ===Citations=== header. —Aɴɢʀ (talk) 08:26, 1 February 2014 (UTC)
I see. Thank you. --kc_kennylau (talk) 08:28, 1 February 2014 (UTC)
For the record, the correct header to use in the mainspace is ====Quotations====, and it is usually level 4 (per WT:"). - -sche (discuss) 22:17, 1 February 2014 (UTC)
I think they should have separate and distinct uses. Quotations should be a sample from what is available on the citations page, and should be used as usage examples. —CodeCat 14:11, 1 February 2014 (UTC)

Welsh plurals or Welsh noun forms?

I've noticed that Welsh plurals are currently split into a plurals category and a noun forms category. Which one is the correct one?

I'm asking because I think the noun form category would be better split into categories for plurals, mutated nouns, and mutated plurals. This would allow more specialised templates than the current cy-noun-form template which does not allow many details to be specified. For instance, for the plural 'cŵn' (dogs), cy-noun-form generates:


I think that it would be more helpful if, for plurals, it could generate something like:

cŵn m pl

EdwardH (talk) 20:26, 1 February 2014 (UTC)

Mutation is the same regardless of what part of speech or inflection a word is, isn't it? Then a single Category:Welsh mutated forms might be preferable. —CodeCat 20:30, 1 February 2014 (UTC)
The issues of categories and templates are separate: multiple templates can add the same category, and the same template can produce different categories. Categories are a tool for finding entries that have something in common, not for classification. Would people need to look specifically for mutated nouns and mutated plurals? Chuck Entz (talk) 20:59, 1 February 2014 (UTC)
No, I doubt they would. In which case, it would probably be better to follow CodeCat's suggestion and create a single mutated form category. EdwardH (talk) 09:31, 2 February 2014 (UTC)

Arabic compound diacritics

Does everyone else see what I see: حَمَّدَ (ḥammada)? (See محمد.) What I see is shadda-kasra above the first letter, and fatha above each of the next two letters. However, that would be incorrect (and impossible), and it is not what is actually written. What is actually written is this: fatha above the first and last letters, and shadda-fatha above the middle letter. For some reason, the shadda, although correctly typed, appears to be shifted to the previous letter. —Stephen (Talk) 09:47, 1 February 2014 (UTC)

I see it normally. But I have in the past encountered problems where Hebrew diacritics would appear on adjacent letters on Macintosh computers and iPhones. --WikiTiki89 09:40, 2 February 2014 (UTC)
I’m using Firefox in Windows XP-Pro. I have never seen this happen before. It’s weird. —Stephen (Talk) 09:56, 2 February 2014 (UTC)
It's strange that you see it on the first letter (by which I assume you mean the rightmost letter). Since the shadda is encoded after the fatha, it would make more sense if it were displayed on the next letter rather than the previous one. Can you possibly post a screenshot so we can see more details about the problem? --WikiTiki89 10:12, 2 February 2014 (UTC)
I see fatha above the first and last letters, and shadda-fatha above the middle letter, but I do recall there having been problems in the past with Hebrew and Arabic diacritics displaying in the wrong order, on the wrong letters, etc. (One such problem is discussed here.) - -sche (discuss) 10:03, 2 February 2014 (UTC)
I see it normally too. --Z 10:21, 2 February 2014 (UTC)

SCREENSHOT: File:Hammada.PNG, by Stephen

  • What I see on my screen is different from what it shows on the screenshot provided by Stephen. I don't see what is on the screenshot. What I see is such that the 1st and the 3rd letters have something resembling "/" used as diacritic, whereas the 2nd letter has "/" and small "w" below it used as a dicaritic, approximately speaking. My letter counting is from the left. --Dan Polansky (talk) 10:47, 2 February 2014 (UTC) Made myself less ambiguous. --Dan Polansky (talk) 11:08, 2 February 2014 (UTC)
    So you are experiencing the same problem of Stephen. By the way, it's a big pic, it's better to simply link to it. --Z 10:53, 2 February 2014 (UTC)
It appears that everybody sees it correctly except me. (I would simply link to the image, but I don’t know how. If you know how, create the link.) —Stephen (Talk) 10:56, 2 February 2014 (UTC)
I thought the screenshot is by Dan Polansky, as you didn't sign under the image, and I didn't read his message completely. You can link to image by adding ":" before the images title. --Z 11:06, 2 February 2014 (UTC)
My display is different from the screenshot, too. On my display the little seagull-looking thing is over the middle letter. —Aɴɢʀ (talk) 11:28, 2 February 2014 (UTC)
The seagull over the middle letter is the correct view. I haven’t changed anything, except for receiving all of the Firefox updates, so maybe this sudden problem has something to do with my most recent Firefox version. Hopefully it will be corrected in the next version. —Stephen (Talk) 11:36, 2 February 2014 (UTC)
Recently User:Widsith had raised an issue in my talk page, he possibly has a similar problem (I say possibly because I'm still not sure if the problem was that he was not aware of a particular rule about Arabic diacritics, or he was seeing the diacritics incorrectly). --Z 11:51, 2 February 2014 (UTC)
I'm using Firefox 26.0; isn't that the most recent version? I'm running it on Windows 7. —Aɴɢʀ (talk) 11:55, 2 February 2014 (UTC)
Another potential problem is that we recently changed the default font for Arabic in MediaWiki:Common.css. It is now a webfont, meaning that it is downloaded when you load the page if you don't have it installed. So the problem could be that there is a bug in this font (Iranian Sans) when used on Windows XP. We tested it on Windows 7 and Mac OS, but not on Windows XP. But that is pure speculation at this point. A way to test this theory is if Stephen could copy and paste the Arabic text into Microsoft Word (or any other word processor) and see if the problem is replicated there, trying various fonts including Iranian Sans. --WikiTiki89 21:37, 2 February 2014 (UTC)
I also have XP-Pro, but in Firefox it appears correctly (ḥammada). However in Chrome it shows up as fatha, shadda-kasra, fatha (ḥammida). Weird. --Dijan (talk) 22:52, 2 February 2014 (UTC)
Then it could be that a bug was just now introduced into the font file and if you use a browser that you haven't used in a while or update your usual browser, it will re-download the font and get the buggy version. But that's also just speculation. --WikiTiki89 23:04, 2 February 2014 (UTC)
I personally hate Iranian Sans anyway. It's appears very small and it's hard for me to read any text in Arabic script unless I increase the size of the text. --Dijan (talk) 23:08, 2 February 2014 (UTC)
When I copy and paste حَمَّدَ into MS Word, the error disappears. (Except that the shadda-fatha over the middle letter get reversed, but that is an old problem and one that does not seem to have a solution). I don’t have Iranian Sans installed, but I have a number of Arabic fonts, including Tahoma, Traditional Arabic, Arial Unicode MS, Andalus, Arabic Transparent, Microsoft Sans Serif, Simplified Arabic, and all of my installed fonts give the same result (fatha-shadda over the middle letter, which looks like shadda-kasra). Dijan’s error is just the old problem of reversed diacritics due to normalization, and fatha-shadda (the reversed order) has the appearance of shadda-kasra. It’s the same thing that I see in MS Word. —Stephen (Talk) 23:09, 2 February 2014 (UTC)
Do you see the problem here (with Arial font): حَمَّدَ? --WikiTiki89 23:21, 2 February 2014 (UTC)
I don't see the problem when using Firefox with the Arial font. However, in Chrome, I have to increase the text size to 250% in order to see the fatha over the shadda. Otherwise, it just appears without the second fatha in Chrome. --Dijan (talk) 23:28, 2 February 2014 (UTC)
حَمَّدَ looks perfect to me. —Stephen (Talk) 00:27, 3 February 2014 (UTC)
Looks correct on my Mac in Safari, Chrome, and Firefox. In my Safari edit field, in Opera, and in Safari/iOS 5, in the seagull is ducking below the arrow, but still over the rolling wave in the middle. Michael Z. 2014-02-03 02:14 z
So we've narrowed it down to some obscure font issue. --WikiTiki89 05:08, 3 February 2014 (UTC)
Oops, my mistake. It looks fine where it has been pasted into this discussion. But the red link at the beginning of this topic, and all of the instances in the entry محمد have the seagull ducking below the arrow instead of flying over it.
I can fix it by removing Arial Unicode MS from the font stack. But the next font, Code2000 looks pretty bad, with large overlapping diacritics, and should be removed too. (The Iranian Sans webfont doesn’t seem to be working for me, and Traditional Arabic is absent on my Mac.)
Yes, it only works with no added fonts on my Mac. The Helvetica Neue font applied by the Typography Refresh doesn’t have the Arabic letters, but whatever font my Safari/Mac is falling back to is better than our additions. Michael Z. 2014-02-04 19:44 z
Update: the above applied only in Safari/Mac. Firefox/Mac and Chrome/Mac don’t fix the diacritic display in the default font. Michael Z. 2014-02-04 19:56 z
I’m not sure I understand what you mean by "arrow" when you say "the seagull ducking below the arrow instead of flying over it". If by arrow you mean the short, usually slanted, hyphen-like diacritic known as the fatha, the correct appearance has the fatha positioned ABOVE the seagull. Unfortunately, since normalization has reversed the order of the compound diacrits (from shadda-fatha to fatha-shadda), most Arabic fonts do not understand it and display it with the seagull on top (which makes it look exactly like shadda-kasra instead), and some fonts simply overlap them. The kasra must always be beneath the seagull, and the fatha must always be above the seagull. Same with double-fatha and double-kasra. —Stephen (Talk) 23:52, 4 February 2014 (UTC)
From my experience, most Arabic fonts handle it fine, but many still do not. --WikiTiki89 00:01, 5 February 2014 (UTC)
None of my Arabic fonts handle it correctly. —Stephen (Talk) 00:09, 5 February 2014 (UTC)
Rendering of محمد in Safari/Mac
The image here is what I see in Safari/Mac (also similar in Firefox & Chrome). If the word is rendered correctly in the etymology and headword, then this is good news (but Code2000 should probably be removed from the font stack for .Arab). Does Arabic properly have boldface in any languages? That headword would look less ugly if we set it to font-weight: normalMichael Z. 2014-02-05 04:43 z
Yes, the Safari/Mac screen shot is correct. Arabic has both bold and regular fonts, but should not be italicized. —Stephen (Talk) 04:59, 5 February 2014 (UTC)

hamzat al-waṣl (ٱ) and Allah ligature (لله)

I had no problems seeing characters above but I have issues combinning hamzat al-waṣl (ٱ) and Allah ligature (لله). The letter ٱ is normally unnecessary on individual words were with elidable alif (not pronounced at all if follows a word ending in a vowel) but occasionally it's important to show that it's actually dropped (elided), e.g. in the sentence هُوَ ٱلْمُعَلِّمُ (huwa l-muʿallim(u)), huwa + al-muʿallim(u)=huwa l-muʿallim(u). The symbol ٱ doesn't seem to work with the ligature لله. If I write them together: ٱلله, the ligature loses shadda and alif hanjariyya (dagger alif) and look simply like lām+lām+hāʾ. The name عَبْدِ اللهِ (ʿabdu llāh(i)) has such an elision but adding the diacritic to demonstrate it, spoils the display of neighboring characters. It seems the ligature can only display on its own لله (on some systems) or with a simple alif الله (on most systems) with no hamza and no waṣla. Is this expected? Should this ligature display diacritics in any position when three Arabic letters (lām+lām+hāʾ) are joined together? --Anatoli (обсудить/вклад) 12:56, 5 February 2014 (UTC)

I can only see now الله (allāh) with shadda (seagull) and a dagger alif (short vertical stroke) when it follows a simple alif ا. The diacritics should definitely show in لله (li-llāh(i)) (it doesn't on my home PC). --Anatoli (обсудить/вклад) 13:01, 5 February 2014 (UTC)
It's an expected behavior, currently it doesn't show in any of these words on my current system either, but as far as I remember, those diacritics (shadda and alif) are shown only if you write alif + lam + lam + ha'. Even if you add any extra diacritic on any of these letters, those diacritics of the ligature would be disappear. In other words: if you'd like to see that word in fully vocalized form and also with hamzat al-wasl, you should use it like عَبْدُ ٱللّٰه --Z 15:48, 5 February 2014 (UTC)
Thank you! I don't know what you did but I can see full diacritics now, even with a final kasra عَبْدُ ٱللّٰهِ. Could you describe what you did, please? I have updated Abdullah and عَبْدُ ٱللّٰهِ now. --Anatoli (обсудить/вклад) 22:29, 5 February 2014 (UTC)
NP, I simply wrote hamzat al-wasl + lam + lam + shadda + dagger alif + ha' --Z 08:59, 6 February 2014 (UTC)
I see. There' some new problem, though. On iPad, عبد الله is seen with two rows of diacritics - shadda-dagger alif and shadda-dagger alif. One generated automatically and one manual. I guess, you can't win. --Anatoli (обсудить/вклад) 09:43, 6 February 2014 (UTC)

How to link in inflection tables

Is there any established preference for how to link to terms in inflection tables? At {{de-decl-noun-n}}, kc_kennylau (talkcontribs) and I are going back and forth between using bare links (i.e. [[...]]; his preference) and using {{l-self}} (my preference). I don't want to keep edit-warring about it, especially if there isn't a consensus that my way is the preferred way. I thought there was, though. —Aɴɢʀ (talk) 14:38, 2 February 2014 (UTC)

Refer to the other declension tables, i.e. {{de-decl-noun-m}} and {{de-decl-noun-f}}. They detect if the form is going to be different with the page name, and add a link if it is different. Otherwise, they use bare links. Notice how your edit causes {{l-self|de|{{PAGENAME}}}} which is, well, yeah... --kc_kennylau (talk) 14:41, 2 February 2014 (UTC)
I don't encourage bare links (because they don't mark the text with the language), but I don't think it matters whether you use {{l}} or {{l-self}}. I'm not sure why there would be a problem with {{l-self|de|{{PAGENAME}}}} though. —CodeCat 14:45, 2 February 2014 (UTC)
Me neither. Kenny, can you be more specific than "well, yeah" about your problem with {{l-self|de|{{PAGENAME}}}}? —Aɴɢʀ (talk) 14:52, 2 February 2014 (UTC)
It is virtually useless trying to detect whether a link, that by definition is linking to itself, is linking to itself or not. --kc_kennylau (talk) 16:40, 2 February 2014 (UTC)
Isn’t it good to be, like, really, really sure? Michael Z. 2014-02-02 18:53 z
I still don't understand what you mean. {{l-self}} doesn't know in advance that its parameter is the same as the page name. That's why it checks. —CodeCat 13:22, 3 February 2014 (UTC)
The code literally contains {{l-self|de|{{PAGENAME}}}} which is redundant and kinda defeats the purpose of calling the template. Why don't you just bold it instead. --kc_kennylau (talk) 15:03, 3 February 2014 (UTC)
Bolding isn't the same as what linking templates do. But in any case, I don't see {{l-self|de|{{PAGENAME}}}} anywhere in {{de-decl-noun-n}}. —CodeCat 15:21, 3 February 2014 (UTC)
Because I removed all them. NB: An edit war was going on between me and him, if you see the history. --kc_kennylau (talk) 15:30, 3 February 2014 (UTC)
I think you were both right. Angr was right to use {{l-self}}, but you were right to not put it in {{de-decl-noun-n}}. I think it should go in {{de-decl-noun}}. —CodeCat 15:33, 3 February 2014 (UTC)
I don't think it's necessary to add extra links because the m f and n already add link if it's not identical to the pagename. Please look at their source codes. --kc_kennylau (talk) 15:54, 3 February 2014 (UTC)
But they add bare links rather than German-specific links. That means (1) clicking on the link won't necessarily take you to the German section if the page has more than one language on it, (2) your browser doesn't know the word is in German (which can make a difference to blind people with screen readers, for example), and (3) those of us who have set our preferences to show links to nonexistent language sections in orange don't see orange links but rather blue links. —Aɴɢʀ (talk) 19:56, 3 February 2014 (UTC)

How would I process special characters in Python?

Discussion moved to Wiktionary:Grease pit/2014/February.

Ugh. We're now punitively proscriptive about additions to our most-general, entry-level community communications channel. - Amgine/ t·e 17:20, 2 February 2014 (UTC)

  • Punitively? This looks to me more like moving a technical discussion to the technical forum, where it's more likely to be seen by people who might be able to answer the question. ‑‑ Eiríkr Útlendi │ Tala við mig 18:08, 3 February 2014 (UTC)

Proposal for minor change in the functionality of Template:etyl

I propose that {{etyl|xx}} should function exactly the same way as {{etyl|xx|-}}, rather than categorizing as "English terms derived from Language X". Then we can deprecate {{etyl|xx|-}} in favor of {{etyl|xx}}. I think that on average {{etyl|xx|-}} is used more times per page than any other use of the template, and removing the need for the "|-" will make it much easier to type up cognate lists in etymology sections. --WikiTiki89 03:46, 4 February 2014 (UTC)

I agree with making this change. — I.S.M.E.T.A. 18:32, 7 February 2014 (UTC)
The change makes sense considered in isolation, but it would be very hard to find and fix all the missing instances of [Category:English terms derived from <language>] after the change is made. I think we would would need to do that before the change while we can still distinguish them by the missing "-", so that it can be done by bot and not manually, entry-by entry.
Also, because the old way and the new way have all the same inputs, but with different interpretations, automatically tagging incorrect uses of the template without the parameter to mean English would seem to me to be impossible. How do we tell the difference between someone deciding not to categorize a derivation and someone erroneously assuming that they're categorizing it as English? Chuck Entz (talk) 19:03, 7 February 2014 (UTC)
Well, we can deprecate {{etyl|xx}} for a while, possibly having it display an error, until people stop using. --WikiTiki89 19:12, 7 February 2014 (UTC)

accents and qualifiers

Should we change all instances of {{a|adjective}} to {{qualifier|adjective}} in all pages here? --kc_kennylau (talk) 07:14, 4 February 2014 (UTC) Fixed grammar. --07:15, 4 February 2014 (UTC)

As well as here, here, here, here and here? --kc_kennylau (talk) 07:17, 4 February 2014 (UTC)
I'd be in favor of such a change. Or maybe {{sense|adjective}}? —Aɴɢʀ (talk) 13:30, 4 February 2014 (UTC)
Absolutely, except for Standard. Those must be fixed one by one as Standard may be labelling pronunciation in the standard accent (it would be a better practice to use the name of the standard accent though). — Ungoliant (falai) 13:39, 4 February 2014 (UTC)
@Angr: I don't think {{sense|adjective}} is appropriate, since IMHO sense is meaning and adjective isn't a meaning, just IMHO. --kc_kennylau (talk) 14:42, 4 February 2014 (UTC)

elevated / lofty style

What context label I can use in order to mark the elevated style or lofty style in the definition?

This is interesting for me, because in Russian Wiktionary there is the context label "высок." ("высокий" or "высокий стиль") and I want to find the corresponding context label in English Wiktionary. -- Andrew Krizhanovsky (talk) 10:07, 4 February 2014 (UTC)

German Wiktionary has de:Vorlage:geh., standing for gehoben, as well, which comes to the same thing. I sometimes use {{context|formal}} for this, but I'm not 100% sure "formal" is really identical to "elevated style". Of course you can always use {{context|elevated}}, but that doesn't categorize. —Aɴɢʀ (talk) 13:29, 4 February 2014 (UTC)
We also have {{context|literary}}, which categorizes. But, given that our user base skews heavily toward those with graduate degrees, maybe 'lofty' is normal. DCDuring TALK 14:58, 4 February 2014 (UTC)
In my experience, in practice, {{cx|literary}} is en.Wikt equivalent of de.Wikt's gehoben; {{cx|formal}} might also be appropriate; and literary is even more elevated than something which is merely formal. In theory (i.e. according to the definitions provided by Category:English literary terms and Category:English formal terms), any words that are rarely used in speech are literary, by which criterion a lot of textspeak (e.g. 404 for "I don't know") might qualify as literary! - -sche (discuss) 18:33, 4 February 2014 (UTC)
<snort> "Fear and Loathing in Las Vegas" is literary, too. And other than the dated drug jargon/slang usage I doubt one can find many rare terms within it. - Amgine/ t·e 18:57, 4 February 2014 (UTC)
I agree that {{context|literary}} is essentially the equivalent of Russian "высокий". They don't imply exactly the same thing, but they are close enough. --WikiTiki89 19:13, 4 February 2014 (UTC)
Thank you very much for your answers.
There is the translation книжный = "literary", "bookish" (O.S. Ahmanova "The dictionary of linguistics terms" (in Russian), 1969 year, page 198). But "книжный" is another context label in the Russian Wiktionary. Therefore, I cannot use "literary", because it is an equivalent to "книжный". OK, I will use {{context|elevated}} as the translation of "высокий стиль". -- Andrew Krizhanovsky (talk) 11:15, 7 February 2014 (UTC)
"bookish" is not exactly the same as "literary", at least the way I see it. --WikiTiki89 14:15, 7 February 2014 (UTC)
Please don't! — especially since (I suspect) the vast majority of readers of en.wikt won't know what it means.​—msh210 (talk) 16:08, 11 February 2014 (UTC)
What is ru.Wikt’s equivalent of “formal”? Michael Z. 2014-02-07 15:19 z
Seriously, why the heck would you create a non-standard label “elevated” for высокий, when the synonym formal is used in every English-language dictionary? Michael Z. 2014-02-09 05:51 z
Because "formal" is not a correct translation of "высокий". --WikiTiki89 06:08, 9 February 2014 (UTC)
Thanks. What is ru.Wikt’s equivalent of “formal”? Michael Z. 2014-02-10 15:45 z

You can see a list of equivalents of context labels in Russian Wiktionary and English Wiktionary: ru:Участник:AKA MBG/Статистика:Пометы.
See the second table: "Labels added by hand". This correspondence was created by my research group in the last year.

The table Labels found by parser" was created by wikokit parser, and the data was extracted from the Russian Wiktionary. During this year we will extract context labels and create the same tables for the English Wiktionary.

P.S. You see in this table that "formal" corresponds to "офиц." (официальный).

P.P.S. You are welcome to write your comments and ideas at discussion page. Of course, we will change our equivalents if you will propose better equivalents. -- Andrew Krizhanovsky (talk) 05:31, 12 February 2014 (UTC)

Russian label abbreviations are here: ru:Викисловарь:Условные_сокращения. "высок." is something like "stilted". I don't agree with the explanation "высокое", it's more like "высокопарное". --Anatoli (обсудить/вклад) 05:43, 12 February 2014 (UTC)
I remembered an exact English equivalent of "высокий": "high(er)-register". --WikiTiki89 01:03, 13 February 2014 (UTC)
Thank you, Wikitiki89! -- Andrew Krizhanovsky (talk) 10:30, 11 March 2014 (UTC)

Chinese character entries in Translingual sections

Discussion moved from Wiktionary talk:Per-language pages proposal#Chinese character entries.

What is called "Translingual" at present essentially refers to "Chinese", eg. , , , including glyph etymology, definitions, derived characters (The character is written like that, and various characters are derived from it, because of homophony or near-homophony in Chinese). This bit should be got rid of completely, and Mandarin should be renamed "Chinese" and moved up the top, in which glyph etymology/word etymology/derivations/semantic development etc. are explained (eg. ). Wyang (talk) 00:48, 7 February 2014 (UTC)

The reason it is "Translingual" is because it also applies to non-Chinese languages, such as Japanese, Korean, Vietnamese, and pretty much any other East Asian language. --WikiTiki89 00:56, 7 February 2014 (UTC)
The semantic development is also language-specific. There are plenty of examples of characters that have different meanings in different Chinese languages, or that only exist in some of them. —CodeCat 00:59, 7 February 2014 (UTC)
The way I see it, the Translingual entry is the entry for the character. The characters have meaning independent of the language, due to the pictographic/logographic origin of the script, and that meaning should be listed as the meaning of the character. Any further semantic developments should be listed under the relevant language. --WikiTiki89 01:04, 7 February 2014 (UTC)
The characters did not have meaning independent of the language, otherwise why would people create them originally? WRT further semantic developments - in what way is the definition of in Korean, Japanese or Vietnamese different from Chinese? Wyang (talk) 01:10, 7 February 2014 (UTC)
Exactly, it's not. But sometimes it is. That's exactly what I mean by independent of language: the character has essentially the same meaning in all of those languages. --WikiTiki89 01:13, 7 February 2014 (UTC)
See below. Wyang (talk) 01:28, 7 February 2014 (UTC)
Yes, but they are only a small proportion out of the hundreds of thousands of characters that exist, and a context label could easily accommodate that. There are plenty of examples of English words that have different meanings in different dialects, or that only exist in some dialects too. Wyang (talk) 01:06, 7 February 2014 (UTC)
@Wyang, I'm not arguing against a "Chinese" header. I'm arguing for a "Translingual" character entry. I think that the "Chinese" and "Translingual" sections could co-exist. --WikiTiki89 01:09, 7 February 2014 (UTC)
I was replying to User:CodeCat. I don't think they should coexist. Even if they do, "Translingual" should not have etymology, definition (because they are the same as in Chinese), or derived characters. It should only contain coding information. Wyang (talk) 01:28, 7 February 2014 (UTC)
I agree with this. The translingual section shouldn't be used to give meanings, those should go in the section for each language. Yes, even if it means the same in every language. land means the same thing in many Germanic languages too, but we don't skimp out and put that under "Translingual" either. I don't see why this should be any different. —CodeCat 01:33, 7 February 2014 (UTC)
In alphabetic languages, the letters don't have meaning. In pictographic languages, the glyphs have the same meaning regardless of the morphology of the language that uses them. --WikiTiki89 01:38, 7 February 2014 (UTC)
Except that's not always true. No two languages using Chinese characters use the exact same set of characters, with the exact same meaning. There are differences. Furthermore, among the Chinese languages, characters are also tied to historical words. Words with different origins get a different character. That means that when a word falls out of use in one Chinese language, the character that was used to write it goes with it, and it's replaced by another synonymous (!) character. But in another Chinese language, the older word might still be in use and the alternative word could mean something else, or it could be that word that disappeared instead. Thus characters can be very much "dialectal" and have language-specific usage and meanings. For some examples, see w:Written Cantonese#Cantonese words. It's often said that Chinese is written the same regardless of language, but I'm pretty sure that written Cantonese would not be perfectly understood by a Mandarin speaker, because of these differences. —CodeCat 01:47, 7 February 2014 (UTC)
Most of the examples in the link above are archaisms, which are perfectly understandable for someone unfamiliar. Using the example of "to eat" there, it can be easily explained in if there is a Chinese header, using
# {{cx|Cantonese|Hakka|Min Nan|Min Dong|Min Bei}} to [[eat]]
or something similar. And the alternative, , would have
# {{cx|Mandarin|Jin|Wu|Gan|Xiang}} to [[eat]]
. Wyang (talk) 02:30, 7 February 2014 (UTC)
@Wikitiki89: Because all the other languages borrowed the fossilised form from one language. They have the same meaning as the Chinese language which originally used these glyphs to write the language. The meanings are basically dead outside the fossilised borrowings, eg. no Vietnamese person would use đại to mean "replace, replacement" or "era, generation", and none of the readings in 弋#Vietnamese would be understood as "catch, arrest" or "shoot with bow". Wyang (talk) 01:52, 7 February 2014 (UTC)
Which is not true. Glyph-wise derived from , not because they were near-homophonic in Japanese at any stage or level (Kun, On). Wyang (talk) 01:02, 7 February 2014 (UTC)
I don't see how that's relevant. Etymology is etymology. The point is the glyph has the same basic meaning in Japanese as it does in Chinese. --WikiTiki89 01:07, 7 February 2014 (UTC)
  • There's also the etymology of the individual characters, which presumably is specific to the character glyph itself, regardless of the semantics or sound shifts of each language. Some have argued that this belongs under a Chinese heading, but that information is applicable (or at least potentially interesting and linguistically useful) to all languages that use that character. —This unsigned comment was added by Eirikr (talkcontribs) at 01:21, 7 February 2014‎.
    Exactly. Now I think we should move the majority of this discussion to somewhere more relevant. Only the beginning belongs here. --WikiTiki89 01:23, 7 February 2014 (UTC)
That etymology is not translingual if it can not be said to be true for all languages then. The character has essentially the same meaning, as in Chinese, in all of those languages, since those languages kept all the meanings in Sinoxenic compounds. It is in Chinese that the character evolved semantically, and produced those meanings, and the other languages merely borrowed the fossilised characters and compounds. It is not a translingual occurrence that the character was coined, given those meanings, and was used to derive other characters. It was in Chinese that the character was coined, and was assigned the correspondence with a native Chinese word, which evolved semantically through time, acquiring various meanings, and the character was used to write other characters in Chinese. Wyang (talk) 01:28, 7 February 2014 (UTC)
@Wyang. No-one argues that Han characters were created by Chinese but character and word definitions become different, they are also different parts of speech across languages (e.g. is also a suffix in Japanese and Korean but it's never a suffix in Chinese) and they are shared. Translingual definitions can often be copied into Mandarin to give Mandarin definitions (many Mandarin single character entries still lack simple definitions!) but Translingual (or something with a new name as CodeCat suggested) should stay. Good Chinese and Japanese dictionaries have separate entries for characters and words. Note that translingual sections don't have noun, verb, other parts of speech. The new header, rather than Translingual may have something to permanently remind that these characters are Chinese, such as Han character (漢字/汉字) if it's your concern but existing subheaders should probably be renamed and standardised (kanjia, hanja, etc.). --Anatoli (обсудить/вклад) 01:51, 7 February 2014 (UTC)
The fossilised meaning are also no concern. Language specific entries could focus on semantics, grammar, pronunciation, not stroke order or generic meaning. --Anatoli (обсудить/вклад) 02:02, 7 February 2014 (UTC)
There is no point in repeating the definitions in Translingual (or whatever the name is) if it is going to be nearly 100% identical with Chinese. There are differences in the PoS across languages, but again I don't agree with the idea of PoS for isolating languages, especially for characters which almost always cover a broad range of PoS in a single sense. Anyway the particle-ilisation of 的 is a phenomenon which happened in Chinese, which used 的 to represent an increasingly common possessive and adjectival particle at that time, and Japanese/Korean borrowed this to systematise their Sinoxenic borrowings, converting nouns into something similar to an adjective. Wyang (talk) 02:07, 7 February 2014 (UTC)
As I showed above, there is no such thing as "Chinese" when it comes to meanings of characters. Each Chinese language can and sometimes does assign different meanings to the same character. If we put some meanings under Translingual and others under Mandarin/Cantonese/Min Nan etc, that is just going to confuse people because then they won't know which one is right, or what's more specific etc. —CodeCat 02:11, 7 February 2014 (UTC)
See above. "There is no such thing as 'Chinese' when it comes to meanings of characters" - well, what are these character dictionaries about then? Hanyu Da Zidian, Zhonghua Da Zidian, Kangxi Dictionary. Wyang (talk) 02:26, 7 February 2014 (UTC)
If we have a generic meaning of "eat", "eating" under "Han character", without the pronunciation section but stroke orders, etymology and the exact modern usage under Mandarin Chinese section (and other languages) (the character is now hardly used as a verb in Mandarin), then it would be easier to see how individual CJKV languages use it today. Lumping everything under Mandarin/Chinese would be a mess. --Anatoli (обсудить/вклад) 02:24, 7 February 2014 (UTC)
No, it won't be. A context label suffices. See the example above. Wyang (talk) 02:26, 7 February 2014 (UTC)
I agree with Wyang and have thought the same for a long time. Kaixinguo (talk) 12:17, 2 March 2014 (UTC)

ISO 639-3 updates

SIL has posted updates a summary of ISO 639-3 code changes made for change requests from 2012. Usually enwikt implements changes that aren't controversial (often for codes that we have no entries in). It's also a good time to debate the changes that are controversial or non-trivial. Check to see if your favorite unappreciated language has been affected! --Bequw τ 15:10, 7 February 2014 (UTC)

Looks like we can change Old Lithuanian from bat-olt to olt. --WikiTiki89 15:17, 7 February 2014 (UTC)
All of the changes to codes look good, except that we should adopt ygs as "Yolngu Sign Language" rather than "Yolŋu Sign Language". And we can note that gev goes by "Viya" in addition to "Eviya". I'll examine the changes to language names later. - -sche (discuss) 18:42, 7 February 2014 (UTC)
I have updated the modules to remove, split, and add all the codes the ISO removed, split, and added, respectively. I have also switched all our uses of bat-olt to olt and removed 'bat-olt' from the exceptional code module. Thanks, Bequw, for altering us to the updates. - -sche (discuss) 04:51, 10 February 2014 (UTC)

A proposal to treat Old Latin as a separate language from the other chronolects of Latin

In a short discussion in Module talk:labels#Old Latin data, CodeCat and I agreed that it would be a good idea to treat Old Latin (la-old) as a separate language from Latin (la) on the English Wiktionary. w:Old Latin "presents some of the major differences" between Old Latin and Classical Latin. The corpus of Old Latin dates from the fragments of the Carmen Saliare (700 BC) to the conventional boundary date between it and its Classical descendant language (75 BC). Saliently, much of the earlier Old Latin corpus was unintelligible by the writers of the Classical period; thus, according to the standard linguistic criterion of mutual unintelligibility for differentiating languages, this arguably renders Old Latin and ≥Classical Latin different Abstandsprachen. Practically speaking, there are more orthographic, phonological, and morphological differences between Old Latin and Classical Latin than there are between Classical Latin and the later Latin chronolects; that Old-Latin–specific information can't easily be shoehorned into (Classical) Latin entries without unduly obscuring the information about the language's Classical form, which, I believe, is what most people who look up Latin words will be after. For all these reasons, I propose that we treat Old Latin (loosely defined as everything in the Latin continuum prior to 75 BC, with the ISO code la-old) as a separate language from Latin (loosely redefined to exclude everything in the Latin continuum prior to 75 BC, retaining the ISO code la) on the English Wiktionary. — I.S.M.E.T.A. 19:25, 7 February 2014 (UTC)

Sounds good to me. —Aɴɢʀ (talk) 20:17, 7 February 2014 (UTC)
The proposal to treat Old Latin as a separate language sounds good to me, too; in the past, User:Metaknowledge explained the dramatic differences between Old and newer Latin to me.
I do have a technical question, though: I don't mean to nitpick, but is "la-old" an ISO code? If it isn't, but we still want to upgrade "Old Latin" from "etymology-only language" to "regular / L2-having language", shouldn't we create an exceptional code the way we usually do (as documented on WT:LANG and demonstrated in Module:languages/datax), which is by using the family code, a hyphen, and three letters that approximate the language's name (so: "itc-ola")? - -sche (discuss) 20:49, 7 February 2014 (UTC)
You're right. However, may I suggest that we use itc-lao instead (by analogy with fro for Old French, for example)? — I.S.M.E.T.A. 21:28, 7 February 2014 (UTC)
Sure. :) - -sche (discuss) 21:40, 7 February 2014 (UTC)
I prefer itc-ola. All of our exceptional codes for old languages are named that way, and some in ISO are too (like the new olt, and existing odt, osx, ofs, orv). —CodeCat 22:13, 7 February 2014 (UTC)
Hm, that's also a good point. Alright, we're back to itc-ola. - -sche (discuss) 22:29, 7 February 2014 (UTC)
Based on the discussion that -sche linked to, I don't think we should split off Old Latin at 75 BC. It seems that if anything it should be the "Primitive Latin" that Metaknowledge talks about that should be split off. --WikiTiki89 22:45, 7 February 2014 (UTC)
I think instead of using a particular date, we can use a particular progression of sound changes. That would simplify things for us, because we can limit how many alternative forms and inflections we need to handle. Obvious choices would be s > r, weakening of unstressed vowels, monophthongisation, loss of final -d, and maybe even the stress shift itself (if we can somehow find out when it happened). (Speaking about codes I noticed we have roa-ptg for Old Portuguese. Maybe we should change that to roa-opt?) —CodeCat 00:01, 8 February 2014 (UTC)
This looks more sensible to me. Otherwise, we would have to create an Old Latin section for every word that occurs in Terence's works (for example), which wouldn't make much sense. A sentence as non dici potest quam cupida eram huc redeundi, abeundi a milite vosque hic videndi, antiqua ut consuetudine agitarem inter vos libere convivium doesn't present "striking differences" from Ciceronian Latin. --Fsojic (talk) 00:20, 8 February 2014 (UTC)
Is Cicero's Latin what is normally used as the "base" for classical Latin? If so, then anything less than 100 years before his writing probably is similar enough to our idea of "Latin" to call it that. It's more practical to look at spelling, that's why I think sound changes give a better cutoff, because of how they influence spelling. w:Senatus consultum de Bacchanalibus from 186 BC is a good example of a text I would consider "old" but not "really old". It has the weakening of unstressed vowels, and probably the classical stress pattern as well, but it still has diphthongs which make quite a lot of words look rather different, final -d in the ablative, DV rather than B, GN rather than N, and a few other things. —CodeCat 00:41, 8 February 2014 (UTC)
Yes, Cicero and Caesar are the base for it. --Fsojic (talk) 13:00, 8 February 2014 (UTC)
I've made the change. I don't know if I caught all the cases that used "OL.", so there might be a module error or two. We'll need to keep an eye out. A lot of links to Old Latin terms use "la" as the code too, so we need to look out for those and update them. —CodeCat 00:38, 14 February 2014 (UTC)

rare terms

(I think this has already been discussed before, but I couldn't find where)

I did this edit, but I'm not satisfied with the categorization it brings about. This is not a "term with a rare sense" (which would imply it had several senses), it's only a "rare term". --Fsojic (talk) 23:54, 7 February 2014 (UTC)

A rare term is a term, all of whose senses are rare. Therefore, a "rare term" is also a "term with a rare sense". There isn't much use for a category of terms that only have rare senses. And it would also be very difficult to implement context labels in a way that would support that. --WikiTiki89 00:02, 8 February 2014 (UTC)
From a technical perspective it'd be simple to have {{cx|rare term}} vs {{cx|rare sense}}. But getting people to use the labels correctly could indeed be difficult, in part because some people would probably continue to use {{cx|rare}} for both cases even if we deprecated it or made it a redirect to only one of the two. - -sche (discuss) 03:44, 8 February 2014 (UTC)
You're right that this has been discussed before, and some users have taken the same view as you, that "terms with rare senses" misleadingly implies that some senses are not rare. Others have taken the view that it doesn't explicitly say any sense are not rare, and so is more palatable than the former category ("rare terms", which nevertheless contained common terms with rare senses). Here are a few past discussions: WT:RFM#Category:English_terms_with_obsolete_senses, WT:Grease pit/2011/June#Bad_category_name_generated_by_template_obsolete, WT:Beer parlour/2011/June#English_terms_with_obsolete_senses.2C_etc.. In general, the way we handle rare/obsolete/etc terms, terms with rare/obsolete senses, and rare/obsolete forms is a mess, because we mix all of those things incompletely. For example, some entries use {{cx|obsolete|lang=en}} {{alternative form of|foo|lang=en}} while others use {{obsolete form of|foobar|lang=en}} — and that's not even necessarily wrong, because some have argued that there is a meaningful difference between the two. - -sche (discuss) 03:44, 8 February 2014 (UTC)

{{head}}'s links to e.g. Wiktionary:Hebrew transliteration

Currently, {{head}} prefaces the headword's transliteration with a bullet symbol linking to e.g. Wiktionary:Hebrew transliteration. I think I understand the motivation here, but I think the result is not quite right, for four reasons:

  • Pages in the Wiktionary: (Project:) namespace aren't part of our actual dictionary content: they're supposed to be for editors, not readers. If we want to document our transliteration schemes for readers, then we should do so in the Appendix: namespace.
  • I've looked through Wiktionary:Hebrew transliteration and several others, and in all cases, it's clear that they were written for use by editors who know the original script and want to write a transliteration. That means that they're categorically not useful to put between the original script and the transliteration when both are present. (If anything, that information would be useful when the transliteration is missing, if we want to prompt readers to become editors and add it.)
  • I can't think of what would be given in reader-oriented documentation of a transliteration scheme. I suppose it would give information about how to go from a transliteration to a pronunciation, but in that case it seems like {{head}} is one of the least useful places for the link, since the headword is already likely to have an associated ===Pronunciation=== section. (I don't feel strongly about this part, though.)
  • It's really not obvious that the bullet symbol is a link; and even if someone notices the color and correctly infers that it's a link, they're not likely to guess what it's a link to, since it doesn't seem closely tied to the transliteration. (The transliteration is wrapped in parentheses, and the bullet is outside those parentheses, so the link actually closer to the original script than to the transliteration.)

I think it might be best to just remove the link, but an alternative would be to (1) change it to point to an appendix rather than a project page and (2) move it after the transliteration, inside the parentheses, as something like <sup>''[[Appendix:Hebrew transliteration|help]]''</sup>.

RuakhTALK 01:39, 9 February 2014 (UTC)

After thinking about it, you do have a point. But I have to point out that not everything on Wiktionary is only for "readers". In fact, the idea of a wiki is that all "readers" are also potential editors. The "[edit]" links, for example, also do nothing to help readers. On the other hand, the link does not help a reader add a missing transliteration since it does not appear when the transliteration is missing. And I agree that it wouldn't be the end of the world if we simply got rid of the link altogether. --WikiTiki89 06:18, 9 February 2014 (UTC)
Re: "not everything on Wiktionary is only for 'readers'": Absolutely; and I addressed that in my second bullet-point. But given the context of the link, I don't think its intent can have been anything like that, anyway. —RuakhTALK 06:48, 9 February 2014 (UTC)
If we standardize transliteration-appendix pagenames (as we do About Languagename pagenames for example), then {{head}} can check for existence and link to such.​—msh210 (talk) 18:08, 9 February 2014 (UTC)
Yup, that's what it currently does, just with the Wiktionary namespace (checking for e.g. Wiktionary:Hebrew transliteration, which redirects to Wiktionary:About Hebrew#Romanizations) instead of the Appendix namespace. But. —RuakhTALK 18:30, 9 February 2014 (UTC)
Come to think of it, your bullet point 4, above, Ran, is a very good point. Perhaps if this is kept it should be as <small>[[appendix:Language transliteration|translit.:]]</small> fú (or some such) within the transliteration parens.​—msh210 (talk) 18:37, 9 February 2014 (UTC)

Request for permission to merge

Discussion moved from WT:RFDO. --kc_kennylau (talk) 12:14, 9 February 2014 (UTC)

I would like to request for permission to merge the following templates to {{de-decl-noun-n}}:

If permitted, I will do the orphaning and merging by myself. --kc_kennylau (talk) 17:24, 7 February 2014 (UTC)

Bot permission

I'd like to request permission to run User:Asturianbot legally. He is to do the same thing that User:Asturbot did. Luckily, KassadBot sorts the page into the correct alphabetical order. --Back on the list (talk) 12:30, 10 February 2014 (UTC)

@Back on the list: I am aware of the statement on the userpage saying that it has done testing on the page lladrar, but the page history of lladrar shows no evidence of Asturianbot making any changes to it. --kc_kennylau (talk) 12:58, 10 February 2014 (UTC)
It created entries for the inflected forms of lladrar. — Ungoliant (falai) 14:06, 10 February 2014 (UTC)

When is English not English?

I've noticed that "Gorsedd" is described as an English word, and its etymology is shown as "from Welsh"; similarly "satyagraha" is categorised as "English, derived from Sanskrit", although it seems to me that Gorsedd is simply a Welsh word denoting a Welsh institution, and Satyagraha is a Sanskrit word denoting a Hindu concept.

What is the reasoning behind this? Any language can borrow and eventually absorb foreign words, but if a word becomes English as soon as it is used in an English sentence, then every word in the world could be categorised as English!

For heaven's sake, we English-speakers are insular enough already!—This unsigned comment was added by Hoffoholi (talkcontribs).

WT:Criteria for inclusion describes how we decide what words are part of the language. Short version: a word has to be used three times. Michael Z. 2014-02-10 23:13 z
Good question. English has a mechanism for indicating a word as foreign: italics. Thus, if a well-edited book (which italicizes foreign words) uses a word without italics, it's using that word as English. As Michael Z. notes, we use those citations to determine whether the word should be listed as English in the dictionary. On the other hand, on Usenet, for example, where (at least traditionally) italics aren't used, it's hard to show whether a word would be in italics were the medium conducive to it, so citations from Usenet that show use of a traditionally foreign in English are not useful in determining whether the word is English or not.​—msh210 (talk) 16:13, 11 February 2014 (UTC)
Also, italics are used for other things, like emphasis, so if a word is printed in italics, it isn't always clear whether the author italicized it for emphasis or because he considers it foreign. —Aɴɢʀ (talk) 18:20, 11 February 2014 (UTC)
Indeed, I remember a Latinate (but ==English==) term being RFVed a year or two ago, which was found to be used in several old books, which italicized it — but also italicized month names like "July" and other English words! - -sche (discuss) 22:14, 15 February 2014 (UTC)
Even when italics are used, it's a use of the word nonetheless. Very often, people use italics when they don't find the word in their favourite dictionary. There cannot be any other criterion for inclusion for a given language than the use (not only the mention) in the language. Lmaltier (talk) 21:34, 15 February 2014 (UTC)
Note that italics are not (and cannot) be used in speech, and so I don't think they are a reliable factor in determining Englishness. --WikiTiki89 22:16, 15 February 2014 (UTC)
Well, italics are an indication that a word is “not fully naturalized” in English. This includes nonce borrowings, but also many words that have a long English history, and are clearly within our criteria for English. If a word is usually italicized, we should include a usage note about this. Some print dictionaries will italicize the headwords of such entries (including the COD and CanOD – perhaps we should do so).
There are other characteristics of foreignisms, like indeterminate spelling, usually being accompanied by a gloss, etc. But there no clear proof of acceptance in the language, except maybe a sufficient frequency of use. Michael Z. 2014-02-16 02:05 z
Indeterminate spelling doesn't mean much either. The word "Hannukah" can be spelled a million different ways, but there is only one accepted pronunciation. A word like "varenyky", on the other hand, likely has many variant pronunciations, and I would be more inclined to call it an unnaturalized. --WikiTiki89 02:30, 16 February 2014 (UTC)

Proposal for Template:ttbc to accept unrecognized language names

Inspired by recent edits to [[ostrich]], I think {{ttbc}} should accept unrecognized language names. Not all users know all of our language name conventions and I think we should allow translations to be added such as "{{ttbc|Sami}}: [[struhcca]]" when the specific Sami lect is unknown. It would presumably categorize separately to something like Category:Translations to be checked (language unknown). --WikiTiki89 13:46, 11 February 2014 (UTC)

It originally did work. I tracked down the problem to Template:langrev/Sami, which User:Kc_kennylau created a few days ago with the contents "smi". Because "smi" is a family code rather than a language code, it fails. I wonder why he created that subpage if he clearly didn't understand what it did. —CodeCat 13:54, 11 February 2014 (UTC)
In that case, I think that if there is any sort of error in determining the language, then it should treat it as an unknown language. --WikiTiki89 14:25, 11 February 2014 (UTC)
There are plans to replace {{ttbc}} with something else anyway, and then that won't be a problem anymore. Right now we're just waiting for someone to update WT:EDIT so that it supports the new format. There's a discussion about it on the GP. —CodeCat 14:44, 11 February 2014 (UTC)
I've been commenting out most of the Sami translations, since they're the result of indiscriminate copying from another website. Chuck Entz (talk) 14:38, 11 February 2014 (UTC)
Commenting out is a bad idea. Things that are commented out will never be looked at again. There is no reason not to just move them to the checktrans section. --WikiTiki89 14:49, 11 February 2014 (UTC)

Template to format quotations

Currently we only have {{usex}} to format quotations and usage examples alike. User:Angr said that there might be objections against using {{usex}} for quotations, but I'm not sure why exactly. Either way, we don't really have a template that can format quotations properly the way WT:QUOTE describes it. So I think that we should add parameters to {{usex}} that allow us to show sourcing information. We could also make a dedicated template like {{quote}}, {{quotation}} or something similar, but I'm not sure what the benefit of having a separate template would be over extending an existing one. —CodeCat 20:58, 11 February 2014 (UTC)

What's wrong with {{usex}} the way it is now? --WikiTiki89 21:01, 11 February 2014 (UTC)
We have at least 3 different templates to format quotations ({{Q}}, {{quote-book}}, {{reference-book}}). The benefit is, having separate templates allows us to easily distinguish between quotations that we made up from quotations that actually occur in the wild. That's not to say that both templates couldn't use the same back end- but they should absolutely be kept separate in entries. DTLHS (talk) 21:02, 11 February 2014 (UTC)
Yes, but they are mainly for formatting the citation line of the quotation, not the quotation itself. In fact {{quote-book}} is very bad at formatting the quotation itself for any language that doesn't use the Latin script. --WikiTiki89 21:04, 11 February 2014 (UTC)
Right, and I'm all for anyone that feels that they can unify the multiple templates we already have into something better. DTLHS (talk) 21:08, 11 February 2014 (UTC)

Why are quotations hidden?

Currently we have a script that auto-hides quotations in a collapsible box, but I don't really know the reasoning behind this. I consider quotations to be a type of usage example; one taken from existing citations instead of made up on the spot. So I don't think quotations should be hidden, just like we don't hide usage examples that aren't quotations. If the size of the text is a concern, then we could decide to hide only that part of the quotation that is not also part of the usage example. Then again, usage examples the way they're produced by {{usex}} can become quite long too; they can have text, transliteration, transthat's 3 lines already. —CodeCat 20:58, 11 February 2014 (UTC)

However long a usex can be, a quotation can be even longer. --WikiTiki89 21:00, 11 February 2014 (UTC)
Then why not just choose a shorter one? —CodeCat 21:12, 11 February 2014 (UTC)
Because sometimes we don't have a large number of attestations to pick and choose from, and sometimes you need some context to demonstrate the meaning you're attesting. I try to use whole sentences where I can, which (especially with 19C sources) doesn't help with brevity any. --Catsidhe (verba, facta) 21:18, 11 February 2014 (UTC)
There isn't a requirement to have verbatim-citable attestations in the entry, though. If there are no attestations that are well-suited to being used as a usage example, you could shorten it or do something else with it to make it fit better into the entry. —CodeCat 21:22, 11 February 2014 (UTC)
I think lengthier quotations are often necessary to demonstrate the meaning of the word. --WikiTiki89 21:27, 11 February 2014 (UTC)
Maybe, but then the same would apply to a usage example that isn't a quotation from somewhere else, right? So the problem is not specific to quotations alone, and we need to consider why quotations are hidden but other usage examples are not. They should really be collapsed based on length, not based on source. —CodeCat 21:32, 11 February 2014 (UTC)
  • I am very much a fan of consistent behavior. So either have quotations / usexes always hidden, or have them never hidden. Changing the display state depending on length sounds like a recipe for confusion and frustration. ‑‑ Eiríkr Útlendi │ Tala við mig 21:44, 11 February 2014 (UTC)
@CodeCat: This seems so obvious. Usexes are brief and usually serve to orient the user at least as well as the definitions. Quotations have all the overhead of sources, dates, titles, etc. and may be present only for purposes of attestation. Putting them on the Citations pages just makes them a little harder to get to and runs the risk of losing the connection with specific senses for polysemic terms. As anyone who cites entries knows, we do not live in a world in which there are citations good for all usage example purposes and for all attestation purposes, especially not for every sense of every polysemic term. DCDuring TALK 21:49, 11 February 2014 (UTC)
Usage examples don't have to be brief, because quotations are an example of usage examples that are not always brief. And like Wikitiki said, sometimes you need a lengthy example to show the meaning, whether it's quoted or made up is not relevant then. I don't understand your point about Citations pages. Is "being hard to get to" is a reason for making our entries less accessible to the average user? Usage examples (and therefore also quotations) should serve the purpose of illustrating usage to normal users who want to know what the word means and how to use it. They're not there to quench the curiosity of the few users who are more interested in the documented historical record, because that's a minor use case that isn't really relevant to knowing what the word means. Documenting usage is what citations pages are for, and have always been for. If they're not good enough for their their intended purpose, then fix them where they are, rather than having the problem spill over into the entries themselves where they affect a lot more of our users. Keep citations on the citation pages, keep usage examples with the senses. —CodeCat 21:58, 11 February 2014 (UTC)
I guess I fail to see what the harm is here. We have usexes to give the reader an at-a-glance feel for the word, how it's used, what it means, etc. We have quotations to demonstrate real usage, track the word's journey through time, etc. Quotations are simply bigger, and of less interest to the general user, but at the same time utterly indispensable to someone who really wants to drill in. It seems to me that the only argument you've offered in support of the assertion that anything's wrong is that it irks you in some vague way, which seems to me as insufficient impetus to make a major change to our presentation. -Atelaes λάλει ἐμοί 22:14, 11 February 2014 (UTC)
Leave quotations hidden, user examples unhidden. Quotations are too long and less useful to most users. --Anatoli (обсудить/вклад) 22:49, 11 February 2014 (UTC)
@CodeCat, usage examples can be intentionally constructed in a way that conveys the meaning with less context. Quotations are more likely to be longer, because their primary purpose usually was not to demonstrate the meaning of the given term. --WikiTiki89 00:40, 12 February 2014 (UTC)

I am in favour of hiding both the citations and usexes, because entries can get overly long like this. --Vahag (talk) 05:25, 12 February 2014 (UTC)

Oh, you may consider using {{der-top|example of using ձի}} or similar. --Anatoli (обсудить/вклад) 05:37, 12 February 2014 (UTC)
No, that's not standard or allowed. --Vahag (talk) 05:43, 12 February 2014 (UTC)
Maybe a separate template could be used for a large number of examples? Up to 15 is probably OK without hiding. --Anatoli (обсудить/вклад) 05:47, 12 February 2014 (UTC)
It is possible to make a template/script that automatically hides lines when their number exceeds a given threshold, for example displaying only the first 5 out of 20 by default. Dakdada (talk) 09:38, 12 February 2014 (UTC)
Notwithstanding CodeCat's apparent desire to conflate contributor-authored usage examples and published quotations by eliminating the distinct names of usage examples andquotations (or citations), it is very useful to shun her idiolect and maintain the names and the distinction made in Wiktionary discussions and in the current use of the word "quotations" in the control to allow or suppress display of quotations. The distinction is particularly meaningful for Wiktionary because "quotations" count for attestation and "usage examples" do not.
@Darkdadaah: I don't think we actually have any instances of more than half a dozen usage examples. Certain English function words have so many. It is an editorial decision as to how many such usage examples should be displayed. There may sometimes be too many. Assuming that designing Wiktionary for actual normal human users, there is much to be said for keeping human input in the design of entries and not allowing bot-implemented edits to supersede it. DCDuring TALK 13:55, 12 February 2014 (UTC)

Should quotations be normalized?

The quotations rules are too strict, IMHO, especially for foreign languages. The photographic image in quotations of the original text + transliteration may even be misleading for some language learners with less than native/advanced knowledge of a language, e.g. Russian words with letter "ё" are spelled with "е", Arabic text may lack hamza or "ي" written as "ى", word stresses are missing, the original orthography may be imperfect, dated or even wrong. --Anatoli (обсудить/вклад) 22:49, 11 February 2014 (UTC)

Actually, I think exposing users to the way things are actually written is much more helpful than helping them read it. --WikiTiki89 00:40, 12 February 2014 (UTC)
If that were the case, we wouldn't need transliteration (manual or automated), pronunciation sections, word stresses and other things used in foreign language entries. IMO, your comment just confirms that quotations are not there to help with the language but provide attestation that a term is/was actually used. --Anatoli (обсудить/вклад) 00:55, 12 February 2014 (UTC)
No we do need pronunciation sections because one reason to look up a word is to find out how it's pronounced. Quotations are not there for pronunciation, but as examples of real usage. Showing real unenhanced examples of language helps users learn to read the language. --WikiTiki89 01:19, 12 February 2014 (UTC)
@Anatoli, re "the original orthography may be imperfect, dated or even wrong": de.Wikt routinely normalises quotations for this reason. I don't necessarily agree with that practice, but I recognise that even en.Wikt normalises quotations quite a bit (e.g., we don't reproduce books' use of different font colours, and we don't often reproduce their use of different font sizes or fonts (Fraktur vs Antiqua), and if they are Latin-script, we don't reproduce their use of obsolete ligatures). - -sche (discuss) 01:42, 12 February 2014 (UTC)
We should omit purely stylistic typographic expression, but include anything with lexicographical or semantic significance. Modern use of italics, for example, often carries meaning. I don’t think I’d agree with a blanket ban on historic fonts or glyphs. Michael Z. 2014-02-12 04:08 z
Another point is, Wilktionary is a dictionary, not a legal firm or an archaeological company. It's about words and grammar, not facts. Of course, learners need to be exposed to the real life situations but that's not the purpose of dictionaries. --Anatoli (обсудить/вклад) 01:58, 12 February 2014 (UTC)
That's a valid point. But I think that when we should avoid non-trivial interpreting of quotations if no real problem is being solved. And I don't think that unvocalized text is a "problem". --WikiTiki89 02:05, 12 February 2014 (UTC)
I see no reason in reproducing obvious typos, for example, especially in modern languages. Unvocalised texts is not a problem (hamza and dotted yāʾ are not part of vocalisation but a more standard and strict way of writing), so is Russian letter "ё". Adding Arabic diacritics is cumbersome but adding word stresses and standardise spellings a bit is not a big issue, really. The same text may appear in a reprint/example for children/foreigners with word stresses and vocalisation. --Anatoli (обсудить/вклад) 02:11, 12 February 2014 (UTC)
Wiktionary’s quotations are there to demonstrate the original author’s usage. Not some amateur lexicographer’s improved version of it. If you use a quotation with a typo, then make an editorial note. Michael Z. 2014-02-12 04:08 z
Thanks - I thought I should ping you on this as someone who opposes any changes to citations. I'll give you an example. The entry оне́ (oné) demonstrates the usage of the archaic Russian pronoun (they, feminine plural), which is now они́ (oní) (they, both genders) in all modern reprints.
  1. Original (A. Pushkin): ...и завидуютъ онѣ государевой женѣ (...i zavidujut oně gosudarevoj ženě)
  2. Modern reprint: ...и завидуют оне государевой жене (...i zavidujut one gosudarevoj žene)
  3. "Improved":...и зави́дуют оне́ госуда́ревой жене́ (...i zavídujut oné gosudárevoj žené)
Providing stresses doesn't change anything and "оне́" is the same as "оне", they demonstrate that оне́ (oné) rhymes with жене́ (žené) (dative of жена́ (žená)). Because of the rhyming and the word stress the word cannot be replaced with modern они, as it was done with other occasions of оне. Word stresses (and other phonetic markings) are especially critical for poems. Long foreign language native citations are difficult and not so helpful without any small normalisation like this. As I said, it's not a legal document but a dictionary, it's about words, not facts. The so-called "improved" or "amateurish" version may appear in children's books or books designed for foreigners (Wiktionary is also designed for people of all ages and language levels). Finding such reprints using Google is not easy but I have seen many of them.--Anatoli (обсудить/вклад) 04:40, 12 February 2014 (UTC)
I think I have mentioned before that for poetry it's more ok to add stress marks. Adding stress marks to poetry is much easier, much more useful, and much more likely to actually be attestable with stress marks. For prose, I think adding stress marks is completely ridiculous. --WikiTiki89 17:28, 12 February 2014 (UTC)
I don't see how poetry is different from prose when users need to be able to read a quotation. Are vocalised quotations from Torah or Qur'an any different from unvocalised? Stress marks, vowel points, Japanese furigana don't add any meanings or change the style of the original, they are simply there to help to read words correctly. There's nothing ridiculous about stress marks in the prose, you should probably see Russian books designed for children and foreign learners. I see stress marks in the Slavic languages, vowel points in abjad languages, Japanese furigana, Mandarin pinyin (used in various editions) are used interchangeably to help to pronounce words correctly. If I find two versions of Harry Potter in Japanese (with and without furigana), I'll post here. --Anatoli (обсудить/вклад) 22:32, 12 February 2014 (UTC)
But children's books usually don't contain ordinary prose. They're usually filled with simple rhymes like На́ша Та́ня гро́мко пла́чет: / Урони́ла в ре́чку мя́чик., and even then don't always have stress marks (for example, here's a page from the same Букварь that I grew up with. --WikiTiki89 23:33, 12 February 2014 (UTC)
It's a recent story book, not an ABC-book, which I have :) (it only has 20 pages or so). Old and some new букварь's all use stress marks and diaeresis, they don't have too. I'm just saying they may. Not just schoolbooks but stories and novels. --Anatoli (обсудить/вклад) 02:10, 13 February 2014 (UTC)
Here's an example of a textbook for foreigners: [1] --Anatoli (обсудить/вклад) 02:45, 13 February 2014 (UTC)
I confused it with this, which (I think) is what I had as a kid. Of course it's been ages since I've seen it. --WikiTiki89 06:27, 13 February 2014 (UTC)
But more to the point, it's not always possible to tell what the intended stress was in words that can be stressed in more than one way. In poetry, it is easy to tell where the stress is (even if the stress is technically in the wrong place) because of the meter and the rhyme. In prose, there are absolutely no hints to the stress. So of course if the original author included stress marks, such as in your textbook for foreigners example, then there is no problem. But if the original author did not include stress marks, then for all you know he could have intended "документы" to be mispronounced as "доку́менты". --WikiTiki89 06:33, 13 February 2014 (UTC)
See my comments below about the quotes from Qur'an. My point is, despite what WT:QUOTE says about "cafe" vs "café", providing stress marks, vowel points or ruby (Japanese, Mandarin) doesn't change the original style or meaning. It only serves to help in reading unknown texts. Words having multiple accents, like "до́гово́р", "апо́стро́ф", etc. can be marked with dual stresses or just by deciding on the most common/standard or intended stress (if it was pronounced) or, if really in doubt left without any stress. I'm not suggesting to make stress marks mandatory but "За сове́ты Чи́чиков благодари́л, говоря́, что при слу́чае не преми́нет и́ми воспо́льзоваться, а от конво́я отказа́лся реши́тельно, говоря́, что он соверше́нно не ну́жен, что ку́пленные им крестья́не отме́нно сми́рного хара́ктера, чу́вствуют са́ми доброво́льное расположе́ние к переселе́нию и что бу́нта ни в како́м слу́чае между́ ни́ми быть не мо́жет." is more helpful linguistically than "За советы Чичиков благодарил, говоря, что при случае не преминет ими воспользоваться, а от конвоя отказался решительно, говоря, что он совершенно не нужен, что купленные им крестьяне отменно смирного характера, чувствуют сами добровольное расположение к переселению и что бунта ни в каком случае между ними быть не может." and doesn't violate the original text in any way (apart from being normalised with the modern orthography). (N. Gogol, Dead Souls, 1842) --Anatoli (обсудить/вклад) 07:00, 13 February 2014 (UTC)
Quotations also demonstrate usage. And I’m not sure what you mean by style, but altering the text as you advocate changes the style of writing and typography. Quoting means not editing.
Have readers been asking us to “help” them read quotations?
If you are unhappy with WT:QUOTE, please propose some changes. Michael Z. 2014-02-13 18:12 z
So a quotation from an un-cited, un-dated (20th-century?) reprint, misleads our readers into thinking that the 1831(?) usage of Pushkin’s publisher was “оне́” and not “онѣ”. Is this not an example of how to fail as a descriptive dictionary and as a historical dictionary? Michael Z. 2014-02-12 18:04 z
I understand what you mean but the reality is that Russian language reform happened nearly 100 years ago. No-one uses old spellings and obsolete letters but the old literature, including Pushkin's "Сказка о царе Салтане" is available for today's readers. If it is important to quote the pre-reform spelling, it's a different story (e.g. to demonstrate original spelling rules, or obsolete letters, if they are available). Besides, the quote is for "оне́", not "онѣ́". Confucius's works are available in simplified Chinese and would be quoted in a simplified form on a simplified Chinese entry, even if simplified was not used in his time. I'm sure it's a similar situation with many languages. --Anatoli (обсудить/вклад) 22:32, 12 February 2014 (UTC)
How that a reason to change quotations or include misleading citations? Michael Z. 2014-02-12 22:44 z

What do you mean by "misleading"? Which reason do you mean? "и завидуют оне государевой жене" in books. Or search "如杀无道"... (simplified)/ "如殺無道" ... (traditional) in Google books (+ Confucius/孔子). --Anatoli (обсудить/вклад) 22:58, 12 February 2014 (UTC)

The so-called quotation in оне implies that Pushkin wrote “оне́”. But in fact he wrote “онѣ”, the Bolsheviks melted down the ѣ types decades later, so some book printed “оне”, and then a Wiktionary editor enhanced the quotation by indicating stress to yield “оне́”. Here is a quotation whose purpose is to demonstrate usage, showing 21st-century schoolbook style, with no date, and the title of an 1831 publication. Changing quotations and leaving out accurate citations lacks integrity and is poor academic practice. It is also against our guidelines.
  1. “The date corresponds with original authorship, the time that the citation was put into the exact words quoted.” – Please don’t omit dates in citations. Please don’t quote a 2000s children’s edition and cite it as if it were published in the 1830s. (If someone updated the orthography, then it is a new edition, not a reprint.)
  2. “Reproducing the spelling is important” – please don’t change 1830s quotations to 1960s spellings. Don’t change ѣ to е, or е to ё.
  3. “The presence or absence of diacritics, and which diacritic(s) are used is important” – please don’t add schoolbook stress marks to quotations. If it is important to indicate stress, add them to the transliteration, or indicate stress in editorial notes.
 Michael Z. 2014-02-12 23:21 z
We do the same thing in English with Shakespeare quotes, we use the modern normalized spellings rather than the original. --WikiTiki89 23:33, 12 February 2014 (UTC)
Says who? I see Shakespeare’s “star-cross'd” quoted in star-crossed.
We should strive to quote and cite accurately and in detail. As a historical dictionary, we should be useful to anyone studying Shakespeare or Pushkin, or the history of English or Russian usage. Michael Z. 2014-02-13 00:17 z
That's not a counterexample. If you've ever looked at older printings of Shakespeare (old as in hundreds of years), you would see things like (as printed here):
Gregorie, of my word Ile carrie no coales.
2 No, for if you doo, you ſhould be a Collier.
1 If I be in choler, Ile draw.
2 Euer while you liue, drawe your necke out of the collar.
I added emphasis to the spellings that would today be normalized, not including trivial differences like u/v and s/ſ. --WikiTiki89 00:44, 13 February 2014 (UTC)
  • I must agree with Michael to some extent that a quotation should be a faithful quote -- no editing, no spelling fixes.
That said, to provide another (hopefully useful) point on the graph, when I dig up old Japanese for quotation purposes, I try to find the original rendering. For quotes from the w:Man'yōshū, for instance, this is entirely in kanji, using different kanji from most modern usage, and with no kana at all. I then make use of extra lines to give 1) a transliteration into kana, 2) a transliteration into the Latin alphabet, and 3) a translation into modern English. See [[鶴#Noun]] or [[牙#Noun]] for two such examples.
For quotations in other languages, if desirable, I would support the addition of an extra line just below the quote itself to provide a modernized rendering of the quote. And if the quote in question is in a non-Latin script, I would also strongly recommend that we add a transliteration into the Latin alphabet, if at all possible.
With regard to WikiTiki's point, I think what Michael objects to is wikieditors altering quotes. Shakespearean terms can be found in published works in many different forms, so choosing one such published example should suffice. What I would certainly object to would be finding a quote and then altering it yourself as you type it in here to change spelling or diacritics, etc. If needed for clarity, add the altered version after the quotation, but not as the quotation. ‑‑ Eiríkr Útlendi │ Tala við mig 01:04, 13 February 2014 (UTC)
I agree, but I think you are wrong about Michael's point. The Russian examples that Anatoli gave are the exact equivalent of my example of Shakespeare. The modern orthography has been used in almost all reprints of older texts since the Russian Revolution, and the citation was of a reprint. --WikiTiki89 01:10, 13 February 2014 (UTC)
  • Perhaps I got muddled? I was responding (I think) to the mention of added diacritics to show stress, which seems to have been done by a wikieditor. That should (in my view) go on a second line, and not be given as the quotation itself. ‑‑ Eiríkr Útlendi │ Tala við mig 01:36, 13 February 2014 (UTC)
@Eirikr: I wonder what you think of furigana in Japanese quotations? Do you really believe that furigana or ruby would somehow modify the original meaning or style? Would a quotation like this really break any rules of quotation?
 (おそ) ()ります
osore irimasu
I beg your pardon, I'm very much obliged
(I'm using usex templates for sample quotatios.) The published books can definitely go both ways, with or without (usually without). Russian word stresses, Arabic/Hebrew diacritics are nothing but a means to help reading. They don't change the style or meaning of the original. Should I quote Qur'an? بسم الله الرحمن الرحيم and بِسْمِ ٱللهِ ٱلرَّحْمٰنِ ٱلرَّحِيمِ (bi-smi llāhi r-raḥmāni r-raḥīmi) is available with or without diacritics.
I find original Shakespeare impossible to read. Russian hasn't changed so badly but most people don't understand archaic letters. It doesn't matter now, who made the language reform, Communists or Nazis, the reform went ahead. Russian is not a dead language, so words said by someone in the past are now available and more common in reprints and new editions with a new spelling with currently standard spelling, letters and glyphs. --Anatoli (обсудить/вклад) 02:10, 13 February 2014 (UTC)
A quotation of Shakespeare in modern orthography, without a real citation, is a usage example (of modern Shakespeare), not a quotation. It has its place, but this doesn’t contribute to our role as a historical dictionary.
If our reader is to learn about Shakespeare’s English and how it has changed over four and a half centuries, then we must provide accurately dated, verbatim quotations from various periods. Michael Z. 2014-02-13 18:30 z
So you would be in favor of changing the quote at star-crossed to "From forth the fatall loynes of theſe two foes, / A paire of ſtarre-crost Louers tooke their life:"? --WikiTiki89 18:37, 13 February 2014 (UTC)
I would be in favour of having quotes on every citation page showing a term’s introduction, development, and the breadth of its meaning and usage. Of course I would love to see its first use and a Shakespeare quotation from every century.
Specifically, I don’t think a citation with 21st-century spellings dated “1592” belongs in a historical dictionary. Michael Z. 2014-02-14 04:16 z
To complicate matters further, the spellings I gave above are not Shakespeare's original spellings, but normalizations made by the publishers at the time. --WikiTiki89 04:25, 14 February 2014 (UTC)
Well, most published work has also been edited. Although citing different editions could show the range of spellings once once acceptable that are now more standardized. Michael Z. 2014-02-15 00:49 z
Arrowred.png @Atitarev:
Sorry I missed that earlier. Furigana as-is in usexes, I'm fine with -- usexes can often be created by wikieditors, and don't purport to be sourced from any particular published text. For quotations, I really don't think we should be putting furigana right in the quote. See again my preferred format for providing quotation readings as illustrated at [[鶴#Noun]] or [[牙#Noun]].
@Atitarev, Wikitiki89, and others:
To reiterate (and hopefully clarify), I feel quite strongly that the main line of the quotation should give the text as-is from the published work. If the published work has typos, misspellings, non-standard orthography, missing vowel points, etc., that should be faithfully reproduced in the main line of the quotation. Since such oddities may indeed be difficult for users to read, I also feel quite strongly that we (wikieditors) should have the option of also including a second line just beneath the quotation, wherein we can give reading assistance -- kana, diacritics, modern spellings, stress markers, what have you. And if the quotation is in a non-Latin script, we should then give a transliteration into the Latin alphabet, followed finally by the translation.
If an editor doesn't like the format of a quotation (in terms of missing diacritics, obsolete spellings, etc.), they have the option of finding a published work that provides the quotation without such shortcomings -- so long as we are sourcing from a published work (and appropriately referencing / pointing to that work), I think we're fine. For example, if we can find a published Qur'an that includes the vowels and other diacritics, we can choose to use that version, so long as we reference it. Quoting the Qur'an with vowels and diacritics, but referencing a version that doesn't have those, sounds to me like an error.
I hope that makes my position clearer? ‑‑ Eiríkr Útlendi │ Tala við mig 18:59, 13 February 2014 (UTC)

for fun

Just for fun, how many languages can you pick out by ear?: http://greatlanguagegame.com/ —Stephen (Talk) 03:25, 13 February 2014 (UTC)

Cool! I just played it several times and my highest score was 850. --WikiTiki89 07:26, 13 February 2014 (UTC)
I scored 700. The only languages I mixed up were either closely related (Czech/Serbian, Urdu/Panjabi) or share areal features (Tamil/Gujarati). —Aɴɢʀ (talk) 10:29, 15 February 2014 (UTC)
Czech/Serbian wasn't a problem for me, but Macedonian/Serbian was. But what confused me most were the Indian, African, and the non-CJKV East Asian languages. --WikiTiki89 10:38, 15 February 2014 (UTC)
I admit I only guessed Tigrinya correctly because I heard the speaker say something that sounded like Afrika and there were no other African languages suggested. —Aɴɢʀ (talk) 11:41, 15 February 2014 (UTC)

Bot vote

Hi all. I'd appreciate some discussion about the vote from my Asturian inflectobot. I'm the only one who's touched the page so far! --Back on the list (talk) 10:29, 14 February 2014 (UTC)

We need a good Asturian inflection bot, but there does not seem to be much confidence in your encoding logic. I can’t write bots myself, so I can only rely on what others think. —Stephen (Talk) 22:54, 15 February 2014 (UTC)
It doesn't matter too much anyway. There are other ways to mass-add the conjugated forms. At the moment, thousands upon thousands of semi-automatically-created Asturian conjugated form entries. The result is essentially the same, just it's slower and less fun. --Back on the list (talk) 16:02, 21 February 2014 (UTC)

Request edit for Module:labels/data

Please remove these lines:

labels["in singular"] = "in the singular"
labels["in dual"] = "in the dual"
labels["in plural"] = "in the plural"

Please replace them with these lines:

labels["in the singular"] = {display = "in the [[singular]]"}
aliases["in singular"] = "in the singular"
labels["in the dual"] = {display = "in the [[dual]]"}
aliases["in dual"] = "in the dual"
labels["in the plural"] = {display = "in the [[plural]]"}
aliases["in plural"] = "in the plural"

--kc_kennylau (talk) 08:14, 16 February 2014 (UTC)

Done. Keφr 10:24, 22 February 2014 (UTC)
@Kephir: Thanks. --kc_kennylau (talk) 11:04, 22 February 2014 (UTC)

Arabic script for w:Dungan language?

I ran into Category:Dungan terms needing native script in Special:WantedCategories, and started to create it. Then I noticed that both members were already in Cyrillic, which we treat as the only script for the language. Further digging turned up the fact that it was originally written in a Perso-Arabic-based script referred to by its Chinese name of w:Xiao'erjing, but that Soviet Union required speakers to switch to Cyrillic (with a Roman alphabet in between). The {{rfscript}} was added by User:Hippietrail, so it was a serious request- not the work of some POV-pushing islamist IP.

I think we should add it as a secondary script, but the question is: does our infrastructure associated with the Arab script code cover Dungan Xiao'erjing (apparently truly alphabetic) adequately, or do we need to create a separate script code? Chuck Entz (talk) 00:42, 17 February 2014 (UTC)

Our aim is to include all words in all languages, but also from all of their stages. So before and after every spelling reform, writing system change, etc. We did already have at least one or two Dungan terms in Arabic script before I added these requests. Let me see if I can find them ... — hippietrail (talk) 03:29, 17 February 2014 (UTC)

  1. حُوِ ذَو
  2. شِيَوْ عَر دٍ

hippietrail (talk) 03:32, 17 February 2014 (UTC)

Suggestions for Chinese tone contour categories

I'm going to start adding categories for "tone contours" for multi-character Chinese terms. I'd like to hear some opinions from other contributors.

By tone contours I mean grouping together all two-character words that have rising tone for the first character and falling tone for the second character, etc.

The obvious name for the categories is "Chinese tone contours "2-4" etc. But maybe there are better ideas.

Tone symbols could be used but that would only make it trickier to enter and use and the tone numbers are widely used and understood, and the method will work for other tone languages that use tone numbers.

The second issue is the neutral tone. Would "5" or "0" be preferred? I always liked "0" better, but "5" seems more popular.

A third issue is whether to group or separate traditional and simplified. I prefer to have just one category and the trad. and simp. forms should sort next to each other anyway. But for that would we need to include explicit sort information?

The worst issue though might be tone sandhi. The point of these categories is to help learners with tones. Most learners can reproduce correct tones in isolated syllables but even many fluent foreign speakers of Chinese end up with very flat tones in continuous speech. So I have been told both by native speaking friends and advanced learners.

So it seems that to make the category most useful for this purpose we should categorize by tones after sandhi. But to fit in with how we and other dictionaries generally treat tone sandhi we should categorize by tones before sandhi.

Maybe a compromise is to categorize by tones before sandhi and explain the process in the cat. boiler and link to any other categories which would end up with the same tone contour after sandhi.

So let's hear other people's thoughts. — hippietrail (talk) 01:53, 18 February 2014 (UTC)

I think the information about what the pre- and post-sandhi tones are is best covered in the entries themselves. Categories would be most useful to link those that have the same pattern of change in the tones: 33->23 or whatever. Chuck Entz (talk) 02:42, 18 February 2014 (UTC)
I don't like this idea. Wiktionary is a dictionary, not a language textbook. There is no Category:English words with counterintuitive pronunciations, Category:English pronunciations containing θ, or Category:English words with dark l. Wyang (talk) 08:37, 18 February 2014 (UTC)
Yes but you don't like many established features of Wiktionary. Wiktionary is a dictionary. It's a dictionary with lots more stuff in it than in some other dictionaries. This would be a type of index or appendix and Chinese dictionaries already have several kinds of indices and appendices that English dictionaries don't use. I could easily list three categories I personally don't like too. But that would be a separate discussion. And like any feature I don't care for I don't use those and don't contribute to those. Simple.
By the way I don't think any of your examples are bad ideas for categories. Certainly no worse than some categories I've seen and probably useful to some people. — hippietrail (talk) 09:58, 18 February 2014 (UTC)
The fact that I don't like many of the current practices doesn't mean my viewpoint should be viewed as more likely to be ridiculous than other people's. How much benefit will be gained by language learners from the existence of these categories, considering less than 5% of the current entries actually have audio samples? Wouldn't it be immeasurably easier for learners to resort to actual audio material elsewhere to get a feel for the native intonations than to look at these categories trying to figure out which tonal contour group a word belongs to? Who is going to be populating these categories with the 30000+ currently existing entries and who will be maintaining them, and how? Do these apply to disyllabic combinations only or do they apply to other multisyllabic combinations too (>4*5*5=100 possibilities for trisyllabic, >4*5*5*5=500 possibilities for tetrasyllabic)? Is the categorisation "Words with the tonal contour 4-1-3-2" meaningful? What about other Chinese varieties, eg. Min Nan (>8*8*8=512 trisyllabic for Taiwanese), or Cantonese (>9*9=81 disyllabic, >9*9*9=729 trisyllabic), especially ones with drastically different mechanisms of tone sandhi from Beijing Mandarin (eg. Min Nan, Shanghainese)? I don't mean to be picky but here are just some complications associated with such categorisations, which are the reasons I disliked this idea. Wyang (talk) 10:22, 18 February 2014 (UTC)
I suspect you never had to learn Chinese as an adult so you can only make guess at what is easy or difficult or useful. You're not the target audience so don't need to be annoyed with a feature you would never need or use.
Quoting what's missing only tells us what can be improved. We don't have enough audio. It's not directly relevant. One thing is that only native or very good speakers can contribute audio whereas anybody with some resources can contribute to the suggested categories. It could perhaps be that because native speakers wouldn't benefit from audio that they don't feel compelled to put the effort into creating them. I can point at any number of fluent nonnative speakers with very poor tones who had audio materials but didn't get a feel for the native intonations in polysyllabic words. Because the materials don't abstract the contours.
That fact that you imagine learners looking at categories trying to make guesses who how hard it is far a native speaker who has native intuition of tones to imagine the difficult learning process of a foreign learner. This is not what they'd do at all. They would look at the category, find some terms they recognize or think are useful, then ask somebody to say each of them to try to tone their ear to their shared aspect, the tone contours.
Can you point to a resource elsewhere which groups audio resources by tone contours? I agree that would be very useful indeed. I haven't been able to find one.
Open source is populated by people with an itch to scratch. People who are interested in populating the categories populate them, just like everything else on Wiktionary was created. You don't have to do any of it.
It would apply to any Mandarin terms of more than one syllable. Calculating numbers of combinations and permutations doesn't illustrate the futility of a category any more than you could do the same to illustrate the futility of attempting any kind of open source / data project. You could've used such an argument to illustrate that nobody would create Wiktionary or Wikipedia. It would've been wrong. There don't need to be categories for every possible sequence of tones because not all possible sequences are in use. Just as we don't have entries for pinyin syllables which are not in use.
If anybody wants to create similar categories for another sinitic or another tonal language I would support them. If nobody wants to create them, nobody will create them. There is no problem.
Another aspect of the supposed combinatorics problem is that it would be quite an easy project for a bot, if it were to capture anybody's attention. — hippietrail (talk) 12:04, 18 February 2014 (UTC)
Instead of categories, why not create appendices? A link of the appropriate appendix could be added to the actual entry under the See also header. While I like categorization, unfortunately, categories do not show up in the mobile view. A bot could monitor the appendices to make sure that the appendix items and the actual entries are in sync. --Panda10 (talk) 13:40, 18 February 2014 (UTC)
Yes I intended to fall back to indices if there was substatial objection to categories. Categories are generally better though because they do some of the work for you and work in a uniform manner.
It could be a good policy to not allow new categores that have not first been implemented and approved as indices though. And perhaps not even allow new public indices until they have first been implemented and approved as private indices.
As for category support lacking in the mobile app, that is disappointing to hear. But forcing people not to implement features because implementations are lacking is a bit like refusing to buy a smartphone because they don't have good Wiktionary apps. The correct approach is to use the better features and push to have implementations support them. See if there's a bug report to watch, or if not then file a bug report.
Since it's a rainy day today I'll look at making some sample private indices using my personal vocabulary study lists as subpages of my user page. — hippietrail (talk) 07:16, 19 February 2014 (UTC)

The following tables are excerpted from the article on the Tianjin dialect in Chinese Wikipedia I wrote a while ago, which is spoken ~100 km east of Beijing and which is characterised by distinctly different tonal values and tone sandhi patterns from Beijing Mandarin. The Beijing dialect is much simpler than this. I think the amount of information contained within these tables (at most, with in-table audios and examples in Table 2) is optimal for appendix content for tone sandhi patterns on Wiktionary, if it is to exist.

Tone sandhi patterns of disyllabic words in the Tianjin dialect
1st syllable2nd syllable Dark level (21)
Bright Level (45)
Rising (24)
Departing (53)
Dark level (21)
RL (24 21)
观音 开车
中华 金银
天主 生产
金库 希望
Bright level (45)
桃花 回家
红糖 长城
鞋底 良好
罗汉 鞋店
Rising (24)
火车 紧张
LH (21 45)
主人 找钱
HR (45 24)
选举 总理
LD (21 53)
手段 讲话
Departing (53)
HL (45 21)
汽车 送书
汽油 问题
怕死 市长
LD (21 53)
世界 运动
Tone sandhi patterns of trisyllabic words in the Tianjin dialect
First two syllables3rd syllable Dark level (21)
Bright level (45)
Rising (24)
Departing (53)
(21 21)
(21 24 21)
(24 21 45)
(24 21 24)
(24 21 53)
(21 45)
(21 45 21)
(21 45 45)
(21 45 24)
(21 45 53)
(21 24)
(21 24 21)
(21 21 45)
(21 45 24)
(21 21 53)
(21 53)
(21 45 21)
(21 53 45)
(21 53 24)
(24 21 53)
(45 21)
(45 24 21)
(45 21 45)
(45 21 24)
(45 21 53)
(45 45)
(45 45 21)
(45 45 45)
(45 45 24)
(45 45 53)
(45 24)
(45 24 21)
(45 21 45)
(45 45 24)
(45 21 53)
(45 53)
(45 45 21)
(45 53 45)
(45 53 24)
(45 21 53)
(24 21)
(45 24 21)
(24 21 45)
(24 21 24)
(24 21 53)
(24 45)
(21 45 21)
(21 45 45)
(21 45 24)
(21 45 53)
(24 24)
(45 24 21)
(45 24 45)
(45 45 24)
(45 21 53)
(24 53)
(21 45 21)
(21 53 45)
(21 53 24)
(24 21 53)
(53 21)
(53 24 21)
(45 21 45)
(45 21 24)
(45 21 53)
(53 45)
(53 45 21)
(53 45 45)
(53 45 24)
(53 45 53)
(53 24)
(53 24 21)
(53 21 45)
(53 45 24)
(53 21 53)
(53 53)
(21 45 21)
(21 53 45)
(21 53 24)
(45 21 53)

Wyang (talk) 12:49, 19 February 2014 (UTC)

Pre-Roman substratum language code nargery

Following up on a question I posed last year: should und-ibe ("pre-Roman (Iberia)") and und-bal ("pre-Roman (Balkans)") be moved from Module:languages/datax to Module:etymology language/data? Module:languages is for languages that are allowed entries (whether in the main namespace or in appendices); Module:etymology language is for lects that are only mentioned in etymologies and are not allowed entries. Given that both und-ibe and und-bal potentially represent multiple unrelated languages, would it ever be appropriate to have entries/appendices in either? If so, then I guess they should stay in Module:languages; if not, then it seems like they should be moved to Module:etymology language. "Pre-Greek", which seems conceptually related, is in Module:etymology language. - -sche (discuss) 06:06, 18 February 2014 (UTC)

Support. Incidentally, is it possible to make these codes categorise as pre-Roman (foo) but only display pre-Roman? — Ungoliant (falai) 23:19, 24 February 2014 (UTC)
I've moved the codes.
I don't think there is a way, within the current framework, to make pages display one name and categorise using another. Besides, so long as we consider "pre-Roman (Iberia)" and "pre-Roman (Balkans)" separate enough to warrant separate codes and categories, I think that it's appropriate that they also display distinct names. - -sche (discuss) 07:06, 1 March 2014 (UTC)
Semi-related. Has the question of a general pre-Ide. category ever been brought up (i.e., not paleo-Balkan or pre-Itallic but a general category usable for whole of Europe)? I think I searched for this a while ago but didn't come up with any discussions. At present there are a couple of entries with a (referenced) proposed pre-Ide. etymology (zaķis is one that I recall). I could potentially have one Livonian entry with a detailed reference ultimately tracing it to a possible pre-Ide. substratum (via Curonian though) and I'd love to have it classified as such.
Pre-Ide. is notorious for attracting pseudo-science but since such a category would have maybe a couple dozen items at most I think it would be pretty easy to check it for offenders and ax them on sight. If this hasn't been discussed in the past ad infinitum, ad nauseam (which didn't seem to be the case) perhaps it would be better to create a new section in BP? Neitrāls vārds (talk) 23:08, 13 March 2014 (UTC)
I use the qfa-sub (substrate) code in such cases, e.g. in մուր (mur). --Vahag (talk) 07:02, 14 March 2014 (UTC)
Thanks, what I needed! The only problem – the w: link directs to a redirect to w:Stratum (linguistics) which has zero relevance, it should link to w:Pre-Indo-European languages, could this be changed? Neitrāls vārds (talk) 04:48, 15 March 2014 (UTC)
No, qfa-sub is not just for Pre-Indo-European languages. It is the generic code for any substrate language. --Vahag (talk) 07:37, 15 March 2014 (UTC)
But what is there besides ine? Uralic is the other well represented family and I have yet to see any concrete speculations of "pre-Uralic" etymologies (they always stop at "...source is not clear.") Maybe I could inquire with User:-sche on the possibility of introducing a general pre-ine code. My Livonian word is via Baltic so as I said wouldn't even touch "pre-Uralic" (which seems nonexistent) it supposedly has cognates in Romanian so I could use pre-Balkan but that would be mildly retarded because Eastern Baltic is not Balkans, lol. Neitrāls vārds (talk) 04:27, 16 March 2014 (UTC)
Substrate languages occur world-wide, and can potentially be detected anywhere the historical phonotactics have been worked out. In general, it's not a good idea to make categorical assumptions based on the extremely fragmentary nature of our information on most of the languages of the world. How well do you know Sino-Tibetan or Afro-Asiatic or Niger-Congo or Pama-Nyungan or Uto-Aztecan or Na-Dené or Algic or Northwest Caucasian, or even Turkic? Can you guarantee that none of them will ever have substrates detected that are unidentifiable to family? Chuck Entz (talk) 06:05, 16 March 2014 (UTC)
I'm not sure what you're getting at. I don't think anyone is proposing to get rid of the generic "qfa-sub". Rather, Neitrāls vārds wants an additional, more specific code for "pre-Indo-European", on the model of "pre-Roman (Iberia)". I suppose such a code would be as useful as pregrc (sic) and und-ibe and und-bal. As I commented in a previous thread, we should perhaps develop a naming scheme for these codes, though; you'll notice how variant the codes currently are. Perhaps "pre-grc", "pre-ine", "pre-rib" (pre-Roman (Iberia)) and pre-rbk (pre-Roman (Balkans))? Or perhaps three-part codes are in order, "sub-pre-grc", "sub-pre-ine", "sub-pre-ibe", "sub-pre-bal"? - -sche (discuss) 06:36, 16 March 2014 (UTC)
I was (over-)reacting to the first sentence, not the rest of it. Chuck Entz (talk) 07:03, 16 March 2014 (UTC)

(reset indent) Lol at the last comment :D (well, the "what is there besides ine" was intended to be semi-facetious as I'm myself mostly interested in Uralic.) Yes, -sche, that is exactly what I'm thinking – pre-ine as a "hypernym" to the already existing pre-Itallic and pre-Balkan. Would be great if that could be added. Maybe as a safety measure an extra sub- (as in sub-pre-ine) could, indeed, be added (as thing like "pre-German" can sometimes be encountered.) Neitrāls vārds (talk) 21:06, 16 March 2014 (UTC)

Preventative block for User:DCDuring

Half an hour ago I noticed I got blocked out of the blue, with no stated reason. I doubt DCDuring would really do something like that, so I'm thinking that the account may have been compromised and I've issued a preventative block of a day so that we can figure out what's going on. —CodeCat 01:06, 19 February 2014 (UTC)

Although that is odd, it should be mentioned that his immediately previous edit was on your talk page, saying "you are dead wrong" (re Ossining). Equinox 01:17, 19 February 2014 (UTC)
I saw that too, but I didn't really know what to make of it. Did he block me for reverting him? That alone would be bad, but I actually reverted my own revert. So was it out of spite? —CodeCat 01:23, 19 February 2014 (UTC)
Dunno. Try an e-mail! Equinox 01:45, 19 February 2014 (UTC)
Ok, I sent an email asking him to shed some light on it. —CodeCat 01:51, 19 February 2014 (UTC)
He replied, saying "You reverted my reversion of SB's reversion on entry for Ossining for no stated reason." I reverted DCDuring in the first place because I thought it was a simple mistake; sometimes people mis-click. But then I saw comments on User:SemperBlotto's talk page and realised it was intentional, so I undid my own revert. DCDuring blocked me half an hour after that. So apparently that's enough for a block? I'm not seeing it, it seems petty to me. I've removed the block in any case. —CodeCat 03:30, 19 February 2014 (UTC)
I am finding it really hard to be interested in this issue but you have both been around for a zillion years and I am sure you can sort it out on talk pages. Block schmock. <3 Equinox 04:06, 19 February 2014 (UTC)

Universal Language Selector will be enabled by default again on this wiki by 21 February 2014

On January 21 2014 the MediaWiki extension Universal Language Selector (ULS) was disabled on this wiki. A new preference was added for logged-in users to turn on ULS. This was done to prevent slow loading of pages due to ULS webfonts, a behaviour that had been observed by the Wikimedia Technical Operations team on some wikis.

We are now ready to enable ULS again. The temporary preference to enable ULS will be removed. A new checkbox has been added to the Language Panel to enable/disable font delivery. This will be unchecked by default for this wiki, but can be selected at any time by the users to enable webfonts. This is an interim solution while we improve the feature of webfonts delivery.

You can read the announcement and the development plan for more information. Apologies for writing this message only in English. Thank you. Runa

Codes the ISO has split or merged (first batch)

In 2012 and 2013, the ISO retired several codes by merging them into other codes or splitting them up. Thirty of these retirements appear to have escaped our notice. Here are the first fifteen, plus my thoughts on them; I'll post the rest another day. If you know a reason we should or shouldn't follow the ISO in a particular case, please comment! - -sche (discuss) 06:09, 20 February 2014 (UTC)

merging the Tanudan Kalingas

The ISO had granted separate codes to "Upper Tanudan Kalinga" (kgh) and "Lower Tanudan Kalinga" (kml), but in 2012 they merged them into kml as "Tanudan Kalinga" and retired the code kgh. I suggest we follow suit. (Kalinga is a dialect continuum; we could consider merging even more Kalingas later.) - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 04:59, 1 March 2014 (UTC)

merging the Wemales

The ISO had granted separate codes to "South Wemale" (tlw) and "North Wemale" (weo), but in 2012 they merged them into weo as "Wemale"; they retired the code tlw. I suggest we follow suit. Side note, does anyone know how "Wemale" is pronounced? - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 19:07, 25 February 2014 (UTC)

splitting Garawa and Wanyi

In 2012, the ISO retired gbc (the code they had used for this language), splitting it into wrk for Garawa proper (which they call by the less common name "Garrwa", and which also goes by "Karawa") and wny for Wanyi (also spelt "Wanji", "Waanyi"). Garawa and Wanyi are closely related, but enough scholarly literature distinguishes them that I think we should follow the ISO in splitting them. - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. I've gone ahead and made the split, because there is actually some Wanyi content that I've wanted to add. - -sche (discuss) 23:10, 21 February 2014 (UTC)

splitting Kadu and Kanan

In 2012, the ISO retired kdv, the code that had been used for the Kado/Kadu variety of Sak, and split it into zkd (Kadu proper) and zkn ("Kanan"). I can find no information about Kanan. One might say "well, let's assume the ISO know what they were doing", but compare Aghu Tharrnggala / Gugu Mini below! - -sche (discuss) 06:09, 20 February 2014 (UTC)

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 22:52, 8 March 2014 (UTC)

splitting Paku and Mobwa Karen

In 2012, the ISO retired kpp (the code they had used for Paku Karen), assigning it the new code jkp at the same time as they split off jkm, "Mobwa Karen". Both Paku and Mobwa are dialects of S'gaw Karen, as is wea ("Wewaw"). - -sche (discuss) 06:09, 20 February 2014 (UTC)

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 22:52, 8 March 2014 (UTC)

splitting Mudburra and Karranga

In 2013, the ISO retired Mudburra's code mwd, splitting it into dmw ("Mudburra" proper, also spelt "Mudbura") and xrq ("Karranga", "Karrangpurru"). Quoth WP, "McConvell suspects Karrangpurru was a dialect of Mudburra because people said it was similar. However, it is undocumented and thus formally unclassifiable." If that was the basis for incorporating it into Mudburra, I suppose we should follow the ISO in making the split. We're not going to have any content in Karranga either way: to quote Mark Harvey, "there is no linguistic material directly on Karranga". - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 04:30, 1 March 2014 (UTC)

splitting Jiwarli and Thiin

In 2013, the ISO retired djl, which had been Jiwarli's language's code, and split it into dze ("Jiwarli" proper — they used the markedly rarer spelling "Djiwarli"; it also goes by "Tjiwarli") and iin (Thiin). Djiwarli and Thiin are closely related; they and two other lects are sometimes considered to form the dialect continuum Mantharta. WP says the varieties (all extinct) "were distinct but largely mutually intelligible". We could either make the split, or not, or go as far as to not only keep Thiin and Jiwarli unified but also unify Mantharta's other two dialects, dhr (Dhargari) and wri (Warriyangga). - -sche (discuss) 06:09, 20 February 2014 (UTC)

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 18:01, 23 March 2014 (UTC)

Aghu Tharrnggala et al

In 2013, the ISO retired ggr (the code they had used for this lect) and split it into gtu (Aghu Tharrnggala proper — they use the less common spelling "Aghu Tharnggala"), ggm (Gugu Mini), and ikr (Ikarranggal). They retired ggm a year later upon realizing that it was not a specific language but rather a cover term for various languages. I can't find evidence of Ikarranggal, either. I suggest we go along with the recoding of Aghu Tharnggalu as gtu (there being no reason to continue using a retired code when an up-to-date one exists for the same language). We already deleted ggm. ikr was already added to Module:languages; I suppose we could let it be. - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 07:15, 1 March 2014 (UTC)

merging things into Rakhine

In 2012, the ISO merged "Yangbye" (ybd) and "Chaungtha" (ccq) into Rakhine (rki). I propose we follow suit. Rakhine, also called Arakanese, is sometimes considered a dialect of Burmese. We can discuss whether or not to merge rmz (Rakhine's other major dialect) into rki, or even rki into Burmese, at a later date. - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 22:11, 24 February 2014 (UTC)

splitting Yendang and Yotti

In 2012, the ISO retired yen, the code which had been used for this language; they split it into ynq for Yendang proper and yot for Yotti. - -sche (discuss) 06:09, 20 February 2014 (UTC)

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 18:16, 23 March 2014 (UTC)

splitting Yir-Yoront and Yirrk-Mel / Yirrk-Thangalkl

In 2013, the ISO retired Yir-Yoront's code yiy and gave it the new code yyr at the same time as they split off yrm for "Yirrk-Mel". Ethnologue and the ISO are rather "splittist" when it comes to Australian language, and the Yir-Yoront Lexicon speaks of it and Yirrk-Mel as having been merely "sister dialects"... but Yir-Yoront and Yirrk-Mel did have somewhat different phonological inventories... I don't really have an opinion on whether to split them or not. Note for whoever updates the language codes+names: "Yirrk-Mel" is also called "Yirrk-Thangalkl", "Yir Thangedl", "Yirr-Thangell"; "Yir-Yoront" is also spelt "Yir Yoront", "Yirr-Yoront", "Yirr-Yorront". - -sche (discuss) 06:09, 20 February 2014 (UTC)

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 18:16, 23 March 2014 (UTC)

merging Baagandji

The ISO had granted separate codes to bjd (which they called by the uncommon name "Bandjigali") and drl (which they called by the placename "Darling"). In 2012, they merged them into drl, which they now call "Paakantyi", although its most common name seems to be "Baagandji". (It also goes by "Baagandji".) I suggest we follow the ISO in merging bjd into drl, and call the end result "Baagandji". - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 06:18, 24 February 2014 (UTC)

merging Atamanu / Yalahatan

The ISO had granted separate codes to two dialects of Atamanu, named for the villages where the dialects were spoken: hrr for the variety spoken in Horuru / Haruru, and jal for the one spoken in Yalahatan. In 2012, they merged hrr into jal. Ethnologue says there were only "slight dialect differences reported between the 2 villages". (It also offers the curious comment that "the name Atamanu is not currently known".) I suggest we follow the ISO in merging hrr into jal. - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 04:52, 1 March 2014 (UTC)

merging Ibilo into Okpamheri

In 2012, the ISO merged Ibilo (ibi), "an Okpamheri dialect [...] spoken at the northern foot of the hills on which Oloma and Emhalhe (Somorika) are spoken" (quoth Ben Elugbe, writing in Current Approaches to African Linguistics, volume 6, 1983, ISBN 9070176572), into Okpamheri (opa). I see no reason not to follow suit. - -sche (discuss) 06:09, 20 February 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 22:07, 24 February 2014 (UTC)

Amendment to the Terms of Use

Phase out the synonyms of {{a|GenAm}} and {{a|RP}} (namely {{a|US}} and {{a|UK}})

For a long time, we have tolerated {{a|US}} as a synonym of {{a|GenAm}} and of {{a|UK}} as a synonym of {{a|RP}}. This discussion in the Tea Room is the latest in a long line of discussions that have made clear how misleading this is: the labels "US" and "UK" seem to cover the entireties of those countries, yet we use them only for very specific accents. I propose that we have a bot replace all instances of {{a|US}} with {{a|GenAm}}, and all instances of {{a|UK}} with {{a|RP}}. Alternatively, if we wanted to check all uses of "US" and "UK" to ensure they were not being used in ways other than as synonyms of "GenAm" and "RP" (out of our 3.5 million pages, I did see one once that used both "US" and "GenAm"), we could simply change it so that "US" displayed "GenAm" and "UK" displayed "RP". Then, at our leisure, we could go through all the occurrences of those templates, checking them by hand; furthermore, they would still exist as redirects for editors to use, without misleading readers by displaying text that implied they covered all of the US and UK, respectively. Thoughts? - -sche (discuss) 21:34, 24 February 2014 (UTC)

  • I've been doing this manually for months whenever I see {{a|UK}} and {{a|US}}, so obviously I support getting a bot to do it. —Aɴɢʀ (talk) 22:00, 24 February 2014 (UTC)
To me it looks like yet another way of driving away normal users and making it a dictionary of little use to any but the kind of folks who contribute here. I would have thought that the right direction is precisely the opposite, with links and footnotes to explain the relationship of the instantly understandable and the 'accurate enough' "US" and "UK" labels to the technically precise and very impressive looking 'General American' and 'Received Pronunciation', which the those not students of language don't instantly understand. That a casual user will follow links goes against our experience and against human nature. To assume that they will or should is fully in accord with our practice and with human nature, but nonetheless wrong. DCDuring TALK 22:59, 24 February 2014 (UTC)
Is our mission to be "accurate" or "accurate enough"? —CodeCat 23:06, 24 February 2014 (UTC)
after edit conflict:
  • With the right notes and descriptions, we can be both easy to use (DCDuring's "accurate enough") and linguistically academically precise ("accurate").
I'm considerably more up on labels and WT than casual users, and I must confess that I too am somewhat put off by the {{a|RP}} label. “UK”, especially in juxtaposition with “US”, is clearly the United Kingdom, but “RP” leaves me confused and uncertain until I click through. If we're expecting users to click through to find stuff (an assumption that is often flawed, as any longer observation of WT:Feedback will confirm), then why not put the extended explanation that "In this context, the label 'UK' refers to the received pronunciation common in Britain," etc., on the linked page? I'm not a fan of obtuse and opaque labels. ‑‑ Eiríkr Útlendi │ Tala við mig 23:19, 24 February 2014 (UTC)
From my perspective, "UK" is a more obtuse label than "RP", because contrary to your notion, RP is not common in the UK. Someone noted in the Tea Room the statistic from Peter Trudgill that it's used by only 3% of the population. The fact that all (or, see below, almost all) of our entries use "UK" to gloss RP pronunciations is thus obtuse ("intellectually dim-witted", "indirect or circuitous") — neither "accurate" nor "accurate enough". It needs to change regardless of whether our goal is to stop misleading our readers the way we currently do, or start misleading them in some new way (like by using "UK" to gloss some never-used "conflation pronunciation"/"abstraction" of Cockney plus RP plus Scottish plus Estuary plus Northern Irish plus Welsh, as proposed below). - -sche (discuss) 19:00, 25 February 2014 (UTC)
  • What about just using the full labels, then? It occurs to me to wonder why we're torturing ourselves by coming up with such obtuse initialisms. “RP” in a pronunciation section leaves me scratching my head, but “Received Pronunciation” is much more clear; even better, “Received Pronunciation accent (England)” with a link to w:Received Pronunciation. Likewise with “GA” or “GenAm” versus just writing out “General American accent” with a link to w:General American. WT:NOT #4: Wiktionary is not paper. Since we don't have to worry about ink or page space, why don't we just write out the full labels? ‑‑ Eiríkr Útlendi │ Tala við mig 19:40, 25 February 2014 (UTC)
  • (United States of America) IPA: /tuː/
  • (United Kingdom of Great Britain and Northern Ireland) IPA: /tuː/
  • (Commonwealth of Australia) IPA: /tuː/
  • (Republic of Ireland) IPA: /tuː/
Abbreviations are not just for saving ink.  Michael Z. 2014-02-26 21:30 z
I support spelling out "RP" as "Received Pronunciation" and "GenAm" as "General American". (I think "General American accent" would be unnecessarily wordy, and "Received Pronunciation accent" sounds like it has tautologitis, a condition related to PNS syndrome.) All it would take, I think, is one edit to Template:accent:RP and another to Template:accent:GenAm. (PS, note that "GA" is not used; the use of labels like "NY" means it would be too ambiguous, potentially meaning "Georgian accent" rather than "GenAm". See the discussions in the Whatlinkshere of Template:accent:GA if you're really bored.) - -sche (discuss) 19:35, 26 February 2014 (UTC)
  • Oppose. I have no problem with adding {{a|RP}} and {{a|GenAm}} in addition to {{a|UK}} and {{a|US}}, but I see these as all meaning different things and oppose the removal of pronunciations tagged as {{a|UK}} and {{a|US}}. My ideal pronunciation sections are as follows:
    Obviously, there will ideally also be other dialects such as Australian and South African included. --WikiTiki89 23:18, 24 February 2014 (UTC)
  • What WikiTiki said.​—msh210 (talk) 23:28, 24 February 2014 (UTC) More precisely: I agree with WikiTiki that we should have as many dialects as possible, with a "general" American and British listed atop the other American and British (respectively) accents where such can be determined. I don't know, however, what delimiters we should use.​—msh210 (talk) 02:39, 25 February 2014 (UTC)
    I'm not sure about the delimiters either. In a past discussion, someone once suggested using //double slashes// for diaphonemic transcriptions and /single slashes/ for ordinary phonemic transcriptions. --WikiTiki89 02:46, 25 February 2014 (UTC)
    I think GenAm and RP should continue to be given in /broad/ transcription rather than [narrow] and your US and GenAm transcriptions of cannot seem to be wrong; our entry and all the other dictionaries I've checked have the first vowel as /æ/. But as far as I can tell, it would require us to do what I propose — systematically clear out all the current uses of "US" and "UK" — because it would not seem to be possible or consistent (and I would oppose) to start using "US" and "UK" to mean something other than "GenAm" and "RP" while all current uses of them are still as synonyms of "GenAm" and "RP". - -sche (discuss) 23:59, 24 February 2014 (UTC)
    Re pronunciation of cannot: In my experience it is more common in the US to put the stress on the second syllable, thereby reducing the first vowel. Stress on the first syllable is less common and I didn't think it was worth including here (it is listed at the entry itself). --WikiTiki89 00:16, 25 February 2014 (UTC)
    If the first syllable is reduced, wouldn't it just be a schwa? [ɪ] is commonly an allophone of /ə/, but then it shouldn't be in broad transcriptions (since it doesn't contrast with a schwa)... - -sche (discuss) 01:38, 25 February 2014 (UTC)
    There's a lot more to it than that. Since I have not actually done research on this, I will give examples from my idiolect. In my idiolect (and I believe this to be common in America and perhaps worldwide), the velars /k/ and /ɡ/ are palatalized by back vowels (everything from /iː/ to /æ/, and even by /aʊ/, which is realized closer to [æʊ], but not by /aɪ/). In my idiolect, the /kə/ in caboose (/kəˈbuːs/) is contrasted with the /kɪ/ in kibbutz (/kɪˈbuːts/) and this distinction is very noticeable due to the palatalization of in kibbutz and lack of palatalization in caboose. Since the unreduced vowel after /k/ in cannot is /æ/, which palatalizes, it is reduced to the palatalizing /ɪ/ rather than the unpalatalizing /ə/. --WikiTiki89 01:52, 25 February 2014 (UTC)
  • Re 'all current uses of them are still as synonyms of "GenAm" and "RP"': I don't think this is true. Maybe the majority, but definitely not all. --WikiTiki89 00:16, 25 February 2014 (UTC)
    Given that "US" = "GenAm" has been the case since long before I started editing, I imagine that fewer entries use "US" to mean something other than "GenAm" than use {{head|foo|adjective}} under a ===Noun=== header. Which is to say, there are certainly a few of our 3.5 million entries that do, but they're not standard (and there's no way to know if the way they do use "US" is the way you want to start using it). - -sche (discuss) 02:08, 25 February 2014 (UTC)
    Maybe so, but I know I've seen and even added ones that are not. But I agree that either way, we should not do this by bot without checking the pronunciations in some way. --WikiTiki89 02:15, 25 February 2014 (UTC)
    Whatever bot changes the current uses of "US" and "UK" (because it is not be feasible to do except by bot) could be programmed to list and then ignore pages that used both "US" and "GenAm", or "UK" and "RP". - -sche (discuss) 18:44, 25 February 2014 (UTC)
    I don't think it's common at all for them to both be used. In the cases that US does not refer to GenAm, it is because the pronunciation is more generalized, such as by using symbols such as /ɒ/, /ɑː/, /iː/, /ə(ɹ)/, etc. --WikiTiki89 19:29, 25 February 2014 (UTC)
  • or
  • It's been suggested before that our pronunciation sections for English terms show in the first instance just RP and GenAm and that everything else gets put in a collapsible box. I think that's a good idea. —Aɴɢʀ (talk) 01:05, 25 February 2014 (UTC)
    Why not use a diaphonemic approach? List the diaphonemic realisation(s), then RP and GA, and then any others. That way, someone who understands the diaphonemes knows how to derive their own dialect from this. We do it this way for Dutch already (which is also a language with two more or less "standard" pronunciations), and probably many other languages as well. —CodeCat 01:14, 25 February 2014 (UTC)
    Because that is essentially what I'm suggesting except that I didn't use the word "diaphoneme". --WikiTiki89 01:18, 25 February 2014 (UTC)
    @Angr, Your example right there is exactly what I want to avoid. (And as a side note, hotdog is pronounced [ˈhʌtˌdɒɡ] in New England.) So here's how I would correct it:
    hot dog:
    Theoretically, I think we should group Canada with US, but then we would lose the simplicity of a simple "US" tag in favor of something like "North America" (because "NA" would not be understood by the
    But your method of avoiding the problem is to provide incorrect broad transcriptions. IPA(key): /ˈfʊtɚ/ is not the correct broad transcription for most of England and all of Scotland, nor are IPA(key): /ˈhɒtˌdɒɡ/, /ˈhɒtˌdɔːɡ/ the correct broad transcriptions for most of North America. Being wrong is simply too high a price to pay for being easy. —Aɴɢʀ (talk) 01:29, 25 February 2014 (UTC)
    That's because you're missing the point. If you take a particular dialect individual, the pandialectal transcription is most likely gonna be wrong. The pandialectal (or diaphonemic) transcription is the bigger picture. It gives the overall phonemic structure of the word that is common to all dialects, even if those dialects merge or shift some of the vowels. --WikiTiki89 01:36, 25 February 2014 (UTC)
    ... so, in other words, what Angr said: the pan-dialectal transcription is wrong and not an accurate representation of how all or most of English's dialects actually pronounce things. - -sche (discuss) 01:54, 25 February 2014 (UTC)
    It's not "wrong", it's just more general and abstract. You have to stop thinking of it as a direct phonetic transcription (which it wouldn't be anyway because we don't represent allophones). --WikiTiki89 02:01, 25 February 2014 (UTC)
    Phonetic transcriptions attempt to transcribe the actual sound speakers articulate. Phonemic transcriptions abstract from that and describe only the features that contrast within a particular speaker's dialect. So each dialect has its own set of phonemes. Diaphonemic transcriptions go further still, and try to describe correspondences between the phonemes of speakers across many dialects. So for example a diaphonemic description of English would acknowledge that /əʊ/, /aʊ/, /ɛʊ/ and so on are all different manifestations of the same underlying diaphonemic unit, depending on dialect. Diaphonemic transcriptions do not describe sounds, therefore, but correspondence sets, and are not all that different from correspondence sets used in historical linguistics; in a sense, they are a synchronic proto-language. Diaphonemic transcription sometimes runs into problems when it's not just the realisations of phonemes that change, but also the different phonemic contrasts (the many splits/mergers in English for example). In cases like that, usually the contrast is maintained in the transcription if it's made in at least some of the dialects. For English, that would mean that if we include Scottish English as part of the English diaphonemic system, then /x/ is an English diaphoneme separate from /k/, and there will also be contrasts among vowels that the majority of English speakers don't distinguish. Diaphonemic systems become progressively more abstract as you include more dialects, because by necessity that means digging further back into the history of the language to the point of departure between various dialects. For the British-vs-American split, that still lies in the modern period, but for many dialects within Britain itself, the splits may be Middle or even Old English in origin. —CodeCat 17:24, 25 February 2014 (UTC)
    That means it'd be worse than unhelpful — it'd actually be harmful to the project. In the words of DCDuring, it "seems like pure contributor-community self-indulgence". It would only serve to mislead and confuse our readers. - -sche (discuss) 18:44, 25 February 2014 (UTC)
  • If we take up more than an inch of vertical screen space, what parts of the pronunciation would be best hidden to allow folks to get to the definitions, which is what those few users who still come by say they want. Do we have any idea who actually wants pronunciations and how many of them can read IPA?
This seems like pure contributor-community self-indulgenceDCDuring TALK 01:33, 25 February 2014 (UTC)
Perhaps the nested transcriptions could be hidden the same way we hide quotations, leaving only the top-level ones by default. --WikiTiki89 01:36, 25 February 2014 (UTC)

RP ≠ UK. Consider ally, pronounced /ˈælaɪ/ in RP and /ˈalʌɪ/ in UK. — I.S.M.E.T.A. 14:54, 25 February 2014 (UTC)

Your point is right, but your evidence is wrong. Those are two different transcriptions of the same pronunciation from different editions of the OED. --WikiTiki89 17:04, 25 February 2014 (UTC)

@Wikitiki89, CodeCat: Having first read the word diaphoneme here, used by the two of you, I did some research and created the aforelinked entry. Could you check it for accuracy, please? — I.S.M.E.T.A. 23:49, 25 February 2014 (UTC)

Could we not be both clear and precise by using simple terms and providing our specific meaning in a link. Most dictionaries do this for usage labels and no one ever minds. If we absolutely must be literalist in our labelling, then add a supplementary abbreviation:

  • (British [RP])
  • (North American [GenAm])

Entries for Japanese verb forms?

It seems like there's some demand for entries for Japanese verb forms. Somewhere down the years, I wound up with the impression that verb forms for Japanese were not to be added. More recently, I've been over WT:AJA and WT:ELE and I don't see any policy text stating that we shouldn't create these. Moreover, we do have entries for verb forms for numerous other languages.

Would any other editors have strong feelings in opposition to the creation of verb form entries for Japanese? If so, why? ‑‑ Eiríkr Útlendi │ Tala við mig 22:49, 24 February 2014 (UTC)

I think it is probably about creating verb forms for agglutinative languages in general. Wyang (talk) 01:17, 25 February 2014 (UTC)
Personally, I support creating entries for a few basic verb forms, as given in the tables. I suppose it can be difficult to draw the line, but that would be for the Japanese editors to decide. As a beginner, I initially found it difficult to guess the lemma form correctly, and although I already find that I am not in need of this, I can certainly see how it would have been useful. —Μετάknowledgediscuss/deeds 08:32, 25 February 2014 (UTC)
I strongly support the addition of Japanese verb forms. My Japanese is poor, so I'm not aware of how many verb forms any given verb can have. That said, I offer a Latin verb for comparison: Consider scīscō. It has 104 non-participial conjugated forms and four participles; its present active participle, future active participle, perfect passive participle, and future passive participle themselves inflect, the p.a.p. having 24 declined forms and the other three participles 36 declined forms each. That means that the one verb scīscō has 236 verb forms or forms of verb forms, all of which get entries; of course, not all of those forms are heteromorphic, so fewer than 236 pages are affected by it, but you get some idea by this of the extent of Latin conjugation. Even if Japanese verbs end up with a comparable number of conjugated forms, I still don't see what the problem is. — I.S.M.E.T.A. 17:00, 23 March 2014 (UTC)

Entries for Japanese adjective forms?

  • Arrowred.png If the community is accepting of verb form entries, we might need a new header template.
Regular verb forms take a ===Verb=== header, followed by a template such as {{ja-verb form}}, and then the definition line containing the inflected verb form, a description of how it is inflected, and a link to the lemma.
Japanese also has a class of term generally described as an adjective in English, or more specifically as an i-adjective for English-speaking learners of Japanese, or as a 形容詞 (keiyōshi) in Japanese. These terms can be used as predicates, and do inflect for aspect.
As such, when creating an entry for a Japanese adjective form, how should we proceed? Do we use the ===Adjective=== header, and then create some new {{ja-adj form}} template for the second line, to avoid erroneous categorization? ‑‑ Eiríkr Útlendi │ Tala við mig 18:44, 19 March 2014 (UTC)
@Eirikr: I think your proposals are bang-on. Presumably, you'd only want lemmata in Category:Japanese verbs and Category:Japanese adjectives, so it would be a very good idea to have separate Category:Japanese verb forms and Category:Japanese adjective forms categories as well. — I.S.M.E.T.A. 17:00, 23 March 2014 (UTC)

AWB application

I would like to apply for permission to use AWB on the English Wiktionary. I will be using it for two purposes if the application is successful:

  1. Template:Pinyin-IPA. Check all main namespace pages using this template, and remove syllable delimitations, since the template now accepts unsyllabilised input.
  2. Obsoleting Template:zh-hanzi and Template:Hani-forms, since they have been superseded by Template:zh-hanzi-box.

Thanks, Wyang (talk) 02:51, 25 February 2014 (UTC)

Actually my bot could do the second one, but still support. --kc_kennylau (talk) 12:48, 25 February 2014 (UTC)
If no-one objects or beats me to it, I'll add you to the check page in a day or two. (Ping me if I forget to.) - -sche (discuss) 04:20, 26 February 2014 (UTC)
@Wyang: I have added you to the check page; you may now use AWB. Cheers, - -sche (discuss) 04:33, 28 February 2014 (UTC)
@-sche: Thanks heaps. I just tried about 50 edits, and they are alright. There are still >3000 pages remaining for task #2, and >30000 pages remaining for task #1. Would it be possible to get some sort of special flag which allows the page to be automatically saved and not flood RC? I will be using Wyangbot (talkcontribs). Thanks, Wyang (talk) 00:51, 4 March 2014 (UTC)
If the tasks are fully automatable, then yes, it makes more sense to do them by bot than by AWB. You can draft a vote like this one to request a bot flag. There are instructions at the top of WT:V to guide you. The flag is not what causes the page to be saved automatically, though; you must write and run a script to do that... or, supposedly, there is a way to turn on an 'automatic' mode in AWB, but I don't know what it is. Maybe someone can enlighten the both of us! - -sche (discuss) 02:29, 4 March 2014 (UTC)
Thanks, I have drafted a vote here. Wyang (talk) 03:43, 4 March 2014 (UTC)

Request AWB for my bot

I would like to apply for permission to use AWB with my bot Kennybot (talkcontribs). --kc_kennylau (talk) 10:16, 26 February 2014 (UTC)

Given that you already have AWB privileges on your user account, and a bot flag on your bot account, I don't imagine any objection to this request. If no-one objects or beats me to it, I'll add you to the check page in a day or two. - -sche (discuss) 19:13, 26 February 2014 (UTC)
Your bot now has AWB privileges. I added it to the "Users" section because I assume you will be using AWB in "regular (manual-review) mode" rather than "fully automatic mode". If you in fact intended to use AWB in full auto mode, let me know. - -sche (discuss) 04:38, 28 February 2014 (UTC)

Different Spellings of Ladino

Discussion moved from Category talk:Ladino headword-line templates#Different Spellings of Ladino.

Hi, I'm working with Ladino Wikipedia and Wiktionary and I also want to contribute to the Ladino words in the English Wiktionary. A lot of words are misspelled eg. sefardí is not Ladino it's Castilian. In Judaeo-Spanish we say sefaradí (sefaradi in Aki Yerushalayim and Turkish spelling and sefaradhí in the Multidialectal spelling).

Today, Judaeo-Spanish (Ladino) is spoken in 36 countries (although in 16 of them the speakers are less than 100). Historically, until the mid-19th century, it has always been written with the Rashi script, which is now not possible to use here, due to technical difficulties. However, with the rise of nationalism and also because of other reasons, most people started writing with the Latin alphabet. However, the conventions they use, more or less depend on the country they live in and on the type of schools they've been in (such as Alliance Israélite Universelle)...Thus today there are around 20 different orthographical norms used for this language! (17 according to some and 22 according to some others)

Not all of these 20 are very common, for example those who use the Greek and Arabic scripts and those who use the German othographical conventions (including ẞ; only very old Sephardim of Hamburg decent) and Dutch spelling (Jews of Curaçao) and Portuguese spelling (Jews of Brazil) etc. are not very wide-spread, however still in use. If we were to write each word with each kind of spelling, it would be way too much. However, just minimising it to 2; Latin and Hebrew would not be just for the very much used other spellings.

The most common/important 6 spelling systems are as follows (alphabetically ordered - and the possible letter to use in the template):

  1. Aki Yerushalayim (a - Autoridad Nasyonala del Ladino)
  2. French (f - Vidas Largas)
  3. Hebrew (h)
  4. Multidialectal (m - Ortoǵrafía Unida)
  5. Rashi (r)
  6. Turkish (t)

The next 6 important spelling systems are as follows (not necessarily to be used here, again alphabetically):

  1. Cyrillic (c)
  2. Italian (i)
  3. Jaquetía (j)
  4. Nehama (n)
  5. Old Spanish (os)
  6. Spanish (s - Arias Montano)

These twelve spelling systems are the ones that we can most possible use to find citations. However I suggest, we use the most common 6, and as we can't use Rashi for now, let's just use the 5 (a, f, h, m, t) instead of the 2 (l, h). However giving four alternative spellings in the same row, doesn't seem logical to me, thus I suggest we give them under the title Alternative forms.

If you guys, can help me change the template accordingly, that would be very nice. WikiTiki, may be you could help me out, or direct me towards the right people who knows to do this?

One more thing, this parameter should be optional, because not every word is spelled in a myriad forms:

  • papel (m=a=f=t) and פאפיל (h) → We could mark papel (m) and פאפיל (h)
  • parâ (m), para (a=f=t) and פארה (h) → We could mark parâ (m), para (a) and פארה (h)
  • justo (m), djusto (a=f), custo (t) and גֿוסטו (h) → We could mark justo (m), djusto (a), custo (t) and גֿוסטו (h)
  • muchacho (m=a), mutchatcho (f), muçaço (t) and מוגֿאגֿו (h) etc.

Thank you in advance,

Friendly --Universal Life (talk) 10:38, 27 February 2014 (UTC)

I agree with everything you said except what you said about Rashi script. We consider Rashi script and Hebrew script to be the same script. Rashi script is just a different font. It will never have separate Unicode code points. Therefore, I see no reason why we can't have entries in the Hebrew/Rashi script. If a Rashi-script font ever becomes available with the proper licenses, we will be able to integrate it into Wiktionary. But until then there is nothing wrong with using the square-script fonts that we use for Hebrew and Yiddish. --WikiTiki89 19:14, 27 February 2014 (UTC)
I feel like this discussion got lost in the midst of the others. Can anyone offer any input? --WikiTiki89 17:50, 1 March 2014 (UTC)
I think the orthography used by the extant regulatory body (bodies?) of Ladino should be used for lemmata, except for terms not citable with it. Naturally, any citable spelling should have an entry as an alternative form. — Ungoliant (falai) 18:15, 1 March 2014 (UTC)

French Verb Usage With Prepositions

Currently the meaning of French verbs is listed with a note about whether the meaning applies in the transitive, intransitive, impersonal, reflexive case etc. I have found that often this information is insufficient to be able to correctly use the verb in a sentence, particularly for the intransitive case. In French, intransitive verbs often change meanings depending on which preposition is used, à or de, or only one preposition can be used for that particular verb. Whether a verb takes à or de is often arbitrary and can not always be worked out from context. Regardless of whether the meaning changes, whether a verb takes à or de' drastically changes that verb's usage in sentences where objects are replaced with pronouns.

Some examples:

parler à - to talk to someone
parler de - to talk about something
arriver à - to go to somewhere
arriver de - to come from somewhere
penser à - to imagine
penser de - to have an opinion about
jouer à - to play a game or sport
jouer de - to play an instrument
être - to be
être à - to belong to
rire - to laugh
rire de - to laugh at

I propose that entries for French verbs should contain more information about the prepositions that go with them. This information would be extremely helpful as it provides clarity about when the verb takes on particular meanings and is vital for understanding how to use the verb in a sentence. Other online dictionaries such as Oxford Dictionary and Word Reference do contain this type of information. Some verbs (very few) already contain this type of information but they seem to be the exception rather than the rule, e.g. faire.

The current parler entry looks like this:

  1. (intransitive) To speak or talk.
    Il ne s'est mis à parler qu'à l'âge de quatre ans.
    Ils parlèrent plusieurs heures avant d'aller se coucher.
  2. (transitive) to be able to communicate in a language; to speak
    Elle parle couramment français. - She speaks French fluently

I imagine it would need to change to something like this:

  1. (intransitive) To speak or talk.
    Il ne s'est mis à parler qu'à l'âge de quatre ans.
    Ils parlèrent plusieurs heures avant d'aller se coucher.
  2. (intransitive, ~ à) To speak or talk to someone.
  3. (intransitive, ~ de) To speak or talk about something.
  4. (transitive) to be able to communicate in a language; to speak
    Elle parle couramment français. - She speaks French fluently

—This unsigned comment was added by Spuzzdawg (talkcontribs) at 20:29, 27 February 2014‎.

Yes, we do that in some entries (see débattre). We should make a point of doing it everywhere it's necessary. --WikiTiki89 20:34, 27 February 2014 (UTC)
The {{context}} template isn't really designed for that, though. And I'm not sure if this is the best way to show it, either. After all, context labels give specific contexts in which the word has a certain sense. But in the phrase parler de, "parler" does not mean "speak about" when it is followed by "de", it's the combination "parler de" as a whole that has that meaning. —CodeCat 20:38, 27 February 2014 (UTC)
I am trying to show something similar in Hungarian entries, although the Hungarian language uses suffixes instead of prepositions. See the verb tartozik for the format. It would be good to come up with a format that other languages could use, too. --Panda10 (talk) 21:02, 27 February 2014 (UTC)
Many languages have things like this. In English, you have "talk about" and "talk to", Dutch has "praten over" and "praten tegen". The preposition to be used is often unpredictable and should be idiomatic by our standards. In many older Indo-European languages, cases fulfilled this role as well, like in Gothic 𐌱𐌹𐌳𐌾𐌰𐌽 (bidjan, to ask), which took an accusative object for the person being asked, and a genitive object for the thing desired. Modern Finnish still uses cases like this, thanks to its elaborate declension system. So if we come up with a consistent way of indicating this, we should also include case usage. —CodeCat 21:15, 27 February 2014 (UTC)
I'm a complete newbie at wiktionary editing, so I'm not particularly familiar with the purposes of certain templates. I don't really have any strong opinions about what mechanism would best convey this information, just that this information needs to be conveyed. --Spuzzdawg (talk) 08:22, 28 February 2014 (UTC)
There are a couple of options with or without templates. No matter which option you go with, it would be useful to create categories for French verbs using a specific preposition. e.g. Category:French verbs with preposition de. Possible formatting options:
  1. Create a separate entry for the French parler de similar to the English talk about. Add parler de to parler under the ====Derived terms==== section.
  2. Or keep one entry for parler. Add the French preposition after the corresponding English preposition: To speak or talk (about something de).
  3. A template such as {fr-prep} would be useful to display the de part because it keeps the formatting the same and it can automatically categorize the entry. --Panda10 (talk) 15:25, 28 February 2014 (UTC)

The first approach would not work very well for all languages. Take pożyczyć:
  • With a direct object (thing) in accusative case and an indirect object (person) in dative, it means "to lend";
  • With a direct object (thing) in accusative case and an indirect object (person) in genitive case and preceded by od, it means "to borrow";
  • With a direct object in genitive case and an indirect object (person) in dative it means "to wish" (though this is rarely used).
One would probably like to keep "to lend" and "to borrow" meanings together, but with this approach there would be two headwords on pożyczyć for "to lend" and "to wish", and pożyczyć od for "to borrow". (I conjecture similar issues for other Slavic languages. Анатоли? Ivan? Vahagn? hell, Dan Polansky even? Damn, we have lots of Slavicists.) It does not suit English well either — does "I talked to him about the situation" use "talk about" or "talk to"? Also, preposition stranding is something we probably should do away with.
I would suggest creating a new template to be placed at the end of definition lines. Say, {{+obj|pl||acc|object being borrowed}} {{+obj|pl|od|gen|person being borrowed from}} would render as something like [+ [accusative]: object being borrowed] [+ od [genitive]: person being borrowed from]. Or maybe a few templates even, each for a different grammar: {{+cobj}} (inflection only, e.g. Finnish), {{+prepobj}} (preposition only, e.g. French), {{+prepcobj}} (preposition and inflection, e.g. Slavic languages), {{+postpobj}} (postposition only, e.g. Japanese). Keφr 17:14, 28 February 2014 (UTC)
I was just about to suggest something very similar. Only I don't see why we need separate templates, when optional parameters can handle all that. This will also be useful in defining prepositions that have different meanings when used with different cases. --WikiTiki89 17:18, 28 February 2014 (UTC)
I do like this idea because it's flexible enough to handle a wide variety of situations. It also prevents us from having to stuff it all into the context label, where it doesn't really belong. Concerning prepositions, we probably shouldn't code things in too hardly. Finnish has many postpositions, and languages may have circumpositions as well. Word order might also be different for different languages, so that in an SOV language the verb comes last in the phrase. For example Dutch has: over[prep] (object) heen[adv-prep] komen[verb] "to recover (emotionally) from", which contains the circumposition overheen and the verb in final position. —CodeCat 17:47, 28 February 2014 (UTC)
I intended the separate templates to be there mostly for the sake of convenience (to make markup terser, and to save the tedium of specifying empty arguments for languages that never need them). The whole set of templates may be implemented by a single procedure in a single module (with args in the #invoke frame telling it how to handle .args in the parent frame). Keφr 21:23, 28 February 2014 (UTC)
That's what optional named parameters are for. --WikiTiki89 23:12, 28 February 2014 (UTC)
These are more verbose, though. But whatever. Anyone willing to implement this, this form or another? Keφr 10:04, 2 March 2014 (UTC)

In parler de and parler à, the sense of parler is exactly the same. Therefore, there should be a single definition, but a usage note can be added about prepositions to be used in different cases. There should be different definitions only when the meaning is actually different. When, for a single meaning, several different words are used in English according to the case, I suggest that a Translations section would be appropriate (this is unusual here, but may be very useful). It would be the same case as stale, which, for a single sense, is translated differently in French for stale water, stale butter, stale news, etc. Lmaltier (talk) 19:05, 25 March 2014 (UTC)

Call for project ideas: funding is available for community experiments

IEG key blue.png

I apologize if this message is not in your language. Please help translate it.

Do you have an idea for a project that could improve your community? Individual Engagement Grants from the Wikimedia Foundation help support individuals and small teams to organize experiments for 6 months. You can get funding to try out your idea for online community organizing, outreach, tool-building, or research to help make Wiktionary better. In March, we’re looking for new project proposals.

Examples of past Individual Engagement Grant projects:

Proposals are due by 31 March 2014. There are a number of ways to get involved!

Hope to have your participation,

--Siko Bouterse, Head of Individual Engagement Grants, Wikimedia Foundation 19:44, 28 February 2014 (UTC)

Languages with difficult scripts

Discussion moved to Wiktionary:Beer parlour/2014/March.

March 2014

Languages with difficult scripts

Discussion moved from Wiktionary:Beer parlour/2014/February.

I was astonished today to run into the word šambaliltu for fenugreek in the Chicago Assyrian Dictionary, which I knew by the Persian word شنبلیله (our entry says it's transliterated shambalile, but I thought it was shambalila). I had no idea the word went all the way back to Akkadian. My first thought was: do we have it in Wiktionary? After looking at Category:Akkadian nouns, I still don't know.

Cuneiform is a complex script that very few people know how to read, and that most references don't use- but all our categories are arranged in character order with no transliteration shown. If you want to find a given word, you have to either: a) browse through all the entries, one by one; b) type the transliteration into the search box and hope it matches the one in the entry; or c) try to guess the characters in the name from the transliteration using some reference, and look for them in the category.

How can we improve on this? I know we can add a sort key to the category wikilink in each entry to make the categories list in transliteration order, and we can create transliteration-of entries (Category:Akkadian nouns has exactly one of those). It would be nice to have an automatically-generated index of transliterations for each category, and/or have the transliterations visible in the category listing itself. The second option looks like it would require assistance from the developers, but what about the first? Does anyone have other ideas? Chuck Entz (talk) 04:31, 1 March 2014 (UTC)

I support allowing romanisation entries for Akkadian (and any language that uses an obsolete script). — Ungoliant (falai) 04:36, 1 March 2014 (UTC)
@Chuck Entz: Pro tip: It's March already. --WikiTiki89 04:51, 1 March 2014 (UTC)
Not here in California, but in wiki-land, I suppose it is. Topic moved. Chuck Entz (talk) 05:22, 1 March 2014 (UTC)
I support allowing romanizations as soft redirects, possibly with their own categories. --WikiTiki89 05:27, 1 March 2014 (UTC)
This is how Ogham is organised. viz: Category:Primitive Irish nouns --Catsidhe (verba, facta) 06:41, 1 March 2014 (UTC)
Romanized entries are already allowed for Etruscan, Gothic, Lydian, Oscan, Phoenician. Akkadian can be added to the list if a standard and referenced transliteration scheme is adopted and described somewhere. --Vahag (talk) 09:11, 1 March 2014 (UTC)
A more general solution would be some kind of fuzzy search to supplement (certainly initially) or even replace our main search capability. I am thinking of a search that could be restricted to search only within a given language, language family, or group of languages using a given script or script family.
I certainly favor any effort to expand the usefulness of our entries to those without great knowledge of the range of scripts in which they are entered. DCDuring TALK 13:40, 1 March 2014 (UTC)
One major difference between how Gothic and Primitive Irish are treated is that Gothic transliterations have no part of speech. Just "Romanization". I'm not sure why the same isn't done for PI. —CodeCat 02:06, 16 March 2014 (UTC)
On refreshing myself with Cuneiform, it's a hard problem. Akkadian cuneiform is a syllabary, but the normalised transcriptions (in Wiktionary, anyway) do not show the syllables which spell the word. (A·KA·AS·SA·PU·TU > Ákassaptu, as a completely made up example. More the point, a symbol may be read as Akkadian, Sumerian (which I think is shown by the differing transcriptions in parenthesis (Akkadian reading) and brackets [SUMERIAN]) or as a determiner. I don't think an automatic transcription is possible, much less fuzzy searching as described above. Moreover, I think that an Assyriologist needs to go through and normalise these, with a better description of what's going on. Something like
  • Akkadian
    𒀀 (mū) f. (plural)
    1. water
    Usage notes
    Most often complemented with MEŠ (link to whatever the cuneiform is for MEŠ)
Sumerian readings should be listed in the Translingual section. The Akkadian section would give variant transcriptions, possibly with a link to an appendix describing the process and conventions (such as transcribing in lower case for Akkadian readings, and UPPERCASE for Sumerian readings, which are not uncommonly combined within a word), and possibly even the transcription and the normalisation. The cuneiform would be a lemma, and the normalisation would also be a lemma, each linking to the other, but the diplomatic transcription, while given in both places, would not be a lemma. So...
TL;DR: It's a hard, and probably not automatable, problem to solve properly. --Catsidhe (verba, facta) 22:29, 1 March 2014 (UTC)

Request for having archaic third-person singular form for English verbs

I don't know if somebody has requested it before, but I think that we may want to include the third-person singular form of English verbs in the headline (goeth, cometh, hath). The details of implementation can be discussed later. What are your views? --kc_kennylau (talk) 16:41, 1 March 2014 (UTC)

I'd be against including them in the headword line, though of course they need to be listed somewhere in the lemma. —Aɴɢʀ (talk) 16:46, 1 March 2014 (UTC)
What do you mean by "somewhere in the lemma"? I think that the headword line is the best place to put it, IMHO. --kc_kennylau (talk) 16:51, 1 March 2014 (UTC)
I mean goeth should be linked to somewhere within go#English, but not the headword line. Under ===Conjugation=== would be a better place for it. —Aɴɢʀ (talk) 16:55, 1 March 2014 (UTC)
Such forms confirm the impression (also otherwise justified) that we are only interested in serving antiquarians and scholars rather than normal humans. I'd like us to do what we can to maintain the illusion that we care about normal people who might still be using Wiktionary.
I strongly oppose such forms being visible by default. It may belong in related terms or in some place, even the inflection line, where it is not visible by default but can be made visible. I would really like it if were not visible even if the user did not have JS or any other more than basic capability. DCDuring TALK 16:58, 1 March 2014 (UTC)
It would be too space-demanding if we have to have a new header for that, because the headword line already contains all the conjugations. If you do not wish to have an impression that you are "only interested in serving antiquarians and scholars", why don't you make the etymology section hidden? --kc_kennylau (talk) 17:02, 1 March 2014 (UTC)
I apologize if I'm sounding rude or anything, because my brain is now not functioning properly due to it being 1 o'clock in the morning for me, and due to the fact that English is not my native language. --kc_kennylau (talk) 17:03, 1 March 2014 (UTC)
The problem is that if we do this, we need to do it right, and there are a lot more forms we would need to include, such as the second person singular and the subjunctive. For example, at [[do]] we would need the indicative dost and doth, as well as the subjunctive doest and doeth, which is a total of four extra forms we'd need to add. --WikiTiki89 17:43, 1 March 2014 (UTC)
Please don't. It's bad enough when an entry starts off with a chain of rare, obsolete "alternative forms". Goodness knows what kind of Chaucerian gobbledygook our foreign users must be acquiring. Equinox 17:32, 1 March 2014 (UTC)
I would support moving the ===Alternative forms=== section down to where all the other related terms sections are. --WikiTiki89 17:43, 1 March 2014 (UTC)
I think alternative forms is placed where it is for reasons similar to why Wikipedia often lists several common varieties of the article name. It's there to let users know that they've found what they're looking for. —CodeCat 17:52, 1 March 2014 (UTC)
@KennyLau: I often hide portions of longer etymologies, especially lists of cognates. As our etymologies have gotten longer, I have become increasingly sympathetic to hiding them by default.
@Wikitiki & Equinox: Horizontal lists of alternative forms are less intrusive than vertical lists. CodeCat's point is true, but we have many rather obscure alternative forms that clutter the lists. I'm not sure what basis there would be for shortening the lists of alternative forms, but I've long thought that digraphs have low value. There may be other typographic alternatives that could be eliminated.
Perhaps all obsolete, archaic, and rare forms, whether in alternative forms or the inflection line could be made to appear only if a user chose to display them. DCDuring TALK 18:27, 1 March 2014 (UTC)
Jut to add my voice to the chorus: obsolete forms should not go in the headword line. We shouldn't clutter the line with information which is no longer useful and, by giving it prominence, suggest it is still usable* : I've known a couple of Germans who liked to use the 'goest'-forms of verbs because they couldn't stand that English wouldn't use as full a set of verb forms as German; they never realized (much like people who use ligatured spellings of "æqual", and much like the North Korean press office which famously only had outdated English-Korean translation dictionaries for a long time) how affectatious they sounded; we don't need to encourage more people to do the same. The last time the presence of obsolete forms on headword lines was discussed, people seemed to favour creating conjugation tables for English verbs which would include the -eth and -est forms. (And people seemed to oppose listing obsolete forms on headword lines, so I proceeded to move all the obsolete forms I could find in headword lines down to, for lack of a better place, ====Usage notes====.)
*I realize one could say much the same of obsolete alternative forms, which I nevertheless don't support moving out of the ===Alternative forms=== section, despite that section's current prominent placement. I think the solution there is to collapse the obsolete forms under a rel-top, and perhaps move the entire Alt forms section — whereas, collapsing part of a headword line, or moving the headword line to a different part of the entry, would be a bad idea. Also, I think that obsolete forms belong with other alt forms, and can (when in Alt forms sections) be segregated onto a separate line and labelled clearly, whereas obsolete forms listed as part of a single run-on headword line can't be labelled as clearly without making the headword line into two or more lines.
- -sche (discuss) 20:43, 1 March 2014 (UTC)
I wonder if putting alternative forms in a right-floating box would help. It would keep them "out of the way" while still being at the top so they can be seen easily. —CodeCat 20:52, 1 March 2014 (UTC)
It wouldn't help if one were using rhs table of contents.
The discreet way in which quotations are concealed by default would be great for obsolete etc forms, still allowing common alternative forms to be prominently displayed. That same approach could allow the archaic inflected verb forms to be displayed just beneath the modern inflected forms.
I am OK if the archaic forms cannot go unto the headword line, but they have to be included at least somewhere in the entry, since one currently can find not the forms anywhere in the main entry, which becometh a problem when thou beest trying to conjugate an irregular verb. --kc_kennylau (talk) 13:13, 2 March 2014 (UTC)
Discussion last year showed that there was interest in having conjugation tables for English verbs, and this discussion shows continued interest, so I've deployed a conjugation table on walk and talk (adapted from something CodeCat designed in the July 2013 discussion—I made some changes to which forms were displayed). See what you think. Obviously, we'll need another table to account for verbs that add, delete or change letters when inflecting, e.g. hate, but that shouldn't be too hard, and per our usual practice, I expect that all the tables used in entries will end up being shells that use a single backend. I think be is so much more complex than every other verb (with two whole conjugations, and many forms other verbs don't distinguish) that it may need a separate template, though. - -sche (discuss) 08:51, 28 March 2014 (UTC)
  • @-sche: If be is just a one-off, why even create a template for it? Why not just code the table manually in the entry itself? ‑‑ Eiríkr Útlendi │ Tala við mig 17:32, 28 March 2014 (UTC)
  • Hm, that would also work. It will be a big table (two tables, really: one suppletive and one not suppletive, the latter now used [in the indicative] only for one sense of be), so it would make the source code of [[be]] more legible if it transcluded a table ({{en-conj-be}}) than if it were filled with screens and screens of template code. But I don't actually object to coding the table into the entry. - -sche (discuss) 18:23, 28 March 2014 (UTC)
One-off templates are often used to reduce clutter on pages. There is no reason that we shouldn't do it. --WikiTiki89 18:49, 28 March 2014 (UTC)
Would User:CodeCat/en-conj-table be good as a starting point? I still had it from another discussion some time ago. —CodeCat 19:11, 28 March 2014 (UTC)
Your template has some mistakes. I have created User:Wikitiki89/en-conj-table which both fixes the mistakes and changes the layout to what I think makes more sense. --WikiTiki89 19:25, 28 March 2014 (UTC)
  • @Wikitiki89: I was under the impression that transcluding templates increases server load and lengthens page loading times. It's perhaps a minor consideration, but it is one potential reason not to create and use templates that are only ever used in one place. ‑‑ Eiríkr Útlendi │ Tala við mig 19:14, 28 March 2014 (UTC)
    • Given the number of modules and templates we already transclude on most pages, one more is hardly going to make a difference. I agree with one-off templates because they make things more manageable in the long term. It makes it possible to track down entries by their templates, for example. Raw tables in entries have caused me quite some trouble in the past. —CodeCat 19:20, 28 March 2014 (UTC)
    • If they are ever only used in one place then their impact on server load is very insignificant. --WikiTiki89 19:25, 28 March 2014 (UTC)

Updating Wiktionary:Entry layout explained/POS headers

This page is very important as far as policy/common practice pages go, but it's also rather out of date. We should bring it up to date, but I'm not sure what the current state of affairs is on everything. I know that there's a general consensus that "acronym", "initialism" and such are no longer acceptable. We've also deprecated "cardinal number/numeral" and "ordinal number/numeral" recently in favour of plain "numeral" (for cardinals that don't fit any other part of speech) as well as "adjective" for ordinals. But there's also "idiom" which I don't think should be allowed either, although I don't know if any consensus exists on that. "Participle" seems to be gaining some ground recently; all Dutch participle entries use it now.

We should probably also go through the "Other headers in use" list and see which ones we should try to track down and fix. —CodeCat 21:09, 1 March 2014 (UTC)

Indeed. If someone more technically adept than me could generate a list of all the POS headers that are currently in use (and, if it wouldn't be too taxing, a count of how often each header is used), that would let us know where we stand, as far as which headers are in use vs which are prescribed by WT:POS. One way to generate a list of POS headers would be to generate a list of L3 headers and then just manually remove non-POS stuff like "See also". In fact, a list of all L1, L3, L4, L5, etc headers (but not L2 headers — WT:STATS already has a list of those) which are in use, with information on what level they are and how often they occur, would surely reveal some typos and other things that we'd like to change. There shouldn't be any L1 headers. And I often notice "References" at L4 not inside a numbered etymology section. But I'm getting off-topic; I apologise. - -sche (discuss) 22:04, 1 March 2014 (UTC)
User:DTLHS/headers DTLHS (talk) 23:33, 1 March 2014 (UTC)
Thank you for making the list, but in its current format it's not terribly useful. We really need to know which entries the non-standard headers occur on, so they can be fixed. At least the spelling mistakes. —CodeCat 23:52, 1 March 2014 (UTC)
I would need a list of standard headers, to avoid listing millions of Noun entries. Also you can use the search to find the less common ones. DTLHS (talk) 23:55, 1 March 2014 (UTC)
Also I just noticed one header called "Δεψλενσιον". It looks like someone had their keyboard in the wrong language! XD —CodeCat 23:54, 1 March 2014 (UTC)
Thank you! There are about as many spelling errors as I expected, lol. - -sche (discuss) 00:05, 2 March 2014 (UTC)
Σηοθλδ ςε σορτ τηε Οτηερ ηεαδερσ ιν θσε ιντο τηοσε ςηιψη αρε λανγθαγε σπεψιφιψ ανδ τηοσε ςηιψη αρε νοτ? Ανδ σηοθλδ ςε θσε Νθμβερ ορ Νθμεραλ? Ανδ Ι αμ αςαρε τηατ ςε αλσο σορτ Αββρεωιατιονσ, Αψρονυμσ, Ψοντραψτιονσ, Ινιτιαλισμ ανδ Συμβολ ιφ Ι αμ ψορρεψτ. --kc_kennylau (talk) 01:33, 2 March 2014 (UTC)
That is incredibly hard to read, so I'll transcribe it for everyone: "Should we sort the Other headers in use into those which are language specific and those which are not? And should we use Number or Numeral? And I am aware that we also sort Abbreviations, Acronyms, Contractions, Initialism and Symbol if I am correct." --WikiTiki89 02:01, 2 March 2014 (UTC)
@DTLHS: I've taken a stab at sorting your list into headers which are "standard enough" (including headers which are not standard anymore but which were once standard and which are therefore still common) and headers which IMO need to be tracked down (including both nonstandard/typo headers, and headers which may be standard but are so rarely used at the level they're at that I thought they could use review). You may want to wait for others to comment / modify the list further before generating a list of where all the nonstandard headers are used, though. - -sche (discuss) 02:16, 2 March 2014 (UTC)
I've added some notes as well. —CodeCat 02:23, 2 March 2014 (UTC)
Thanks for you input, I'll generate something more comprehensive in a day or so. DTLHS (talk) 02:53, 2 March 2014 (UTC)
I've edited the page some more, grouping headers by where they may appear. —CodeCat 13:42, 2 March 2014 (UTC)
It looks like the list is finished, but there are a lot of problems already, that I can see. —CodeCat 00:02, 3 March 2014 (UTC)
I've used MewBot to correct all the spelling mistakes, and converted some of the deprecated ones to the modern versions as far as was feasible with a bot. User:-sche also helped out. We probably need to wait now until the next dump is made, as the list doesn't reflect the current reality anymore. —CodeCat 16:43, 4 March 2014 (UTC)

Wiktionary would benefit from a more user-friendly discussion forum

I have not been editing here regularly for years, and so coming back and looking at things with fresh eyes I think that there is a need for a more modern discussion forum, with a more normal way of posting messages. I think that would help discussions to take place more easily and would allow new users and people who don't edit regularly to contribute to discussions without being familiar with the unusual way of posting a message (when compared with other online forums/'fora'). It might be easier to edit from a tablet, as well. Also, the current format is slow to edit for anyone with a very slow internet connection. Does anyone agree? Has this been raised before? Kaixinguo (talk) 12:27, 2 March 2014 (UTC)

  • I have an infinitesimal amout of hope that Flow will be this. However, its developers seemingly prefer to waste time on trying to make it look like linkbait for a certain website with an orange logo, instead focusing on features like threads on multiple pages.
Also, we have LiquidThreads. Boring, somewhat clunky, somewhat ugly, but mostly functional. I am not sure why we do not use it more widely. Keφr 13:42, 2 March 2014 (UTC)

What would a lua-cized translation template look like?

I'm trying to continue the discussion here. --kc_kennylau (talk) 13:26, 2 March 2014 (UTC)

What would you like to discuss? DTLHS (talk) 23:29, 3 March 2014 (UTC)

Vote: CFI: Removing usage in a well-known work 2

FYI: Wiktionary:Votes/pl-2014-03/CFI: Removing usage in a well-known work 2

Let us postpone the vote as much as the discussion will make necessary. --Dan Polansky (talk)

Part-of-speech sections with multiple headword lines, lemmas with form-of definitions on the same page

Occasionally I come across entries where there is a single part-of-speech header with multiple headword lines below it. An example is mensa, in the Latin section. There are also many examples of this in Italian entries. Sometimes they are like the case of mensa where the second headword line is simply a form of the first, but I've also seen it used to distinguish masculine and feminine-gendered nouns with the same lemma form. I'm wondering what the general consensus is about entries like this. I personally think that this is wrong, and that these really should have their own headers. Masculine and feminine nouns are separate, they are not the same lemma, because they have (at least in Italian) separate types of inflections. But even the Latin mēnsā is not the same word as mēnsa; they have different pronunciations and it's only because of an orthographic shortcoming that they end up on the same page. There's also a practical consideration: having "floating headwords" with no header before them is harder for bots to parse, especially when they are formatted using the obsolete "bold headword" formatting like mēnsā is. As the page is now, a bot that tries to parse that page will come upon what seems like a random bit of bold text in the middle of the list of definitions. —CodeCat 23:26, 3 March 2014 (UTC)

Mensa should definitely not be formatted like it is, we should either remove the "mēnsā f" line (and perhaps also the following "# ablative singular of mēnsa" which simply points users to the page they're on) or give it its own ===Noun=== header. - -sche (discuss) 00:12, 4 March 2014 (UTC)
I don't think we should remove it, because it's pronounced differently from the noun above. There should be two pronunciation sections on the page, which implies two noun sections IMO. —CodeCat 15:32, 4 March 2014 (UTC)

A question that's somewhat related is about form-of definitions that are homonymous with the lemma and therefore appear on the same page, under the same etymology/pronunciation section (often under the same PoS header too and listed as one of the definitions of the lemma, but not always). mēnsā is not an example, because its pronunciation differs, but the entry also contains a definition for the vocative singular. Again I'm not really sure if this is good practice. The main reason we create form-of entries in my opinion is to let people find the lemma, and also to give pronunciation information about each form specifically. But when the lemma is on the same entry already, and has the same pronunciation, there's not really anything to be gained from listing the form-of definition there. There would presumably be an inflection table on the entry, and that would show which forms coincide with the lemma form (these often appear in black bold, too). Please note that this doesn't apply at all to sublemma form entries. These entries would need extra grammatical information, such as their own inflection table. In those cases we should create sections for both, like on vergeten, which is both a verb (infinitive lemma) and the past participle of that verb. —CodeCat 23:26, 3 March 2014 (UTC)

Hot words

Lately there has been a lot of debate about words such as olinguito and Euromaidan, which don't pass CFI but likely will in the future. I think that instead of keeping them outside of CFI, we should amend CFI to accommodate such words.

Proposal for special provisions for "hot words":

  • If a word that can be considered beneficial to Wiktionary has citations that do not span the required time period, but meet the following criteria, the word may be kept as a "hot word". The citations must be:
    • relatively recent.
    • reach a wide population.
    • from a wide variety of media.
  • An entry for a "hot word" will have a highly visible indication of its special status.
  • While a "hot word" meets the above criteria, it will have the same rights as any other entry and may even be featured on the main page.
  • A "hot word" will be reevaluated every time its lifespan doubles and if it does not still meet the above criteria, it must be deleted.

--WikiTiki89 04:51, 6 March 2014 (UTC)

I agree with this, but maybe we shouldn't call it a "word" unless we intend it to apply only to single words. I'm not sure if adding all of it to WT:CFI is a good idea, that page is very long and hard to read as it is. Though that's a separate discussion. —CodeCat 14:15, 6 March 2014 (UTC)
I also tend to agree with this. It often happens that a relatively new term makes the headlines and, naturally, people will try to look it up in online dictionaries. We should do what we can to help in such circumstances. Perhaps such terms should be in a hidden category that we can check from time to time to see if it is still being used. SemperBlotto (talk) 14:40, 6 March 2014 (UTC)
I think we need this. A categorizing template that contains a date of the entry or of the first valid citation would facilitate review a year or 13 months after the date leading to conversion of a normal entry or demotion to protologism status. We have a large Appendix:List of protologisms useful for reminding us of how bad some suggestions can be. DCDuring TALK 15:04, 6 March 2014 (UTC)
I created {{hot word}}, which can be added to a page with a date= parameter. It checks whether the date was more than a year ago, and categorises the page accordingly. —CodeCat 15:35, 6 March 2014 (UTC)
I added {{hot word|date=06 March 2014}} to the current sandbox - no obvious result. SemperBlotto (talk) 15:41, 6 March 2014 (UTC)
See [[olinguito]] for a working test. DCDuring TALK 15:55, 6 March 2014 (UTC)
We don't need a vote to try this for a while, do we? For how many terms would this apply, help? DCDuring TALK 15:58, 6 March 2014 (UTC)
Basic template defaults should allow the selection of the date of creation of the entry as a default without jeopardizing our amateur status. Riskier in that regard would be:
  1. making sure that the earlier of the creation date or the date specifically inserted as a template parameter was selected.
  2. finding (using Lua?) the earlier citation date in the applicable language section and use that.
  3. allowing for protologistic senses, not just L2s. DCDuring TALK 16:07, 6 March 2014 (UTC)
Including “hot” words is a good idea, but I think it would be better if we added them to the Appendix namespace and linked to them with a template similar to {{only in}}. This will allow us to keep entries even if they turn out to be fads, if we forget to reevaluate the the entry we won’t be including a CFI-failing term and it won’t turn the main namespace into a temporary storage. — Ungoliant (falai) 19:12, 6 March 2014 (UTC)
Advantages to having them in principal namespace are direct access via normal search and tabbed access to citations. I also think that they are likely to be taken more seriously. Previously Appendix:List of protologisms was treated as a way of softening the blow of deletion of mainspace entries by new would-be contributors. We would need to create a tiered system to enjoy the benefits of both a class of 'serious' protologisms and a class of protologisms only present as a kind of consolation prize. DCDuring TALK 19:38, 6 March 2014 (UTC)
Appendix entries would still be accessible via the search, due to the template similar to {{only in}}. — Ungoliant (falai) 19:47, 6 March 2014 (UTC)
I have added a preliminary visual design to {{hot word}}. I'm open to suggestions. --WikiTiki89 19:39, 6 March 2014 (UTC)
Is protologism less clear than hot word, which we would have to define in our glossary and link to? Should we say 'This is a very popular, newly coined word that is likely to meet our criteria for inclusion in the future,but may not.'. DCDuring TALK 20:26, 6 March 2014 (UTC)
"hot word" is more exciting than "protologism". Yes, we can add it to our glossary and link it to there. I think your suggestion for the text is too long and wordy do be displayed in the template, but it would be a good fit for the glossary definition. --WikiTiki89 20:32, 6 March 2014 (UTC)
How about “This English term is a hot word and may be removed from Wiktionary in the future.” Using our terminology term, and indicating the language for clarity on multilingual pages. Maybe the template should be applied to a sense rather than an entry.
Graphically, it is much too hot. It looks more important than the main heading and all other content on the page. It only needs enough differentiation to draw the eye, and a treatment to differentiate it from other messages. But as a notice regarding the existence of an entry, perhaps it belongs above entry content, not as a sidebar on the right. Michael Z. 2014-03-06 21:06 z
I like your wording. As for it being graphically "much too hot", feel free to cool it down. --WikiTiki89 21:43, 6 March 2014 (UTC)
We also need a template for hot definitions. — Ungoliant (falai) 21:13, 6 March 2014 (UTC)
Yes, we should have a {{hot word-sense}} template. --WikiTiki89 21:43, 6 March 2014 (UTC)
{{hot sense}} makes more... sense? —CodeCat 21:50, 6 March 2014 (UTC)
Maybe. It doesn't make much of a difference, although it is more concise. --WikiTiki89 21:58, 6 March 2014 (UTC)
Created. See rolezinho for an example. — Ungoliant (falai) 22:14, 6 March 2014 (UTC)
I think instead of linking to the glossary, it may be more informative to create a dedicated Wiktionary:Hot words. That way we can dedicate more space to what they are, why we include them, and exactly in what way they are exempt from WT:CFI (that is, what qualifies them for deletion). —CodeCat 22:25, 6 March 2014 (UTC)
Appendix:Hot words might be better. --WikiTiki89 22:27, 6 March 2014 (UTC)
If olinguito doesn't pass CFI then it's CFI that needs to change. We don't need to add an ugly "hot words" box to its entry. It shouldn't need to pay penance. Let's start with removing the one-year requirement for broadly reported and attested scientific discoveries and mathematical concepts. For some neologisms, the "hot words" category could be a good idea, but not for olinguito, which is highly unlikely to be a flash in the pan as far as the word's usage goes. Pengo (talk) 06:24, 9 March 2014 (UTC)
That's the whole point of this discussion, to add provisions to CFI to allow words like olinguito. --WikiTiki89 06:52, 9 March 2014 (UTC)

This (language) term is a hot word, a new term that has quickly become popular. It may soon fall out of usage, and its entry may be deleted from Wiktionary in the future.

You can help this entry stay by establishing the word's usage over a significant period of time.

  • I oppose the use of {{hot sense}} and {{hot word}}, especially since they have this incredibly ugly look. The look from this revision is not any better. They should be deleted, IMHO. If we are to include fad terms that may meet the standard CFI later, we can place them into a special category to ease tracking, but such an ugly box is uncalled for. --Dan Polansky (talk) 07:50, 16 March 2014 (UTC)
    Do you have any suggestions to making it less ugly? I think that this information needs to somehow be portrayed on the page. --WikiTiki89 07:53, 16 March 2014 (UTC)
    If not enough votes for deletion are collected at WT:RFDO#Template:hot sense and WT:RFDO#Template:hot word, {{Webster 1913}} can be used as a model of decent formatting for this kind of warning template. --Dan Polansky (talk) 08:06, 16 March 2014 (UTC)
    Now, that is what I would call an ugly template. Keφr 08:28, 16 March 2014 (UTC)

User:Wyangbot is applying for bot status (as continuation of a discussion last month)

Here. Please participate in the discussion and vote now. Wyang (talk) 03:46, 4 March 2014 (UTC)

Plural forms of proper nouns

Many proper nouns have well-attested plural forms; for example, Frances, Germanies, Caitlins and Jesuses. It seems to me that we could create a couple of templatised usage notes which would explain two major categories of proper noun plurals, namely:

  1. The plurals of personal names, which are used to refer to multiple individuals with the same name, and perhaps sometimes in the same way as placename-plurals (q.v.), as in "the two Obamas [the college-age one with X political views and the sitting president with Y political views] might not even recognise each other". And:
  2. The plurals of placenames, which are used when comparing or contrasting two historical incarnations, or two current or historical governments or social incarnations, of a place, e.g. "the border between the two Germanies", "unite the two Jerusalems", "John Edwards assailed the divide between the two Americas".

Some plurals would need more explanation than the template alone would provide — there is slightly more to say about the use of Germanies than about Frances/Estonias/Denmarks, and more to say about the use of Jesuses than about Caitlins/Barbaras/Annas/etc — but additional information could easily be provided along with or even in place of the templatised usage note.
What do you think? Would it be a good idea to have such usage notes? Where should they be placed: in France/Anna etc, in Frances/Annas etc, or in both places? What should the titles of the templates that contain the notes be? Some templatised usage notes exist in a "U:" 'subnamespace' like the "R:" one that our reference templates exist in; others are named in other ways. (Many languages use the plurals of personal and place names the same way English does, so it would seem inappropriate to use a language prefix.) - -sche (discuss) 22:43, 8 March 2014 (UTC)

Support. — Ungoliant (falai) 23:00, 8 March 2014 (UTC)

Codes the ISO has split or merged (second batch)

In 2012 and 2013, the ISO retired several codes by merging them into other codes or splitting them up. Thirty of these retirements appear to have escaped our notice. I posted Wiktionary:Beer parlour/2014/February#Codes_the_ISO_has_split_or_merged_.28first_batch.29 one batch here; here is the second batch, plus my thoughts on them; I'll post the rest another day. If you know a reason we should or shouldn't follow the ISO in a particular case, please comment! - -sche (discuss) 01:04, 9 March 2014 (UTC)

merging Xiandao into Achang

The ISO merged (xia) into Achang (acn). Xiandao is a 100-speaker dialect of Achang. A merger seems sound. - -sche (discuss) 01:04, 9 March 2014 (UTC)

Yes check.svg Done - -sche (discuss) 09:00, 28 March 2014 (UTC)

merging Panang into Amdo Tibetan

The ISO merged Panang (pcr) into Amdo Tibetan (adx); I propose we follow suit. "Panang" is merely the name of an Amdo-speaking group. - -sche (discuss) 01:04, 9 March 2014 (UTC)

Yes check.svg Done - -sche (discuss) 09:00, 28 March 2014 (UTC)

merging Sansu and Hlersu

The ISO merged Sansu (sca) into Hlersu (hle); we should follow suit, because Sansu is merely another name for Hlersu (per e.g. The Cambridge Handbook of Endangered Languages, ISBN 052188215X, 2011). - -sche (discuss) 01:04, 9 March 2014 (UTC)

Yes check.svg Done - -sche (discuss) 09:00, 28 March 2014 (UTC)

merging Piru and Luhu

Following those linguists who consider Piru to be a dialect of Luhu, the ISO merged Piru (ppr) into Luhu (lcq). (The ISO records this in a somewhat unclear way, but see here.) I propose we do likewise. Note that some literature takes the opposite stance and considers Luhu to be a dialect of Piru, but the end result for us — that we have a language with {"Luhu", "Piru"} as its names field — is the same. - -sche (discuss) 01:04, 9 March 2014 (UTC)

Yes check.svg Done - -sche (discuss) 08:52, 3 April 2014 (UTC)

merging Talur into Galoli

The ISO merged Talur (ilw) into Galoli (gal). Talur is indeed a dialect of Galoli. - -sche (discuss) 01:04, 9 March 2014 (UTC)

Yes check.svg Done. - -sche (discuss) 23:24, 5 April 2014 (UTC)

(Southern) Yamphe/Lorung

The ISO merged Yamphe (yma) into lrr, renaming that code from Southern Lorung to Southern Yamphu. A paper published in the Australian National University's Papers in South East Asian Linguistics in 1997 describes the situation: "With two dialects, northern and southern, the Lohorong or Lorung language forms part of the Lohorong-Yamphe group. The Yamphu language occupies an intermediate position in its subgroup between Lohorong, Yamphe and southern Lohorong." (Sic!) - -sche (discuss) 01:04, 9 March 2014 (UTC)

Appendix:Orphaned words is neglected

Appendix:Orphaned words could use some TLC. Half the information is in hidden comments in the source of the page. The term "Orphaned word" seems not to be used elsewhere, and perhaps it should be called unpaired words or "cranberry morphemes" or "fossilized terms" as it seems to contain all of the above. If it weren't just seven words long I'd suggest creating an appendix for each of the above. In order to clear up confusion and stop them being added again, false examples should be listed as such, rather than be deleted (e.g. where the apparent etymology is different to the actual, or where compounding happened in another language). The centered table format seems out of place on Wiktionary. And the list seems like it could be much longer.

Anyone want to have a go at adding terms to it, renaming it, reformatting it, splitting it up, turning it into a category (or categories), merging with the list on Wikipedia's Unpaired word page, or generally just cleaning it up? Pengo (talk) 05:55, 9 March 2014 (UTC)


This script displays a notice at the top of a user's page and contributions list showing whether they are blocked. However, MediaWiki already displays this information by default, even showing the block log entry — except when viewing an existing user page. Therefore loading and executing this script is mostly a waste of bandwidth. Can we disable it? Keφr 07:24, 9 March 2014 (UTC)

How about simplify it so it only takes up the software's slack?​—msh210 (talk) 17:04, 16 March 2014 (UTC)

A question.

Exactly what is this {{rfc-header|Perfective Counterpart|lang=ru}}? After organizing the verb forms like звать, I really wonder about this --KoreanQuoter (talk) 08:03, 9 March 2014 (UTC)

The bot (Kassadbot) which enforces consistent formatting applies {{rfc-header}} tags to entries that have nonstandard headers. Which entry did you see this tag in? If there was just a tag, and no corresponding header, you can simply remove the tag. If there was a header ===Perfective Counterpart=== in one of the entries, it needs to be changed to a standard header. I would have to look at the entry before speculating about which header it should be changed to. - -sche (discuss) 18:08, 13 March 2014 (UTC)
It means that (as in зовём), you made this header: ====Perfective Counterpart====. We do not use that header. зовём should only link back to its infinitive, звать. Under звать it will list its perfective counterpart позвать. So, what {{rfc-header|Perfective Counterpart|lang=ru}} means is that you must remove that section. —Stephen (Talk) 18:24, 13 March 2014 (UTC)
I have fixed all non-standard sections. @KoreanQuoter: please see my edits, e.g. зовём. --Anatoli (обсудить/вклад) 04:20, 31 March 2014 (UTC)

Links to Wikipedia in reference templates

Hello all. Dan Polansky and I are currently in disagreement about whether to link to Wikipedia from reference templates (the ones that start with R:…). Take {{R:L&S}} as an example: I prefer this version, whereas Dan Polansky prefers this version. There are other differences between those two diffs, but the salient bone of contention is whether or not to link to w:A Latin Dictionary in the template code.

I assert that reference templates should link to the relevant cited authority's Wikipedia article; that way, an explanation for why the source is being cited as an authority is readily available for the sceptical reader on the other side of the link. Dan Polansky maintains that such links are distracting and inessential; the latter because a reader can copy the name of the reference work and paste it to Wikipedia article box thereby finding the relevant article.

You can see the (short) discussion so far at User talk:Dan Polansky#Re linking in reference templates. As far as I can tell, Atelaes and I support such linking (i.e., single links to the cited authority's Wikipedia article only), whereas Dan Polansky opposes such linking. I come here to try to obtain consensus for such linking. What do others think? — I.S.M.E.T.A. 19:39, 9 March 2014 (UTC)

P.S.: @Atelaes, Dan Polansky: I have tried to represent your views faithfully; please post corrections hereto if you feel I have misrepresented either of your positions.

Support linking. If a person doesn’t want to know more about the work being referenced, they can simply not click the link. — Ungoliant (falai) 19:49, 9 March 2014 (UTC)
I have no general opposition to linking, but linking to the Wikipedia article about a dictionary is not very useful. If it were a citation of a novel or something, that would be a different story and I would support linking, but I'm not sure what exactly the difference is. So I'm undecided. (Having said that, I definitely oppose this version). --WikiTiki89 20:13, 9 March 2014 (UTC)
Support linking to WP or the home page of the reference website, if it is online and WP doesn't have any article. I try to do this routinely.
We have a number of templates that contain two links, one to a webpage that has substantive information relevant to the headword and one to some page that explains the source in some way, either at WP or a source site page such as its home page or "About us". Having two links increases the possibility for confusion. We seem to have all possible kinds of user preferences and behavior with respect to such links in unknown proportions: not being able to find and follow blue links even when they would help, finding them distracting/following them accidentally, hitting the source-site link rather than the content page link, as well as figuring out which link is relevant to one's needs and using it appropriately.
This discussion makes me wonder whether we should reduce the size of the source link by not having the entire site title be clickable. That would seem to reduce the likelihood of following the source-explanation link when one wanted the content link. I don't see why cut-and-paste should be required for a user to satisfy a question about the source of some information, especially if there is a WP article on the source, as there often is in my experience. DCDuring TALK 20:38, 9 March 2014 (UTC)
It follows a fortiori from the argument about confusion where there are two links that I agree with Wikitiki that the version of the L&S template he cites, with its multiple links, is unsatisfactory, as it provides even more opportunities for mistakenly following links of minimal relevance to identifying and evaluating the source of the substantive information. DCDuring TALK 22:59, 9 March 2014 (UTC)
  • Oppose linking to a Wikipedia article from the reference work name. The reference template should focus the follow-link behavior on the sole link, which takes the reader directly to the page where they can find more about the word, rather than the reference work. Wikilinks are typographically inferior to black text, IMHO, so they should only be used where they add real value. Thus I prefer
    I also oppose the extraneous "New York: Harper & Brothers" added by the user in diff, whose edit summary does not indicate other changes than linking. --Dan Polansky (talk) 19:00, 10 March 2014 (UTC)
  • Support linking to a specific content page and some page that explains the source. DCDuring TALK 18:51, 13 March 2014 (UTC)

References on the page should follow the same format as references for quotations, as much as possible.

If we format references consistently, then it is less confusing to link to Wikipedia articles for titles or authors (I’d suggest limiting this to one link, to the author only if there is no article about the work). Why not use the visible shortcut to show the nature of the interwiki link?:

We do not want to link to Wikipedia articles about every mentioned city, publisher, etc. Linking every word is distracting. These are of secondary relevance to a citation, and to be found in a Wikipedia article about the work. Michael Z. 2014-03-13 19:18 z

I certainly oppose this much linking. I tend to oppose other linking (e.g. linking of author names and work titles) in reference templates, too, because users may click on the work-title links expecting to get to the cited page in the reference, and then have to backtrack. (On File pages that say "This is a file from the Wikimedia Commons. Information from its description page there is shown below.", I have on an embarrassingly large number of occasions been distracted enough to click on the "Wikimedia Commons" link and then have to backtrack and click the link to the actual file.) Linking author and/or work names when quoting a book under a sense, the way Thomas Jefferson is linked in [[liberticide]], is more acceptable, because I think users are less likely to click the links expecting to get to the citation, since they can already see the citation (quoted on the very next line). - -sche (discuss) 21:59, 13 March 2014 (UTC)
How about this format for the source link:
—This unsigned comment was added by DCDuring (talkcontribs) at 22:38, 13 March 2014.

I take on board the general opposition to overuse of wikilinking; I apologise, for I hadn't realised how distracting and potentially confusing others find it, because I don't myself find such linking particularly distracting nor at all confusing. Re the proposed alternatives to this version:

  • @DCDuring ("reduc[ing] the size of the source link by not having the entire site title be clickable" and the Rhipidistia/Palaeos format proposed above in your post timestamped: 22:38, 13 March 2014): Unfortunately, I think that your format would be more likely to cause users to follow the incorrect link (or at least, I believe it would if most users are anything like me). I say that because, when I see a blue link with either or a padlock (or whatever) beside it, I correctly interpret it to be an external link, whereas when I see a simple blue link, I interpret that as an internal (i.e., intra-Wikimedia) link (with dark blue for Wiktionary links and light blue for links to other MW projects); that, I expect, is what the software's developers intended. Although there are some links to sources that point to Wikisou rce, most source links on the English Wiktionary are external links (usually to Google Books or Usenet). The fact about sourcing here and the software developers' intent interact in the mind to produce in me (and, I assume, in others) the assumption that a link with or a padlock (or whatever) beside it is the source of whatever I'm looking at/for. (And it seems as if that assumption is also behind the recommended use of {{usex}}'s ref= parameter.) For me, the format you give would temporarily confuse me, until I'd hovered over the links to guess by their URLs the content of what they're pointing to. (Pace @-sche, your confusion-causing example concerns two plain blue, internal links, and not one internal link with one external link; I suspect you would be far less likely to make the same mistake were you to be faced with the latter combination.)
    • We already use fullurl links abundantly in our discussions when referring to entry revisions or diffs. Per a vote sponsored by DanP some time ago, we treat links to our sister projects under "External links", rather than, say, "See also". That you haven't been reeducated means you must have been following links on our more updated Translingual entries. DCDuring TALK 20:46, 22 March 2014 (UTC)
      • I meant in entries. Sorry to confuse the issue. I meant that the association is: single-bracket links = links to sources. — I.S.M.E.T.A. 14:17, 23 March 2014 (UTC)
  • @Dan Polansky ("I…oppose the extraneous 'New York: Harper & Brothers' added"): Come to think of it, it seems pretty unnecessary to mention the publisher at all; what do you say we remove the "Oxford: Clarendon Press" bit, too?
  • @Mzajac ("format[ting] references consistently" and "Why not use the visible shortcut to show the nature of the interwiki link?"): I agree with you regarding what I take to be your point about consistent formatting of references: that they should conform, as closely as analogously appropriate, to the standard format(s) laid out in WT:", yes? If that is indeed what you mean, why did you put the publication date immediately after the authors' names? For consistency with WT:", the date should be included in parentheses immediately after the publisher's name (see the citation from Treasure Island under WT:"#Between the definitions). As for including the ISBNs, they automatically create blue links, which many in this discussion have opposed, and they are of very marginal usefulness; I suggest we not include them for those reasons. I confess that I dislike "w: A Latin Dictionary", although largely on aesthetic grounds; would something like "w:A Latin Dictionary" be acceptable?

In the light of the discussion so far, I suggest one of the two following formats for, e.g., {{R:L&S|via|via}}:

What do y'all think? Also @Ungoliant MMDCCLXIV, Wikitiki89? — I.S.M.E.T.A. 19:37, 22 March 2014 (UTC)

  • I prefer the first example. — Ungoliant (falai) 19:53, 22 March 2014 (UTC)
  • I also prefer the first example. The little w is unlikely to clarify anything for anyone who doesn't know where the link points to anyway. --WikiTiki89 19:58, 22 March 2014 (UTC)
  • Why should the year be in brackets rather than after a comma? (I still oppose linking to Wikipedia. I don't oppose removing "Oxford: Clarendon Press".) --Dan Polansky (talk) 20:16, 22 March 2014 (UTC)
  • We really should provide links to information about the sources, so readers can better assess their reliability- even a great work like Lewis & Short has suffered from changes in scientific names and terminology over the past century or so, and there are some references that are great in some areas and downright awful in others.
I'm not happy with either format, though. We should have "(see A Latin Dictionary at Wikipedia)" set off from the rest, not an unlabeled bluelink that might be misinterpreted as a link to the main page for the dictionary at Perseus. If we don't waste space on trivia such as the publisher, there should be plenty of room to spell out the basics- especially since the template is doing all the typing for us. Chuck Entz (talk) 21:29, 22 March 2014 (UTC)
@Dan Polansky: WT:" prescribes the parenthetic format. I'd be happy for the date to be included after a comma instead.
@Chuck Entz: Hmm. What about this format?:
  • via” in Lewis & Short’s Latin Dictionary (1879) — For more information on this source, see its Wikipedia article.
 — I.S.M.E.T.A. 14:17, 23 March 2014 (UTC)

Stop treating Nynorsk and Bokmal as languages separate from Norwegian

Previous discussions: March 2008, July 2008, February 2011, January 2012, August 2012.
On March 20th, Wiktionary:Votes/pl-2014-03/Unified Norwegian will begin so that we can (hopefully) formalize a policy on this oft-discussed subject.

At present, we treat all three of "Norwegian" (code no), "Norwegian Bokmål" (nb) and "Norwegian Nynorsk" (nn) as languages. We have 5800 ==Norwegian== entries, 4800 ==Norwegian Bokmål== entries and 7400 ==Norwegian Nynorsk== entries. I and several others think we should stop treating Bokmål and Nynorsk as languages separate from Norwegian.
As was pointed out in a previous discussion, Nynorsk and Bokmål are two standards of Norwegian, but there are other standards (e.g. Riksmål) and many dialects whose words, because they cannot be labelled Bokmål or Nynorsk, would ironically be the sole users of the plain ==Norwegian== header if we were ever to seriously consider Nynorsk and Bokmål separate languages (but in fact most of the words which currently use the ==Norwegian== header are acceptable in both, or sometimes just one, of the standards).
Bokmål and Nynorsk are mutually intelligible (see e.g. Rubén Chacón-Beltrán's Introduction to Sociolinguistics, page 135, and Joshua Fishman and Ofelia Garcia's Handbook of Language and Ethnic Identity, page 434). They are no more different than US English and Indian English or txtspk or any of the other forms of English we handle very well with context tags rather than separate L2 headers.
In a previous discussion, someone (I don't recall who) made the point that there also exists a degree of mutual intelligibility between Norwegian and Danish. The person who made this point seemed to think it constituted an argument against merging Nynorsk and Bokmål. I dismiss this slippery slope fallacy. If anyone wants to propose merging Norwegian and Danish, they can start a section about that below this one, and the merits of it can be discussed, but the question I ask in this section is: "should we stop treating ==Norwegian Bokmål== and ==Norwegian Nynorsk== as languages separate from ==Norwegian==?" - -sche (discuss) 03:56, 12 March 2014 (UTC)

Confirming my support for the merge. --Anatoli (обсудить/вклад) 04:29, 12 March 2014 (UTC)
  • Support. From my understanding, this is similar to treating people-who-write-color and people-who-write-colour as speakers of separate languages. --WikiTiki89 04:49, 12 March 2014 (UTC)
    I don't know how carefully you chose your wording, but it's particularly apt: it is like treating people who write "i 1877 forlét Brandes København" vs "i 1877 forlot Brandes København" as if they were speakers of separate languages, lol. (Meanwhile, no problems have arisen from our treatment of people who write "nevermind, I realised I have to go pick up Andy; I'll see you later instead" and people who write "nvm. realized i gtg get Andy. cu l8r" as users of a single language...) - -sche (discuss) 02:01, 13 March 2014 (UTC)
    My wording was a highly condensed form of a rant about how Canadian English would the same language as British English, but separate from American English. Sometimes I'm careless with my wording, but this wasn't one of those times. --WikiTiki89 06:29, 13 March 2014 (UTC)
For what it's worth, we've already had this discussion, here, among other places. I strongly suggest people interested in the topic read some. I don't have a strong view myself, as I have next to no knowledge of Norwegian, but my impression is that most Norwegian contributors tend to support the split. Perhaps we can, at the very least, be a bit more civil about it this time around. -Atelaes λάλει ἐμοί 06:19, 12 March 2014 (UTC)
Bokmål and Nynorsk are different enough to be considered separate languages, because they are also different enough from Swedish and Danish. It's true that Nynorsk is skewed more towards the west of Norway while Bokmål is closer to the urban speech of the east, but that's only kind of relevant, because in reality both standards claim to be written representations of the full range of speech in all of Norway. That is, the aim of both is to be a single unified language for all Norwegians, not merely to represent some part of them. So even if they have a different dialectal base, they are not regional dialects of Norwegian; they're both Norwegian, period. That means IMO that they can't be treated as true languages; languages have phonology, and pronunciation, but Bokmål and Nynorsk don't have anything to do with speech. That in itself must mean that they are not languages, because they are written standards only. The situation is really more like that between ekavian and ijekavian Serbo-Croatian, where one is free to choose whichever. And similar also to Traditional and Simplified Chinese characters, or between Latin and Cyrillic Serbian, where different speakers within the same region might decide to use different forms depending on their preference.
Some Norwegians do claim to "speak Bokmål" or "speak Nynorsk", but what really happens there, as far as I know, is that people simply follow the vocabulary and idioms that are part of one standard or the other. The standards are fairly strict in that regard, they also tell you which words to use! But that could be considered a spelling pronunciation, and I doubt that people do this in everyday life; they'll just speak the local dialect without regard to the Bokmål-Nynorsk division. That of course shows the reality of the situation: as far as everyday speech goes, there's really no such thing as Bokmål or Nynorsk, but there's no "Norwegian" either. There's just varieties as spoken in Norway. Plenty of people will use words that are proscribed by (not included/sanctioned in) either standard. When reading Bokmål out loud, their pronunciation might actually be closer to the Nynorsk spelling, or it might match neither one closely.
So really, when we split these into separate languages, we end up with the confusing situation in which both language forms always have the same range of dialectal pronunciations and must therefore have duplicated, identical pronunciation sections. The split also leaves out any forms that are part of neither standard, which Wiktionary must include as part of its NPOV policy. We can't simply decide that all Norwegians must speak either Bokmål or Nynorsk, and whatever doesn't fit those can't be included. I think that implies that we must by necessity have a language header to account for forms found in neither standard. So that really gives us only two possible ways to handle the situation:
  • Three headers: Bokmål, Nynorsk, and a third language header for whatever words are left out of either one. This would create duplication of all pronunciation information.
  • One header: Norwegian, which encompasses both standards, as well as anything that falls outside them.
To me, the choice is not that hard. —CodeCat 14:55, 12 March 2014 (UTC)
Yeah, I never really understood the division when context tags can handle this perfectly well. --Æ&Œ (talk) 18:38, 12 March 2014 (UTC)
  • Support, this presents a clear-cut case for a single language with similar dialects. We might as well split ==English== into ==Deep South== and ==California==. bd2412 T 19:08, 12 March 2014 (UTC)
  • I too support merging Nynorsk and Bokmål back into Norwegian. They're certainly at least as similar to each other as the various standards of Serbo-Croato-Bosno-Montenegrin are, if not more so. —Aɴɢʀ (talk) 19:07, 12 March 2014 (UTC)
  • Blue Glass Arrow.svg FWIW, the WP articles at w:Nynorsk and w:Bokmål describe these as "written standards" and "language forms", not languages and not dialects. I know scant little about Norwegian in general, but on that basis, and lacking other context, I would weakly support the merge (weak due to my general ignorance in this area). ‑‑ Eiríkr Útlendi │ Tala við mig 19:11, 12 March 2014 (UTC)
  • Expanding on my earlier comment: Nynorsk and Bokmål are two of the prescriptivist written standards of Norwegian; they include many, but far from all, Norwegian terms and spellings. Some Norwegian words and spellings fell out of use before Nynorsk and Bokmål came into existence, others are not accepted by either standard but are still used (e.g. in dialects), and others are prescribed by other standards, e.g. Riksmål, the standard used by Norway's largest newspaper, Aftenposten. Our descriptivism and NPOV require that we include all these words.
    If we consider Norwegian to be a language, this is simple to do: we include all the words under a Norwegian header, and use whatever context tags are appropriate: {{cx|dialectal}}, {{cx|Nynorsk}}, {{cx|obsolete}}/{{obsolete spelling of}}, {{cx|Bokmål}}, {{cx|Riksmål}}, etc.
    If, on the other hand, we consider Nynorsk and Bokmål to be separate languages, things get harder:
    - Whenever a term or sense is attested, but we cannot determine if it is acceptable in Nynorsk or Bokmål, we must have a ==Norwegian== section. Readers may expect the ==Norwegian== section to document all the uses the term has in Norwegian, but it may document only the term's obsolete or nonstandard uses.
    - In the many cases that a term/spelling is used in both Nynorsk and Bokmål, we must duplicate content in a ==Norwegian Nynorsk== section and a ==Norwegian Bokmål== section on the same page. If the spelling is also used in Riksmål, we must document this as well... apparently by having a third section on the page, namely a ==Norwegian== section with a {{cx|Riksmål}} tag.
    - If the prescriptivist body which regulates Nynorsk (or Bokmål) deprecates a particular term or spelling, or newly allows a previously unacceptable term or spelling, we must change the language we consider the term to be based on the prescriptivist body's decree. We cannot determine the language of a term we encounter in the wild without referring to the prescriptivist body which governs that "language". This is fundamentally incompatible with our descriptivism and policy of NPOV. We would never strip the ==French== header off an attested French word or spelling just because the French Academy deprecated it. - -sche (discuss) 06:15, 13 March 2014 (UTC)
    PS, in previous threads it was noted that the ISO has granted Nynorsk and Bokmål their own codes: the ISO has also granted codes to many other lects which are not independent languages, listed here, and to at least one "language" that doesn't even exist, vmf/"Mainfränkisch". - -sche (discuss) 23:28, 13 March 2014 (UTC)
  • Is it necessarily true that Nynorsk and Bokmål are, by definition, prescriptive standards? I mean, there exist real works that are written using those standards; is it necessarily true that, no matter how many Nynorsk or Bokmål works a word appears in, it's only actually "Nynorsk" or "Bokmål" if the Norwegian Language Council endorses it as such? (I've heard people make similar claims about other languages — e.g., saying something is "not French" if it's not Academically recognized — and usually we're quite happy to ignore such prescriptivist poppycock. What about Nynorsk and Bokmål will forcibly compel us to heed the Council?) —RuakhTALK 00:35, 15 March 2014 (UTC)
  • We agree that we should not not strip the ==French== header off an attested French spelling just because the Académie does not accept it. If, however, we decide to have a marker (say, a header or a tag like {{cx|Académician French}}) indicating a spelling's presence in the Académie's standard, we should not apply it to spellings the Académie does not accept. Such spellings should have only the basic header (==French==) and not an {{cx|Académician French}} tag. Right?
  • In my view, our ==Norwegian== header corresponds to that scenario's ==French==, and we have headers for two specific standards (namely ==Norwegian Bokmål== and ==Norwegian Nynorsk==) that correspond to {{cx|Académician French}}. As with French spellings, I think Norwegian spellings which are not in the Bokmål and Nynorsk standards should have only the basic header (==Norwegian==), not Bokmål or Nynorsk tags or headers.
  • Just as the appearance of a particular word/spelling (say, *défoobaristiqué) in French books which otherwise use Académie-approved spellings does not override the Académie's rejection of défoobaristiqué and make it Académician French, the appearence of a word/spelling (*enfoobarig) in books which otherwise use Nynorsk-approved spellings does not, in my view, make enfoobarig Nynorsk.
  • Now, one might take a different view, and think ==Norwegian Bokmål== and ==Norwegian Nynorsk== are more like {{cx|US}} and {{cx|UK}}, i.e. dialects, than written standards. Under such a view, it would be more intelligible to see a word's presence in a "Nynorsk work" as indicating it to be Nynorsk. However, (1) CodeCat does a good job of explaining why Nynorsk and Bokmål "are written standards only", not true languages (or, I would add, true dialects). And (2) one would still have to refer to the prescriptive body's list of which words are Nynorsk—or how else would one determine whether the other spellings in the work were Nynorsk or Bokmål, in order to infer whether the work, and thus enfoobarig, was in Nynorsk or Bokmål?
  • (And so, because we're using L2 headers to do what should be done with context tags, [in my view] we reach the unusual position of having to change a term's L2 header based on a prescriptivist body's rulings. Changing a context tag based on prescriptivists' rulings is, in contrast, fairly common: if, for example, authorities proscribe a term, and we notice, we often add a {{cx|proscribed}} or {{cx|sometimes|proscribed}} tag and (on our best days) explanatory usage notes.)
  • - -sche (discuss) 19:54, 15 March 2014 (UTC)
  • I guess I'm just missing the step from "this is a written standard" to "this is wholly owned and operated by a standards body". I mean, U.S. English also has a written standard, distinct from any spoken dialect, but no single entity defines that standard, and we don't find ourselves preternaturally incapable of identifying works written in it. (My point being — if you're correct that Bokmål and Nynorsk are really nothing but prescriptivist figments, and not language varieties that people actually write in, then not only should they not be language headers, but they probably shouldn't even be context tags.) —RuakhTALK 00:24, 16 March 2014 (UTC)
  • support I knew nothing about Bokmål, Nynorsk, and Norwegian before from reading the above. I just want to point out that nb, nn, and no are the only languages on Wiktionary that tangle together in our Category hierarchy. E.g. nb:Sciences and nn:Sciences are listed under no:Sciences. Having written a script to parse the category tree, I can say with some authority that no other languages have this kind of structure on Wiktionary. I've previously wondered why no/nb/nn do this. Thanks all for explaining. It would be nice if Norwegian had the same category structure as every other language. Pengo (talk) 08:19, 13 March 2014 (UTC)
  • To ensure that they notice and can join this discussion (if they want to), I now ping recently-active users who often participate in discussions of lect/code mergers, and/or indicate proficiency in Norwegian. User:Njardarlogar, User:Teodor605, User:Metaknowledge, User:JorisvS, User:Liliana-60, User:MaEr, User:LA2. Anticipating that some may feel that this issue, having been discussed before, should not be discussed again, I point to the number of users who participating in this thread who were not involved in previous threads, the new arguments presented in this thread, the unanimity of the users who have participated so far, and the fact there will be a vote after this thread runs its course, which will hopefully finally result in an actual policy. - -sche (discuss) 21:44, 13 March 2014 (UTC)
I've asked a question on the the vote's talk page about how grammar differences should be handled after a merge. Please comment. --Anatoli (обсудить/вклад) 01:28, 14 March 2014 (UTC)
  • support. I was just notified of this discussion by - -sche. My native language is Norwegian (Bokmål Face-smile.svg) and I speak a few other languages. I must say that my personal view is that these two languages ideally should be treated as one. But that is a practical view on it. As most contributors have pointed out, the differences between the two are small. I definitely support the use of {{cx|Nynorsk}} and {{cx|Bokmål}} tags. If you drop that altogether, you would definitely make a mess and create a lot of confusion.
However, this is quite intricate. One would ideally also need a {{cx|Riksmål}} and a {{cx|Nonstandard}} tag. Riksmål is really a subset of Bokmål. And you also need a tag that allows all the permutations of Bokmål, Nynorsk and Riksmål. I.e. (N/B/R), (N/B), (B/R) and theoretically also (N/R). You also need a tag to point out if a word lacks adequate information, In fact, that is how we have decided to do things on the Norwegian project, if not for an altogether different reason. The number of active contributors on either project is small, so instead of duplicating our effort we decided to join forces.
The general sense here in Norway, however, is to keep the two languages apart. The topic is politically quite sensitive, and there are plenty of zealots on either side. Had there been more active users on the Norwegian project I am sure we would have kept the two languages as separate projects. The grammar in the two subsets of the same languages vary quite a lot and there are lots and lots of words that are spelled differently. It is a very time consuming effort to create grammar templates that cater for all needs.
So, to sum it up, I think it is too easy to dismiss the two as one language. However, for practical reasons I think they should be treated as on non-Norwegian projects. I sincerely hope other Norwegian native speakers will also have their say. I don’t want to be the sole voice of the ones who actually speak this language.
  • Might I also take this opportunity to ask for help on the Norwegian project itself? We are simply too few active contributors to progress in a speedy fashion. We need help with some of the features that bigger projects have implemented long ago. E.g. making translation templates that helps a non-proficient user to quickly add a translation and to add that translated word in the Norwegian project itself. And helping out with programming some smart features like words lacking important sections like a grammar section, etc. I love some of the gadgets I have seen elsewhere but lack the skill to implement them. --Teodor (dc) 00:18, 15 March 2014 (UTC)
  • Note: The fact that the two things are mutually intelligible is far from sufficient for them to be the same language; Czech and Slovak are mutually intelligible. The claim that they are one language needs a better support. --Dan Polansky (talk) 08:09, 15 March 2014 (UTC)
    • I already gave this support. Czech and Slovak represent distinct dialect bases and nobody would claim that either one could cover the whole Czech-Slovak continuum. But with Bokmål and Nynorsk, there is no such geographical "tie": they are both used scattered throughout Norway. It would be as if written Czech and Slovak were both used in both countries, and each person would be free to choose which one, regardless of what dialect they actually spoke, and we'd call them "Knižný Jazyk" and "Nový Československý". —CodeCat 14:20, 15 March 2014 (UTC)
  • Keep separate. Although a lot of words are the same in Bokmål and Nynorsk, as well as Danish and Swedish, there are differences in the inflections between the two languages, especially the plural inflections; only in very few cases are the inflections for nouns completely the same, such as for snømann, or mass nouns. In some cases the noun gender can differ; a feminine noun in Nynorsk is usually both masculine and feminine in Bokmål, probably a compromise between Riksmål, which doesn't have a feminine gender, and Nynorsk.
    I have gradually been eliminating the Tbot entries, determining whether each one is Bokmål, Nynorsk, or both. Any new entries I make are split between the two languages where appropriate. It should also be pointed out that there are separate Wiktionaries for Bokmål and Nynorsk, so why should the English Wiktionary differ from this? There are also separate Wikipedias for both languages. Donnanz (talk) 19:50, 15 March 2014 (UTC)
    There is no problem at all in having separate inflection tables for different standards, nor for having different genders for different standards. This is not an obstacle to merging them. --WikiTiki89 19:57, 15 March 2014 (UTC)
    [edit conflict] I don't understand why separate inflections is an argument for keeping them separate. Look at the inflection tables for Catalan verbs on the Catalan Wiktionary for example (ca:cantar). They contain quite a few different alternative forms, depending on dialect; some are part of a standard (Catalan has multiple standards, like Norwegian), some are not. Our current inflection tables don't cover all of these standards, but they probably should at some point. That is not a reason to separate Catalan into multiple languages of course, so it can't be a reason to do that for Norwegian either. All it means is that we need to include more than one inflection table in some Norwegian entries, but that's hardly a problem. As for gender, I don't see how that's a problem either. Norwegian is not the only language where nouns may have ambiguous gender; Dutch is another example, and there are more. We haven't had any trouble with handling those cases, so again we wouldn't expect trouble in this case. Furthermore, you haven't addressed any of the issues that arise from keeping the languages separate (like the question of nonstandard forms), which are far more disruptive. —CodeCat 20:03, 15 March 2014 (UTC)
  • Re separate wikis: There are also separate Serbian and Croatian Wiktionaries and Wikipedias, yet en.Wikt found it best to have one Serbo-Croatian language rather than several. Re different inflections: English and German also have words which inflect differently, or (in the case of German) have different gender, in different varieties of the language. Joghurt#German, with its multiple genders and inflection tables, shows how this can be handled. - -sche (discuss) 20:06, 15 March 2014 (UTC)
  • It is both true and false that there is both a Bokmål and a Nynorsk Wiktionary, as Donnanz claims. The nn project is dead. Check the number of non-bot contributions over the last three years... There was a vote, in 2008 or 2009 I think, to merge them, and all of the nn content was merge into the no project. But that was because there weren't enough active users, especially on the Nynorsk project. I wasn't contributing much at that time so I didn't take part in the discusion. The no project now uses tags to mark nb, nn, Riksmål, and also non-standard. I agree with Njardarlogar when he says there are no dictionaries that treat Nynorsk and Bokmål as one language. If we have the resources to keep them separated I support his view. However, I fear that will just lead to poorer quality, especially when it comes to Nynorsk. The number of active users of Nynorsk is steadily declining, and even in the Norwegian project itself it is hard to find contributors willing to put in the necessary time to check grammar, make grammar templates accomodate Nynorsk inflections etc. Is it very likely that the English project would fare any better? --Teodor (dc) 20:32, 16 March 2014 (UTC)
  • Strong oppose It would be hypocritical to merge the two versions of Norwegian while keeping the while keeping Danish, Norwegian (Bokmål + Nynorsk) and Swedish separate (because that merger is not likely to happen in the near future, is it? Being pragmatic here). I have yet to encounter a dictionary that treats Bokmål and Nynorsk as the same language, just as I have yet to encounter a dictionary that treats Danish, Norwegian and Swedish as one language (even though they are; check this list to get an idea).
We don't have separate headers for e.g. British English and American English because they are hardly ever treated as separate languages (there is no good reason for that they should be, either). British and American English have the same origin; they share a common core. Nynorsk and Bokmål do not. Bokmål evolved gradually from Danish, while Nynorsk was created from scratch based on rural Norwegian dialects; so the two standards are not as related as one could initially be led to believe (think of convergent evolution in biology). Again, please take a look at the list I just linked to to get an idea of what we are dealing with. It is in no way such that Bokmål + Nynorsk = Norwegian while Danish and Swedish are entirely different beasts; and the list will give you a strong indication of this. --Njardarlogar (talk) 23:28, 15 March 2014 (UTC)
That still doesn't change the fact that Bokmål and Nynorsk are both standards intended to be used by speakers of all dialects in Norway. That makes them fundamentally different from Danish vs Swedish vs Norwegian. The Swedish standard language is only intended to be used by self-declared Swedish speakers in Sweden, not by Norwegian speakers in Norway. Thus, even though there are quite some differences between Bokmål and Nynorsk, the basic fact is still that they are both used in all of Norway, and there is therefore no strong correlation between location and the standard used. Two neighbours speaking the same local dialect in Trondheim might write in Bokmål and Nynorsk respectively. That, to me, makes all other arguments about their supposed differences, and comparisons with other Scandinavian dialects, irrelevant. They are two standards for the same set of dialects; i.e. those spoken in Norway. Those dialects should therefore be grouped together as "Norwegian".
Aside from that, you have not addressed any of the problems that having three (or more!) headers for each separate standard creates. Just read through this discussion and you'll get an idea. I and others have already argued that having just 2 headers is untenable with respect to NPOV, and 3 or more headers creates far more problems than it solves. No matter how many standards we treat as separate languages, we still eventually end up having to need one extra language name to cover anything that's not part of any standard. The practice of treating language standards as normative for what can or cannot be included in Wiktionary under a particular header also goes completely against all practice in this area so far. The names "Bokmål" and "Nynorsk" are by definition prescriptive. So unless you provide solutions for the problems noted in this discussion, your oppose vote really does not accomplish anything. It basically becomes "yes, we have problems, but what about my principles!". —CodeCat 23:47, 15 March 2014 (UTC)
I agree that having three headers is a rather horrible solution that could create a bad precedent in terms of headers cluttering up the pages. At the same time, there already are thousands of living languages on this planet, so a header or two extra doesn't move us into a new league when it comes to number of headers.
I don't see why we should have more than 3 headers. There are only 3 separate codes for the Norwegian language, and there are only 2 normally recognised standards of Norwegian, namely Bokmål and Nynorsk. There are two more variants of written Norwegian, and those are Riksmål and Høgnorsk. Riksmål is technically a subset of Bokmål (plus perhaps a few extra words/spelling variants that are hardly ever found in contemporary Norwegian texts. Also, see which language the Riksmål Dictionary is described as being written in in one of the biggest online Norwegian bookstores..). Høgnorsk is very uncommon, and is not a standard. Riksmål can be treated under the Bokmål header while Høgnorsk can be treated under the Nynorsk header.
Yes, we'd solve a problem with two headers. But we'd create new problems with the tags. How exactly would that work? Should every Norwegian meaning be tagged so that the reader is never in doubt whether the editor simply forgot to add the tag or is not familiar enough with Bokmål and Nynorsk to know that this meaning or word is only found in one of the two standards? And what about words that are considered dialectal in one standard and ordinary in another? (Bokmål 2 vs Nynorsk 1; hin is another example, 2 vs 1). Then there's Riksmål and Høgnorsk again. Should one expect an editor to be sufficiently familiar with these two minority variants in order to tag all entries correctly? Should only Riksmål and Høgnorsk words/spellings that are different from Nynorsk and Bokmål be tagged; or would we have to tag all words that can be said to be Riksmål and Høgnorsk?
How relevant is it really that Nynorsk and Bokmål are aimed at Norwegians and not Danes and Swedes? It doesn't sound very descriptive to mention this as an argument. I could write in Swedish and my fellow Norwegians wouldn't have any major issues with understanding what I wrote. Likewise with Danish. Danish used to be the official (de facto or otherwise) language in Norway for centuries, anyway, so it's not a completely theoretical objection. On board Norwegian airliners, if the captain is Danish or Swedish, the captain will give information first in his native Scandinavian variant, then in English. Just like Norwegian pilots give information first in Norwegian, then in English.
Then there's also the important fact that despite that both Bokmål and Nynorsk are supposed to be used by all speakers of all Norwegian dialects, they still have very different origins. Bokmål stems from Danish, while Nynorsk stems from rural Norwegian dialects. So by the argument you are providing, if suddenly one day Swedish started to be popular as a written language in Norway, and some proponents of it meant that all Norwegians should write it, then suddenly Swedish is Norwegian; which linguistically is nonsense.
As for being prescriptive; you have to realise that there is no Norwegian language. There is a Mainland Scandinavian language with 4 official written standards: Bokmål, Danish, Nynorsk and Swedish. It's no more prescriptive to label a word as either Bokmål or Nynorsk than it is to label it as Swedish or Danish.
-sche writes We cannot determine the language of a term we encounter in the wild without referring to the prescriptivist body which governs that "language". This is fundamentally incompatible with our descriptivism and policy of NPOV. And indeed, this goes for all of Scandinavian, not just Bokmål vs Nynorsk. --Njardarlogar (talk) 09:51, 16 March 2014 (UTC)
  • At least Njardarlogar knows what he is talking about, being a native Norwegian. There are two officially recognised languages - Bokmål and Nynorsk; Riksmål isn't, and I don't know anything about Høgnorsk. Although a lot of words are similar, and believe it or not an egg is et egg (Bokmål) or eit egg (Nynorsk), there are also many words which differ fundamentally; compare ukrainer with ukrainar, eier with eigar, and kirke with kyrkje, to quote just a few. I suggest comparing a Nynorsk text with one in Bokmål (Wikipedia is a good source), it is quite obvious that the two languages are sufficiently different to be classed as languages in their own right.

I have been slowly sorting out the mess that exists at present, splitting between Bokmål and Nynorsk where possible, entering missing inflections, replacing inflection tables where they are incomplete or erroneous - I am less than enamoured with inflection tables anyway and prefer a more "in your face" presentation of inflections. But it's a slow job, and I have only tackled nouns so far. But my ideal would be to scrap the "Norwegian" (no) heading entirely, leaving us with just Bokmål and Nynorsk. I would hate to see the work I have done so far undone by a whim. It would be very disheartening, and could lead to my ceasing to contribute. Donnanz (talk) 13:32, 16 March 2014 (UTC)

Re: kirke and kyrkje and , etc. Despite orthographic and grammar differences, Bokmål and Nynorsk and can still have the same L2 - "Norwegian". Having {{context|Nynorsk}} and {{context|Bokmål}} would add them to Category:Norwegian Bokmål language and Category:Norwegian Nynorsk language but SoP categories without such a distinction. E.g. kyrkje:

===Alternative forms===
* {{l|no|kyrkja}}
* {{l|no|kirke}} {{qualifier|Bokmål}}

From {{etyl|non|nn}} {{term|kirkja|lang=non}}.


#  {{context|Nynorsk|lang=no}} [[church]]

* {{R:Dokumentasjonsprosjektet|lang=no}}

And kirke:

[[Image:Arneberg kirke.jpg|thumb|kirke]]

* {{IPA|/çɪrkə/|lang=no}}
* {{rhymes|ɪrkə|lang=no}}

===Alternative forms===
{{l-nn|kyrkje}} {{qualifier|Nynorsk}}


# {{context|Bokmål|lang=no}} a [[church]] (''a house of worship'')


--Anatoli (обсудить/вклад) 22:39, 16 March 2014 (UTC)

I don't think we should use context labels to specify the variety. Context labels are for sense-specific things, but the Bokmål-Nynorsk distinction is headword-specific. So it would be better placed on the headword line, like the Norwegian Wiktionary does. —CodeCat 22:13, 17 March 2014 (UTC)
But that's a wider problem. We already use context labels for that in all other languages that have this issue. --WikiTiki89 22:23, 17 March 2014 (UTC)
Which others are those? —CodeCat 22:26, 17 March 2014 (UTC)
English, Russian, German, e.g.: {{context|UK|Ireland|India|Pakistan}}, which add to appropriate categories. We could use different headwords to allow difference in grammar but if we are to unify Norwegian, it's better to stick to "no", not "nb" and "nn" naming convention. --Anatoli (обсудить/вклад) 22:35, 17 March 2014 (UTC)
  • after e/c... Heck, I thought that was the whole point of the discussion about Chinese entries -- replacing multiple lang headings that necessitate lots of duped content with a single lang heading and multiple context tags to clarify ... well, to clarify the context, in this case, which Chinese language/dialect. Perhaps I misunderstood? ‑‑ Eiríkr Útlendi │ Tala við mig 22:41, 17 March 2014 (UTC)
  • Chinese is below :). The discussion is similar but not the same, as Chinese is not inflected, so it's easier to unify but Norwegian is inflected, so headwords should cater for NB/NN differences. --Anatoli (обсудить/вклад) 22:44, 17 March 2014 (UTC)
@CodeCat Not all meanings are shared between the two standards. Even if two words ultimately have the same origin, intermediate steps in their etymology can be different. That's why I brought up convergent evolution earlier: even though Bokmål and Nynorsk may appear very similar on the surface, their inner workings can be pretty different. --Njardarlogar (talk) 22:44, 17 March 2014 (UTC)
All differences, including etymological can be handled in a unified Norwegian approach. Can they not? --Anatoli (обсудить/вклад) 23:11, 17 March 2014 (UTC)
Good luck doing it without context tags. And as I already mentioned, there are meanings that are considered dialectal in one standard and completely normal in another. How to handle that? Context tags? Usage notes? --Njardarlogar (talk) 23:18, 17 March 2014 (UTC)
There is always a way (if there's a will), e.g. {{context|Bokmål|regional Nynorsk}} The exact context name can be decided and labels/categories created. No linguistic Bokmål or Nynorsk information should and will be lost. Having one L2 header will just make it easier to create new entries in any Norwegian variety or neutral (not applicable to any). Semantics is the easy part. Perhaps grammar should more of a concern to you but I think this can be solved as well. --Anatoli (обсудить/вклад) 23:24, 17 March 2014 (UTC)
Another detail. If Nynorsk and Bokmål words are to be considered variants of each other rather than belonging to separate languages, we'd have to provide usage examples on the wrong entry, which would be a bit weird (e.g. having a usage example of øye on auga or vice versa). --Njardarlogar (talk) 11:52, 18 March 2014 (UTC)
We'd have to do that? Why? As far as I know, we don't have to do that for other languages, nor are we in the habit of doing it. Cyrillic-script Serbo-Croatian entries have Cyrillic-script usexes, Latin-script entries have Latin-script usexes, don't they? English spellings like grey have quotations and usexes that use grey, while spellings like gray have quotations and usexes that usex gray. Sometimes lemma entries have quotations from all forms of a word, e.g. wight includes a quotation that actually uses the obsolete spelling wyght, but users have said in past discussions that they felt that was good because the lemma should represent all the forms. In any case, I don't see how or why our treatment of Norwegian usexes under one ==Norwegian== header would or should differ from our treatment of Serbo-Croatian or English usexes. - -sche (discuss) 17:21, 18 March 2014 (UTC)
I don't see how or why our treatment of Norwegian usexes under one ==Norwegian== header would or should differ from our treatment of Serbo-Croatian or English usexes. Because Nynorsk and Bokmål are separate standards with different grammar, vocabulary, sentence structure etc. It's not adequate to give an example in only one of the two standards. And examples need to be attached to meanings. Your example at grey is not (even if it can be deduced which meaning is meant).
Either we duplicate the meanings in some way (such that the words are no longer treated simply as variants), or we keep the examples at the lemma page. Having references on the variant's page to meanings on the lemma page is unstable and does not strike me as a viable alternative.
Another issue is that Bokmål and Nynorsk can have two or more unique variants of a word, in which case it is not obvious where the examples of usage would be found; so there would be a need to duplicate the usage examples at up to at least 8 different entries (as an example, the Nynorsk word fliseleggja has 8 variants: fliseleggja, fliseleggje, fliselegga, fliselegge, flisleggja, flisleggje, flislegga and flislegge) --Njardarlogar (talk) 17:53, 18 March 2014 (UTC)
The vote on this matter has now opened: Wiktionary:Votes/pl-2014-03/Unified Norwegian. - -sche (discuss) 19:48, 20 March 2014 (UTC)

A new format for Chinese entries (multisyllables)

Question book magnify2.svg
Input needed: This discussion needs further input in order to be successfully closed. Please take a look!

At present, our Chinese (Mandarin mostly) entries are structured like this. This format has a number of drawbacks:

  1. It splits the Chinese heading into multiple headings, resulting in duplication of information, since the written form of Chinese is shared across Chinese varieties. It does not take into account the fact that simp-trad correspondences are shared by all Chinese varieties (hence hanzi-box is duplicated for every variety), the semantics is more than 99% of the time the same across dialects, and the etymology is essentially shared across dialects as well.
  2. It duplicates simp-trad conversions in hanzi-box and every headword template.
  3. It duplicates pronunciation in every headword template and in the Pronunciation section.
  4. It duplicates etymology in the hanzi-box and in the Etymology section.

As a consequence, the Chinese-language presence here has been overwhelmingly "Mandarin", which is the basis of Written Chinese, and limiting the growth of other varieties (cf. nouns - Mandarin 20467, Cantonese 317, Wu 10). Hence a change in the format is much needed. I have created User:Wyang/歷史 and some new templates (see below). What do people think of this format? The code for an entry in such format would be (excluding the temporary userspace markers since the templates do not exist yet):

{{zh-forms|s=历史|to go through,<br>to experience|history,<br>records}}

First attested in ''{{w|Sanguozhi}}'', meaning "records of past events".

|c=lik6 si2
|w=5liq sr


# {{cx|obsolete}} [[record]]s of past events; [[historical]] records
# [[history]], [[past]]
# past [[experience]]s of a person, the history of a person
# ({{zh-l|歷史學}}) [[historiography]], the [[study]] of history

===See also===
* Synonyms: {{zh-l|過去|past}}, {{zh-l|以往|past}}
* Antonyms: {{zh-l|將來|future}}, {{zh-l|未來|future}}
* Derived terms: {{zh-l|歷史學|historiography}}, {{zh-l|歷史劇|historical play}}, {{zh-l|歷史觀|historical view}}, {{zh-l|歷史性|historic; of historic significance}}
* {{Sinoxenic-word|歷史|s=歴史|れきし|rekishi|역사|yeoksa|lịch sử}}

which I think is much more succinct than if all these varieties are created in separate headings.

Thanks for any feedback and input on this in advance. Wyang (talk) 08:37, 12 March 2014 (UTC)

Personally, I like the idea. --WikiTiki89 08:43, 12 March 2014 (UTC)
I like it too. Since, as you, @Wyang: mentioned, the presence here has been overwhelmingly "Mandarin", there is little concern about missing user examples in Cantonese, Min Nan, etc (such as examples of vernacular written Cantonese or Min Nan) but it's still possible. I suggest to use {{cx|Cantonese}} whenever we have a specific variety entry, which should add to a corresponding subcategory.
Categorising: I think specific terms, not used in standard Chinese (Mandarin), could be added to existing categories, such as Category:Cantonese nouns, which should in turn belong to Category:Chinese nouns.
This is a big change, so it most likely require a vote, get ready for opposition. Note that we just had a vote allowing Cantonese Jyutping, which was a bit controversial, IMHO - not clear if only monosyllabic or polysyllabic entries are allowed. --Anatoli (обсудить/вклад) 10:37, 12 March 2014 (UTC)
Re: your example entry: User:Wyang/歷史:
Every foreign script entry has transliteration in the header, apart from Lao and Burmese, which have separate transliteration tables. Since Hanyu pinyin is the standard transliteration system for Mandarin, in my opinion, each PoS header should also repeat pinyin, like this: 歷史 (lìshǐ) (even if they are multiple). I'm OK to move simp./trad. differences to the Hanzi section.
Entries should be sorted by numbered pinyin (as agreed previously), so 歷史 should be sorted by li4shi3 and appear under letter "L", not under character 歷 or radical 止. The simplified equivalent 历史 should be sorted the same way, as the current entry 历史. We now have the functionality to convert toned pinyin to numbered pinyin, please consider adding this functionality for sorting.
Toned pinyin was specifically allowed by a vote and they are useful to locate Hanzi entries. Pinyin entries can also be generated in an accelerated method from entries. This functionality should also be available in the long run, IMO. --Anatoli (обсудить/вклад) 10:50, 12 March 2014 (UTC)
1. We can add additional parameters to the pronunciation template, since string parsing is now achievable. We could, for example, use
|c=lik6 si2
|w=5liq sr
, which would add the page to the following categories: 1) "Chinese nouns", "Chinese verbs", "Chinese adjectives", "zh:History|*", "zh:Insects"; or 2) "Mandarin nouns", "Cantonese nouns", ... depending on what everyone prefers. Each of these categories would be sorted by the respective sort key, eg. Pinyin for "Category:Mandarin nouns". Regardless, one can always call variety-specific categories, using codes like
in the definition.
2. Regarding headword templates - I think we can generalise it to something like Template:zh-pos, since Chinese is uninflecting and there is no point in using a different base template for each PoS. I don't think the absence of a transliteration in the template is a problem. The requirement of transliterations is to guide pronunciation. Here the Pronunciation section has made romanisations and IPA pronunciations in various varieties sufficiently clear to readers, and I think we can use the precedent of Burmese entries (which also involves multiple transliterations) in this case.
3. Regarding Pinyin: Pinyin in the pronunciation template can be made clickable. And when acceleration creation is enabled in the preferences, one can click on the uncreated Pinyin in trad-form entries to create it. Trad-to-simp conversion can be performed hardly with any errors. Wyang (talk) 11:31, 12 March 2014 (UTC)
Re: There is no point in using a different base template for each PoS. But that's how all all languages are structured here, including non-inflected ones. It allows categorisation and other things, e.g. for nouns, you can add optional measure words (classifiers). If they are simple, it just makes it easy when recreating each headword for each SoP.
The suggested pronunciation section is quite big and may be even overwhelming to some users (I don't mean it's bad). What if more dialects are added? Having a simple pinyin looks much neater and that's what users see in published dictionary. Just standard pinyin in brackets, next to Hanzi, no other schema. Please consider. Burmese has various standards but I'd prefer that they've selected at least one to use in the headword, same with Lao. Just my opinion but that's also the established practice with current Mandarin entries. (Note that Burmese and Lao, currently a bit neglected have been maintained by one user each - Angr and Widsith but we have a bunch of editors with Chinese now). --Anatoli (обсудить/вклад) 11:51, 12 March 2014 (UTC)
Further feeback: See also, Synonyms, etc. should follow the standard format. Undecided about "Sinoxenic descendants" yet but we usually use "Derived terms". --Anatoli (обсудить/вклад) 12:00, 12 March 2014 (UTC)
Actually, I was wrong about Burmese and Lao. I've just checked. They do have transliteration. Not sure why I thought so, sorry. Thus, each non-Roman script entry has transliteration in the header (if correctly formatted). --Anatoli (обсудить/вклад) 12:10, 12 March 2014 (UTC)
Sure, I still mean using
in the headword, but make templates like Template:zh-noun call Template:zh-pos or some module, instead of making them call Template:head separately like Template:cmn-noun etc. currently do. -pos templates also exist for some other languages, and in fact I got this idea from Template:ja-pos. (Others: ko-pos, tt-pos, oj-pos)
If we convert existing entries to this format, the vast majority will just have one variety in the Pronunciation section, i.e. Pinyin. The example I created was more of an extreme case - I doubt that section will be heavily populated by readings. Four (Mandarin, Cantonese, Min Nan, Wu) probably represents the limit. For other varieties, there is simply a lack of comprehensive dictionaries (not dialectal word dictionaries) like those dedicated to these four.
Regarding "see also" content - Those are not part of the change. The preference for clustered "see also" terms is more of a personal habit - I find it a little easier to locate content when various things are grouped by relatedness.
Burmese headword templates don't have transliterations (လောက). The transliterations are handled by Template:my-roman, which is kind of similar to what the proposed template "Template:zh-pron" would do. Wyang (talk) 12:17, 12 March 2014 (UTC)
Parameterised Template:zh-pos is not a bad idea. I didn't mean it must be only Template:zh-noun, etc.
Burmese and Lao entries are inconsistent, apparently. Some do and some don't have transliterations. My preference as a user and as an editor to have pinyin in the header, even if pronunciation sections are smaller.
If Chinese entries lose "rs" value, it would be a great change. Showing trad./simp. only Hanzi box is also a great change. No need to do numbered pinyin is also good. --Anatoli (обсудить/вклад) 12:37, 12 March 2014 (UTC)
OK, thanks, I see your point about the header. I agree with the rest. Wyang (talk) 22:44, 12 March 2014 (UTC)
Moreover, I'd like to introduce Zhuyin into the header, automatically converted from Pinyin, e.g. Pinyin: lìshǐ, Zhuyin: ㄌㄧˋ ㄕˇ. Pity you don't like the idea but Japanese and Korean entries do have transliterations, so does the overwhelming majority of other non-Roman languages.
I think the pronunciation section could and should look simpler (and smaller) and topolects should be added vertically, not horizontally, similar to how US/UK English sections are organised, e.g. man. That way, it won't matter how many regional pronunciations are added. I'm sure there are researches into many dialects. They many lack transliterations or sound recording but IPA could be optionally obtained and added to the bottom. --Anatoli (обсудить/вклад) 23:12, 12 March 2014 (UTC)
I have added Zhuyin. I am not sure about the vertical arrangement.. This is what it would look like with the tables now. To me it looks not as aesthetically pleasing as the horizontal one, since the four stacking tables are limited in width. Do you mean using the default list format of Wikimedia and getting rid of the tables? But again, the arrangement at man#Pronunciation to me looks like a mess. I prefer a side-by-side tabular format.
The Shanghainese one was already running out of romanisations a bit - it is using "WT romanisation", something I created. IPA is even harder for people to become familiar with. I doubt there will be people wishing to add other pronunciations. Wyang (talk) 00:01, 13 March 2014 (UTC)
Thanks! Although it would be great if the header had 歷史 Pinyin: lìshǐ, Zhuyin: ㄌㄧˋ ㄕˇ :) Maybe yes, doing without a table, something like this but with proper bolding, linking, etc:
Mandarin (Standard Chinese, Beijing)
Pinyin: lìshǐ
Zhuyin: ㄌㄧˋ ㄕˇ
IPA (key) /li⁵¹ ʂʐ̩²¹⁴⁻²¹⁽⁴⁾/
Cantonese (Standard Cantonese, Guangzhou)
Min Nan (Taiwanese)
--Anatoli (обсудить/вклад) 00:32, 13 March 2014 (UTC)
I see three differences between the proposed format and our current format.
  1. Some templates are changed. For the most part, the changes look like improvements, but proposed/new zh-pron needs more work: the proposed horizontal arrangement is so wide that smartphones and computers with small screens will have trouble with it, but the proposed vertical arrangement pushes the actual content (definitions) several screens down. (And if additional dialects are added, it will become even more unwieldy.) A large part of its bulk comes from its tabular format. Reworking it to use our usual bulleted-list format would seem to solve these problems.
  2. The proposal lumps Synonyms, Antonyms, Derived terms, Descendants etc into the ===See also=== section. I agree with Anatoli that we should not do this. There is no reason for our treatment of Chinese antonyms to differ from our treatment of Latin ones. The proposed format would become particularly untenable whenever a word had a large number of synonyms, antonyms, and/or derived terms, like some of the words for tea probably do.
  3. The proposal groups the various Chinese languages under one header. This is the bit that may be most controversial. As spoken, the varieties are often not mutually intelligible... yet (as has been noted) they are written the same way, and Wiktionary is a written dictionary.
- -sche (discuss) 00:51, 13 March 2014 (UTC)
Thanks. Wyang has responded to your second point, it's not part of a change, so, it should probably be removed from the example page to avoid confusion. I like your bulleted example. Most entries won't have such a big variety, since we don't have editors and knowledge of dialects. The existing ones should be imported.
On the main issue of merging all topolects into one L2 "Chinese" there has been surprisingly few comments. I'm sure there will be resistance, if there is a vote. Yes, all written forms can be accommodated into one Chinese section, despite them being not mutually intelligible (when spoken out loud!). Perhaps THIS problem should be solved first. There are other important improvements. If we ignore the pronunciation section (partially automated), entries will become easier to create. The main difficulty and source of errors for beginners was "rs" (radical sort) value, which stopped even native speakers from adding contents. --Anatoli (обсудить/вклад) 01:25, 13 March 2014 (UTC)
Thanks. Format of the pronunciation template isn't an issue; it can be easily changed when all pages link to it, whether it be a table or a list. To me the list form is not as good-looking, but it is much easier to change to. I am not sure what was meant by the unresolved problem with the merger - like said above, variety-specific labels can be applied when necessary. Looking at Category:Cantonese nouns and Category:Min Nan nouns in traditional script, most currently-existing entries in these varieties are not dialectal words. There isn't a huge number of currently-existing entries for other varieties (something less than 2000) compared with Mandarin (around 35000), and these can be manually examined one-by-one. Wyang (talk) 04:17, 13 March 2014 (UTC)
The issue is not only technical but political. I know that most existing non-Mandarin entries only differ in pronunciation. The existing written dialectal forms not included here are small in number, even if they may be frequently used. I meant that in the past, after a seeming agreement in the BP, there was a vehement opposition to certain changes. Some editors are currently away or ignoring this page or, this change may create other issues later. If we have a successful vote, then there's no worry. Apart from Pinyin/Zhuiyn missing in the header (which I consider important and it matches the majority of our entries in non-Roman script) and minor formatting (please consider -sche's suggested format or similar) I personally have no issue with your proposal. Calling on other Chinese-aware editors: @Tooironic:, @Jamesjiao:, @Kc kennylau:. --Anatoli (обсудить/вклад) 05:08, 13 March 2014 (UTC)
(Cantonese native here) @Atitarev: Thanks for notifying me. I support this proposal. By the way, this is a matter of merging different dialects of Chinese, so I think that WT:RFM should be notified for this matter. {{context}} should also be modified so that it can show dialects, as I am aware that 告白 means a declaration of love in other dialects but advertisement (especially on television) in Cantonese. I support this proposal because I often see lack of definition in Cantonese entries while the definition is complete, precise and concise in their Mandarin section, and the definition in Cantonese is often the same as the definition in Mandarin. Still have to look out for dialectal differences, though. As a non-native speaker of English, I apologize for any misunderstanding that I have made. Here is what I mean for dialectal difference:
[pronunciation, etymology and rest of the stuff here]

{{zh-noun|[stuff goes here]}}

# {{context|Mandarin|Min Nan|[other dialects]}} a [[declaration]] of love, a [[confession]] of one's feelings towards someone
# {{context|Cantonese}} an advertisement broadcast on television

{{zh-verb|[stuff goes here]}}

# {{context|Mandarin|Min Nan|[other dialects]}} to [[declare]] love, to [[confess]] one's feelings towards someone
--kc_kennylau (talk) 09:03, 13 March 2014 (UTC)
Thanks. I agree with you. BTW, apart from Cantonese, 告白 also means "advertisement" in some Mandarin dialects: eg. Changli, Hebei (Jilu Mandarin) - /kau55-43 pai13/; Ganyu, Jiangsu (Zhongyuan Mandarin) - /kau51-31 pei55/, and possibly in Taiwanese: [3]. Also I think the sense of "declaration of love" is used in Cantonese too (google:告白嘅). So the page can go something like

# [[announcement]], [[public]] [[announcement]]
# [[expression]] of one's thoughts; especially, [[declaration]] of love, [[confession]] of one's feelings towards someone
# {{cx|Cantonese|dialectal Mandarin|Min Nan}} [[advertisement]], [[ad]]


# to [[pronounce]]; to [[express]] oneself; especially, to [[declare]] love, to [[confess]] one's feelings towards someone

Wyang (talk) 12:18, 13 March 2014 (UTC)

This is a huge change. How will the existing entries be dealt? What about categories? We haven't yet resolved the issue with [*Category* in simplified/traditional script] categories. Don't get me wrong. I like where this is going. I am fully capable of creating entries in Wu, which is my native dialect, but it has been the amount of duplication that I have to deal with that puts me off creating any entries for it, so this change would be a big step forward. It will need to be carefully managed however. JamesjiaoTC 01:55, 14 March 2014 (UTC)
Yes it is a huge change, which is why it should not be made lightly. We have made such radical changes before, such as with merging Serbo-Croatian, which took several years before the last entries were merged, but nevertheless it worked out in the end. --WikiTiki89 02:16, 14 March 2014 (UTC)
Yes, the main thing is to have an agreement about the change. The problem with duplications and lack of dialectal templates makes it harder, not easier to add contents for, say Wu. It's looking very positive but I think it's desirable to have a vote. It's a pity @A-cai: is no longer active, he is a Min Nan native speaker and creator of many original Chinese templates. His view on dialects was opposite to Wyang's but he had a lot of valid points we should consider before merging. Can I throw in some counterarguments? What do we do with Dungan (Cyrillic), romanised Min Nan and Hui dialects (in Arabic script), current and future? --Anatoli (обсудить/вклад) 02:33, 14 March 2014 (UTC)
We could have entries such as Cyrillic/Arabic script form of ... that also contain all the lexical and dialectal information that is unique to the script. --WikiTiki89 02:41, 14 March 2014 (UTC)
Thanks. That sounds good. Any other opinions? I think these varieties could all go under Chinese L2 header as well. Do we need to change nesting for Mandarin translations? "Chinese:\Mandarin:" to just "Chinese:"? The dialects, if they get separate translation should continue to be nested, IMO. Please consider child#Translations or water#Translations (identical Sinitic translations without transliteration should be removed, IMHO, such as Gan: , Wu: ). --Anatoli (обсудить/вклад) 02:51, 14 March 2014 (UTC)
I think we should continue nesting, but only when the dialect differs in written form from the standard/common form. The standard/common form should be listed on the top level (i.e. after "Chinese:"). --WikiTiki89 02:59, 14 March 2014 (UTC)
That's what I meant about nesting but I think transliterated translations could be allowed (no strong opinion on this but users may complain about missing Cantonese, Min Nan translations, if they provide pronunciations). Instead of (@Beijing#Translations):
* Chinese:
*: Cantonese: {{t|yue|北京|tr=bak1 ging1}}
*: Mandarin: {{t+|cmn|北京|tr=Běijīng|sc=Hani}}
*: Min Nan: {{t+|nan|北京|tr=Pak-kiaⁿ|sc=Hans}}
We could have:
* Chinese: {{t+|zh|北京|tr=Běijīng|sc=Hani}}
*: Cantonese: {{t|yue|北京|tr=bak1 ging1}}
*: Min Nan: {{t+|nan|北京|tr=Pak-kiaⁿ|sc=Hans}}
With a language code "zh", not "cmn". --Anatoli (обсудить/вклад) 03:24, 14 March 2014 (UTC)
  • I'm going to sit out of this one. Although I'm one of the regular Mandarin editors here, my technical knowledge is close to zero, so I wouldn't be able to contribute much. However I do support the lumping of all Chinese languages under Chinese in theory. As the sandbox page currently stands, I don't like how the pronunciation boxes are so massive and keep pushing out to the right, it causes the whole page of my browser to elongate. There must be some way to condense this. I agree also that there should be separate headers for See also, Synonyms, etc. just like all the other languages' entries. Anyway, you guys are right - this is a huge proposal for change. Wyang has already contributed so much, I just hope that the transition can be relatively smooth - there are many other issues that need to be worked out first. ---> Tooironic (talk) 11:18, 14 March 2014 (UTC)
I like the unformatted list but in a collapsible format which would collapse/open the entire pronunciation section with one click. --Panda10 (talk) 17:47, 14 March 2014 (UTC)
Do you mean something like the format that is used on pecan? - -sche (discuss) 21:13, 14 March 2014 (UTC)
Yes, the pecan-style collapsing is fine. Otherwise, the pronunciation section will take up a lot of vertical space and the users will have to scroll a lot. --Panda10 (talk) 21:54, 14 March 2014 (UTC)
I like the unformatted list. It matches the format used in other entries in other languages. I have no strong feelings for or against collapsing it under a pecan-style {{rel-top}} the way Panda seems to propose. If one insists upon a tabular format, I think this is the best one (much less bulky than the other tables, while still showing the IPA and audio files, which the collapsed table hides). - -sche (discuss) 21:13, 14 March 2014 (UTC)
I see, thank you both. Other opinions? Wyang (talk) 03:52, 15 March 2014 (UTC)
Good efforts, Wyang. I like the unformatted list as well. Perhaps, it should be collapsible but only if there are more than two(?) topolects. The overwhelming majority of entries just have Mandarin, so it should be visible (expanded) by default, in my opinion. My second choice is "Collapsed blocks". A similar style is also used in Wikipedia when there are multiple languages involved. What are the opinions about transliterations in the header? --Anatoli (обсудить/вклад) 04:20, 15 March 2014 (UTC)
Thanks. How about this if there are more than two(?) topolects, and Unformatted list if there are less than three? Wyang (talk) 21:04, 15 March 2014 (UTC)
Looks great. I would add Zhuyin after comma for Mandarin. Pinyin is no longer linked. It's okey with me but this doesn't allow for accelerated Pinyin entries. --Anatoli (обсудить/вклад) 23:13, 15 March 2014 (UTC)
Pinyin is now linked. I would probably not worry about Zhuyin, since it is now unofficial in Taiwan. Wyang (talk) 23:12, 16 March 2014 (UTC)
Thank you. The role of Pinyin and Zhuyin is a bit different in Taiwan. Fortunately, standard Pinyin is now standard in Taiwan and used for Romanisation of Chinese characters as well but Zhuyin is used in education, to teach pronunciation at elementary schools and in dictionaries. The fact that children's book in Taiwan are published with Zhuyin, not Pinyin, makes it more important for foreigners as well, wishing to study the "Taiwanese way" or using Taiwanese resources. For educating foreigners both Zhuyin and Pinyin are used now. --Anatoli (обсудить/вклад) 23:39, 16 March 2014 (UTC)
OK... Zhuyin added now (User:Wyang/歷史). Wyang (talk) 02:26, 17 March 2014 (UTC)
I like how the pronunciation section is looking (unformatted, collapsible). I do have a question regarding the new etymology/hanzibox section. How would you deal with entries that have two (or possibly more) alternative traditional forms, such as 僥幸? JamesjiaoTC 21:01, 17 March 2014 (UTC)
@Jamesjiao: That's {{zh-hanzi-box}}, which should be reused, IMHO, which can handle alternatives (I would use Chinese comma , instead of the word "or". I also don't think the etymology of individual characters (above characters) is a sustainable feature (User:Wyang/歷史). --Anatoli (обсудить/вклад) 22:51, 17 March 2014 (UTC)
How about User:Wyang/僥幸? Wyang (talk) 04:30, 18 March 2014 (UTC)
May I request instant expand/collapse rather than the super slow sliding one? --WikiTiki89 21:48, 17 March 2014 (UTC)
@Wikitiki89: Which one is which? --Anatoli (обсудить/вклад) 22:51, 17 March 2014 (UTC)
I'm talking about the expand/collapse in the pronunciation box. Currently (at least for me), it slides in and out (it's not too slow, I was exaggerating above), but I think it should just expand and collapse instantly like our other expand/collapse boxes (translation tables, inflection tables, etc.). --WikiTiki89 04:12, 18 March 2014 (UTC)
I don't know whether it is possible to change it to a navbox-based template without significantly altering the appearance (to make it look like the one in pecan)... I seem to like this collaspsible layout better, despite the slower speed. If people prefer that layout, I can do that change. Wyang (talk) 04:30, 18 March 2014 (UTC)
There has to be some way to change the expand/collapse effect without affecting the layout. --WikiTiki89 04:51, 18 March 2014 (UTC)
Agreed, though not to my knowledge... Anyway, someone familiar with collapsible tables can change the appearance of the template later on. I would like to hear more feedback on the idea and potential effects of the changes. Wyang (talk) 03:37, 19 March 2014 (UTC)
Are you asking about homophones? You seem to be able to generate a list of homophones to add to the template (by default - no homophones). Perhaps you could allow templates to have parameters for homophones? My concern is that your suggested handling of homophones seems too complicated for an average editor with few technical skills. --Anatoli (обсудить/вклад) 03:42, 19 March 2014 (UTC)
Just asking about feedback on the topic of this discussion. (WRT homophones: I think keeping the information centralised is probably best. Using a homophone parameter in templates wouldn't be as efficient (eg. one would have to update 14 pages if one wants to add another homophone of 意義. It is not technically difficult to edit the information. Just edit the page Template:Pinyin-IPA/hom/yìyì which is in "Templates used on this page" in the edit page. This discussion belongs at User talk:Wyang#Putting a homophone field in the pronunciation header template) Wyang (talk) 04:00, 19 March 2014 (UTC)
We should invite @Liliana-60: to ask for permission to allow "zh" language code, ==Chinese== to be recreated and review what else needs to be done for conversion, as there seems to be a general agreement about merging all Chinese varieties. Dungan, romanised Min Nan, Xiao'erjing, nested translations will need to be addressed as well. --Anatoli (обсудить/вклад) 04:10, 19 March 2014 (UTC)

Can people please comment on the next steps in making it happen (unified Chinese, zh, ==Chinese==, all said above) and express opposition, if any? Some help would be appreciated. --Anatoli (обсудить/вклад) 00:29, 21 March 2014 (UTC)

I have just created a template for the vote: Wiktionary:Votes/pl-2014-04/Unified Chinese for your convenience. @Wyang:, @Jamesjiao:, @Tooironic:, @Hahahaha哈:, @Kc kennylau:, @Bumm13:. --Anatoli (обсудить/вклад) 05:10, 21 March 2014 (UTC)
As I'm not a native speaker of any Chinese variety, it's not a big deal to me if the varieties (technically they're not dialects as the varieties such as Cantonese, Min Nan, etc. have their own local dialects) get merged under a "Chinese" heading. That said, it's going to have to be done right and is going to be a huge task. Bumm13 (talk) 05:52, 21 March 2014 (UTC)
One thing I think is important is that we preserve parameters for specific romanization methods (Pinyin and Wade-Giles for Mandarin, Jyutping and Yale for Cantonese, etc.) used in the {{t+|abc}} templates or whatever we would end up using as that helps provide clear information about the readings used for our CJKV word entries. Bumm13 (talk) 06:05, 21 March 2014 (UTC)
Thanks Anatoli. With translation tables, the only change would be: one can add a translation directly after the * Chinese field after the change, without having to specify a variety. The varieties nested underneath still exist, with appropriate romanisations, even though the characters might be identical. For character entries, the romanisations for varieties will be put into one template in the pronunciation section (the same as the one in User:Wyang/歷史), with different readings separated by commas. Automatic conversion should be utilised to maximum extent. There are now a number of scripts available for various transliteration or pronunciation conversions, including Pinyin-IPA (and other Pinyin analyses, eg. tone markers to tone numbers, adding syllable spacings), Pinyin-Zhuyin, Zhuyin-Pinyin, Jyutping-IPA, wuu-IPA and a fairly crude Min Nan pronunciation template (Module:nan-pron). I just wrote a script for converting Jyutping to Yale for Cantonese (see Template:Jyutping-IPA), and will be working on Min Nan tomorrow or later. Wyang (talk) 13:24, 22 March 2014 (UTC)
I have finished the work on the Min Nan pronunciation template Module:nan-pron; please see Template:Min Nan-pron for examples. All subcomponents of {{zh-pron}} are now done. Wyang (talk) 06:18, 25 March 2014 (UTC)

OK. Thanks for all the comments above. It appears no one is in opposition to this proposal, and that our active editors of Chinese (Mandarin, Cantonese, Wu, etc.) entries are in overwhelming support for the move. The pronunciation template {{zh-pron}} is now ready as well. The vote Wiktionary:Votes/pl-2014-04/Unified Chinese will be open in two days - please express your opinion there. Thanks all. Wyang (talk) 01:52, 27 March 2014 (UTC)

The vote Wiktionary:Votes/pl-2014-04/Unified Chinese has started. --Anatoli (обсудить/вклад) 22:33, 30 March 2014 (UTC)

Thanks admins

Thanks to all the admins who didn't block me for violate the bot policy. Obviously, I've been running a bot using my own username for the last month or so. However, now "all the work" is done, so I'm essentially retiring from botting in Asturian. As is my trademark, I'll leave with a tiny batch of vandalism, and will see you all soon under a new username. WF --Back on the list (talk) 18:52, 13 March 2014 (UTC)

I have deleted all the fuckmyasses, fuckmyassáremos pages which were added. - -sche (discuss) 19:30, 13 March 2014 (UTC)
Good thing we didn't let him convince us to make him an admin again. --WikiTiki89 19:52, 13 March 2014 (UTC)

Proposed optional changes to Terms of Use amendment

Hello all, in response to some community comments in the discussion on the amendment to the Terms of Use on undisclosed paid editing, we have prepared two optional changes. Please read about these optional changes on Meta wiki and share your comments. If you can (and this is a non english project), please translate this announcement. Thanks! Slaporte (WMF) 21:55, 13 March 2014 (UTC)

Linked terms in black

Why are some term linked to from entries appearing in black and others in blue? See Xylopia for which the first species is black and linkable and the others blue and linkable for me. DCDuring TALK 00:04, 15 March 2014 (UTC)

I don't see it. Do you notice this problem anywhere else? DTLHS (talk) 00:14, 15 March 2014 (UTC)
And what CSS class is around the link? DTLHS (talk) 00:15, 15 March 2014 (UTC)
I notice it in many places, but this one has: class="mediawiki ltr sitedir-ltr ns-0 ns-subject page-Xylopia_aethiopica skin-vector action-view vector-animateLayout" and a following element, with normal appearance has class="extiw". Another item with the black-seeming font is <a class="stub". DCDuring TALK 01:15, 15 March 2014 (UTC)

{{audio}} to categorize entries?

According to this Template_talk:audio#Categorisation this has been brought up (or at least was meant to?). I think terms with audio links by language categories are very important. Right now all of the members in them are by "hard coding" (actually typing out the category in the entry's body) which I think is very unwieldy.

Lua (among other things) provides the functionality to compare the first x chars of a string to a set of values (correct me if I'm wrong). what could be the chances of using the language codes used in the names of all of pronunciation files to define a category "X terms with audio links" for that entry? E.g., de-Amerika.ogg --> de --> Category:German terms with audio links. Neitrāls vārds (talk) 05:11, 15 March 2014 (UTC)

We could also just add a lang= parameter to the template. That's how we do it with the others too. —CodeCat 14:21, 15 March 2014 (UTC)
Could you add that parameter? ...and then a bot could add the lang= parameter to existing audio templates based on the L2 header they are under? Realistically how complicated would it be to have a bot do this, could it be hooked up to some existing bot? I have some 800 files waiting on commons (I am not looking forward to actually having to add those, lol) but if I get around to doing that I could provisionally start specifying a lang=lv parameter (although it's not there yet.) Neitrāls vārds (talk) 04:01, 16 March 2014 (UTC)
It would be trivial to do if the template were to add entries lacking the parameter to Category:Language code missing/audio. But I'd prefer it if others gave their assent before changing the template. —CodeCat 04:07, 16 March 2014 (UTC)
Editors who are creating/adding pronunciation files would probably be the ones most interested in this proposal. User:Panda10 is one editor who I've noticed has been working on pronunciations. Perhaps there are more. Neitrāls vārds (talk) 21:24, 16 March 2014 (UTC)
I second the request of Neitrāls vārds and support adding a lang= parameter to the template. --Anatoli (обсудить/вклад) 22:19, 16 March 2014 (UTC)
I've also been thinking lately that adding lang= to the template is a good idea. @Neitrāls vārds: Re "Editors who are creating/adding pronunciation files would probably be the ones most interested in this proposal": I disagree. I think that editors who would have been recording and adding audio if it were easier to do are the ones who benefit the most. --WikiTiki89 22:40, 16 March 2014 (UTC)
Thanks for your input. @Wikitiki, it's actually rather uncomplicated, imo, when I had the last go at it it came out to 600+ files (reading out to commons upload) in less than 6 hrs (ofc that might be overdoing it a little 'cause I started to seriously second-guess every file "what's up with the weird intonation, is this tone right," etc., etc.) Neitrāls vārds (talk) 22:50, 16 March 2014 (UTC)
Yeah, but for me, I feel no strong need to upload audio. There are times that I just want to add audio to one or two words, but I don't want to go through the trouble of downloading an audio recorder that supports OGG or whatever. It's more of a psychological barrier than a real one, and I expect the same is true for many editors. --WikiTiki89 00:22, 17 March 2014 (UTC)
This project to develop a gadget to allow people to record and upload files directly from Wiktionary pages could help with that. - -sche (discuss) 00:55, 17 March 2014 (UTC)
Hahaha, I thought this was that discussion! That's why I was saying this. I just realized that this is actually about audio templates. --WikiTiki89 02:48, 17 March 2014 (UTC)
I've added some code to add entries to Category:Language code missing/audio if the lang= parameter is not present. That would currently include all entries that use {{audio}}. The parameter itself doesn't do anything yet, I only made the change to give the category some time to fill up.
I did notice something else about the template though. It checks for the presence of the first parameter (the sound file name), and only actually displays anything if that is present. But the first parameter doesn't default to it being empty, so {{audio}} would display something, while {{audio|}} would display nothing. This seems rather strange, so I wonder if that could be removed? Really, the first parameter should always be required, so leaving it empty should be an error; it shouldn't just cause the template to show nothing silently. —CodeCat 01:09, 17 March 2014 (UTC)
The {{audio-IPA}} template already has the lang= parameter and it's functional. I checked a few entries in Category:Terms with audio links by language and they all used the {audio-IPA} instead of {audio}. So adding lang= to {audio} makes sense. While the 'terms with audio' category appears useful, my preference really is the index with immediate audio links, so users do not have to go to each entry to listen to the audio. As an example, see Index:Hungarian/d. Click the blue arrows and it will play the sound file. Unfortunately, the index is no longer refreshed. --Panda10 (talk) 13:06, 17 March 2014 (UTC)
I'm OK with the idea of adding lang= to audio and making it obligatory. But what about all the articles with audio files without the lang= parameter? Is someone going to run a bot on these to add it? (I'm thinking specifically of Latvian entries: would someone here be willing to run a bot adding lang=lv to the {{audio}} template in Latvian entries, in case it is not there yet?) --Pereru (talk) 00:21, 18 March 2014 (UTC)
Yes, that should be very easy to do by bot, since the language can be determined from the section. --WikiTiki89 04:13, 18 March 2014 (UTC)

So, could we add the param itself to the template? My take on it would be something like this (not sure about the invoke part though): {{#if: {{{lang}}} | [[Category:{{#invoke:languages/templates|lookup|{{{lang}}}|names}} terms with audio links]] | [[Category:Language code missing/audio]] }}

And then one could start making any red categories there are (which would probably pop up right to the top of Special:WantedCategories) as that would be done manually either way, I think. Neitrāls vārds (talk) 17:36, 25 March 2014 (UTC)

Ok, I've made the change. —CodeCat 20:24, 26 March 2014 (UTC)
Would you be able to analyse the template usage by the prefix and insert "lang=". It would be much less time-consuming to remove the wrong ones than to add this parameter on thousands of entries. --Anatoli (обсудить/вклад) 21:37, 26 March 2014 (UTC)
MewBot is already running on this. It's using the cleanup category I added earlier, and it determines what code to use based on the language section that the template appears in. So if it sees {{audio}} in a L2 section called "Russian", it converts that to "ru" and then adds "lang=ru". —CodeCat 21:51, 26 March 2014 (UTC)
I see, thanks. --Anatoli (обсудить/вклад) 21:56, 26 March 2014 (UTC)

Recent edits to US location entries

I've just spent close to an hour systematically undoing unhelpful edits User:Pass a Method made to entries on US states, cities, etc. These included listing North America and Western Hemisphere as holonyms. Why stop there? Why not pan out for the widest angle possible and include Earth, Solar System, Orion Arm, Milky Way, Local Group, Universe? And how is adding such detailed geographical information to entries beneficial from a linguistic perspective? -Cloudcuckoolander (talk) 22:59, 15 March 2014 (UTC)

Thank you very much. I undid some of them, but didn't have time to go through all of them. I was going to bring it up here, but we were already discussing it at the WT:Information desk#homonyms. Anyway, you seem to be the only person who actually did anything about it. --WikiTiki89 23:30, 15 March 2014 (UTC)
Thats an invalid comparison. I myself was unaware about the existence of some of these holonyms until a few years ago. Therefore I consider it to be educational since it offers one a varied vocabulary not only for others but myself too - hence i'm dissapointed with the reverts. Plus they deal with the ambiguity that is inherent in the language of some place names. You will notice that i usually ignore holonyms in matters unrelated to geography. This is because only in geography have such holonyms become of critical contemporary importance with intergovernmental organizations, multiregional international organizations, and supraorganizations being organized on the basis "detailed geographical information". I consider my addition to be akin to adding "Balkans" to the Bulgaria page. Nonetheless, i am willing to compromise and be less broad with a sense of proportion next time since that seems to be the main objection. Pass a Method (talk) 00:00, 16 March 2014 (UTC)
Words like Usonia are very rare in the real world. Most English speakers don't even know that such a word exists. Teaching people that Usonia is holonym of, say, New York is not education, but misinformation. --WikiTiki89 00:12, 16 March 2014 (UTC)
Wiktionary is a dictionary, not a popularity contest. The standard method for rarer words is to tag it with the word "rare". Pass a Method (talk) 00:28, 16 March 2014 (UTC)
And did you tag it with the word rare? Essentially the bigger problem is that you are adding these instead of adding words such as United States, which are much more useful and much more common. The question of whether we even need holonyms on these pages is also controversial, as this is already covered by our system of categories. --WikiTiki89 00:33, 16 March 2014 (UTC)
I was planning to. Pass a Method (talk) 00:47, 16 March 2014 (UTC)
Even if we agreed that rare words should be listed on pages like these, then you still need to add the common words first, not plan to later. But I think very few people would agree that Usonia belongs on a page like New York. --WikiTiki89 01:02, 16 March 2014 (UTC)
You may think everyone should know terms like Usonia- and they should. The way to accomplish that is by including them in the synonym sections for US, USA, America, etc. Wiktionary is a descriptive dictionary, so it follows usage.Your favored terms are simply not used in connection with the entries you added them to in any significant amount.
There are far too many -nyms applicable to most entries for us to include all of them, so there's some level of selection involved. However much you may wish to promote knowledge of those obscure words, there are also others that may want just as much to include similar references to Al-Qaeda and terrorist in every entry remotely connected with Islam. Allowing one person's personal taste to prevail over the whole of Wiktionary is a recipe for divisiveness, partisan ugliness and edit-warring when the inevitable backlash sets in. Neutral point of view is one of our fundamental principles for this very reason.
Before you start to claim that what you're trying to push is neutral, you should consider the difference between not taking sides and trying to equalize sides. If the reality is that the vast majority of usage is biased or even bigoted, we have to describe that reality as it is, not give extra weight to minority usage that opposes the biases. Chuck Entz (talk) 01:15, 16 March 2014 (UTC)

See, this is why I generally do not go on edit‐sprees unless somebody preapproves it. There was an Ossetian entry where I changed the scripts to avoid script‐mixing, but I only did one, then asked about it in the Beer Parlour before doing any more. Turns out that their script is supposed to be Slavic with a Latin letter.

Personally, I’d rather he or she be asked to revert his or her own modifications if people didn’t agree with them. Getting pissed off at somebody over work that was never obligatory for you is considerably unfair. --Æ&Œ (talk) 01:17, 16 March 2014 (UTC)

@Pass a Method — Wiktionary is a dictionary. It's not our place to try to function as a sort of text-based atlas. There's an argument for listing "United States" as a hypernym of each of the U. S. states. But listing "Western Hemisphere" and "North America" is excessive, and opens the door to the listing of even larger, farther-removed regions to which, say, Vermont belongs (e.g. Orion Arm). It's also redundant, because the definition of, say, Vermont should be written in such a way as to communicate the fact that it's located in the United States, and the entry of the United States to communicate that it's in North America, and so forth. -Cloudcuckoolander (talk) 01:56, 16 March 2014 (UTC)

Liaison in French

How would we express that "vous" would be pronounced /vuz/ when followed by a word beginning by a vowel or a mute h? Here are some of my suggestions:

  1. IPA(key): /vu(z)/
  2. IPA(key): /vu/, (in liaison) IPA(key): /vuz/
  3. IPA(key): /vu/, (before words beginning by a vowel or a mute h) IPA(key): /vuz/

--kc_kennylau (talk) 10:49, 16 March 2014 (UTC)

All three of those variants completely ignore the fact that the /z/ is pronounced in the first syllable of the next word. For example, vous avez is pronounced /vu.za.ve/. --WikiTiki89 12:47, 16 March 2014 (UTC)
@Wikitiki89: Please make your suggestion. --kc_kennylau (talk) 14:43, 16 March 2014 (UTC)
French liaisons follow rules: all words beginning with a vowel sound can make a liaison with the previous word if it ends with a consonant. It is something that should be learned in general, not detailed in every word, except for the words that begin with a h which may be either (des héros = /de e.ʁo/, not /de.z‿e.ʁo/ ; but des habitudes /de.z‿a.bi.tyd/ not /de a.bi.tyd/) and other exceptions. Dakdada (talk) 14:59, 16 March 2014 (UTC)
But vous does not end with a consonant, it's /vu/ as shown above. From a phonological point of view, the /z/ is unpredictable and just needs to be memorised. French grammar actually turns out to be vastly different from what we're accustomed to, if you take the pronunciation as the starting point. (Most nouns turn out to have no plural inflection anymore, for example; and for adjectives the feminine is actually the base form.) —CodeCat 15:17, 16 March 2014 (UTC)
I have always wondered if liaison in French would exist to the same extent if it weren't for the written language (i.e. spelling pronunciations). We do know that colloquial French has less liaison than formal French, and that Quebec French has less liaison than French French (but Quebecois are no less literate than the French). --WikiTiki89 18:21, 16 March 2014 (UTC)
Similar question for (American) English: but is pronounced /bət/ (let's say), but /bəɾ/ before a vowel or /h/. We don't mention that difference in the entry. Should we? Certainly not with slashes. Maybe with brackets, maybe not. But I don't know enough about French to intelligently comment on the analogue there.​—msh210 (talk) 17:12, 16 March 2014 (UTC)
The change that you mention is predictable. If you know the pronunciation of the word in isolation, then you can predict the lenition of the final consonant depending on the next word. But for French that doesn't work; you have to learn the liaison consonant separately for every word. You can't predict /vu.z/ from /vu/. —CodeCat 17:16, 16 March 2014 (UTC)
A better comparison would be a language with final devoicing, for which we indicate the devoicing in our transcriptions, but devoicing does not happen when followed by a word that starts with a vowel (let's ignore the case that the following word starts with a voiced consonant, because that could potentially voice previously unvoiced consonants). --WikiTiki89 18:21, 16 March 2014 (UTC)
That too, but I should note that not all languages with devoicing work that way. In Dutch for example it's actually morpheme-final devoicing, and it happens even to the first part of a compound. badwater (bath water) is [ˈbɑtˌʋaːtər] in normal speech. You rarely hear it pronounced with [d], but that depends on dialect and also implicitly reflects a kind of "blurring" of the morpheme boundary, and therefore of the syllable boundary. It's also relatively uncommon with stops, a bit more common with fricatives. —CodeCat 18:34, 16 March 2014 (UTC)
You can also consider the inflected forms. I'm sure baden is pronounced with a /d/, right? But that is also easy to account for as a voiced inflection paradigm and an unvoiced inflection paradigm. In standard Russian, prepositions and some particles are the only words that regularly retain their voicing before words that start with a vowel; for example над (nad), even though we transcribe it as /nat/ is often actually pronounced /nad/ even in isolation, as attested in the audio sample in our entry. --WikiTiki89 18:46, 16 March 2014 (UTC)
(In reply to CodeCat, 17:16, 16 March 2014 (UTC).) Is it predictable from the spelling, though, and therefore from the entry?​—msh210 (talk) 05:22, 17 March 2014 (UTC)
Usually but not always. For example, et (and) never causes liaison. While es (are) causes a /t/ rather than a /z/. --WikiTiki89 05:33, 17 March 2014 (UTC)
(in reply to msh210 17:12, 16 March 2014 (UTC)) This is a feature exhibited by some accents, in a given context /t/ would be realised as [t], [ɾ] or [ʔ] depending on the accent. But this is not phonetically relevant, as well as predictable. Besides, the liaison may be mandatory, forbidden or optional in a given context. Whether an optional liaison is made depends on the speaker, the level of speech –the more formal, the more liaisons are likely to be made–, and the speaker’s mood; it can even, along with elision and prosody, change the meaning, making the statement solemn rather than anodyne. So it’s way more complex than having allophones. – Pylade (talk) 11:28, 27 March 2014 (UTC)
Question book magnify2.svg
Input needed: This discussion needs further input in order to be successfully closed. Please take a look!
Since most of the time liaison is predictable from the spelling, I don't see a compelling need to regularly indicate it in the pronunciation section. For exceptional cases, we can use usage notes. --WikiTiki89 10:14, 19 March 2014 (UTC)
Does this speak your mind? --kc_kennylau (talk) 10:45, 19 March 2014 (UTC)
Yes. --WikiTiki89 11:53, 19 March 2014 (UTC)
As Dakdada said, rules regarding liaisons are complex, so the third option is unacceptable. The second option could do, with /vu.z/ to render the fact that the last consonant is pronounced in the first syllable of the next word. But I would rather agree with Wikitiki89 and just have a note for exceptions. – Pylade (talk) 22:11, 26 March 2014 (UTC)
If we list only exceptions, then the majority of entries won't have this information in IPA form. It can be predicted from spelling, maybe, but that's only if you know how to pronounce French words to begin with. So it becomes a bit of a circular argument at that point. I think we should include liaison in our pronunciation sections in some form. —CodeCat 22:18, 26 March 2014 (UTC)

Request for Comments: c: link prefix for Wikimedia Commons

There is a cross-wiki discussion in progress as to whether c: should be enabled globally as an interwiki prefix for links to the Wikimedia Commons. As your wiki has several pages or redirects whose titles begin with "C:", they will need to be renamed if this proposal gains consensus. Please take a moment to participate in the discussion. Thank you.

What this means: Our entry at c:a will instead automatically take people to commons:a once software support for the prefix is implemented. We have a couple of suggested options, like moving to a fullwidth rather than a halfwidth colon, or moving to Appendix:Unsupported titles. TeleComNasSprVen (talk) 06:09, 17 March 2014 (UTC)
Moving to a subpage of Unsupported titles seems like the best idea. (I thought entries with colons in their titles were already unsupported titles.) - -sche (discuss) 22:56, 17 March 2014 (UTC)
On further thought maybe this should have been posted to Wiktionary:Requests for moves, mergers and splits instead. TeleComNasSprVen (talk) 16:40, 18 March 2014 (UTC)
I think it was good to post it here. It's an invitation to participate in a discussion that affects not just our wiki but other wikis, and which, if it results in a consensus to use "c:" as a shortcut to commons, will force us to move "c:a". - -sche (discuss) 19:02, 18 March 2014 (UTC)
Appendix:Unsupported titles#Unsupported prefix is what we've been doing with these thus far, including for Swedish terms (as c:a is). I see no reason to change, nor, indeed, reason we should object to the interwiki prefix c: on preexisting-enwikt-entry grounds.​—msh210 (talk) 03:14, 18 March 2014 (UTC)
  • I just want to stress that replacing the color with a fullwidth colon is a very, very, very, very bad idea. --WikiTiki89 04:14, 18 March 2014 (UTC)
    • Maybe we should add a suggestion to use fullwidth "c" for the Commons prefix. Keφr 19:17, 18 March 2014 (UTC)
The point of the shorter prefix is that it is easy to type.
The c: prefix could be used as a shortcut for our Citations: namespace. There isn’t much demand for it, but it would be nice to promote it by making it as easy to use as possible. But if it to be a global prefix, we may as well be compatible with everyone else. Michael Z. 2014-03-31 02:50 z
But if there's no demand for it, why would we? It seems like the trouble we'll have of fixing it is more than the benefit we'll ever have from the shortcut... —CodeCat 02:55, 31 March 2014 (UTC)
The Request for Comment ended; there was judged to be consensus that "c:" should be made a shortcut for "commons:". The devs can be expected to implement that consensus. It would be foolish to start using "c:" for something else locally, since such uses will break once the devs implement "c:" → "commons:". - -sche (discuss) 03:51, 31 March 2014 (UTC)

Redirects in Wikisaurus

I would like to start using redirects in Wikisaurus because, the way that it is now, I have to search through synonyms of the word I'm looking for to find the page with synonyms of the word I'm looking for. For example, if I want synonyms for ceiling, it is quite possible that a page for ceiling does not exist, and the synonyms that I'm looking for are on the roof page. However, I cannot do this at present (without messing up the system) because if (using the previous example) I make ceiling a redirect that sends people to roof, then on the roof page the word ceiling has "[WS]" next to it, which links to the Wikisaurus page for ceiling. (Wikisaurus does this automatically.) But then the link to the ceiling page is completely useless and misleading because it redirects to roof, which is the page that the link is on.

I don't know what would be the best course of action here, but I think something should be done about this if possible. Any ideas? Thanks. —TeragR disc./con. 03:15, 18 March 2014 (UTC)

You made redirects at Wikisaurus:soda and Wikisaurus:pop to Wikisaurus:soft drink. These were really bad: pop more often than not refers to the music genre, while soda may refer to something quite unsuitable for drinking. Wikisaurus pages should have unambiguous titles (or at least titles with few closely related senses). A good thing to do would be to put a link to Wikisaurus:soft drink at soda, for example. Keφr 18:45, 18 March 2014 (UTC)
Those do not need to be red links, though. We could very well have the equivalent of a disambiguation page at Wikisaurus:soda giving readers the option of going to soda, sodium bicarbonate, Wikisaurus:soft drink, or w:soda. bd2412 T 17:29, 20 March 2014 (UTC)
I think we have it already — at [[soda]]. Keφr 17:59, 20 March 2014 (UTC)
A reader could conceivably type Wikisaurus:soda, because they are looking for a synonym of some meaning of "soda". bd2412 T 21:17, 20 March 2014 (UTC)
A reader could also conceivably type Help:What are the synonyms of soda?. We don't have to account for every situation. --WikiTiki89 21:40, 20 March 2014 (UTC)

Suffix category name conflict

The Hungarian -ika suffix can form both diminutive nouns and can be the end of regular (non-diminutive) nouns. I'd like to separate these two groups but the standard category name for both is Category:Hungarian nouns suffixed with -ika. What would be a good naming convention to resolve this with two categories? --Panda10 (talk) 20:58, 19 March 2014 (UTC)

I’ve ran across the same problem in other languages. I propose that in such cases we add a short gloss to the category name, as Category:Hungarian words suffixed with -ika (diminutive), that they be subcategories of glossless category, and that the alt1= parameter be used with {{suffix}} to suppress the gloss. — Ungoliant (falai) 23:51, 19 March 2014 (UTC)
But how do we prevent entries from being added to the "main" category? —CodeCat 00:26, 20 March 2014 (UTC)
They will have to be fixed and the user who added it told about the new system. — Ungoliant (falai) 00:45, 20 March 2014 (UTC)
Better if there's a system built in to the template so people can't just make up their own names (any system that can be abused will be abused). DTLHS (talk) 00:52, 20 March 2014 (UTC)
Separating categories is very important. Right now the derived terms in -իկ and many other entries misleadingly list all the suffixed terms for both suffixes. I propose to separate the categories by adding a superscript number, so Category:Latin words suffixed with -ceps¹, Category:Latin words suffixed with -ceps², etc. The number can be added by {{suffix}} with a special parameter. Short glosses instead of numbers are not a good idea, as in some languages there are unproductive suffixes of unknown function which you can't describe by a gloss, e. g. -որ (-or). --Vahag (talk) 07:53, 20 March 2014 (UTC)
Superscript numbers are too vague, they don't really mean anything. —CodeCat 13:51, 20 March 2014 (UTC)
Some other categories/suffixes which have the same issue are mentioned in the Tea Room. They are: Latin -ceps, English -er (although we may not want to split it), and English -est. - -sche (discuss) 00:34, 20 March 2014 (UTC)
Cf. user:msh210/Sandbox#adjectives ending in 'ly'.​—msh210 (talk) 15:34, 21 March 2014 (UTC)
How about distinguishing "ending in" from "suffixed with"? Sometimes a common ending is not an independent suffix. We could add a type=ending or altcat=ending to the {suffix} template to indicate that a Hungarian nouns ending in -ika is needed instead of Hungarian nouns suffixed with -ika. I'm not saying this will solve every situation in question, but it would be helpful in many cases.--Panda10 (talk) 14:35, 20 March 2014 (UTC)
Categories for words "ending in" would be just statistical, rather than etymological. So we'd need a whole new category tree for those, along the lines of "terms spelled with (character)" categories. —CodeCat 15:37, 20 March 2014 (UTC)
Ok, then let's return to the idea of adding a short gloss to the category name:
  • There would be the standard category Hungarian nouns suffixed with -ika as today.
  • There would be an altcat1 (and altcat2 if needed) parameter in {suffix} to allow glosses.
  • The allowable glosses would be stored in a list somewhere that the template would check.
  • If the user provides altcat1=diminutive and if diminutive exists on the allowable gloss list, then the entry would be placed in the Hungarian nouns suffixed with -ika (diminutive) category but not in the regular category.
  • If the user provides a gloss that is not on the allowable gloss list, the word would go to the regular category.
  • The regular category might have to be monitored for incorrect entries, but I don't see this as an issue. --Panda10 (talk) 18:01, 20 March 2014 (UTC)
Perhaps we can set up something analogous to sensids for the categories to attach to. This ID would be appended to the category name "Category:Hungarian nouns suffixed with -ika ([whatever the id is]) , and would be supplied as an optional parameter to the suffix template. Contributors would know which one to use by checking the suffix entry.
This would be especially useful if it could be set up so the suffix template's module could check whether the suffix entry had at least one disambiguation id for the language in question and flag uses of the template that had one or more ids available but didn't use them.
I don't know the technicalities well enough to even guess how it could be done but maybe someone else here will. Chuck Entz (talk) 08:25, 21 March 2014 (UTC)

Template:only in with links to deleted ISO-639 appendices

Some ISO-639 language codes were listed first at rfv, then rfd (see talk:jv). It was finally decided to convert them to "only in" links to the appendices for ISO-639 codes. Now that the appendices have been deleted, we have a bunch of entries with basically nothing but explanations of why there's no entry and a referral to a non-existent appendix (see Special:WhatLinksHere/Appendix:ISO_639-1 language codes and Special:WhatLinksHere/Appendix:ISO_639-3 language codes).

Should we:

  1. delete the entries?
  2. change them to link to the WP disambiguation page for the entry name?
  3. change them to link to the appropriate WP list of ISO 639 codes?
  4. change them to link to WT:LL?
  5. do something else?

If we decide to delete the entries, there's a related issue re: {{also}} in some of these entries and whether to remove the entry names from the same template in other entries. Chuck Entz (talk) 20:59, 21 March 2014 (UTC)

If we keep them, option 3 seems best. —CodeCat 21:21, 21 March 2014 (UTC)
We should restore the two appendices. They were never properly deleted. It was just CodeCat's decision which was not voted and was far from our consensus. --kc_kennylau (talk) 22:30, 21 March 2014 (UTC)
She proposed deletion, in the appropriate forum, and waited almost eight months for feedback. During that time, exactly one other contributor commented on the proposal, and that contributor expressed agreement. In what sense is that "not voted" or "far from our consensus"? What more do you want her to have done? —RuakhTALK 23:03, 21 March 2014 (UTC)
Why would one delete something relatively harmless with so many inbound links without having made arrangements for replacement of the links? Why ever delete based solely on one's own opinion? There probably continue to be a steady stream of such "tidyings" to this day at {{WT:RFDO]]. DCDuring TALK 23:13, 21 March 2014 (UTC)
WT:RFDO exists because it's not good to delete things just based on one's own opinion. And that's why I submitted it there. If nobody actually submits their opinions after I ask them for it (except for that one person), then how is that my fault? —CodeCat 01:34, 22 March 2014 (UTC)
Whatever omissions or misnomers our appendix may have had, I thought our protocol for deletions of linked-to pages would be to first correct the links, the equivalent of depopulating a category. If that isn't our practice it should be, as our current practice makes the perfect the enemy of the better-than-nothing (The now-deleted appendix was superior to redlinks in, I'd guess, more than 98% of cases.) DCDuring TALK 20:24, 22 March 2014 (UTC)
As noted in the deletion discussion, our appendices only duplicated content that Wikipedia (and the ISO) already had, and our appendices were woefully out-of-date compared to Wikipedia's. Deleting them was the right thing to do. Changing pages which link to the appendices so that they link to Wikipedia's appendices instead is the best solution, but as a stopgap until all the relevant pages can be updated, we could make our appendices redirect softly to Wikipedia's. - -sche (discuss) 02:23, 22 March 2014 (UTC)
Special:WhatLinksHere/Appendix:ISO_639-1 language codes, Special:WhatLinksHere/Appendix:ISO_639-2 language codes, Special:WhatLinksHere/Appendix:ISO_639-3 language codes and Special:WhatLinksHere/Appendix:ISO_639-5 language codes have now been orphaned, except from a few old talk pages and sandboxes. - -sche (discuss) 06:47, 28 March 2014 (UTC)

User:Pass a Method

Throughout their entire history of editing, this editor has demonstrated what can be described as either notoriously bad judgement or POV pushing. Highlights:

The "good" edits are negligible, and almost always intertwined with some form of the above. Interacting with the user seems to be a waste of time: they are neither improving nor taking hints to leave. I propose an indefinite block. Keφr 08:52, 22 March 2014 (UTC)

I think we should give him a sort of official warning, clearly stating that if he does not start using better judgment with his editing then he will be blocked. And the block should not be immediately indefinite, we should start off by blocking him for a month, then a year, and only then a permanent block. --WikiTiki89 08:59, 22 March 2014 (UTC)
I doubt it will help. If he wants to push a POV, he will just shift to new, more stealthy ways of doing it; if he is incompetent, he is probably also incapable of recognising (a lack of) competence in himself and others; telling him to "start using better judgment" will fall on deaf ears as he will not understand what we want from him. Month-long blocks have already been handed, they did not help much either. Keφr 09:21, 22 March 2014 (UTC)
If it falls on deaf ears, then we will block him; that's the whole concept of a warning. If month-long blocks have already been tried, then I guess we can skip right to a year-long block. --WikiTiki89 09:37, 22 March 2014 (UTC)
I have taken heed of the warning. Pass a Method (talk) 09:51, 22 March 2014 (UTC)
I would say they're more willing to change than you give them credit for. The main problem has been that they tend to honor the letter rather than the spirit of the objections, and tend to slip back into their old ways after a while. They also tend to shift into other types of entries and make analogous, but different types of errors.
This reminds me a lot of Gtroy/Acdcrocks/Luciferwildcat. In both cases they identified real problems in lack of coverage for controversial and/or unpopular subjects, but both exercised poor lexicographic judgment and a sort of counter-prescriptive wishful thinking: terms should exist or have the meanings they thought they should because it made sense in their way of thinking, and the only reason they weren't attested was censorship or bias in the coverage of available sources.
The sad part is that a clear-headed search for religious and political biases in our entries was and is a good idea, and we needed and still need better coverage of terms that go against the biases of our contributor base. I just wish they had approached it with more common sense so that they wouldn't have wasted so much of their truly prodigious effort on bad edits.
For all the time wasted in rfv and rfd of stuff that should never have been added, and all the checking needed to weed out the pov and the phoniness they've added to so many entries, they have made some real contributions in increasing our coverage of religious terms outside of mainstream Judaism and Christianity. Chuck Entz (talk) 20:39, 22 March 2014 (UTC)

Accents in Greek transliteration

I've always wondered why accents are not included when transliterating Greek. We include them for Cyrillic, so why not for Greek? I think we should have them for Greek too. —CodeCat 18:48, 22 March 2014 (UTC)

If you mean Modern Greek, then AFAIK accents are included. Or do you mean Ancient Greek? — I.S.M.E.T.A. 19:10, 22 March 2014 (UTC)
Ancient Greek: cf. άνθρωπος (ánthropos) vs. ἄνθρωπος (anthrōpos). I thought we discussed it a while back and decided against it because it would be so complicated in some cases, e.g. having to transliterate ῇ as ẹ̄̂. —Aɴɢʀ (talk) 19:27, 22 March 2014 (UTC)
My main concern in the past was the difficulty of producing combinations of macrons and accents. Since we have a module to do the heavy lifting now, I think it's worth reconsidering. Chuck Entz (talk) 19:32, 22 March 2014 (UTC)
As I'm sure everyone is tired of hearing, I support a simple, intelligible transliteration for Ancient Greek (and all other languages that have native script support), preferring to demonstrate nuanced detail in the actual script and in the IPA of the pronunciation sections. -Atelaes λάλει ἐμοί 19:46, 22 March 2014 (UTC)
I think the suggestion user:Gilgamesh gave at Wiktionary talk:About Ancient Greek#Accents in transliteration is pretty workable. He gives these examples: η=ē ή=ḗ ὴ=ḕ ῆ=ê ῃ=ēi ῄ=ēí ῂ=ēì ῇ=ēî, ηι/ηϊ=ēï. The circumflex accent is indicated with ^ but this implies vowel length, so the macron is omitted. Iota subscripts are simply treated as diphthongs, which is what they were originally. —CodeCat 21:01, 27 March 2014 (UTC)
We should have Ancient Greek accents. --Anatoli (обсудить/вклад) 22:33, 27 March 2014 (UTC)


It would be great if someone helped with the anusvara (nasalisation symbol) problem in all Indic languages. It affects the pronunciation/transliteration of the vowel and m/n consonants, depending on what follows, for the lack of a better way, it's always "ṁ" but should ã (or other vowel with nasalisation), vowel + "ṅ", vowel + "m" or vowel + "n". Such modules are Module:te-translit, Module:si-translit, Module:hi-translit, etc. The native symbols in any of these modules don't matter, it's the quality of the consonant or lack of it that matters. @CodeCat:, could you help? The nasalisation and its transliteration is described here - Wiktionary:Hindi_transliteration#Nasalisation but I can describe more and make test cases if you agree. Just need to fix one module, the rule is applicable for all. --Anatoli (обсудить/вклад) 22:33, 27 March 2014 (UTC)

Category:Gerunds by language

As w:Gerund indicates, the term is applied to a wide variety of verb-like words that don't necessarily have much in common otherwise. In English, Dutch and German, it's synonymous with "verbal noun", while in certain Romance languages it indicates an adverb. Because the term "gerund" is so ambiguous, I don't know if it's a good idea to have this category. It would be better to put the entries into other categories, with a name that is more descriptive for the words in question. —CodeCat 21:10, 22 March 2014 (UTC)

You might be right. I was excited to add the boiler (that I didn't know existed) to Category:Livonian gerunds but now I have noun category at the bottom. Livonian gerunds are inessives of infinitive and don't decline for anything else (being able to take the inessive is nominal-like behavior but they should be able to take on other cases too if they were nominals but if they would they wouldn't be gerunds anymore.) Tangentially Latvian -ot are also strictly verbal/adverbial (as in not nominal). Neitrāls vārds (talk) 14:02, 25 March 2014 (UTC)

Various votes have started or ended

- -sche (discuss) 20:33, 23 March 2014 (UTC)

Easter Competition 2014

This is to announce the forthcoming Easter/Spring competition, which will be open to all users, and will take the form of a crossword.

SemperBlotto (talk) 19:26, 24 March 2014 (UTC)

As always, I will offer all technical support. --kc_kennylau (talk) 23:15, 24 March 2014 (UTC)
  • Only two competitors so far. Does nobody else have a desire to win a fabulous prize? SemperBlotto (talk) 22:17, 27 March 2014 (UTC)

Finnish accusative

There has been some disagreement between me and User:Hekaheka over the Finnish accusative case. In the past, our templates always showed at least two forms in the accusative singular box. One that had a form identical to the genitive singular, and one that had a form identical to the nominative singular. When I converted the templates to Lua, I removed the second one, because in my view this isn't an accusative case form at all, and the two aren't just interchangeable, they have separate uses. I wonder why the form that looks like the nominative is called an accusative at all. It's used primarily with imperatives, which have no explicit subject. So it's not a real accusative form in my opinion; it's just the nominative used in the role of imperative direct object. It's not uncommon for languages to use different cases in limited syntactic roles, than they would otherwise. In Slavic languages for example, negative verbs take an object in the genitive, but that doesn't mean that the genitive form should be included under "accusative". Likewise, languages like Icelandic or Latin may have a dative object, but that doesn't make the dative form accusative.

Hekaheka has argued that grammars and standard bodies have stated that there is no accusative case at all, and only a genitive. To me, this position is completely untenable, and this becomes clear when you look at the plural forms of nominative, accusative and genitive together:

  • nominative: talo, talot
  • accusative: talon, talot
  • genitive: talon, talojen

This clearly shows that there is no single case that aligns perfectly with accusative function, other than the accusative itself. If you eliminate the accusative as a case, then you end up with convoluted rules like "objects use the genitive in singular but nominative in plural", which doesn't help anyone.

It should also be noted that the identity between accusative and genitive forms is a historical accident, and is a result of sound change rather than a real functional identity between the two cases. In Proto-Uralic, the accusative had the ending -m, while for the genitive it was -n. In the prehistory of Proto-Finnic, final -m became -n, causing the two cases to become identical in form. So historically, there definitely was an accusative case, and the modern Finnish accusative singular -n is a direct descendant of that; it has nothing to do with the genitive. (The accusative plural is identical to the nominative; I don't know whether that's also a historical accident, or something else, but that's not relevant here.)

Since Hekaheka and I just keep arguing on this, I'm asking for wider feedback. —CodeCat 20:09, 24 March 2014 (UTC)

Hekaheka has said that there are two mainstream theories: one, which you summarize and criticize above, is that there is no accusative in modern Finnish, except perhaps in certain pronouns. The other is that there are two accusative forms in modern Finnish. Historically, Wiktionary accepted the latter view. If Hekaheka is right that these are the only two mainstream theories, I think we should continue to follow one of them — presumably the one we've been following — because I don't think the arguments of one person who doesn't speak Finnish are more convincing than the views of Finnish linguists. - -sche (discuss) 20:43, 24 March 2014 (UTC)
But should we blindly follow those arguments, or should we evaluate it for ourselves? It seems that you're doing the former; you're not evaluating my arguments based on my background, which is bad linguistics for sure. —CodeCat 21:09, 24 March 2014 (UTC)
  • Another 2p from someone who doesn't know squat about Finnish, so take this for what you will.
I've been taught that, if a language has grammatical cases, and a certain set of case endings is used for a distinct grammatical role, then the grammatical case for that grammatical role exists, even if the case endings it uses are shared with other case endings.
Case in point (pun intended): German. Broadly speaking, the feminine singular nominative matches the feminine singular accusative matches the plural nominative matches the plural accusative. But we don't say that the plural nominative or accusative cases don't exist. Similarly, the feminine singular dative matches the masculine singular nominative. But we don't say that the masculine singular nominative doesn't exist. Etc., etc.
If the Finnish language has endings that function as a grammatical accusative case, i.e. a set of endings that are consistently applied when a noun is used as the grammatical object, then it follows that Finnish has a grammatical accusative case, even if those endings happen to be shared with other grammatical cases.
Blue Glass Arrow.svg I think part of the confusion might stem from how the Finnish authors define “case”. I don't read Finnish, but following the thread here and on CodeCat's Talk page, it sounds to my ears as though the Finnish authors in the "accusative doesn't exist" camp have taken the view that “case” == “distinct endings”. Meanwhile, those authors in the "accusative does exist" camp have taken the view that “case” == “grammatical role”. My bias is towards the latter.
Either way, any in-depth treatment of Finnish grammar (perhaps WT:About Finnish) should at least mention that there are both interpretations, describe them briefly, and explain which interpretation Wiktionary entries adhere to. ‑‑ Eiríkr Útlendi │ Tala við mig 21:59, 24 March 2014 (UTC)
(E/C) I see this situation as being very similar to Russian. But even if you (like me) consider that Russian masculine nouns do not have an accusative case, which is instead supplemented by the nominative or genitive depending on animacy, it is not wrong to call the supplemented form an accusative. Also, history is completely irrelevant, otherwise Modern English has cases as well, only all the endings except for the genitive have merged. --WikiTiki89 22:07, 24 March 2014 (UTC)
  • Um, I was always taught that English does have cases. Where and how to use whom, etc. only makes sense from a case perspective. ‑‑ Eiríkr Útlendi │ Tala við mig 22:20, 24 March 2014 (UTC)
    English has a relic of cases in the personal pronouns, but English nouns certainly cannot be considered to have cases. --WikiTiki89 22:25, 24 March 2014 (UTC)
    In Finnish, like IE, adjectives agree with nouns in case, and that agreement is what is generally considered to be evidence for case-ness in Finnish. Some grammars cite a "prolative" case for Finnish, or several others, but they don't show adjective agreement, so they are not considered true cases. As for English, it has a two-way case distinction in the pronouns, but not in the nouns (I don't consider the possessives a case; they were adjectives historically). So English has cases, but as a closed set and they are not productive. —CodeCat 22:47, 24 March 2014 (UTC)
    I thought that the possessives are descended from the Old English genitive case ending -es. --WikiTiki89 22:51, 24 March 2014 (UTC)
    Of nouns, yes. But the pronoun possessives were adjectives (or determiners) and agreed in case and number with the noun they modified, like they still do in German and in the Romance and Slavic languages. At least the first- and second-person ones did, but in German the third-person possessives now also inflect, so maybe that happened sometime in English history too before cases disappeared altogether. —CodeCat 22:55, 24 March 2014 (UTC)
    Well maybe I wasn't clear, but I meant to refer specifically to nouns. --WikiTiki89 23:41, 24 March 2014 (UTC)

The point of view that I'm promoting is this (copied from en-Wikipedia article on accusative):

According to traditional Finnish grammars, the accusative is the case of a total object, while the case of a partial object is the partitive. The accusative is identical either to the nominative or the genitive, except for personal pronouns and the personal interrogative pronoun kuka/ken, which have a special accusative form ending in -t. For example, the accusative form of hän (he/she) is hänet, and the accusative form of kuka (or ken) is kenet.

As there are two singular accusative forms, our declension table should show both. If I have not misunderstood, CodeCat argues that only the genitive-looking form is a "true" accusative and therefore only it should be shown. --Hekaheka (talk) 22:22, 24 March 2014 (UTC)

Hekaheka also showed me this, which I found to be much more convincing evidence:
I know I'm walking on thin ice, but the thinking may depend on the fact that in case of personal pronouns (the only true accusative forms that everyone seems to agree on) the accusative is used as equivalent to both nominative and genitive accusatives. At a slave market one might say ostan orjan/ osta orja but if one uses hän instead of orja, it becomes ostan hänet/ osta hänet. The grammatical case must be the same both if the object is a noun or if it is a pronoun - ergo, there's a nominative accusative form. In the end, the existence of nominative-accusative is at least partially a question of convention, but this is anyway the convention that is widely agreed upon.
(my reply, copied from my talk page):
Ok, that is an argument that does make some sense to me at least. The fact that the nominative of a noun becomes the accusative of a pronoun shows that there is a functional connection between the two.
But then, if we include them both under "accusative" then people may think that they're equivalent and interchangeable. We do the same with alternative genitive and partitive plural forms, after all. So I propose changing the table a bit, so that the "accusative" line becomes two rows high, and have two sub-rows each showing the two possible types of accusative. What should we call those sub-rows? Is the imperative the only case where the second accusative (the one like the nominative) is used, or are there others?
CodeCat 22:42, 24 March 2014 (UTC)
  • I strongly feel that we should somehow differentiate the two forms. I do not know in what contexts each of them are used, but we can come up with names for those contexts and call them "X accusative" and "Y accusative". --WikiTiki89 22:46, 24 March 2014 (UTC)
As a native speaker of Finnish, I agree with everything Hekaheka has said on the subject. Thousands of Finnish declension tables are erratic now. Either put back the nominative in the accusative singular box, or remove all accusative boxes. User Jyril who created the old declension tables is quite competent too. CodeCat is welcome to invent new rules for Finnish grammar in the discussion rooms, but we cannot base the declension tables on them. --Makaokalani (talk) 11:59, 27 March 2014 (UTC) It would take several pages to explain the rules when to use nominative or genitive type. Native speakers go by ear, and foreigners never seem to learn them completely. You can call them "nominative type" and "genitive type" if that makes you happy.--Makaokalani (talk) 12:14, 27 March 2014 (UTC)
There don't seem to be any agreed-upon names for them. One paper I found on the subject labelled them the "n-accusative", "t-accusative" and "zero accusative". Since no words distinguish the t-accusative from any other type, we can ignore that and just label it "accusative" for those words. So should our tables name them "n-accusative" and "zero accusative"? —CodeCat 14:32, 27 March 2014 (UTC)
I think a description of their function is better than a description of how they're formed. --WikiTiki89 16:35, 27 March 2014 (UTC)
But what would concisely describe their function? The n-accusative is the "default", but the zero accusative isn't used in situations that form a clear pattern. It's used:
  • as the object of imperatives
  • as the object of passives
  • in some infinitive constructions
and probably in some other situations as well; a native Finnish speaker could probably elaborate. —CodeCat 17:48, 27 March 2014 (UTC)
You cannot explain their function in one word. Any person who knows basic Finnish will know that object forms are tricky, and won't look for advice in a declension table. Just call them something. Hekaheka suggests nominative-accusative and genitive-accusative. Zero-accusative and n-accusative are fine with me too. Anything to stop this argument and finally fix the declension tables. --Makaokalani (talk) 16:33, 28 March 2014 (UTC)

I agree with Makaokalani, let's move into action. CodeCat, forget what I wrote about use of nominative-accusative on your discussion page and check Accusative case#Finnish instead. I erred with some infinitive and participle forms, because normally I don't need to think about them. Sorry for that. Anyway, we must change the current way of displaying Finnish accusatives, as it does not conform with any mainstream linguistic theories. I would be totally happy with the system we had, showing genitive-accusative and nominative-accusative forms in singular and nominative-accusative in plural without any further explanations. Those who don't understand why it's that way, may study Finnish grammar from available sources in the net and elsewhere or post a question on the Feedback page. After all, this is not the only unexplained thing in the table. I bet most users unfamiliar with Finnish will have a problem with understanding the concept behind "instructive", just to name one. If further labeling is deemed necessary, I have given one proposal in the discussion we had on CodeCat's discussion page. Another option is to link the word "Accusative" in the declension table to the page akkusatiivi, which contains some discussion on accusative in Finnish and a link to Wikipedia. --Hekaheka (talk) 20:39, 14 April 2014 (UTC)

I've modified the table now. —CodeCat 21:48, 14 April 2014 (UTC)
Good, thank you. But I still have a complaint about the table headers. The terms "zero?" and "normal" are not an even pair (or even a pair, if you like). The term zero-accusative refers to zero ending and its counterpart is n-accusative in which "n" is not short for "normal" but a reference to the "-n" -ending. Second, this discussion is the first time I ever hear the terms zero-accusative and n-accusative. Nominatiiviakkusatiivi (nominative-accusative) and genetiiviakkusatiivi (genitive-accusative) on the other hand seem to be used by many writers and they are also used in both English and Finnish Wikipedia. In order not to confuse the users of various Wiki sources it would probably be recommendable to use consistent terminology across the Wiki projects, even if it's not perfect. Therefore, change "zero?" to "nom." or "nom.acc." and "normal" to "gen." or "gen.acc.", please. --Hekaheka (talk) 23:13, 14 April 2014 (UTC)
I'd prefer to avoid those terms as their names are not the clearest. "Genitive accusative" sounds like it's an accusative that is used together with a genitive or something like that. The terms "0-accusative" and "n-accusative" are used in the paper I mentioned, The Finnish accusative: Long-distance case assignment under agreement by Anne Vainikka and Pauli Brattico (view). I just switched it with "normal" because it is the normal default accusative case form, and because having just "zero" and "n" looks strange and doesn't say much. With the name "normal" it's clearer to readers that in most circumstances that's the accusative you should use. —CodeCat 23:25, 14 April 2014 (UTC)

French open vowels

French has two open vowel phonemes, /a/ and /ɑ/, but many speakers make no or little difference between them, even though there exists minimal pairs. Therefore, either the underlying phoneme is /a/ or /ɑ/, pronunciation is often written down with only /a/ representing these two phonemes; and when one encounters /a/, one can never be sure that is a /a/ indeed.

To solve this issue, I think we should consider using /a/ where the underlying phoneme is not known or usage varies, and /æ/ where the underlying phoneme is /a/ indeed. That policy would not break any previously written entry as /a/ is confusing anyway.

What do you think of this proposition? – Pylade (talk) 20:59, 26 March 2014 (UTC)

I'm sorry, but I think that's a really bad idea. Even if we could do that consistently, it would "solve this issue" only for those readers who knew that that's what we were doing. For everyone else, it would just be more confusing or misleading (or else look simply erroneous). —RuakhTALK 06:41, 31 March 2014 (UTC)
That’s why this has to be a policy on which we collegially agree. Surely then we should link updated entries to a page describing the policy for French. But I can’t see how things could go more confusing than now; whereas /æ/, not being traditionally used for French, would only require a look at the policy to make sense. – Pylade (talk) 20:42, 31 March 2014 (UTC)
If even I — well acquainted with Wiktionary — and surely someone unacquainted with it — came across a pronunciation with æ, I'd assume it pronounced æ, not that it's pronounced some other way and I need to check a help page to see how. (I'm not talking about those fluent in French. They'd probably notice something amiss. But most enwikt users aren't fluent in French.)​—msh210 (talk) 23:05, 31 March 2014 (UTC)
The funny thing is this phoneme is actually pronounced closer to [æ] than [a]. – Pylade (talk) 14:16, 1 April 2014 (UTC)
Of course, that's true of both of them . . . —RuakhTALK 02:38, 2 April 2014 (UTC)

CheckUsers on Wiktionary

Hello. Wiktionary has been getting quite a few spambots recently, most of which are blocked by some lovely abusefilters. However, it would be very useful to be able to get CheckUser data for those accounts, which could be used to prevent spam elsewhere. All of the current CheckUsers here are inactive, so would it be possible to elect some new ones, or modify the local checkuser policy to allow stewards to check locally in cross-wiki cases, such as anti-spam? Regards, Ajraddatz (talk) 04:36, 27 March 2014 (UTC)

Yeah ... we really do need to de-checkuser the inactive checkusers, and elect new checkusers. And possibly also allow stewards to check locally in spam cases, although if we do the first two things, that will be less important. - -sche (discuss) 05:38, 27 March 2014 (UTC)
What exactly is a CheckUser (other than a Dan Polansky)? --WikiTiki89 05:46, 27 March 2014 (UTC)
You can see m:CheckUser policy for the global policy, or Wiktionary:CheckUsers for the local page. --Rschen7754 06:01, 27 March 2014 (UTC)
To expand on my earlier comment (aware that this is somewhat tangential)... allowing inactive users to retain admin rights, as en.Wikt often does, is one thing; it can even be useful, in that it allows the users to immediately resume blocking vandals and deleting vandalism if they do return for short spurts of activity. Allowing inactive users to retain checkuser rights is different. Checkusers are in a powerful position to invade other users' privacy.
I stumbled across an old thread some time ago in which someone had proposed requiring that people undergo checkuser checking before being granted things like adminship. An admin had replied that he would be wary of that, since he actually had to change accounts because someone had stalked him under his previous wiki account and in real life.
Accordingly, checkusers should be people who are very active, because they should be people who are trusted by the current community (not just the community of users who elected them and then themselves became inactive). They shouldn't be just minimally active (en.WP requires one edit per year to retain admin rights, I think), they should be so active that the current community knows them; otherwise, the community isn't in a position to evaluate their trustworthiness. Also, checkusers should be people who are very active so that they can respond to situations as they arise. It's not as useful for an inactive checkuser to return seven months from now and tell us who was running the spambots.
Also, the Meta Checkuser policy states: "On any wiki, there must be at least two users with CheckUser status, or none at all. This is so that they can mutually control and confirm their actions. In the case where only one CheckUser is left on a wiki (when the only other one retires, or is removed), the community must appoint a new CheckUser immediately (so that the number of CheckUsers is at least two)." [...] "Any user account with CheckUser status that is inactive for more than a year will have their CheckUser access removed."
Looking at Special:ListUsers/checkuser, we are violating the spirit of that, if not the letter:
  1. Versageek has made 8 edits since 2009, half of them in 2010. I don't know them, and I'd hazard a guess that a sizeable number of our current editors (who joined after 2009) don't know them, either. They may be a great person, but I don't know that.
  2. Rodasmith made 5 edits last year, and 2 in 2012.
  3. Connel MacKenzie made 4 edits last year, 4 edits in 2012 (all to react to a motion that they be desysopped for inactivity by accusing one of the makers of the motion of bad faith), and just one edit in 2011 (which is why there was a motion to desysop them for inactivity).
  4. TheDaveRoss was reasonably active last year.
I think we should de-checkuser the first three, and then (per the Meta policy) either de-checkuser Dave or else appoint a new checkuser. Who wants to draft the de-checkuser votes?
Incidentally, I thought the folks at meta had a vote and were going to automatically desysop any of our admins (and presumably also checkusers, etc) who were inactive for more than 2 years, on the basis that we were failing to do so ourselves. Was I mistaken? - -sche (discuss) 07:02, 27 March 2014 (UTC)
The policy is at m:Admin activity review; while we stewards will post the notices, if there is local consensus to leave the admin rights, we will not desysop them. --Rschen7754 07:14, 27 March 2014 (UTC)
Also, as stewards we are willing to look at the CheckUser log and give general statistics on how often the CheckUsers are using the tool, since you cannot view it yourselves, if you would find this information helpful. --Rschen7754 17:19, 27 March 2014 (UTC)
I suppose that would be helpful, yes. :) - -sche (discuss) 03:20, 31 March 2014 (UTC)

<-I pop by here periodically to see if anyone is looking for me, but most of my activity these days is on en.wiki. I used to do xwiki spambot checking here, but since I haven't been on IRC much I don't get the requests anymore. I have no objection to giving up my en.wikt checkuser bit if the community would prefer to elect checkusers who are more active on the project. --Versageek 04:26, 28 March 2014 (UTC)

  • Voting on changing our local checkusers will take a while. Is there any objection to letting stewards use the checkuser tool on this wiki for anti-spam work in the meantime, i.e. until such time as we have more active checkusers? Or would we have to have a vote in order to allow that, which wouldn't actually take any less time than electing new checkusers? lol. (Alternatively, @Ajraddatz/Rschen, did you take advantage of Versageek's appearance to have him do the checkusering you needed done?) - -sche (discuss) 03:20, 31 March 2014 (UTC)
I only notified Versageek about the discussion on enwiki, since they seemed to be active there. --Rschen7754 19:39, 31 March 2014 (UTC)


  1. Rodasmith last used the tool on 8 October 2010.
  2. Connel MacKenzie last used the tool on 20 May 2008.
  3. Versageek last used the tool on 20 December 2013.
  4. TheDaveRoss last used the tool on 21 October 2013.
  5. There were more checks in 2008-2010, by a large factor.
  6. Stewards have run emergency checks at times in 2012 and early 2013, but well before I became a steward, so I am unaware of the circumstances.
  7. There have been no checks run in 2014.
  8. Below are the month-by-month statistics in 2012 and 2013. Generally, fulfilling a request involves multiple log entries, which are what are counted below.
Month Versageek TheDaveRoss Stewards
January 2012 11 3
February 2012 3
March 2012
April 2012 2
May 2012 2
June 2012
July 2012 5
August 2012
September 2012
October 2012 1
November 2012
December 2012
January 2013 3
February 2013
March 2013 1 3
April 2013 2
May 2013 11
June 2013
July 2013 19
August 2013
September 2013
October 2013 2
November 2013
December 2013 2

Rschen7754 20:23, 31 March 2014 (UTC)

This is very informative, thank you! It seems two of the checkusers have been semi-active, while two have been totally inactive with the tool for years. I have drafted a vote to remove the two inactive users' checkuser bits: WT:Votes/cu-2014-04/De-checkusering inactive checkusers. Votes on electing new, fully active checkusers can come later. In the meantime, @Ajraddatz / @Rschen, if you still need checkuser data on the spambots that prompted this thread, ping Versageek again. - -sche (discuss) 08:14, 2 April 2014 (UTC)
Will do, thanks. Ajraddatz (talk) 01:41, 3 April 2014 (UTC)

Fix Greek nouns' inflection line

As a result of the last change on Template:el-noun, the second parameter is now corresponding to plural, which used to be the third one. We need to remove the empty old second parameter and make the plural visible again. If there is no objection, I'm going to run my bot with a fix like that:
(ur'\{\{el-noun\|(.*)\|\|(.*)\}\}', u'{{el-noun|\\1|\\2}}')
I've already made some tests. Is it OK? Does anyone see any potential problem? --flyax (talk) 13:52, 29 March 2014 (UTC)

It's ok, I was already going to do this. —CodeCat 14:00, 29 March 2014 (UTC)

A Different Way To Look At Chinese?

I've been doing some thinking about Chinese, trying to sort out the implications of the different ways we've treated the Chinese lects. I can't say that I have a clear solution, but I have the inklings of what seems to me like a more rational conceptual framework. It may be incompatible with our current infrastructure, but here it is:

Let's assume that Chinese is a strictly written language, originally based on an earlier spoken lect, but that has developed since on its own independent of the spoken language(s). In this respect, it reminds me a lot of sign languages such as American Sign Language.

The "dialects" are independent, but strictly spoken languages which are translated into dialects of written Chinese when they are written. That makes written Chinese a sort of lingua franca that's used to communicate between speakers, though the example of the w:Code Talkers comes to mind, as well.

Following from that, Chinese would be considered a language, but so would Cantonese, Hakka, Mandarin, Min Nan, Wu, Xiang, etc (and/or some of their subdivisions and/or fellow members of their dialect groups, as well).

Some aspects would be complementary in distribution: pronunciation would be strictly for the spoken lects and orthography for the written one. Morphology and syntax, on the other hand, are partly tied in with the writing, but have dimensions that the writing simply doesn't address. Etymology of the spoken lects is quite different from that of the written, but there again, intertwined with it. Lexicon is also quite distinct, but there are regional terms in the writing and I'm sure the writing contributes to the spoken lects as well.

I'm not sure if this is developed enough to provide anything useful, but I thought I would present it and see what others think.

Comments? Chuck Entz (talk) 20:21, 29 March 2014 (UTC)

NB: A 'classical chinese' (WMF lang code zh-classical) written dialect dates from at least the 9th century ce, while an updated 'vernacular chinese' (WMF lang code zh) written dialect dates from at least the 20th century ce. The former is understood across a broader base of languages, the latter is understood by a larger population group. - Amgine/ t·e 17:24, 30 March 2014 (UTC)
Most of the modern written Chinese is basically the same across all Chinese varieties, e.g. standard written Cantonese only differs from standard Chinese (Mandarin) in style and word choices, pronunciation (when read out loud) and use of traditional Chinese (e.g. in Hong Kong) (the latter is not really a difference, since Mandarin is also written in traditional characters in Taiwan and Cantonese is written in simplified characters in Guangdong). Vernacular forms of dialects when written down differ more from standard Chinese as there are very common words, which are not used in Mandarin or have a different meaning. The specific words are very low in number, even if they are very common, every day words. Some spoken dialects simply lack the writing form or is not common and attestable.
Vote here for the Unified Chinese approach Wiktionary:Votes/pl-2014-04/Unified Chinese, which will allow to boost non-Mandarin contents. --Anatoli (обсудить/вклад) 00:57, 31 March 2014 (UTC)

Whinge: AbuseFilter #24

This isn't suggesting a change or anything, just a wee venting.

Rule 24 seems, to me, to violate WT:NOT #7. Which is mostly annoying to me because a collaborator on a current project was unable to leave me in-line notes. - Amgine/ t·e 17:17, 30 March 2014 (UTC)

A lot of people have complained about this in the past. But I don't see how it violates WT:NOT #7, since admins can edit these pages and will delete any that violate this rule. --WikiTiki89 17:46, 30 March 2014 (UTC)
Hosting a page only a specific user (or an admin) may edit seems to me to be webhosting, and not wiki. Just a difference in interpretation. - Amgine/ t·e 17:59, 30 March 2014 (UTC)
I think it really depends on the content. If it's just vanity, then I agree. If it's lists of pages they want to edit or test versions of templates, then it's not just web hosting. The reason for this filter is that user pages seem to be a big target for personalized vandalism. So I think we can reduce the restriction to non-autoconfirmed users. --WikiTiki89 18:06, 30 March 2014 (UTC)
Like several other abuse filters, this one would benefit from the ability to check if users are in a certain group (e.g. autoconfirmed users) on any wiki, not just on this one. - -sche (discuss) 19:29, 30 March 2014 (UTC)
IMO, if this filter is applied not only to [[user:Username]] but also to [[user:Username/subpage]], then it should log but not prevent the edit, so as to allow, as Amgine says, collaboration on projects, which user subpages have often been used for.​—msh210 (talk) 23:09, 31 March 2014 (UTC)


I just noticed that Wikimania (London, 6-10 August 2014) accepts submissions only until tonight (2014-03-31).

Do any of you plan to attend, and do you think a Wiktionary session is possible (it would have to be proposed quickly)?

I believe the scholarships offered by the Wikimedia Foundation are closed, but it may be possible to ask your local chapter for help if you want to attend.

Otherwise, do you think it would be a good idea to plan a Wiktionary conference, "just for us", all languages together? Dakdada (talk) 14:42, 31 March 2014 (UTC)

I have no opinions regarding Wikimania as I cannot attend. However, I would love for Wiktionary to have an all language project meeting, and would like a discussion there regarding harmonizing interfaces and content across the project. - Amgine/ t·e 15:57, 31 March 2014 (UTC)

Automatizing {{rhymes}} with Lua, merging it into {{IPA}}, and reforms in pronunciation section

Automatizing {{rhymes}} looks feasible. The template that takes IPA pronunciation can generate the title of the corresponding rhymes page (at least for English, I don't know about rules of rhymes in other languages). Any opinions/objections about automatizing it?

On the other hand, the structure of pronunciation sections of Wiktionary looks quite stupid and needs some reform:

Note how things such as the word "Audio" and the accent labels ("UK", "US") and the symbols (ɔːtə(ɹ), ɒtə(ɹ)) are being repeated. Maybe it should be something like:

Also: /ˈwɔːtə(ɹ)/ (rhymes)

(Template:IPA is supposed to automatically create the wikilink)

Any opinions/objections about this change for linking to rhymes namespace? --Z 16:36, 31 March 2014 (UTC)

How would you automate it? —CodeCat 16:40, 31 March 2014 (UTC)
By finding the stress mark, skipping consonant symbols, and checking if the corresponding page for the remaining strings exists. --Z 16:49, 31 March 2014 (UTC)
What about languages that don't form rhymes in that way? —CodeCat 16:51, 31 March 2014 (UTC)
We can start from English, and add rhymes manually for languages with different rules until they are automatized, too. --Z 17:26, 31 March 2014 (UTC)
If the audios are moved to the same lines as the transcriptions, I prefer a tabular layout, otherwise they look like they are floating around.
As for automatising rhymes, it is a good idea, but I don’t think it should be enabled by default in the {{IPA}} template (not even for just English). — Ungoliant (falai) 16:54, 31 March 2014 (UTC)
Could you elaborate on the automatising part (for English)? --Z 17:35, 31 March 2014 (UTC)
I don’t think that the template {{IPA}} should automatically “rhymify” its parameters once the module is implemented. Either a new template should be created for rhymified IPA transcriptions, or a new parameter should be added to {{IPA}} to enable rhymification (i.e.: {{IPA|/ˈfuː/|rhyme=yes}}). — Ungoliant (falai) 17:45, 31 March 2014 (UTC)
Or {{IPA|/ˈfuː/|rhyme=en}}, to remain open. Dakdada (talk) 17:47, 31 March 2014 (UTC)
I see, but I meant why do you think it should not automatically rhymify? --Z 17:49, 31 March 2014 (UTC)
Maybe it should eventually, but I think it should first be made non-automatic so that we can test it. After several months of that, I think we could consider making it automatic. --WikiTiki89 18:03, 31 March 2014 (UTC)
It will rhymify IPAs whose author never intended for use as a rhyme appendix index, which is a problem because rhyme page names are much more standardised than IPA transcriptions. — Ungoliant (falai) 18:08, 31 March 2014 (UTC)
The main problem is that the English rhymes pages are currently normalized for one regional variant, so that there will be duplicate rhyme pages for different regional realizations of the same phoneme. This highlights another issue: do we want to create a rhyme page for every variant that anyone ever puts in an IPA template? Some people mark things like aspirated and unreleased stops, or represent an artificial-sounding spelling pronunciation, while others just don't understand IPA and make mistakes. Automation means we have to fix or delete a rhyme page every time we correct the IPA. Chuck Entz (talk) 18:30, 31 March 2014 (UTC)
Ok, automated linking rhymes to looks to be problematic. We can still merge {{rhymes}} into {{IPA}}: the input would be {{IPA|/ˈwɑtɚ/|rhymes=ɒtə(ɹ)}}, {{IPA|/ˈwɑtɚ/|rhymes=yes}} (semi-automatic), or {{IPA|/ˈw[[-ɒtə(ɹ)|ɑtɚ]]/}} (the template can make the page title using the tools that are already developed in Module:links) --Z 19:31, 31 March 2014 (UTC)
(Re Chuck Entz 18:30, 31 March 2014 (UTC).) We could do it only for IPA marked with /…/ and not for that marked with […]. (I don't know that it's a good idea anyway, though.)​—msh210 (talk) 23:12, 31 March 2014 (UTC)

Über-template with tabular output for pronunciation section

This is inspired by opinions from Vahagn Petrosyan and Ungoliant MMDCCLXIV.

We can create a single, Lua-powered template for pronunciation section, which would generate a table similar to that of below. This template would be able to automatically generate IPA pronunciation for certain languages (for now, only ready for Armenian, Georgian, Polish, Standard Chinese (Mandarin), Ukrainian and to some extent for Persian and Ancient Greek), possibly rhymes (discussion at above), and hyphenation (for American English, etc.) wherever possible. (these are feasible from technical aspect)

Both output and input of the proposed template are more readable and the output doesn't have duplications that can be seen in the current format. It also takes and shows information more precisely (in the current format of our pronunciation sections, it is often unclear which, say, audio, or which homophone corresponds to which IPA pronunciation when there's more than one phonemic pronunciation for a given accent, which is the case for "US" here):

IPA enPR Audio Hyphenation Homophones
Australia /ˈwoːtə(ɹ)/ [ˈwoːɾə(ɹ)]
UK /ˈwɔːtə(ɹ)/
wa‧ter whatever!
US /ˈwɔtɚ/ [ˈwɔɾɚ] wôtər
/ˈwɑtɚ/ [ˈwɑɾɚ] wŏtər


IPA enPR Audio Hyphenation Homophones
Australia /ˈwoːtə(ɹ)/ [ˈwoːɾə(ɹ)]
UK /ˈwɔːtə(ɹ)/
wa‧ter whatever!
US /ˈwɔtɚ/ [ˈwɔɾɚ] wôtər
/ˈwɑtɚ/ [ˈwɑɾɚ] wŏtər

input would be:
|a1= AU
|AU-IPA= /ˈwoːtə(ɹ)/ [ˈwoːɾə(ɹ)]

|a2= UK
|UK-IPA= /ˈwɔːtə(ɹ)/
|UK-audio= En-uk-water.ogg
|UK-hyphen= wa-ter
|UK-homo= whatever!

|a3= US
|US-IPA= /ˈwɔtɚ/ [ˈwɔɾɚ]
|US-enPR= wôtər
|US-audio= en-us-water.ogg
|US-IPA2= /ˈwɑtɚ/ [ˈwɑɾɚ]
|US-enPR2= wŏtər

Its equivalent in our current format is:

* {{a|Australia}} {{IPA|/ˈwoːtə(ɹ)/|[ˈwoːɾə(ɹ)]}}
* {{a|UK}} {{IPA|/ˈwɔːtə(ɹ)/}}
* {{a|US}} {{IPA|/ˈwɔtɚ/|/ˈwɑtɚ/|[ˈwɔɾɚ]|[ˈwɑɾɚ]}}, {{enPR|wôtər|wŏtər}}
* {{audio|En-uk-water.ogg|Audio (UK)|lang=en}}
* {{audio|en-us-water.ogg|Audio (US)|lang=en}}
* {{hyphenation|wa|ter|lang=en}} (UK, US)
* {{rhymes|ɔːtə(ɹ)|ɒtə(ɹ)}}

--Z 20:54, 31 March 2014 (UTC)

Whole-heartedly support. This is so much nicer than what we have! There are a few things to work out, though. Some of our languages use an diaphonemic approach to IPA. Dutch is one example; there is a single diaphonemic transcription in IPA for all "standard" Dutch varieties, and then differences are noted below that. —CodeCat 21:01, 31 March 2014 (UTC)
I can only think of two solution, listed here. The 2nd one looks better, but harder to implement. --Z 14:52, 1 April 2014 (UTC)
Support with both hands and feet. --Vahag (talk) 21:05, 31 March 2014 (UTC)
Support. But can the homophones parameter be something other than homo? — Ungoliant (falai) 21:09, 31 March 2014 (UTC)
How about just hom? - -sche (discuss) 00:38, 1 April 2014 (UTC)
Support Provided there's ongoing support. --Anatoli (обсудить/вклад) 22:28, 31 March 2014 (UTC)
Would we really need the a1 and similar parameters? Can't Lua infer them from the other parameters' names?​—msh210 (talk) 23:21, 31 March 2014 (UTC)
Can we please move pronunciation below the definitions, then? As it is, people have long mentioned that they can't find our definitions. Putting this above the definitions would hide them even more. (Of course, putting pronunciation below the definitions requires a vote, whereas this doesn't.) Oppose this template, even though I like the idea and will support its use if the section is moved down.)​—msh210 (talk) 23:21, 31 March 2014 (UTC)
Indeed, that's my ultimate goal, to highlight important information by getting rid of or marginalizing not-so-important ones (X-SAMPA, rhymes, probably hyphenation). By packing data in a table, it will be much easier for newcomer's eyes to skip the information. We can also make the table expandable, like translations. --Z 14:27, 1 April 2014 (UTC)
That's a very good point (that, even though the box catches the eye more, it's also easier to skip over and know where to continue looking).​—msh210 (talk) 23:43, 2 April 2014 (UTC)
  • Support the general idea, but I don't think linking the pronunciation straight to the rhymes is a good idea, for a few reasons. First, because it's not at all clear to the reader that it should link to rhymes, as it doesn't say "rhymes" anywhere in the box. Second, because this eliminates the possibility of adding helpful tooltips indicating what sounds each symbol is for. (I think this would be quite helpful, as many readers don't know IPA. I set up something like this for Wikipedia at w:Module:IPAc, but it never got used.) Third, because this way rhymes can't be shown at all if the full pronunciation details aren't available. In many cases, a user will be able to add that one word rhymes with another word just be using the rhymes adding tool, even if they don't know IPA and thus wouldn't be able to add the full IPA pronunciation. --Yair rand (talk) 20:28, 1 April 2014 (UTC)
    (Imported Module:IPAc. --Yair rand (talk) 02:08, 3 April 2014 (UTC))
Umm....I don't know if this is at all relevant, but I just created a new Ancient Greek pronunciation template, {{grc-pron}}, which can be seen in action at εὐδοκέω (eudokeō). If this template is to become the community standard, then I imagine the results from grc-pron could be piped into it. I also assume that the bits not relevant to a language, like rhymes aren't relevant to Ancient Greek, for example, could be turned off. Finally, I haven't finished the JS for this yet, but I was planning on making my new template collapse into one line by default, showing a general gist of the word's phonological evolution, which would be expandable to the current five lines of detail. I would strongly prefer to keep the ability to show the general user a fairly compact, simple pronunciation scheme, while allowing the more serious phonophile more detail. I feel fairly confident that most users don't give a rat shit about most of this info, and assaulting their above-the-line load-screen with it is unacceptable. Again, I don't know if this is relevant, and I do apologize if my grc blinders have caused me to insert myself somewhere I don't belong. -Atelaes λάλει ἐμοί 08:13, 3 April 2014 (UTC)


There are a few things that need to be sorted out, as the parameters given above are a nice ideal, but there are some practical problems.

  • Hyphenation is not actually a pronunciation thing, so it probably doesn't belong in the pronunciation section, let alone in this table. It's also redundant to list it for each pronunciation as it's going to be pronunciation-independent.
  • The suggested way of specifying parameters doesn't say anything about the order in which the columns should be shown. While you can specify parameters in a different order on calling the template, this information is lost in the translation to Lua or the template. So the module would have no way of knowing that IPA goes before audio, unless we hard-code it into the module itself. That, of course, means that we'd have to maintain a list of valid columns, and reject any that are not recognised. It's not ideal.
  • Consider also the changes that are in progress with Chinese entries. They also use a table structure but it expands. See for example User:Wyang/告白. It may not be the best thing to have one format for Chinese and another for all other languages. So we may want to consider a way to accommodate both.
  • What should go in the first column when there is only one pronunciation?

CodeCat 21:46, 31 March 2014 (UTC)

IMO the column names and their order should be hard-coded. There are only a few, and keeping the order consistent will be good for those who view multiple pages, with no drawback I can think of.​—msh210 (talk) 23:21, 31 March 2014 (UTC)
American hyphenation is based on pronunciation. British hyphenation is based on etymology. We could put the American hyphenations in the pronunciation section, and the British hyphenations under etymology (tongue-in-cheek). —Stephen (Talk) 04:10, 2 April 2014 (UTC)
I've written a module, Module:User:CodeCat/pronunciation, which can be seen (temporarily) on User:CodeCat/sandbox. It's a quick draft, but it does work more or less. The data is hard coded for now, I want to make a separate module (named /templates) which handles calling from templates. That way, the current module is focused only on Lua, and doesn't need to concern itself with details of how it's invoked. —CodeCat 23:02, 31 March 2014 (UTC)

It looks good. The plain version is better, because the table lines and tone are just visual distraction. It would be more readable if every header and body cell were left-aligned.

If there is only one line and no accent, can the first column be omitted?

I agree that hyphenation doesn’t really belong here. There are, apparently, different American and British hyphenation rules, and probably rule sets according to specific style guides. Michael Z. 2014-03-31 23:15 z

If we're putting homophones in the template we should decide what it looks like if you have a lot- vertical list? expandable? DTLHS (talk) 23:49, 31 March 2014 (UTC)

Like many languages, English has a very large number of dialects. At present, people can add information on these dialects' pronunciations of words rather simply: they just copy a line of the current format and change the {{a}}, expanding
* {{a|GenAm}} {{IPA|/maɪt /}}
* {{a|Southern US}} {{IPA|/mɑːt/}}
to e.g.
* {{a|GenAm}} {{IPA|/maɪt /}}
* {{a|Southern US}} {{IPA|/mɑːt/}}
* {{a|Old Virginia accent}} {{IPA|/mæːt/}}
and {{a}} is flexible enough to handle this. Your proposed format seems to use dialects' names in the names of parameters ("UK-IPA=", "UK-audio=", etc). Does this mean we would have to anticipate and code into the template/module, in advance, every dialect of every language that the template covered, or people would not be able to add those dialects' pronunciations of words? - -sche (discuss) 01:29, 1 April 2014 (UTC)
Lua is capable of iterating over all parameters. So it's able to "find" parameters whose name is not known in advance. It does require that the parameters follow some predictable pattern so that the module knows which part of the name it should interpret as a dialect name. —CodeCat 01:55, 1 April 2014 (UTC)
Yes. But there is also a nice solution to avoid problems such as following restricted patterns and long names for parameters: (for 2nd row)
|2= US
|2-IPA= ...
|2-audio= ...
this way, we'd be able to put more than one dialect/variety in a single row: by passing them to a numbered parameter, separated by comma or something: |2=UK, US. We can also define subvarieties which would be placed in the second column, by passing them to |n-n= where n's are numbers, which is especially useful in Chinese entries.
Regarding columns, and whether they should be hard-coded, I think specific columns should be defined with a default order, but we should probably be allowed to override it by defining new columns and their order using special parameters, like |columnn=.
If the |n= parameters are not specified (usually when we have only one row), the first column should be omitted.
I think hyphenation belong somewhere in PoS section, near headword line where we show the word, maybe floating on the right side, like this.
@Wyang: you may be interested in this discussion. --Z 14:09, 1 April 2014 (UTC)
I support the idea of using a single template to cover all the English pronunciation information. In the case of English, it might be useful to pre-define some commonly used dialectal names, such as 'UK', 'RP', 'US', 'GenAm', 'Canada', 'Australia', 'Ireland', 'New Zealand' and 'South Africa', so that one can use
|UK-IPA= ...
without having to identify the variety in a separate parameter. The case of Chinese is a little different from English, as the internal divisions of Chinese are well-defined from centuries of Chinese dialectology (Mandarin, Cantonese, Wu, etc.), with each division having a prestige dialect (usually the dialect of the largest city), whereas English accents are less well-defined. Regional Chinese accents are associated with locations (List of varieties of Chinese, eg. Beijing, Guangzhou), while English accents tend to be characterised by region/area (eg. Southern US). Hence, I would prefer if {{zh-pron}} has parameters 'm=PINYIN', 'c=JYUTPING', 'w=ROMANISATION' for prestige dialects, and 'm-LOCATION_A=IPA', 'm-LOCATION_B=IPA' for other varieties. In contrast, the English template is probably better without a set of predefined divisions (as the example above |1= ... |2= ...), but with some predefined commonly used parameters, as said above. As for rhymes, I think they should be kept separate from the IPA template. Wyang (talk) 03:54, 2 April 2014 (UTC)

Changes to the default site typography coming soon

This week, the typography on Wikimedia sites will be updated for all readers and editors who use the default "Vector" skin. This change will involve new serif fonts for some headings, small tweaks to body content fonts, text size, text color, and spacing between elements. The schedule is:

  • April 1st: non-Wikipedia projects will see this change live
  • April 3rd: Wikipedias will see this change live

This change is very similar to the "Typography Update" Beta Feature that has been available on Wikimedia projects since November 2013. After several rounds of testing and with feedback from the community, this Beta Feature will be disabled and successful aspects enabled in the default site appearance. Users who are logged in may still choose to use another skin, or alter their personal CSS, if they prefer a different appearance. Local common CSS styles will also apply as normal, for issues with local styles and scripts that impact all users.

For more information:

-- Steven Walling (Product Manager) on behalf of the Wikimedia Foundation's User Experience Design team

Well, this is different... —CodeCat 18:41, 1 April 2014 (UTC)
What timezone is WMF? East Coast US? I want to know when April 1st is over because this looks hideous. Neitrāls vārds (talk) 19:28, 1 April 2014 (UTC)
Are you telling me this isn't just a subtly maddening April Fools' joke? Ultimateria (talk) 20:38, 1 April 2014 (UTC)
Neitrāls vārds, we're spread across timezones. :) The public deployment calendar has times in both UTC and Pacific, where the office is. And nope, Ultimateria, not a joke. If anyone is curious about how to go back to the old version for themselves, or why we did this, the FAQ and other materials has way more detail than I could provide here. Steven (WMF) (talk) 21:46, 3 April 2014 (UTC)

April 2014

Measure word

Continuation of Wiktionary:Beer_parlour/2013/November#Measure_word, See also: Talk:笔, Template_talk:cmn-new#New_PoS

Re: "measure word", "counter" and "classifier" - difference in headers Is ===Classifier=== allowed by KassadBot?. --Anatoli (обсудить/вклад) 06:51, 1 April 2014 (UTC)

Yes. The only one not allowed is "measure word". Chuck Entz (talk) 07:37, 1 April 2014 (UTC)
Thanks. If everyone is OK with it, I'm planning to make them all (counters) "classifiers", add to existing Category:Classifiers by language, since "measure words" are not allowed and "counters" are ambiguous and hated by some. --Anatoli (обсудить/вклад) 10:45, 1 April 2014 (UTC)
  • I'm curious -- hated by some refers to whom? And in what context -- in reference to Chinese, or Japanese, or Korean, or ?? I've only ever heard the Japanese 助数詞 (​josūshi) POS referred to in English as counter. Furthermore, these are only ever used in counting, as far as I can think at the moment, so calling them "classifiers" doesn't seem quite right either.
Why do we need to unify these terms? I fail to see any compelling reason for changing the grammatical terminology in use at Wiktionary, especially when that usage runs counter to long-established usage in the English-language literature. A beginning learner of one of these languages might come here to look things up and walk away confused because of this change.
Also, why should mention of a new POS for Chinese have any bearing on Japanese entries? This process seems to be a bit muddled. ‑‑ Eiríkr Útlendi │ Tala við mig 07:36, 3 April 2014 (UTC)
  • And as I discover the scope of the already-implemented changes (which in and of itself seems premature), I'm increasingly bothered. Category:Japanese classifiers (again, non-standard nomenclature for describing Japanese) has this descriptive paragraph as its header:

Japanese terms that classify nouns according to their meanings.

That's not correct. The 助数詞 (​josūshi) POS does not classify nouns according to their meanings. They are only ever used immediately after a number when counting things. That's why they are traditionally called counters. Moreover, the term counter is much more intuitively obvious with regard to its use than the overly broad classifier.
Could we perhaps put the brakes on extensive changes across a broad array of languages until more of a consensus has been established? ‑‑ Eiríkr Útlendi │ Tala við mig 07:52, 3 April 2014 (UTC)

Arrowred.png For the record, I am deeply concerned that decisions about entry nomenclature and formatting that affect multiple languages, including Korean, Japanese, Vietnamese, and even Bengali, were apparently made at [[Template_talk:cmn-new#New_PoS]], somewhere I (and most likely other editors of these affected languages) would not even think to look. I understand that many of us are enthusiastic and keen to make progress, but forging ahead without clear consensus is ultimately disruptive and counterproductive. I urge all those involved to do more to communicate clearly, widely, and obviously. The Beer Parlor and other forums are intended for precisely this kind of broad communication. Please keep broad discussions, especially those that have such wide-ranging ramifications, on these forum pages. ‑‑ Eiríkr Útlendi │ Tala við mig 08:04, 3 April 2014 (UTC)

Question book magnify2.svg
Input needed: This discussion needs further input in order to be successfully closed. Please take a look!

@Atitarev: Saying "If everyone is ok with it" and getting no response in a day or two is not the same as getting permission to make massive changes in languages that were not part of the discussion. This is absolutely unacceptable. STOP!!!!! Chuck Entz (talk) 13:43, 3 April 2014 (UTC)

Hello all. I don't understand this uproar and I'm sorry to see people upset. This topic is the continuation of the November 2013 topic. BP is the place everybody should read. What other place is there to get consensus? I only linked relevant other discussions here for reference. Shinji, a native Japanese speaker, agreed that "classifier" is the best term for the East Asian languages. Haplology only explicitly opposed "measure word", which is also illegal at Wiktionary. It is, at least synonymous with "counter", and is definitely used in reference to Japanese along with "counter". It's easy to check. Japanese and Chinese also share a number of classifiers, even if the Japanese usage is narrower. It is, of course, desirable to have the same name for the same PoS across languages.
Well, I wrongly assumed there is no objection. I guess, I'll have to undo the change for Japanese, maybe Korean. Wyang helped me with this with his Wyangbot. I may have to ask him again because the list is big. Sorry and thank you. --Anatoli (обсудить/вклад) 20:26, 3 April 2014 (UTC)
  • @Atitarev:, thanks for posting. Some thoughts in response:
I was on WT hiatus in November, so I didn't see that at the time. More recently, once that thread was brought to my attention (just last week really), I skimmed through and saw mostly reference to Chinese, and some discussion of how to handle categories. I didn't see anything resembling a real consensus for changing Japanese entry structure or headers.
Takasugi-san's actual words were:

For most East Asian languages, classifier is the best term. Counters (助数詞) are classifiers used only after a numeral. In Mandarin, you say 一 and 那 while in Japanese you say 一 but you say just あの.

The phrase most East Asian languages may or may not include Japanese as Takasugi-san intends it. It's not clear here. His wording that counters are classifiers used only after a numeral to describe the Japanese POS makes the case that Japanese counters are a subset of classifiers in terms of how they function. Note his point that Mandarin simplified 那个, traditional 那個, has no direct Japanese counterpart that uses the character: this Mandarin term is much more limited in use in Japanese.
Moreover, while I certainly respect Takasugi-san's judgment regarding Japanese terms, inasmuch as he is a native speaker of Japanese, this issue affects English terminology used to describe and teach Japanese -- something that he is less likely to be familiar with than either Haplology or myself, among other native-English-speaking learners of Japanese.
Meanwhile, Haplology didn't exactly support this suggested change (bolding mine):

As mentioned above, a counter word in Japanese is a unique pos called 助数詞 in Japanese. I suppose that it would not be inaccurate to change the header to "measure word" but it would be unconventional. So far I've only heard the word "counter" used to name them.

That's not strong opposition, but it's certainly not a ringing endorsement either.
James Jiao made the statement that:

There are definitely inconsistencies across the East-Asian languages on WT. Japanese measure words are usually called counters (助数詞 josūshi) and an example of this can be found here: . As you can see a non-standard heading 'counter' is used here. It is a unique pos for these languages that serve the same purpose and there should be a unified heading to denote them.

However, this POS does not actually function the same way in each language, as demonstrated above, raising the question of whether this can really be called the same POS in each language. I also don't see a strong case for using the same English label for a POS that 1) might be functionally different in different languages, and thereby effectively viewable as a different POS in each language, and that 2) already has set English terminology that happens to differ for each described language, such that calling it something else may very well confuse and alienate users.
The November BP thread does mention categories as one possible driving consideration for this change. I suspect there may well be a behind-the-scenes technical solution that would allow for these apparently similar-but-different POSes to be grouped into a common parent category, without requiring any change to user-facing terminology.
I'm sorry to stand against you on this one, but I must voice my opposition to changing the term counter as used so far in Japanese entries. ‑‑ Eiríkr Útlendi │ Tala við mig 22:16, 3 April 2014 (UTC)
Thanks for answering, Eirikr. As I said, I will revert the change. I'll do it not because I think it was wrong. I acted in good faith and I did find confirmation that the term "classifier" is not contradictory and is used to refer what is called Japanese "counter". There is a lot of commonality between classifiers in all languages and that's the term used when comparing them and describing classifiers in general, not specific to any language. I did rush a bit, though. I should have made sure there is a real consensus. I should have personally contacted you (Haplology has been unavailable for quite some time now), since you haven't commented in the discussion. Again, I will undo the change because I respect your opinion, not because I think it was incorrect. Yes, linking the Category:Classifiers_by_language with Category:Counters_by_language somehow makes perfect sense and I could use some help in changing back to the original headings for Japanese (Wyangbot (talkcontribs) did it in a matter of minutes). --Anatoli (обсудить/вклад) 22:34, 3 April 2014 (UTC)

Are Japanese counters a subset of classifiers or are they simply an adaptive consequence of the classifier system in Japanese? The evolution of Japanese counters is clearly tied in with influence from Classical Chinese, considering the most frequently used Japanese counters are all of Sino-Japanese origin. Some people have argued that Japanese originally had no indigenous counter system, or only a poorly-developed, non-systematic one (more in Downing, "Numeral Classifier Systems: The Case of Japanese", Chapter 2). I agree that the Chinese classifier system is more versatile (Chinese classifier#Usage). But to me there doesn't seem to be essential differences between the two, just that the Japanese system is the nativised descendant of the basic structures of the Chinese classifier system (most notably, the 'noun-number-classifier' structure). Phrases like '犬三匹' or 'パン一斤' are reminiscent of how the classifier phrase is structured in Classical Chinese (Chinese classifier#Classifier phrases). Additional structures, like 'demonstrative-classifier-noun', were not found in Classical Chinese and were not borrowed, same as how the indigenous demonstrative system in Japanese was unreplaced (as Sino-Japanese counters tend to accompany nouns or numerals of Sino-Japanese origin).

Additionally, which name is more commonly used? Google Books search appears to suggest the opposite: Japanese counters grammar (2020), Japanese classifiers grammar (7680). Also, 'counters' may be polysemous: Japanese counters history vs Japanese classifiers history. I have undone the changes for now, although I'm in favour of the name 'classifier'. Wyang (talk) 23:09, 3 April 2014 (UTC)

  • Apologies for the delay, I've been schnockered by food poisoning on the tail end of a crappy cold. Joy.
Anyway, regarding polysemy, note that counter in many of the examples of usage in history contexts are indeed using a different meaning, or represent scannos:
  • ...on the Sino-Japanese War, which ranged from stories of their en- counters with anti-Japanese sentiments held by...:*
  • ...the Japanese scientists see their lab radiation specialists as trustworthy fellow scientists. The Americans demanded that radiation counters with ...
  • ...Sandra Wilson's comprehensive study of the Japanese response to the Manchurian Incident counters by ...
  • ...The right wing counters by arguing that most history and social studies textbooks ...
However, in grammatical contexts, counter is unambiguous.
Regarding commonality of usage, restricting a search to language learner contexts shows that counter is indeed more commonly used than classifier. Compare:
The term classifier might get more play in high-end academia, but that 1) seems to include comparisons across languages, and 2) is not common ground for the likely majority of Wiktionary users. ‑‑ Eiríkr Útlendi │ Tala við mig 04:24, 7 April 2014 (UTC)

Categorizing with the pipe

Correct discussion: #Sort-keys in Category:CJKV radicals.

At Wiktionary:Votes/2011-04/Representative entries it had been discussed that the default sorting order within a category should not be overwritten. IMHO it is completly another thing to avoid that each entry in a category gets a headline of this entry. Please look to Category:CJKV radicals containing 275 one-character entries: all the characters are preceded by an additional headline of the character itself. This is rather disturbing than of any use, doubles the lines and makes it difficult to get a swift overview.

The headings are useful where several entries with the same leading letter should be bundled together, making this way partitions allowing an easier overview. They are totally useless in categories of single letters, where they are just doubling each entry.

Compare it with the 41 entries in the Category:Kangxi radicals, where the headings are omitted but the sort order is not disturbed.

Rukhabot is willing to edit the 275 pages, changing the [[Category:CJKV radicals]] to [[Category:CJKV radicals| ]] to suppress the 275 useless headlines. He just wants to be sure that everyone on board agrees with this change. So I ask the community to decide that in this case pipes are not evil. sarang사랑 09:47, 1 April 2014 (UTC)

I would consider that vote superceded in any case. All categorisation through Module:utilities (and by extension through {{head}}, {{catlangname}} etc) adds automatically-generated language-specific sort keys to categories. —CodeCat 13:15, 1 April 2014 (UTC)
Please read the actual vote, rather than Sarang's description of it; (s)he has misunderstood what it was about, and has thereby led you astray. The vote certainly has not been superseded in the way that you suggest.
Or, alternatively — don't bother reading the vote, since it's also not related to the subject of the current discussion. I think I erred in asking Sarang to start this discussion, since (s)he apparently doesn't understand the reason I felt one was needed: (s)he seems to think I wanted a discussion in order to partially overturn the vote (s)he links to, whereas in fact I just wanted a discussion in order to approve a bot run for the change (s)he's requesting.
RuakhTALK 06:42, 7 April 2014 (UTC)
@Sarang:. "...This is rather disturbing than of any use". Radicals are the lowest level of Han characters, you can't go any lower. Well, if you sort radicals themselves, not Han characters containing them, you get radicals, the same way an English alphabet letters would have exactly 26 headings. The Chinese words, though, (Mandarin only, for the moment) at least PoS are sorted by numbered pinyin ("pint" value), previously traditional entries were sorted by "rs" values (by radicals). It would be great if someone made Mandarin categories sorted by "pint", without having to do it manually. --Anatoli (обсудить/вклад)

I've struck out the above because it consists solely of a series of confusions and misunderstandings, so cannot lead to any relevant conclusion. I have started a new discussion with correct information: #Sort-keys in Category:CJKV radicals.RuakhTALK 20:55, 7 April 2014 (UTC)


I propose that we replace all entries with the template {{magic}} which will automatically display all the desired content on any page it is transcluded on. I've been working on this template for a while. A test can be found at User:Wikitiki89/apple. --WikiTiki89 15:43, 1 April 2014 (UTC)

Magic, eh? I shall have a talk with the Grand Inquisitor about this, Mr. Wikitiki. — Ungoliant (falai) 16:12, 1 April 2014 (UTC)
Shall we replace the Main Page with it? Or are we only allowed one April Fool joke per year? SemperBlotto (talk) 16:28, 1 April 2014 (UTC)
This template's code is really hard to read. I think it should be converted to Lua. Keφr 16:30, 1 April 2014 (UTC)
Can it replace templates? If so I think we can save a lot of server space. Equinox 20:59, 1 April 2014 (UTC)

Edit notice for talk pages

I created an edit notice (MediaWiki:Editnotice-1) that is displayed whenever someone edits a mainspace talk page. That may reduce the number of people who post there expecting to get an answer, like on Wikipedia where every page usually has at least a few people watching it. —CodeCat 13:59, 3 April 2014 (UTC)

I would suggest pointing people to WT:Information desk instead, if not for the fact that the page is quite damn large (~550K of wiki markup), eclipsed only by WT:RFD (~600K). Time to archive, I guess…
Also, do we have any standardised styling for edit notices? Keφr 14:52, 3 April 2014 (UTC)

Voting rules for de-privving.

Normally, when closing a vote, we have a strong bias in favor of the status quo, and require much more than a majority in order to change it.

However, I think that, when voting to revoke a privilege from a user, the default presumption should be that it should be removed, rather than that it should be kept; a user should only have a privilege if there is still clear support for his/her having it.

There are a lot of details to sort out, but to start with: do y'all agree with this general principle?

RuakhTALK 05:54, 5 April 2014 (UTC)

Yes. — Ungoliant (falai) 05:57, 5 April 2014 (UTC)
I disagree. The default should always be the status quo. For example, if no one votes at all, how can that mean the vote passed? --WikiTiki89 06:08, 5 April 2014 (UTC)
I can't imagine how that would happen. Did the vote specify a minuscule time-range that didn't give anyone a chance to vote? Did it come in the middle of fifty identical votes in the hopes that one would be missed? Was the vote never actually added to Wiktionary:Votes? In any situation like that, we wouldn't consider the vote to be valid, and certainly no bureaucrat/steward/whatever would implement its supposed outcome. (But — yes, we'd still need the general concept of "quorum". We shouldn't desysop someone if there were only three voters. And in the special case of CheckUsers, where the original vote needs 25 "support" voters in order to grant the privilege, we obviously shouldn't require 25 "oppose" voters to scuttle a de-privving vote. This is the sort of thing I had in mind when I said there were details to sort out.) —RuakhTALK 17:56, 5 April 2014 (UTC)
Well shouldn't the case of 0-0, be equivalent to no consensus? Thus shouldn't no consensus mean to preserve the status quo? --WikiTiki89 18:14, 5 April 2014 (UTC)
0-0 doesn't mean "no consensus", it means "no information on whether there's consensus". It's equivalent to there not having been a vote. (Unless a whole bunch of people abstained. In that case, it's true that there's no consensus for the editor to keep the priv — not a single voter was willing to step forward and say "this person should keep this priv" — and we should de-priv. But again, I can't imagine how that could happen.) —RuakhTALK 18:57, 5 April 2014 (UTC)
(After edit conflict,) I agree with Ruakh. 0-0 should not be equated with "no consensus". If there is a vote and some people express support, while a roughly equal number of others express opposition, then the community can be said to have reached no consensus on an issue; one effect of this is that it would be wise not to hold another vote on the same subject in the very near future, because it can be presumed that the second vote would reach the same tie or near-tie. In contrast, if there is a vote and no-one participates, then there wasn't really a vote at all; one effect of that is that holding another vote on the same subject in the very near future would be acceptable and possibly even desirable, because it could be hoped that the second vote might be participated-in and thus reach a result. - -sche (discuss) 19:04, 5 April 2014 (UTC)
Let me clarify a bit. I'm not talking about a situation in which no one knows about the vote, but a situation in which every voter chose not to vote (a.k.a. abstained). --WikiTiki89 19:27, 5 April 2014 (UTC)
I tend to agree. At least, keeping a privilige of a user who already has it should require a plain majority at reconfirmation, so there would be no status quo bias at reconfirmation. --Dan Polansky (talk) 09:37, 5 April 2014 (UTC)
Plain majority makes sense. With other votes we don't have a very clear specific cutoff, whereas for deprivving votes, given the rather negative and personal nature of the outcome, it would suck to have to decide which way to close a borderline vote. —RuakhTALK 17:56, 5 April 2014 (UTC)
I don't think a bare majority is appropriate for removing a user's priveleges. The reason for someone's priveleges being removed can be arbitrary and unrelated to Wiktionary's core objectives. Most actual cases of removing priveleges seem to attract near unanimous support. The cases that do not seem to actually be situations that do not obviously warrant action. Sometimes subsequent behavior makes the situation clearer. DCDuring TALK 11:15, 5 April 2014 (UTC)
I do.​—msh210 (talk) 06:01, 6 April 2014 (UTC)

Keeping common misspellings

I have created Wiktionary:Votes/pl-2014-04/Keeping common misspellings.

Let us postpone the vote as much as discussion needs. --Dan Polansky (talk) 09:34, 5 April 2014 (UTC)

CFI and removal of minor reference

I want to remove from Wiktionary:CFI#Spellings the reference currently numbered 11, intending to refer to Wiktionary:Beer_parlour/2012/April#Deleting_.22his.22_in_CFI. The reference involved a minor change in wording and thus should not be referenced from CFI, IMHO. Now, it looks like the sentence "A person defending a disputed spelling should be prepared to provide references for support" is supported by the reference, which is not the case. Any objections? --Dan Polansky (talk) 10:26, 5 April 2014 (UTC)

I have no opinion on whether the reference is needed, but I did take the liberty of correcting the link in the meantime (I hope such a change doesn't require a discussion). --WikiTiki89 19:33, 5 April 2014 (UTC)
I agree that the reference should be removed. —RuakhTALK 20:04, 5 April 2014 (UTC)

New layout

So, I was curious, do we kind of wait out what the results of vote on en.wiki will be? Neitrāls vārds (talk) 21:24, 5 April 2014 (UTC)

I got used to the new font after a couple of hours and don't have a problem with it. Of course, I didn't have a problem with the old font, either... I don't really care which font the site uses, so if you or other users do care, then suit yourselves. But note that you could just suit yourselves individually by changing your personal css. - -sche (discuss) 23:33, 5 April 2014 (UTC)
There are some problems with the new font. It cannot display many characters used in transliterations/proto-languages causing them to be displayed in a different font, making it look awkward. For example: *ʾarṣ́, *ʾarṣ́. And certain combining characters don't display properly at all: p̄. --WikiTiki89 23:42, 5 April 2014 (UTC)
I actually have a weird case of Schadenfreude in that this new font seems much more detrimental to Wikipedia than to "us" (since wikt. by design is meant to be very concise.) However, just because it's not wreaking havoc here as much as it is on Wikipedia doesn't mean that I want to be complacent about this.
Defining my own style is something I'm not OK with. See, for example, izā#Declension, that absolutely unnecessary heightening of the instrumental row is something I want to keep tabs on. Even before this I always double checked whenever I started a new batch in AWB how does it look in browser. I want to always see what a non-logged in user would see. Not to mention that globally changing the layout for users to then (en masse) change theirs locally reminds me of the parable of one group of workers digging trenches for the trenches to then be filled up by the other group. Counter-intuitive.
I don't think it's the end of the world but I would like to see the Arial layout back. Neitrāls vārds (talk) 00:11, 6 April 2014 (UTC)
The first column in the izā declension table has an inline style width:158px. This is bad on several counts, and the reason it wraps inappropriately. Michael Z. 2014-04-07 00:07 z
I do not think that having a fixed, non-inherited, non-percentage width for a column whose contents are never intended to be changed is a bad idea.
You are right about the (non-CSS) border= property (might be some "historical mistake" as I took the style in its entirety from another language as I thought it's perfect.) In diff it was, however, missing the already mentioned border= and I removed the styling information that didn't have a property (I don't know if that's some new shorthand) as it seems to already be getting that styling from MediaWiki:Common.css. Neitrāls vārds (talk) 01:54, 7 April 2014 (UTC)
User:Neitrāls vārds, on the web, the contents always change. Every machine has different fonts with different metrics, some of our style sheets change the font size, wiki content gets reused in dozens of websites and apps, etc. And of course, each term’s inflections are of different lengths. If you set column width in absolute units (px), the only thing you can be sure of is that it will fail somewhere. Using relative units like ems will at least resize the column along with the font. Just setting padding and letting the column fit the content will always avoid wrapping unless constrained by the container element. Michael Z. 2014-04-09 01:26 z
Em's are a good idea. Neitrāls vārds (talk) 02:50, 10 April 2014 (UTC)
User:Wikitiki89: looks quite normal to me. Does it use Liberation or Arimo on your machine? The latter is pretty much the same as the former, but with more characters. Combining character behaviour is also improved — though for example а̄ renders quite poorly for me. Keφr 05:50, 6 April 2014 (UTC)
I have been using Liberation for regular text before, so little change for me here. Personally, I can stand headers in a light serif font (Linux Libertine for me) — they look kind of awkward, but are bearable. I also dislike having a larger font and line height — I was quite used to what I had before; still, I could just live with it, zoom it out or revert it in my user stylesheet. But I want to put a fork in each eyesocket of the person who decided to put  div#content { color: #252525; } into the stylesheet, so that they have some idea how much pain it is to read long passages of text written in a dark-grayish font. Seriously, I went to MediaWiki.org to read something, and suddenly started asking myself "why is everything so blurry and hard to read?"
I switched to Monobook just to have sane typography back (MediaWiki:Gadget-VectorClassic.css can restore it too, but it flickers sometimes). Speaking of Monobook, I noticed some custom header styles that we never bothered to copy to MediaWiki:Vector.css even though Vector default header styles used to be identical to Monobook's. Do we need these? I would rather get rid of them. Keφr 05:50, 6 April 2014 (UTC)

A comparably minor thing (but would require several thousand edits, although I'm not even sure what type of edits, commenting out linebreaks?) Entries with pic dic have either 2 (or 3?) (for example, cauna) or 1 (Bosnija un Hercegovina) linebreaks before the first L3 heading. I edited many of these recently and I don't remember anything of the type, the L3 used to be right after the L2 horizontal ruler.

Also, if you're bored, here's a vision of what's to come: Meet Kim, she is a bird lover. :D Neitrāls vārds (talk) 02:11, 7 April 2014 (UTC)

Today in "How this phony, insincere accessibility update screws up layout": one of my reference templates wraps although with a layout that doesn't look like an ugly Wordpress blog it would stay on one line on just about any setup. The quotations thing is supposed to be one sentence per line 1) book its from, 2) the sentence 3) its English translation, the trailing 7 or so ISBN digits look just retarded it's short enough as it is am I supposed to delete the ISBN? And all of this so that some Portlandia hipsters could pretend to be busy to their employer later to be hit by a tsunami wave of "bugs" that they could then pretend to be "fixing" all the while remaining on WMF's payroll! And literally no one gives a fuck? Neitrāls vārds (talk) 23:13, 9 April 2014 (UTC)
Somebody needs a wiki holiday. Michael Z. 2014-04-10 00:19 z
I just hate being right on stuff like this. I've thought about what a miracle it is that wiki projects are "bland" and "neutral" enough to accommodate just about any culture on this planet, and by "culture" I mean actual opinions even though, I'm sure, Bay Area hipsters vision of "the outside world" probably doesn't involve anything other than dancing and singing kumbaya for days on end and growing organic, free-range, fair trade coffee beans. Like, wtf is this supposed to mean: File:How to Make Wikipedia Better - Wikimania 2013 - 61.jpg are they even aware of the fact that many projects ban (often) useless personal userboxes, or that personally I couldn't care less whether another editor has a vagina or a penis and testicles, moreover, that there is this place called "Europe" where there are countries where literally no one gives a fuck whether, for example, the President or the Prime minister is female (like, you would need to search with fire to find even a hint at a misogynist joke.)
I will go on a limb here and say that all of the above is nothing short of irrelevant, people do retarded things all the time (I know I do on occasion ;)) the point is that you are supposed to contain the damage! One user has on their page a nifty explanation of the stages of the development of a web project and the first one is "users? we don't need no stinking users!" So, I guess, that's that. The "share this on Twitter" buttons and even more pushing for "visual" editor (the useful tool that brings us such quality IP edits as replacing the whole page text with "penis" among many others! :D) is not yet here but I think I already know what the response to that is going to be. Neitrāls vārds (talk) 02:41, 10 April 2014 (UTC)
Maybe this will calm you down. --WikiTiki89 02:49, 10 April 2014 (UTC)
  • @Neitrāls vārds: I've been using <div style="float:right;">...</div> on the JA entries I work on, as a container for such right-margin material that suppresses such odd linebreak behavior. See カリフラワー (karifurawā, cauliflower), 寿司 (sushi) as examples.
(I also took the liberty of implementing similar changes to cauna and Bosnija un Hercegovina, but feel free to back those out if you don't like them.)
‑‑ Eiríkr Útlendi │ Tala við mig 00:57, 10 April 2014 (UTC)
That looks better (and is probably backward compatible). I do wonder why two float:right's fix this because at least {{slim-wikipedia}} already has float right at last a class whose name would imply that. Now, I guess, that extra float:right needs to be stuck into both templates... Neitrāls vārds (talk) 02:50, 10 April 2014 (UTC)
  • FWIW, I think it works because the style="float:right;" is on a containing block element -- I don't think it would work so well with style="float:right;" just on each individual element. (In fact, I think the individual elements already all, or mostly, have that.) ‑‑ Eiríkr Útlendi │ Tala við mig 04:32, 10 April 2014 (UTC)

Listing Transwiki pages at RFD or RFDO

When {{rfd}} is added to a non-main-namespace page such as a category, the links in the text "nominated for deletion" and "(+)" take users to WT:RFDO, which has a header stating that it is indeed the place for requesting the deleting of non-main-namespace pages. But when the template is added to a Transwiki page like Transwiki:ISO 639:a, the links point to WT:RFD, which claims in its header to be for main-namespace pages only. Judging by {{rfd}}'s code, this behavior is intentional. Why? Should the behavior be changed (so that it points Transwiki pages to RFDO), or should Wiktionary:Request pages be updated to note that certain non-main namespaces are discussed at RFD? - -sche (discuss) 23:51, 5 April 2014 (UTC)

Theoretically, isn't everything in Transwiki supposed to be eventually deleted? --WikiTiki89 00:14, 6 April 2014 (UTC)
It's supposed to be deleted or moved to another namespace. But sometimes it's necessary to discuss which of those things to do to a particular page (e.g. Transwiki:ISO 639:a, which I'm not going to move to the Appendix namespace because I think it doesn't belong there, but which I don't want to delete without seeking others' views). Should that discussion be on RFD or RFDO? - -sche (discuss) 01:38, 6 April 2014 (UTC)
RFDO, if only for the reason that the other one (confusing pun intended) is overloaded enough already — RFD is ~600K characters long, while RFDO is only ~120K. Keφr 04:40, 6 April 2014 (UTC)
I effected that setting (that Transwiki: {{rfd}} defaults to RFDO), for the reason stated in my edit summary there: that most Transwiki pages are entries as opposed to appendix pages, indices, templates, or what-have-you. I still think it's wisest (for the same reason), but don't feel very strongly about it.​—msh210 (talk) 05:54, 6 April 2014 (UTC)
Hm? {{rfd}} in the Transwiki namespace currently defaults to RFD, not RFDO, see e.g. Transwiki:ISO 639:a — I put the discussion on RFDO manually; the template links to RFD. I had interpreted the code you refer to as the bit that was responsible for that behaviour (putting Transwiki pages on RFD instead of RFDO). If the code was designed to do the opposite, then the plot thickens... - -sche (discuss) 06:51, 6 April 2014 (UTC)
From the rest of msh210's comment, I think it's clear that he meant to write "RFD" rather than "RFDO" in his parenthetical aside. —RuakhTALK 07:13, 6 April 2014 (UTC)
So the thick does not plotten. --kc_kennylau (talk) 07:35, 6 April 2014 (UTC)
Yes: I meant RFD. Sorry for the confusion.​—msh210 (talk) 07:49, 6 April 2014 (UTC)

Wiktionary:About Serbo-Croatian

The top of the page said that there is no consensus for merging the languages. But that's clearly wrong, the current status quo of merging them definitely has consensus. So that should be changed. The page should probably explain the merging more, too, including our reasons for it and the discussions we had about it. —CodeCat 12:15, 6 April 2014 (UTC)

It should probably link to the vote as well. --WikiTiki89 17:34, 6 April 2014 (UTC)
...although the vote actually didn't pass by the criteria used at the time. (We just merged them anyway.) - -sche (discuss) 18:18, 6 April 2014 (UTC)
Hmm... Should we have another vote to make it official? --WikiTiki89 18:56, 6 April 2014 (UTC)
Personally, I would just let sleeping dogs lie. - -sche (discuss) 18:58, 6 April 2014 (UTC)
I would generally agree, but this particular sleeping dog's Norwegian cousin is already awake. If we went back on the SH vote, what's to prevent us from going back on the Norwegian vote? --WikiTiki89 19:02, 6 April 2014 (UTC)
I've had a go at rewriting the page. If anyone wants to add to it a directory of our many prior discussions about merging SC, WT:LANGTREAT has one already that could be copied. - -sche (discuss) 18:58, 6 April 2014 (UTC)

Sort-keys in Category:CJKV radicals.

Since each page-name in Category:CJKV radicals consists of a single character, each one has its own header, which is always identical to the page-name. The overall effect is massive redundancy.

Sarang (talkcontribs) therefore proposes that we categorize these pages in this category using a single space as the sort-key, so that the category has no headers. (That is: instead of [[Category:CJKV radicals]], the pages would have [[Category:CJKV radicals| ]].) He has asked me to write a bot that does this.

Does anyone object to my doing so?

RuakhTALK 18:32, 7 April 2014 (UTC)

Good idea. --Anatoli (обсудить/вклад) 20:57, 7 April 2014 (UTC)
Now underway. —RuakhTALK 17:34, 13 April 2014 (UTC)
Now complete. —RuakhTALK 17:33, 14 April 2014 (UTC)

Thank you. Lot of work for you, רוח, esp. the explaining :-), but worth the effort. sarang사랑 11:10, 16 April 2014 (UTC)


Researchers have discovered an extremely critical crypto bug in the cryptographic software library an estimated two-thirds of Web servers use to identify themselves to end users and prevent the eavesdropping of passwords, banking credentials, and other sensitive data. —Stephen (Talk) 05:32, 8 April 2014 (UTC)

So? Keφr 11:06, 9 April 2014 (UTC)

Redlinked translations in search results

In the search results for redlinks, I like the way that it shows when a foreign word is a translation of an existing word in English. For example, the Finnish word selostus is not yet entered, but the search results say, near the top, that it is a translation of narrative, report, and account. Thank you to whoever did that because it really speeds things up for me. ~ heyzeuss 08:08, 8 April 2014 (UTC)

Should we use a serif font for IPA?

IPA is full of characters that may be hard to read for users, even those who don't have reading disabilities. It has been shown (I think) that serifs help readers "anchor on" to characters more readily, and aid in reading. So I wonder if we should be using a serif font for our IPA transcriptions. Wiktionary:IPA already uses serif fonts, and I think it looks clearer there. —CodeCat 21:31, 8 April 2014 (UTC)

  • Huh, must be a skin or CSS issue -- I still see sans serif. Using the default Vector skin with no custom CSS. ‑‑ Eiríkr Útlendi │ Tala við mig 21:34, 8 April 2014 (UTC)
    • The page specifies the font to use, Lucida Sans Unicode. Presumably that's the font you're seeing, and I don't have it so it's falling back to the default for me. I kind of like the default though. —CodeCat 21:56, 8 April 2014 (UTC)
      • Aha. Yep, I do have the Lucida fonts installed. Any idea what your default fallback font is? I'd be curious to see what the page looks like with that. ‑‑ Eiríkr Útlendi │ Tala við mig 22:01, 8 April 2014 (UTC)
  • I think that as long as we make sure that the font supports all the necessary IPA characters and they are all distinguishable, it doesn't matter much whether it has serifs or not. But serifs in general don't work well on computer screens at smaller sizes. --WikiTiki89 00:14, 9 April 2014 (UTC)
    • I think IPA is one case where we should definitely use a larger font. —CodeCat 00:23, 9 April 2014 (UTC)
User:Wikitiki89, that rule of thumb is from the olden days.[4] Now that we have HD monitors and high-density mobile displays, font hinting is less important or sometimes absent, and traditional font design more important. Conventional wisdom is that serif fonts are more readable in general, but some studies disagree.
User:CodeCat, There are 54 cases where we already use a larger font (in Common.css). Maybe we should just use a larger font. Michael Z. 2014-04-09 01:09 z
You'd be surprised how many low resolution monitors are out there. --WikiTiki89 01:18, 9 April 2014 (UTC)
I think our current font size is fine for normal Latin alphabet text. Increasing it on a case-by-case basis is fine, that doesn't mean we have to make it bigger altogether. —CodeCat 01:21, 9 April 2014 (UTC)
“Normal” Latin represents a minority of our Latin, much less other scripts, IPA, and transliterations. Basically, we are making 90% of the site’s text bigger in dribs and drabs. May as well make all text bigger and about six languages smaller then.
Varying font sizes leads to inconsistent line heights, font sizes, and harms readability. And looks like pants. Michael Z. 2014-04-10 00:23 z

Etymologies for backward spellings

We have standard templates for prefix, suffix, etc. Should we have one for words formed by spelling another word backwards? They are quite rare, but I've seen some electrical units of measure (yrneh, daraf, mho), the female name Nevaeh, the slang words riah and tink, and the fictional Erewhon (we have Erewhonian). Equinox 15:16, 9 April 2014 (UTC)

I don't think this is common enough to need a template. All templates do is save time when writing the same thing over and over. --WikiTiki89 15:22, 9 April 2014 (UTC)
They also create categories, so it would allow all such words to be viewed as a group. Equinox 15:23, 9 April 2014 (UTC)
They don't magically "create" categories. They just add the category to the page automatically, which can be done manually as well:
Backwards spelling of {{term/t|en|oof}}.[[Category:English backwards spellings]]
--WikiTiki89 15:31, 9 April 2014 (UTC)
We do etymologies for them as you will see in yob and yobbo. —Stephen (Talk) 04:49, 10 April 2014 (UTC)
  • I think there may be enough to justify a template. There may also be enough for etymology based on anagrams of other words Purplebackpack89 (Notes Taken) (Locker) 03:35, 14 April 2014 (UTC)
    • Hardly. I'd like to see a list of candidates first. Also, backwards spellings are a subset of anagrams. —CodeCat 03:37, 14 April 2014 (UTC)

Why are Latin verbs defined in first person?

...rather than the infinitive? I know this is standard but I have no idea about the reasoning behind it. Equinox 20:11, 9 April 2014 (UTC)

You could also ask why do we use the infinitive for most other languages? --WikiTiki89 21:14, 9 April 2014 (UTC)
In English you can generally produce the person inflections from the infinitive (walk+s, walk+ing, walk+ed), so the infinitive "feels like" the base form. Is it totally arbitrary? Equinox 21:16, 9 April 2014 (UTC)
The infinitive is not the base form in most languages. For many Indo-European languages, the base form is the first- or third-person singular, or the second-person singular imperative. I think what matters for Latin is that it's the first of the four principal parts of a Latin verb. —CodeCat 21:18, 9 April 2014 (UTC)
But what makes it the "first" of the four principal parts? --WikiTiki89 21:26, 9 April 2014 (UTC)
Tradition, probably. —CodeCat 21:37, 9 April 2014 (UTC)
No, tradition explains why it still is, but not why it originally was. --WikiTiki89 21:39, 9 April 2014 (UTC)
Freund, Andrew, Lewis, and Short did Latin verbs that way. I don't know what earlier lexicographers of Latin did. Perhaps there are fewer homonyms in Latin if one uses the first person as the lemma. DCDuring TALK 22:18, 9 April 2014 (UTC)
The Romans themselves had some grammatical traditions, too. And presumably medieval scribes as well. What did they use? —CodeCat 22:23, 9 April 2014 (UTC)
Donatus's Ars Minor (4th century) seems to view the first person singular present as the lemma. —Mr. Granger (talkcontribs) 00:21, 11 April 2014 (UTC)
In fact, Lewis, and Short defined Latin verbs using English infinitive. And Equinox asks "Why are Latin verbs defined in first person". --Dan Polansky (talk) 08:13, 12 April 2014 (UTC)

Here is a past discussion on this:

--Dan Polansky (talk) 08:30, 12 April 2014 (UTC)

Personally I think defining it using the infinitive would be more useful, for etymologies in particular, but also because it matches how we categorise verbs by conjugation (Latin conjugations follow the infinitive). I don't think it's necessary enough to change what we have, though. —CodeCat 12:57, 12 April 2014 (UTC)

Previous and Next in ordinalbox and cardinalbox

There doesn't seem to be a standard for choosing the next number in an ordinalbox.

Most languages I've poked at seem to just give up once the question gets tricky.

For example, if the next number after 30 is 31, and there's not a separate word for 31, you get a dead link and the utility of the ordinalbox is greatly diminished.

My proposal is that these boxes always point to the previous and next numbers for which there is a word.

English would have all the number through 20, then 30, 40, ... 90. Then do we stop, or link next to "hundred" which is not quite a cardinal number?

In Latin, one would count ..., 11, 12, 18, 19, 20, 28, 29 30, 38, 39, 40, ...

I will admit it's not a great solution, but I can't find a better one.

Any suggestions? Avjewe (talk) 17:17, 10 April 2014 (UTC)

Can we delete the script code templates?

Many of these have already been deleted, as they weren't used by anything. We have Lua-based alternatives now like {{lang}}, and even the non-Lua-based {{script helper}}, which all script code templates already invoke. I've been working on converting out the remainder, but I want to make sure everyone is ok with it before I start deleting (formerly) widely used templates like {{Latn}}. —CodeCat 18:54, 11 April 2014 (UTC)

Maybe, but you've been breaking things in the process of orphaning them because you apparently didn't think everything through. That's not a good way to do things. Chuck Entz (talk) 23:13, 12 April 2014 (UTC)
What did I break? —CodeCat 23:43, 12 April 2014 (UTC)
Well, for one thing you ever so slightly broke {{grc-conj-aorist-blank-am}}. Mind you, it was an easy fix, and I definitely still support abandoning the old templates and converting to Lua, but you asked.  :-) -Atelaes λάλει ἐμοί 00:01, 13 April 2014 (UTC)
I see what you did, but I'm not sure how that fixed anything. Or rather, I don't see what was wrong with my original change. —CodeCat 00:09, 13 April 2014 (UTC)
Well, to be completely honest, I don't fully understand what happened myself, and perhaps I should have investigated the matter further, but I didn't. If you preview οἰδέω (oideō) with your version, you'll notice that, since transliterations weren't suppressed, they were produced, and produced rather oddly. The only thing I can think of is that, given multiple linked targets as the primary input into {{term/t}}, the auto-transliterate engine had some sort of spasm and included line breaks in its output for some reason. I imagine that simply suppressing the transliteration would have fixed it equally well, but I modeled my version after one of the similar templates. -Atelaes λάλει ἐμοί 00:19, 13 April 2014 (UTC)
I fixed it. The problem was really in {{grc-conj-aorist-1}}, which had a long string of unnamed parameters, but there were no comment tags around the line breaks. The wiki software doesn't strip whitespace (including line breaks) in unnamed parameters, only in named parameters. I added comment tags around the line breaks, and I also added in the names of the parameters. The latter wasn't necessary to fix the problem, but it helps with future maintainability, because I had a hard time figuring out just which of those parameters was number 105! —CodeCat 00:34, 13 April 2014 (UTC)
I've reverted your latest. The switch to commented-separated numbered parameters was a good one. It will definitely make understanding and maintaining the template easier. However, your switch to {{term/t}} is still not workable. For starters, it still breaks entries, ἄγω (agō) for example, possibly because we still have line-breaks in the other templates which feed into the blanks. Additionally, I don't want transliterations in there; we're already jamming a lot of info into a fairly compact space, and the transliterations just make it that much trickier to parse. Now, perhaps {{term/t}} with suppressed transliteration might work, but I suspect we'd have to do it for all the templates, so they'd look uniform. I don't feel like doing that kind of overhaul at the moment. -Atelaes λάλει ἐμοί 02:04, 13 April 2014 (UTC)

Internet Explorer 6 is dead

Microsoft stopped supporting MSIE 6 on Tuesday. I can’t believe I forgot to stop and dance a little jig on its fresh grave.

Microsoft is celebrating with a little treatMichael Z. 2014-04-11 22:43 z

Windows 95 was better... --WikiTiki89 02:59, 12 April 2014 (UTC)
  • And I guess the implied question is whether we should drop support for it too? I would rather take a look at market share numbers before answering that. And even then I would be hesitant to answer "yes". We might have some users stuck with it. Keφr 04:49, 12 April 2014 (UTC)
    • We already don't support it in a lot of areas. We should drop support for it completely. It may be harsh to those few who still use it, but... IE 6 is harsh on anyone no matter who you are. And of course, anyone who still uses it nowadays should be used to things breaking, so it's not going to make much of a difference if we add Wiktionary to that long list. —CodeCat 13:00, 12 April 2014 (UTC)

Edit request of protected page

Could an admin please deal with the edit request at Template talk:ko-inline? I'm not sure what templates I should be using to call attention from admins so I thought I'd just post it here. Thanks. Wyang (talk) 22:01, 12 April 2014 (UTC)

Thanks, Anatoli. Wyang (talk) 00:17, 14 April 2014 (UTC)
No worries. --Anatoli (обсудить/вклад) 00:27, 14 April 2014 (UTC)

Automated Latin pronunciation

Please check for any bugs while I will start to automate all the Latin pronunciation using the module that I just built yesterday. (First module ever built woohoo :D) --kc_kennylau (talk) 03:06, 13 April 2014 (UTC)

Looks good, but is there anything in place to deal with majuscules? I'm not seeing any lowercasing. Also, it's utterly unfair that your language is almost IPA without any conversion. That is all. -Atelaes λάλει ἐμοί 03:32, 13 April 2014 (UTC)
Does it give the lax pronunciation of the short vowels? Short e, i, o, u should be /ɛ, ɪ, ɔ, ʊ/. Does it treat vowel + nasal sequences as nasalized vowels before fricatives and at the end of a word? dēfensum should be /deːˈfẽːsʊ̃/. Does it geminate intervocalic nonsyllabic i? māior should be /ˈmajjɔr/ (or /ˈmaɪjɔr/) and cuius should be /ˈkʊjjʊs/ (or /ˈkʊɪjʊs/). Does it generate Ecclesiastical pronunciations or only Classical pronunciations? —Aɴɢʀ (talk) 05:41, 13 April 2014 (UTC)
We don't actually know if short vowels were indeed lax, and as far as I know, it has been our practice on Wiktionary to only distinguish their length and not their quality. --WikiTiki89 14:24, 13 April 2014 (UTC)
The evidence from historical phonology is clear that they must have been; see my comments below dated 11:52, 13 April 2014. Since we indicate both length and quality differences like /iː/ vs. /ɪ/ in other languages such as English (RP) and German, there's no reason not to do so for Latin. —Aɴɢʀ (talk) 15:42, 13 April 2014 (UTC)
Now I'm hurt that you weren't utterly dissatisfied with {{grc-pron}}. -Atelaes λάλει ἐμοί 06:15, 13 April 2014 (UTC)
I wasn't?Aɴɢʀ (talk) 07:15, 13 April 2014 (UTC)
You weren't. -Atelaes λάλει ἐμοί 07:39, 13 April 2014 (UTC)
Oh, the new one! I forgot all about that. I mean I really forgot all about it. I recently added pronunciation info to ἰτέα using the old template instead of the new one; I'll go change it now and see what I think. —Aɴɢʀ (talk) 07:56, 13 April 2014 (UTC)
Thanks for all this feedback. I hadn't started the automation yet since I hadn't got time, but I would like to ask one question: would the vowel still be nasalized if it's long? --kc_kennylau (talk) 10:57, 13 April 2014 (UTC) And I don't see any other page turning e,i,o,u to the IPA pronunciation, so should I break the convention? --kc_kennylau (talk) 11:01, 13 April 2014 (UTC) Why was the nasalized e long in your example? --kc_kennylau (talk) 11:08, 13 April 2014 (UTC)
Nasalized vowels before s and f are always long; our page says dēfensum but it should actually be dēfēnsum, and monstrum should actually say mōnstrum in the headword line. We don't seem to be very consistent about that. And I see that Appendix:Latin pronunciation doesn't transcribe the short vowels as lax, but w:Latin spelling and pronunciation#Vowels does (though inconsistently). A lot of people don't bother showing the distinction since traditional Latin grammar makes the length distinction primary, but the Vulgar Latin mergers of ĭ with ē and of ŭ with ō only make sense if ĭ and ŭ were /ɪ/ and /ʊ/ rather than /i/ and /u/, and the fact that ĕ and ŏ avoid that merger only makes sense if they were /ɛ/ and /ɔ/ rather than /e/ and /o/. We know the nasalized vowels were long because of the way they developed in Romance: īnsula and mēnsa develop exactly as if they were īsula and mēsa and not as if they were ĭsula and mĕsa. —Aɴɢʀ (talk) 11:52, 13 April 2014 (UTC)
You can't use Vulgar Latin as evidence of Classical Latin vowel quality. You have proven that at some point before Vulgar Latin, ĭ and ŭ were pronounced /ɪ/ and /ʊ/, but this point may have been after Classical Latin. We also don't know when the change from /ens/ to /ẽːs/ took place (or maybe we do, but you have not given any evidence for it). From what I have read, the fact that in Classical Latin the -um ending did not have an /m/ (or at least had a very light one) and maybe had a nasalized vowel, is only known because of the meter of such words in Classical poetry (i.e. the syllable merges with a following vowel). Using Vulgar Latin alone as evidence would not have been enough. --WikiTiki89 15:58, 13 April 2014 (UTC)
There isn't a time difference between Classical Latin and Vulgar Latin. When Caesar and Cicero were sitting around chatting with their friends over a glass of wine, they were speaking Vulgar Latin and were certainly pronouncing their short vowels lax and nasalizing their vowels before their fricatives. —Aɴɢʀ (talk) 16:21, 13 April 2014 (UTC)
Sorry, I was making the mistake Wikipedia mentions of confusing Vulgar Latin with Late Latin/Proto-Romance. w:Vulgar Latin#Sources lists four types of sources of information about Vulgar Latin, the first and third of which don't really apply to Classical Latin. I was under the impression that we know about the mergers of ĭ with ē and ŭ with ō from reconstructing Proto-Romance, but I may be wrong, which others of the four types of sources Wikipedia lists has evidence of this? --WikiTiki89 16:34, 13 April 2014 (UTC)
Off the top of my head, I don't know, but I'm sure many of the sources listed in the Wikipedia article could tell you. Incidentally, the first source (solecisms) does apply to Classical-era Latin; for example, the graffiti found at Pompeii is full of spelling mistakes that tell us a lot about how the colloquial language was pronounced, and AD 79 isn't "Late Latin", though it's somewhat later than C & C, of course. I don't know offhand whether this particular pronunciation is one of the things those inscriptions tells us, though. —Aɴɢʀ (talk) 16:50, 13 April 2014 (UTC)
Well until such evidence turns up on Wiktionary, I think we should continue transcribing short vowels as /a e i o u/. --WikiTiki89 17:00, 13 April 2014 (UTC)
I think the Romans themselves also marked the vowel length sometimes, so that's another way we can tell. And in case it wasn't clear, what appears to us as lengthening before nasal + fricative is actually the nasalisation process itself occurring. So, when the written form transitions from -ens- to -ēns-, that's actually an indication that the spoken form changed from [ens] to [ẽːs]. The length is compensatory for the loss of the nasal. —CodeCat 13:01, 13 April 2014 (UTC)
So the current pronunciation transcription in insula and mensa are wrong? --kc_kennylau (talk) 13:32, 13 April 2014 (UTC)
It depends on what you call "wrong". It's correct in a purely phonemic way, as the nasalisation was just allophonic. In the same way, the difference between [e] and [ɛ] was allophonic; the former appeared when long, the latter when short. And final -m also caused nasalisation, but also at an allophonic level. So it depends on how much phonetic detail we want to show. Most of our current pronunciations are purely phonemic and show no nasalisation or differences in vowel height. —CodeCat 13:52, 13 April 2014 (UTC)
As for me, I fancy detailed phonetic transcriptions. There are more symbols I don't understand, so it's funnier! (btw: I've added the macrons in monstrum. I don't know if it's possible to get a list of all entries in which there are lacking?) --Fsojic (talk) 14:04, 13 April 2014 (UTC)
For what it's worth, I prefer phonemic transcriptions (for dead languages like Latin, I mean). I guess we could always give both though, as in IPA(key): /ˈho.moː/, ['hɔ.moː]. —Mr. Granger (talkcontribs) 15:43, 13 April 2014 (UTC)
I prefer a balance between broad and narrow transcriptions. To use an English example, a word like keeping ought IMO to be transcribed /ˈkiːpɪŋ/, neither the purely phonemic /ˈkiːpinɡ/ nor the narrowly phonetic [ˈkʰiːpɪ̃ŋ]. —Aɴɢʀ (talk) 15:49, 13 April 2014 (UTC)
How do you decide which traits you want to display? I mean, you could as well have /ˈkʰiːpiŋ/ as a middle ground option. --Fsojic (talk) 15:58, 13 April 2014 (UTC)
Good question. For languages like English and German where there's a strong history of phonetic transcriptions in dictionaries, it's easy to follow the example of others (though this changes over time -- nowadays most dictionaries and phoneticians transcribe English heat~hit as /hiːt/~/hɪt/ and German biete~bitte as /biːtə/~/bɪtə/ but 75 years ago or so that would have been /hiːt/~/hit/ and /biːtə/~/bitə/, but the actual pronunciations haven't changed). Latin unfortunately doesn't have that history, so we have to make it up as we go along, discussing amongst ourselves what we think the most important aspects to transliterate are. Which is what we're doing now :) ! —Aɴɢʀ (talk) 16:21, 13 April 2014 (UTC)
Right, but I don't see the point of choosing this middle ground, partially accurate option, when we could just have a broad, phonemic transcription, and a narrow, phonetic one. If you include meaningless phonetic variations in a phonemic transcription, isn't it wrong? And in the case of English or German, why should we follow tradition, if our predecessors didn't motivate their choices? (I suppose they did, but I'm curious of knowing a bit more about it) --Fsojic (talk) 15:45, 15 April 2014 (UTC)
So what are we going to display? I'll start the automation as soon as I have time, and when we have a consensus, I will alter the code accordingly. --kc_kennylau (talk) 12:35, 15 April 2014 (UTC)
How would we know if an "i" in a word is a consonant or a vowel? Like I want to know the pronunciation of abiungo without looking at its etymology and stuff. --kc_kennylau (talk)
I don't think there is a way to tell. And I think that the current pronunciation of abiungo may be wrong, because it doesn't have /j/ like iungo does. —CodeCat 16:14, 15 April 2014 (UTC)
How can I know whether or not when ab- and iungo combines, the "i" becomes a vowel? --kc_kennylau (talk) 16:18, 15 April 2014 (UTC)
It doesn't. abiungō is /ab.ˈjʊŋ.ɡoː/, while abiēs is /ˈa.bi.eːs/. You have to know the morphology of the words: abiungō has a prefix ab- and a root iungō, and abiēs is a simple stem. —Aɴɢʀ (talk) 18:33, 15 April 2014 (UTC)
By the way, please tell the module to use ɡ (U+0261) rather than g (U+0067) in IPA transcriptions. —Aɴɢʀ (talk) 18:41, 15 April 2014 (UTC)
Would you happen to know the etymology of -ies by any chance? —CodeCat 18:38, 15 April 2014 (UTC)
No, sorry. Abiēs doesn't have it, though, anyway, since it's third declension, not fifth. —Aɴɢʀ (talk) 18:46, 15 April 2014 (UTC)
The way to force nonsyllabic i after a consonant is simply to use "j" in the parameter that we're already using to mark vowel length. So while {{la-pronunc|abiungō}} produces the wrong result, {{la-pronunc|abjungō}} produces the right one. However, I notice that it transcribes the n as /n/, but before g it should be /ŋ/. —Aɴɢʀ (talk) 07:54, 16 April 2014 (UTC)
Another problem: {{la-pronunc|suāvis}} and {{la-pronunc|svāvis}} both render as (Classical) IPA(key): /suˈaː.wis/, but it should be IPA(key): /ˈswaːwis/. —Aɴɢʀ (talk) 09:42, 16 April 2014 (UTC)
See fingo and tingo. I had struggled before removing the ng rule out of the module, but well I think this is the consensus (or is it?). --kc_kennylau (talk) 09:44, 16 April 2014 (UTC)
I don't think svavis is a word. --kc_kennylau (talk) 09:46, 16 April 2014 (UTC)
But how do you know that it doesn't change to a vowel? Like is there any evidence or what? --kc_kennylau (talk) 09:49, 16 April 2014 (UTC)
In Latin, 'v' and 'u' were different forms of the same grapheme. As late as Shakespearean English, the two were positional variants. ('v' for word initial, 'u' for medial and final, regardless of phonetic value. Thus the line in Bacon: "vast void vniuerse".) So a literate Roman would be wondering why we even had two glyphs for it. In current practice, we use 'u' for the vowel, and 'v' for the consonental use. In suavis, they're both consonental. Basically, 'uV' = /wV/, 'uC' = /uC/; 'iV' = 'jV', 'iC' = /iC/. (A bit more complicated in the case of words like cuius, and I'm sure Angr and others can improve the algorithm.) --Catsidhe (verba, facta) 10:15, 16 April 2014 (UTC)
Unfortunately, it's more complicated than that. You can't always tell whether a u/v is syllabic or not, and there are even minimal pairs like seruit (he joined) vs. servit (he serves) and voluit (he wanted) vs. volvit (he rolls). That's probably why the u~v distinction is maintained in modern Latin writing, while the i~j distinction (which is much more predictable) is usually not made. As for /ŋ/, the module currently converts /n/ to /ŋ/ before /k/ but not before /kʷ/ and /ɡ/, which is silly. If fingō and tingō are transcribed with /n/, it's because someone was thinking of abstract phonemes rather than surface sounds. If that's the way we want to go, fine, but then to be consistent we also have to transcribe vincō with /n/. It makes no sense to use /ŋ/ in vincō but /n/ in fingō and relinquō. But I see that {{la-pronunc|suāvis}} now gives (Classical) IPA(key): /suˈaː.wis/ while {{la-pronunc|svāvis}} gives (Classical) IPA(key): /ˈswaː.wis/, so we can use the latter for words where u after s is /w/. This isn't predictable either (unless you know the morphology): while suāvis, suādeō, and suēscō all have /sw-/, the inflected forms of sūs all have /su-/, as does suus and its inflected forms. I don't even know whether suīle, suillus, and suīnus have /sw-/ or /su-/. One more thing: if we're going to use /i e o u/ for the short vowels rather than /ɪ ʊ ɛ ɔ/, then for consistency we should be using /y/ rather than /ʏ/ as well. It makes no sense to use /i e o u ʏ/. —Aɴɢʀ (talk) 14:38, 16 April 2014 (UTC)
Another question. Accentuation. I don't know if this edit is correct. --kc_kennylau (talk) 14:43, 16 April 2014 (UTC)
I think so, yes. —Aɴɢʀ (talk) 14:54, 16 April 2014 (UTC)

Reconstructed languages as cognates in etymologies

I've noticed that occasionally, reconstructed terms appear in the list of cognates of some entry. In particular, I've seen Proto-Slavic reconstructions appear as cognates, but there are a few with Proto-Germanic cognates too. I'm wondering whether this is ok. On one side, it seems a bit wrong to include unattested terms as cognates. But on the other side, it's very convenient if we can list the reconstructed ancestor as a stand-in for its descendants. Slavic words often don't differ too much from their Proto-Slavic origin, so showing a Proto-Slavic cognate is probably a concise way to indicate that all Slavic words are cognates, without having to list them all or having to choose which to list. —CodeCat 22:24, 14 April 2014 (UTC)

I guess I don't have any specific arguments for or against, but I like to cite proto-Germanic terms as cognates, when we have an entry for them. -Atelaes λάλει ἐμοί 04:29, 15 April 2014 (UTC)
Reconstructions are abstractions- my guess is that they mean very little to anyone who hasn't studied the history of their language groups. I'm not against using them, but they should be accompanied by a descendant or two to ground them in reality. The problem is, of course, that nobody wants their language to be left out. If you list Swedish as a cognate and leave it alone for a minute, suddenly Danish, Norwegian, Faroese and Icelandic are right there with it. Chuck Entz (talk) 06:51, 15 April 2014 (UTC)
I list reconstructed terms as cognates too, though I prefer to list an attested form from Old Church Slavonic instead of Proto-Slavic or Gothic or Old English instead of Proto-Germanic if those words are attested. But if they're not, then using the proto-form is a good shorthand for all the attested modern forms. I would be more inclined to do it for proto-languages whose reconstructions are pretty solid (like Slavic and Germanic) than for those whose reconstructions are themselves kind of shaky (like Celtic and Italic). —Aɴɢʀ (talk) 15:17, 16 April 2014 (UTC)
I don't think reconstructions for Celtic or Italic are any less solid, because the sound changes are still well-known and understood for the most part. What is lacking in those branches is a wide array of languages to use as a base for reconstructions, but that doesn't have to be a problem. In some cases, even just one descendant is enough to solidly reconstruct a term, if it can be regularly derived from a known PIE reconstruction that is itself solid. The principle is that you know two end points, and the rules that derive one from the other are known as well, so the intermediate proto-language is a matter of tying the two ends together in the middle. —CodeCat 15:31, 16 April 2014 (UTC)
There are Proto-Celtic reconstructions that are very solid, but the percentage of them that are very solid is much lower than in Proto-Slavic. Reconstructing PSl. is fairly easy not just because of the large number of attested languages but because of the relatively few and relatively transparent sound changes that have happened between the proto-language and the attested descendants. Hell, the number of changes you have to make to PSl. to get OCS can be counted on one hand. That means PSl. has an enormous, very securely reconstructed vocabulary – even for words with no known cognates outside Slavic. We could never do for Proto-Celtic what we've done at Appendix:List of Proto-Slavic nouns (and its subpages), Appendix:List of Proto-Slavic adjectives, Appendix:List of Proto-Slavic verbs, and so on; the lists would either be full of question marks or would have only a handful of entries each. —Aɴɢʀ (talk) 16:11, 16 April 2014 (UTC)
That's more a matter of time depth though. Of course the further you need to go back from the attested languages, the more the changes are going to compound, and the more things will end up being in an unexplained "dark era". But Germanic certainly has plenty of questions as well, it's not as secure as you think. The general idea is clear, but there are still many specific matters like the class 3 weak verbs, or the curious mixing of masculine and neuter a-stem cognates, that are not at all settled yet. —CodeCat 16:36, 16 April 2014 (UTC)