Wiktionary:Beer parlour/2014/February

Citations page vs Quotation[edit]

When to put quotes in the main namespace and when to put them in the citations namespace? --kc_kennylau (talk) 07:40, 1 February 2014 (UTC)[reply]

I usually put them in the main namespace if there are only a few (say, up to three). Any more than that I put in the Citations: namespace. —Aɴɢʀ (talk) 07:45, 1 February 2014 (UTC)[reply]

Thanks. So they are redundant, I assume from your answer? --kc_kennylau (talk) 08:19, 1 February 2014 (UTC)[reply]

Well, not in cases where there are fewer than 3 quotes and the Citations: tab is still a red link. But there certainly are cases where a quote is found both in the main namespace and in the Citations: tab. Then there's the redundancy within the main namespace between putting the quotes directly under the sense and putting them in a ===Citations=== header. —Aɴɢʀ (talk) 08:26, 1 February 2014 (UTC)[reply]

I see. Thank you. --kc_kennylau (talk) 08:28, 1 February 2014 (UTC)[reply]

For the record, the correct header to use in the mainspace is ====Quotations====, and it is usually level 4 (per WT:"). - -sche (discuss) 22:17, 1 February 2014 (UTC)[reply]

I think they should have separate and distinct uses. Quotations should be a sample from what is available on the citations page, and should be used as usage examples. —CodeCa t 14:11, 1 February 2014 (UTC)[reply]

Welsh plurals or Welsh noun forms?[edit]

I've noticed that Welsh plurals are currently split into a plurals category and a noun forms category. Which one is the correct one?

I'm asking because I think the noun form category would be better split into categories for plurals, mutated nouns, and mutated plurals. This would allow more specialised templates than the current cy-noun-form template which does not allow many details to be specified. For instance, for the plural 'cŵn' (dogs), cy-noun-form generates:

cŵn

I think that it would be more helpful if, for plurals, it could generate something like:

cŵn m pl

EdwardH (talk) 20:26, 1 February 2014 (UTC)[reply]

Mutation is the same regardless of what part of speech or inflection a word is, isn't it? Then a single Category:Welsh mutated forms might be preferable. —CodeCa t 20:30, 1 February 2014 (UTC)[reply]

The issues of categories and templates are separate: multiple templates can add the same category, and the same template can produce different categories. Categories are a tool for finding entries that have something in common, not for classification. Would people need to look specifically for mutated nouns and mutated plurals? Chuck Entz (talk) 20:59, 1 February 2014 (UTC)[reply]

No, I doubt they would. In which case, it would probably be better to follow CodeCat's suggestion and create a single mutated form category. EdwardH (talk) 09:31, 2 February 2014 (UTC)[reply]

Arabic compound diacritics[edit]

Does everyone else see what I see: حَمَّدَ (ḥammada, “ḥammada”)? (See محمد.) What I see is shadda-kasra above the first letter, and fatha above each of the next two letters. However, that would be incorrect (and impossible), and it is not what is actually written. What is actually written is this: fatha above the first and last letters, and shadda-fatha above the middle letter. For some reason, the shadda, although correctly typed, appears to be shifted to the previous letter. —Stephen ^(Talk) 09:47, 1 February 2014 (UTC)[reply]

I see it normally. But I have in the past encountered problems where Hebrew diacritics would appear on adjacent letters on Macintosh computers and iPhones. --Wiki Tiki 89 09:40, 2 February 2014 (UTC)[reply]

I’m using Firefox in Windows XP-Pro. I have never seen this happen before. It’s weird. —Stephen ^(Talk) 09:56, 2 February 2014 (UTC)[reply]

It's strange that you see it on the first letter (by which I assume you mean the rightmost letter). Since the shadda is encoded after the fatha, it would make more sense if it were displayed on the next letter rather than the previous one. Can you possibly post a screenshot so we can see more details about the problem? --Wiki Tiki 89 10:12, 2 February 2014 (UTC)[reply]

I see fatha above the first and last letters, and shadda-fatha above the middle letter, but I do recall there having been problems in the past with Hebrew and Arabic diacritics displaying in the wrong order, on the wrong letters, etc. (One such problem is discussed here.) - -sche (discuss) 10:03, 2 February 2014 (UTC)[reply]

I see it normally too. --Z 10:21, 2 February 2014 (UTC)[reply]

SCREENSHOT: File:Hammada.PNG, by Stephen

What I see on my screen is different from what it shows on the screenshot provided by Stephen. ~~I don't see what is on the screenshot.~~ What I see is such that the 1st and the 3rd letters have something resembling "/" used as diacritic, whereas the 2nd letter has "/" and small "w" below it used as a dicaritic, approximately speaking. My letter counting is from the left. --Dan Polansky (talk) 10:47, 2 February 2014 (UTC) Made myself less ambiguous. --Dan Polansky (talk) 11:08, 2 February 2014 (UTC)[reply]
So you are experiencing the same problem of Stephen. By the way, it's a big pic, it's better to simply link to it. --Z 10:53, 2 February 2014 (UTC)[reply]

It appears that everybody sees it correctly except me. (I would simply link to the image, but I don’t know how. If you know how, create the link.) —Stephen ^(Talk) 10:56, 2 February 2014 (UTC)[reply]

I thought the screenshot is by Dan Polansky, as you didn't sign under the image, and I didn't read his message completely. You can link to image by adding ":" before the images title. --Z 11:06, 2 February 2014 (UTC)[reply]

My display is different from the screenshot, too. On my display the little seagull-looking thing is over the middle letter. —Aɴɢʀ (talk) 11:28, 2 February 2014 (UTC)[reply]

The seagull over the middle letter is the correct view. I haven’t changed anything, except for receiving all of the Firefox updates, so maybe this sudden problem has something to do with my most recent Firefox version. Hopefully it will be corrected in the next version. —Stephen ^(Talk) 11:36, 2 February 2014 (UTC)[reply]

Recently User:Widsith had raised an issue in my talk page, he possibly has a similar problem (I say possibly because I'm still not sure if the problem was that he was not aware of a particular rule about Arabic diacritics, or he was seeing the diacritics incorrectly). --Z 11:51, 2 February 2014 (UTC)[reply]

I'm using Firefox 26.0; isn't that the most recent version? I'm running it on Windows 7. —Aɴɢʀ (talk) 11:55, 2 February 2014 (UTC)[reply]

Another potential problem is that we recently changed the default font for Arabic in MediaWiki:Common.css. It is now a webfont, meaning that it is downloaded when you load the page if you don't have it installed. So the problem could be that there is a bug in this font (Iranian Sans) when used on Windows XP. We tested it on Windows 7 and Mac OS, but not on Windows XP. But that is pure speculation at this point. A way to test this theory is if Stephen could copy and paste the Arabic text into Microsoft Word (or any other word processor) and see if the problem is replicated there, trying various fonts including Iranian Sans. --Wiki Tiki 89 21:37, 2 February 2014 (UTC)[reply]

I also have XP-Pro, but in Firefox it appears correctly (ḥammada). However in Chrome it shows up as fatha, shadda-kasra, fatha (ḥammida). Weird. --Dijan (talk) 22:52, 2 February 2014 (UTC)[reply]

Then it could be that a bug was just now introduced into the font file and if you use a browser that you haven't used in a while or update your usual browser, it will re-download the font and get the buggy version. But that's also just speculation. --Wiki Tiki 89 23:04, 2 February 2014 (UTC)[reply]

I personally hate Iranian Sans anyway. It's appears very small and it's hard for me to read any text in Arabic script unless I increase the size of the text. --Dijan (talk) 23:08, 2 February 2014 (UTC)[reply]

When I copy and paste حَمَّدَ into MS Word, the error disappears. (Except that the shadda-fatha over the middle letter get reversed, but that is an old problem and one that does not seem to have a solution). I don’t have Iranian Sans installed, but I have a number of Arabic fonts, including Tahoma, Traditional Arabic, Arial Unicode MS, Andalus, Arabic Transparent, Microsoft Sans Serif, Simplified Arabic, and all of my installed fonts give the same result (fatha-shadda over the middle letter, which looks like shadda-kasra). Dijan’s error is just the old problem of reversed diacritics due to normalization, and fatha-shadda (the reversed order) has the appearance of shadda-kasra. It’s the same thing that I see in MS Word. —Stephen ^(Talk) 23:09, 2 February 2014 (UTC)[reply]

Do you see the problem here (with Arial font): حَمَّدَ? --Wiki Tiki 89 23:21, 2 February 2014 (UTC)[reply]

I don't see the problem when using Firefox with the Arial font. However, in Chrome, I have to increase the text size to 250% in order to see the fatha over the shadda. Otherwise, it just appears without the second fatha in Chrome. --Dijan (talk) 23:28, 2 February 2014 (UTC)[reply]

حَمَّدَ looks perfect to me. —Stephen ^(Talk) 00:27, 3 February 2014 (UTC)[reply]

Looks correct on my Mac in Safari, Chrome, and Firefox. In my Safari edit field, in Opera, and in Safari/iOS 5, in the seagull is ducking below the arrow, but still over the rolling wave in the middle. —Michael Z. 2014-02-03 02:14 z

So we've narrowed it down to some obscure font issue. --Wiki Tiki 89 05:08, 3 February 2014 (UTC)[reply]

Oops, my mistake. It looks fine where it has been pasted into this discussion. But the red link at the beginning of this topic, and all of the instances in the entry محمد have the seagull ducking below the arrow instead of flying over it.

I can fix it by removing Arial Unicode MS from the font stack. But the next font, Code2000 looks pretty bad, with large overlapping diacritics, and should be removed too. (The Iranian Sans webfont doesn’t seem to be working for me, and Traditional Arabic is absent on my Mac.)

Yes, it only works with no added fonts on my Mac. The Helvetica Neue font applied by the Typography Refresh doesn’t have the Arabic letters, but whatever font my Safari/Mac is falling back to is better than our additions. —Michael Z. 2014-02-04 19:44 z

Update: the above applied only in Safari/Mac. Firefox/Mac and Chrome/Mac don’t fix the diacritic display in the default font. —Michael Z. 2014-02-04 19:56 z

I’m not sure I understand what you mean by "arrow" when you say "the seagull ducking below the arrow instead of flying over it". If by arrow you mean the short, usually slanted, hyphen-like diacritic known as the fatha, the correct appearance has the fatha positioned ABOVE the seagull. Unfortunately, since normalization has reversed the order of the compound diacrits (from shadda-fatha to fatha-shadda), most Arabic fonts do not understand it and display it with the seagull on top (which makes it look exactly like shadda-kasra instead), and some fonts simply overlap them. The kasra must always be beneath the seagull, and the fatha must always be above the seagull. Same with double-fatha and double-kasra. —Stephen ^(Talk) 23:52, 4 February 2014 (UTC)[reply]

From my experience, most Arabic fonts handle it fine, but many still do not. --Wiki Tiki 89 00:01, 5 February 2014 (UTC)[reply]

None of my Arabic fonts handle it correctly. —Stephen ^(Talk) 00:09, 5 February 2014 (UTC)[reply]

The image here is what I see in Safari/Mac (also similar in Firefox & Chrome). If the word is rendered correctly in the etymology and headword, then this is good news (but Code2000 should probably be removed from the font stack for .Arab). Does Arabic properly have boldface in any languages? That headword would look less ugly if we set it to font-weight: normal. —Michael Z. 2014-02-05 04:43 z

Yes, the Safari/Mac screen shot is correct. Arabic has both bold and regular fonts, but should not be italicized. —Stephen ^(Talk) 04:59, 5 February 2014 (UTC)[reply]

hamzat al-waṣl (ٱ) and Allah ligature (لله)[edit]

I had no problems seeing characters above but I have issues combinning hamzat al-waṣl (ٱ) and Allah ligature (لله). The letter ٱ is normally unnecessary on individual words were with elidable alif (not pronounced at all if follows a word ending in a vowel) but occasionally it's important to show that it's actually dropped (elided), e.g. in the sentence هُوَ ٱلْمُعَلِّمُ (huwa l-muʿallim(u)), huwa + al-muʿallim(u)=huwa l-muʿallim(u). The symbol ٱ doesn't seem to work with the ligature لله. If I write them together: ٱلله, the ligature loses shadda and alif hanjariyya (dagger alif) and look simply like lām+lām+hāʾ. The name عَبْدِ اللهِ (ʿabdu llāh(i)) has such an elision but adding the diacritic to demonstrate it, spoils the display of neighboring characters. It seems the ligature can only display on its own لله (on some systems) or with a simple alif الله (allāh) (on most systems) with no hamza and no waṣla. Is this expected? Should this ligature display diacritics in any position when three Arabic letters (lām+lām+hāʾ) are joined together? --Anatoli ^{(обсудить}/^вклад) 12:56, 5 February 2014 (UTC)[reply]

I can only see now الله (allāh) (allāh) with shadda (seagull) and a dagger alif (short vertical stroke) when it follows a simple alif ا. The diacritics should definitely show in لله (li-llāh(i)) (it doesn't on my home PC). --Anatoli ^{(обсудить}/^вклад) 13:01, 5 February 2014 (UTC)[reply]

It's an expected behavior, currently it doesn't show in any of these words on my current system either, but as far as I remember, those diacritics (shadda and alif) are shown only if you write alif + lam + lam + ha'. Even if you add any extra diacritic on any of these letters, those diacritics of the ligature would be disappear. In other words: if you'd like to see that word in fully vocalized form and also with hamzat al-wasl, you should use it like عَبْدُ ٱللّٰه --Z 15:48, 5 February 2014 (UTC)[reply]

Thank you! I don't know what you did but I can see full diacritics now, even with a final kasra عَبْدُ ٱللّٰهِ. Could you describe what you did, please? I have updated Abdullah and عَبْدُ ٱللّٰهِ (ʕabdu llāhi) now. --Anatoli ^{(обсудить}/^вклад) 22:29, 5 February 2014 (UTC)[reply]

NP, I simply wrote hamzat al-wasl + lam + lam + shadda + dagger alif + ha' --Z 08:59, 6 February 2014 (UTC)[reply]

I see. There' some new problem, though. On iPad, عبد الله is seen with two rows of diacritics - shadda-dagger alif and shadda-dagger alif. One generated automatically and one manual. I guess, you can't win. --Anatoli ^{(обсудить}/^вклад) 09:43, 6 February 2014 (UTC)[reply]

How to link in inflection tables[edit]

Is there any established preference for how to link to terms in inflection tables? At {{de-decl-noun-n}}, kc_kennylau (talk • contribs) and I are going back and forth between using bare links (i.e. [[...]]; his preference) and using {{l-self}} (my preference). I don't want to keep edit-warring about it, especially if there isn't a consensus that my way is the preferred way. I thought there was, though. —Aɴɢʀ (talk) 14:38, 2 February 2014 (UTC)[reply]

Refer to the other declension tables, i.e. {{de-decl-noun-m}} and {{de-decl-noun-f}}. They detect if the form is going to be different with the page name, and add a link if it is different. Otherwise, they use bare links. Notice how your edit causes {{l-self|de|{{PAGENAME}}}} which is, well, yeah... --kc_kennylau (talk) 14:41, 2 February 2014 (UTC)[reply]

I don't encourage bare links (because they don't mark the text with the language), but I don't think it matters whether you use {{l}} or {{l-self}}. I'm not sure why there would be a problem with {{l-self|de|{{PAGENAME}}}} though. —CodeCa t 14:45, 2 February 2014 (UTC)[reply]

Me neither. Kenny, can you be more specific than "well, yeah" about your problem with {{l-self|de|{{PAGENAME}}}}? —Aɴɢʀ (talk) 14:52, 2 February 2014 (UTC)[reply]

It is virtually useless trying to detect whether a link, that by definition is linking to itself, is linking to itself or not. --kc_kennylau (talk) 16:40, 2 February 2014 (UTC)[reply]

Isn’t it good to be, like, really, really sure? —Michael Z. 2014-02-02 18:53 z

I still don't understand what you mean. {{l-self}} doesn't know in advance that its parameter is the same as the page name. That's why it checks. —CodeCa t 13:22, 3 February 2014 (UTC)[reply]

The code literally contains {{l-self|de|{{PAGENAME}}}} which is redundant and kinda defeats the purpose of calling the template. Why don't you just bold it instead. --kc_kennylau (talk) 15:03, 3 February 2014 (UTC)[reply]

Bolding isn't the same as what linking templates do. But in any case, I don't see {{l-self|de|{{PAGENAME}}}} anywhere in {{de-decl-noun-n}}. —CodeCa t 15:21, 3 February 2014 (UTC)[reply]

Because I removed all them. NB: An edit war was going on between me and him, if you see the history. --kc_kennylau (talk) 15:30, 3 February 2014 (UTC)[reply]

I think you were both right. Angr was right to use {{l-self}}, but you were right to not put it in {{de-decl-noun-n}}. I think it should go in {{de-decl-noun}}. —CodeCa t 15:33, 3 February 2014 (UTC)[reply]

I don't think it's necessary to add extra links because the m f and n already add link if it's not identical to the pagename. Please look at their source codes. --kc_kennylau (talk) 15:54, 3 February 2014 (UTC)[reply]

But they add bare links rather than German-specific links. That means (1) clicking on the link won't necessarily take you to the German section if the page has more than one language on it, (2) your browser doesn't know the word is in German (which can make a difference to blind people with screen readers, for example), and (3) those of us who have set our preferences to show links to nonexistent language sections in orange don't see orange links but rather blue links. —Aɴɢʀ (talk) 19:56, 3 February 2014 (UTC)[reply]

How would I process special characters in Python?[edit]

Discussion moved to Wiktionary:Grease pit/2014/February.

Ugh. We're now punitively proscriptive about additions to our most-general, entry-level community communications channel. - Amgine/^t·e 17:20, 2 February 2014 (UTC)[reply]

Punitively? This looks to me more like moving a technical discussion to the technical forum, where it's more likely to be seen by people who might be able to answer the question. ‑‑ Eiríkr Útlendi │ Tala við mig 18:08, 3 February 2014 (UTC)[reply]

Proposal for minor change in the functionality of Template:etyl[edit]

I propose that {{etyl|xx}} should function exactly the same way as {{etyl|xx|-}}, rather than categorizing as "English terms derived from Language X". Then we can deprecate {{etyl|xx|-}} in favor of {{etyl|xx}}. I think that on average {{etyl|xx|-}} is used more times per page than any other use of the template, and removing the need for the "|-" will make it much easier to type up cognate lists in etymology sections. --Wiki Tiki 89 03:46, 4 February 2014 (UTC)[reply]

I agree with making this change. — I.S.M.E.T.A. 18:32, 7 February 2014 (UTC)[reply]

The change makes sense considered in isolation, but it would be very hard to find and fix all the missing instances of [Category:English terms derived from <language>] after the change is made. I think we would would need to do that before the change while we can still distinguish them by the missing "-", so that it can be done by bot and not manually, entry-by entry.

Also, because the old way and the new way have all the same inputs, but with different interpretations, automatically tagging incorrect uses of the template without the parameter to mean English would seem to me to be impossible. How do we tell the difference between someone deciding not to categorize a derivation and someone erroneously assuming that they're categorizing it as English? Chuck Entz (talk) 19:03, 7 February 2014 (UTC)[reply]

Well, we can deprecate {{etyl|xx}} for a while, possibly having it display an error, until people stop using. --Wiki Tiki 89 19:12, 7 February 2014 (UTC)[reply]

accents and qualifiers[edit]

Should we change all instances of {{a|adjective}} to {{qualifier|adjective}} in all pages here? --kc_kennylau (talk) 07:14, 4 February 2014 (UTC) Fixed grammar. --07:15, 4 February 2014 (UTC)[reply]

As well as here, here, here, here and here? --kc_kennylau (talk) 07:17, 4 February 2014 (UTC)[reply]

I'd be in favor of such a change. Or maybe {{sense|adjective}}? —Aɴɢʀ (talk) 13:30, 4 February 2014 (UTC)[reply]

Absolutely, except for Standard. Those must be fixed one by one as Standard may be labelling pronunciation in the standard accent (it would be a better practice to use the name of the standard accent though). — Ungoliant ^(falai) 13:39, 4 February 2014 (UTC)[reply]

@Angr I don't think {{sense|adjective}} is appropriate, since IMHO sense is meaning and adjective isn't a meaning, just IMHO. --kc_kennylau (talk) 14:42, 4 February 2014 (UTC)[reply]

elevated / lofty style[edit]

What context label I can use in order to mark the elevated style or lofty style in the definition?

This is interesting for me, because in Russian Wiktionary there is the context label "высок." ("высокий" or "высокий стиль") and I want to find the corresponding context label in English Wiktionary. -- Andrew Krizhanovsky (talk) 10:07, 4 February 2014 (UTC)[reply]

German Wiktionary has de:Vorlage:geh., standing for gehoben, as well, which comes to the same thing. I sometimes use {{context|formal}} for this, but I'm not 100% sure "formal" is really identical to "elevated style". Of course you can always use {{context|elevated}}, but that doesn't categorize. —Aɴɢʀ (talk) 13:29, 4 February 2014 (UTC)[reply]

We also have {{context|literary}}, which categorizes. But, given that our user base skews heavily toward those with graduate degrees, maybe 'lofty' is normal. DCDuring TALK 14:58, 4 February 2014 (UTC)[reply]

In my experience, in practice, {{cx|literary}} is en.Wikt equivalent of de.Wikt's gehoben; {{cx|formal}} might also be appropriate; and literary is even more elevated than something which is merely formal. In theory (i.e. according to the definitions provided by Category:English literary terms and Category:English formal terms), any words that are rarely used in speech are literary, by which criterion a lot of textspeak (e.g. 404 for "I don't know") might qualify as literary! - -sche (discuss) 18:33, 4 February 2014 (UTC)[reply]

<snort> "Fear and Loathing in Las Vegas" is literary, too. And other than the dated drug jargon/slang usage I doubt one can find many rare terms within it. - Amgine/^t·e 18:57, 4 February 2014 (UTC)[reply]

I agree that {{context|literary}} is essentially the equivalent of Russian "высокий". They don't imply exactly the same thing, but they are close enough. --Wiki Tiki 89 19:13, 4 February 2014 (UTC)[reply]

Thank you very much for your answers.

There is the translation книжный = "literary", "bookish" (O.S. Ahmanova "The dictionary of linguistics terms" (in Russian), 1969 year, page 198). But "книжный" is another context label in the Russian Wiktionary. Therefore, I cannot use "literary", because it is an equivalent to "книжный". OK, I will use {{context|elevated}} as the translation of "высокий стиль". -- Andrew Krizhanovsky (talk) 11:15, 7 February 2014 (UTC)[reply]

"bookish" is not exactly the same as "literary", at least the way I see it. --Wiki Tiki 89 14:15, 7 February 2014 (UTC)[reply]

Please don't! — especially since (I suspect) the vast majority of readers of en.wikt won't know what it means.—msh210℠ (talk) 16:08, 11 February 2014 (UTC)[reply]

What is ru.Wikt’s equivalent of “formal”? —Michael Z. 2014-02-07 15:19 z

Seriously, why the heck would you create a non-standard label “elevated” for высокий, when the synonym formal is used in every English-language dictionary? —Michael Z. 2014-02-09 05:51 z

Because "formal" is not a correct translation of "высокий". --Wiki Tiki 89 06:08, 9 February 2014 (UTC)[reply]

Thanks. What is ru.Wikt’s equivalent of “formal”? —Michael Z. 2014-02-10 15:45 z

You can see a list of equivalents of context labels in Russian Wiktionary and English Wiktionary: ru:Участник:AKA MBG/Статистика:Пометы.
See the second table: "Labels added by hand". This correspondence was created by my research group in the last year.

The table Labels found by parser" was created by wikokit parser, and the data was extracted from the Russian Wiktionary. During this year we will extract context labels and create the same tables for the English Wiktionary.

P.S. You see in this table that "formal" corresponds to "офиц." (официальный).

P.P.S. You are welcome to write your comments and ideas at discussion page. Of course, we will change our equivalents if you will propose better equivalents. -- Andrew Krizhanovsky (talk) 05:31, 12 February 2014 (UTC)[reply]

Russian label abbreviations are here: ru:Викисловарь:Условные_сокращения. "высок." is something like "stilted". I don't agree with the explanation "высокое", it's more like "высокопарное". --Anatoli ^{(обсудить}/^вклад) 05:43, 12 February 2014 (UTC)[reply]

I remembered an exact English equivalent of "высокий": "high(er)-register". --Wiki Tiki 89 01:03, 13 February 2014 (UTC)[reply]

Thank you, Wikitiki89! -- Andrew Krizhanovsky (talk) 10:30, 11 March 2014 (UTC)[reply]

Chinese character entries in Translingual sections[edit]

Discussion moved from Wiktionary talk:Per-language pages proposal#Chinese character entries.

What is called "Translingual" at present essentially refers to "Chinese", eg. 代, 學, 任, including glyph etymology, definitions, derived characters (The character is written like that, and various characters are derived from it, because of homophony or near-homophony in Chinese). This bit should be got rid of completely, and Mandarin should be renamed "Chinese" and moved up the top, in which glyph etymology/word etymology/derivations/semantic development etc. are explained (eg. 斗). Wyang (talk) 00:48, 7 February 2014 (UTC)[reply]

The reason it is "Translingual" is because it also applies to non-Chinese languages, such as Japanese, Korean, Vietnamese, and pretty much any other East Asian language. --Wiki Tiki 89 00:56, 7 February 2014 (UTC)[reply]

The semantic development is also language-specific. There are plenty of examples of characters that have different meanings in different Chinese languages, or that only exist in some of them. —CodeCa t 00:59, 7 February 2014 (UTC)[reply]

The way I see it, the Translingual entry is the entry for the character. The characters have meaning independent of the language, due to the pictographic/logographic origin of the script, and that meaning should be listed as the meaning of the character. Any further semantic developments should be listed under the relevant language. --Wiki Tiki 89 01:04, 7 February 2014 (UTC)[reply]

The characters did not have meaning independent of the language, otherwise why would people create them originally? WRT further semantic developments - in what way is the definition of 代 in Korean, Japanese or Vietnamese different from Chinese? Wyang (talk) 01:10, 7 February 2014 (UTC)[reply]

Exactly, it's not. But sometimes it is. That's exactly what I mean by independent of language: the character has essentially the same meaning in all of those languages. --Wiki Tiki 89 01:13, 7 February 2014 (UTC)[reply]

See below. Wyang (talk) 01:28, 7 February 2014 (UTC)[reply]

Yes, but they are only a small proportion out of the hundreds of thousands of characters that exist, and a context label could easily accommodate that. There are plenty of examples of English words that have different meanings in different dialects, or that only exist in some dialects too. Wyang (talk) 01:06, 7 February 2014 (UTC)[reply]

@Wyang, I'm not arguing against a "Chinese" header. I'm arguing for a "Translingual" character entry. I think that the "Chinese" and "Translingual" sections could co-exist. --Wiki Tiki 89 01:09, 7 February 2014 (UTC)[reply]

I was replying to User:CodeCat. I don't think they should coexist. Even if they do, "Translingual" should not have etymology, definition (because they are the same as in Chinese), or derived characters. It should only contain coding information. Wyang (talk) 01:28, 7 February 2014 (UTC)[reply]

I agree with this. The translingual section shouldn't be used to give meanings, those should go in the section for each language. Yes, even if it means the same in every language. land means the same thing in many Germanic languages too, but we don't skimp out and put that under "Translingual" either. I don't see why this should be any different. —CodeCa t 01:33, 7 February 2014 (UTC)[reply]

In alphabetic languages, the letters don't have meaning. In pictographic languages, the glyphs have the same meaning regardless of the morphology of the language that uses them. --Wiki Tiki 89 01:38, 7 February 2014 (UTC)[reply]

Except that's not always true. No two languages using Chinese characters use the exact same set of characters, with the exact same meaning. There are differences. Furthermore, among the Chinese languages, characters are also tied to historical words. Words with different origins get a different character. That means that when a word falls out of use in one Chinese language, the character that was used to write it goes with it, and it's replaced by another synonymous (!) character. But in another Chinese language, the older word might still be in use and the alternative word could mean something else, or it could be that word that disappeared instead. Thus characters can be very much "dialectal" and have language-specific usage and meanings. For some examples, see w:Written Cantonese#Cantonese words. It's often said that Chinese is written the same regardless of language, but I'm pretty sure that written Cantonese would not be perfectly understood by a Mandarin speaker, because of these differences. —CodeCa t 01:47, 7 February 2014 (UTC)[reply]

Most of the examples in the link above are archaisms, which are perfectly understandable for someone unfamiliar. Using the example of "to eat" there, it can be easily explained in 食 if there is a Chinese header, using

# {{cx|Cantonese|Hakka|Min Nan|Min Dong|Min Bei}} to [[eat]]

or something similar. And the alternative, 吃, would have

# {{cx|Mandarin|Jin|Wu|Gan|Xiang}} to [[eat]]

. Wyang (talk) 02:30, 7 February 2014 (UTC)[reply]

@Wikitiki89: Because all the other languages borrowed the fossilised form from one language. They have the same meaning as the Chinese language which originally used these glyphs to write the language. The meanings are basically dead outside the fossilised borrowings, eg. no Vietnamese person would use đại to mean "replace, replacement" or "era, generation", and none of the readings in 弋#Vietnamese would be understood as "catch, arrest" or "shoot with bow". Wyang (talk) 01:52, 7 February 2014 (UTC)[reply]

Which is not true. Glyph-wise 代 derived from 弋, not because they were near-homophonic in Japanese at any stage or level (Kun, On). Wyang (talk) 01:02, 7 February 2014 (UTC)[reply]

I don't see how that's relevant. Etymology is etymology. The point is the glyph has the same basic meaning in Japanese as it does in Chinese. --Wiki Tiki 89 01:07, 7 February 2014 (UTC)[reply]

There's also the etymology of the individual characters, which presumably is specific to the character glyph itself, regardless of the semantics or sound shifts of each language. Some have argued that this belongs under a Chinese heading, but that information is applicable (or at least potentially interesting and linguistically useful) to all languages that use that character. — This unsigned comment was added by Eirikr (talk • contribs) at 01:21, 7 February 2014.
Exactly. Now I think we should move the majority of this discussion to somewhere more relevant. Only the beginning belongs here. --Wiki Tiki 89 01:23, 7 February 2014 (UTC)[reply]

That etymology is not translingual if it can not be said to be true for all languages then. The character has essentially the same meaning, as in Chinese, in all of those languages, since those languages kept all the meanings in Sinoxenic compounds. It is in Chinese that the character evolved semantically, and produced those meanings, and the other languages merely borrowed the fossilised characters and compounds. It is not a translingual occurrence that the character was coined, given those meanings, and was used to derive other characters. It was in Chinese that the character was coined, and was assigned the correspondence with a native Chinese word, which evolved semantically through time, acquiring various meanings, and the character was used to write other characters in Chinese. Wyang (talk) 01:28, 7 February 2014 (UTC)[reply]

@Wyang. No-one argues that Han characters were created by Chinese but character and word definitions become different, they are also different parts of speech across languages (e.g. 的 is also a suffix in Japanese and Korean but it's never a suffix in Chinese) and they are shared. Translingual definitions can often be copied into Mandarin to give Mandarin definitions (many Mandarin single character entries still lack simple definitions!) but Translingual (or something with a new name as CodeCat suggested) should stay. Good Chinese and Japanese dictionaries have separate entries for characters and words. Note that translingual sections don't have noun, verb, other parts of speech. The new header, rather than Translingual may have something to permanently remind that these characters are Chinese, such as Han character (漢字/汉字) if it's your concern but existing subheaders should probably be renamed and standardised (kanjia, hanja, etc.). --Anatoli ^{(обсудить}/^вклад) 01:51, 7 February 2014 (UTC)[reply]

The fossilised meaning are also no concern. Language specific entries could focus on semantics, grammar, pronunciation, not stroke order or generic meaning. --Anatoli ^{(обсудить}/^вклад) 02:02, 7 February 2014 (UTC)[reply]

There is no point in repeating the definitions in Translingual (or whatever the name is) if it is going to be nearly 100% identical with Chinese. There are differences in the PoS across languages, but again I don't agree with the idea of PoS for isolating languages, especially for characters which almost always cover a broad range of PoS in a single sense. Anyway the particle-ilisation of 的 is a phenomenon which happened in Chinese, which used 的 to represent an increasingly common possessive and adjectival particle at that time, and Japanese/Korean borrowed this to systematise their Sinoxenic borrowings, converting nouns into something similar to an adjective. Wyang (talk) 02:07, 7 February 2014 (UTC)[reply]

As I showed above, there is no such thing as "Chinese" when it comes to meanings of characters. Each Chinese language can and sometimes does assign different meanings to the same character. If we put some meanings under Translingual and others under Mandarin/Cantonese/Min Nan etc, that is just going to confuse people because then they won't know which one is right, or what's more specific etc. —CodeCa t 02:11, 7 February 2014 (UTC)[reply]

See above. "There is no such thing as 'Chinese' when it comes to meanings of characters" - well, what are these character dictionaries about then? Hanyu Da Zidian, Zhonghua Da Zidian, Kangxi Dictionary. Wyang (talk) 02:26, 7 February 2014 (UTC)[reply]

If we have a generic meaning of 食 "eat", "eating" under "Han character", without the pronunciation section but stroke orders, etymology and the exact modern usage under Mandarin Chinese section (and other languages) (the character is now hardly used as a verb in Mandarin), then it would be easier to see how individual CJKV languages use it today. Lumping everything under Mandarin/Chinese would be a mess. --Anatoli ^{(обсудить}/^вклад) 02:24, 7 February 2014 (UTC)[reply]

No, it won't be. A context label suffices. See the example above. Wyang (talk) 02:26, 7 February 2014 (UTC)[reply]

I agree with Wyang and have thought the same for a long time. Kaixinguo (talk) 12:17, 2 March 2014 (UTC)[reply]

ISO 639-3 updates[edit]

SIL has posted updates a summary of ISO 639-3 code changes made for change requests from 2012. Usually enwikt implements changes that aren't controversial (often for codes that we have no entries in). It's also a good time to debate the changes that are controversial or non-trivial. Check to see if your favorite unappreciated language has been affected! --Bequw → τ 15:10, 7 February 2014 (UTC)[reply]

Looks like we can change Old Lithuanian from bat-olt to olt. --Wiki Tiki 89 15:17, 7 February 2014 (UTC)[reply]

All of the changes to codes look good, except that we should adopt ygs as "Yolngu Sign Language" rather than "Yolŋu Sign Language". And we can note that gev goes by "Viya" in addition to "Eviya". I'll examine the changes to language names later. - -sche (discuss) 18:42, 7 February 2014 (UTC)[reply]

extended notes

The retirements of mhh, emo, lmm, ggm and particularly yuu all look like good moves.
We should adopt ygs as "Yolngu Sign Language" rather than "Yolŋu Sign Language", per WT:LANG (specifically the bits about avoiding special characters and using attested names; "Yolngu Sign Language" is attested, "Yolŋu Sign Language" isn't).
gev goes by both "Viya" and "Eviya". I'm not sure, but "Viya" may be the better name, as it may be the language's autonym while "Eviya" is the ethnonym.
It's great to see Old Lithuanian getting a code (olt).
xaj is a might-be-distinct dialect of the apparently extinct language ama, but whatevs, we can follow the ISO in splitting xaj off from ama. (Can always merge them back again later if necessary.)
gmg is apparently a newly discovered language, which is cool.
xis is the only code I have reservations about. "Kisan" is an ambiguous name, also used by other lects (though that is surmountable, particularly because neither we nor the ISO seem to have granted codes to any of the other lects under that name), and WP says that Kisan and the other major dialect of Kurukh, "Oraon [...] have 73% intelligibility between them".
- -sche (discuss) 18:42, 7 February 2014 (UTC)[reply]
PS, they're said to be planning to retire the code for the spurious (non-existent) 'Yiddish Sign Language' next year.

I have updated the modules to remove, split, and add all the codes the ISO removed, split, and added, respectively. I have also switched all our uses of bat-olt to olt and removed 'bat-olt' from the exceptional code module. Thanks, Bequw, for altering us to the updates. - -sche (discuss) 04:51, 10 February 2014 (UTC)[reply]

A proposal to treat Old Latin as a separate language from the other chronolects of Latin[edit]

In a short discussion in Module talk:labels#Old Latin data, CodeCat and I agreed that it would be a good idea to treat Old Latin (la-old) as a separate language from Latin (la) on the English Wiktionary. w:Old Latin "presents some of the major differences" between Old Latin and Classical Latin. The corpus of Old Latin dates from the fragments of the Carmen Saliare (700 BC) to the conventional boundary date between it and its Classical descendant language (75 BC). Saliently, much of the earlier Old Latin corpus was unintelligible by the writers of the Classical period; thus, according to the standard linguistic criterion of mutual unintelligibility for differentiating languages, this arguably renders Old Latin and ≥Classical Latin different Abstandsprachen. Practically speaking, there are more orthographic, phonological, and morphological differences between Old Latin and Classical Latin than there are between Classical Latin and the later Latin chronolects; that Old-Latin–specific information can't easily be shoehorned into (Classical) Latin entries without unduly obscuring the information about the language's Classical form, which, I believe, is what most people who look up Latin words will be after. For all these reasons, I propose that we treat Old Latin (loosely defined as everything in the Latin continuum prior to 75 BC, with the ISO code la-old) as a separate language from Latin (loosely redefined to exclude everything in the Latin continuum prior to 75 BC, retaining the ISO code la) on the English Wiktionary. — I.S.M.E.T.A. 19:25, 7 February 2014 (UTC)[reply]

Sounds good to me. —Aɴɢʀ (talk) 20:17, 7 February 2014 (UTC)[reply]

The proposal to treat Old Latin as a separate language sounds good to me, too; in the past, User:Metaknowledge explained the dramatic differences between Old and newer Latin to me.
I do have a technical question, though: I don't mean to nitpick, but is "la-old" an ISO code? If it isn't, but we still want to upgrade "Old Latin" from "etymology-only language" to "regular / L2-having language", shouldn't we create an exceptional code the way we usually do (as documented on WT:LANG and demonstrated in Module:languages/datax), which is by using the family code, a hyphen, and three letters that approximate the language's name (so: "itc-ola")? - -sche (discuss) 20:49, 7 February 2014 (UTC)[reply]

You're right. However, may I suggest that we use itc-lao instead (by analogy with fro for Old French, for example)? — I.S.M.E.T.A. 21:28, 7 February 2014 (UTC)[reply]

Sure. :) - -sche (discuss) 21:40, 7 February 2014 (UTC)[reply]

I prefer itc-ola. All of our exceptional codes for old languages are named that way, and some in ISO are too (like the new olt, and existing odt, osx, ofs, orv). —CodeCa t 22:13, 7 February 2014 (UTC)[reply]

Hm, that's also a good point. Alright, we're back to itc-ola. - -sche (discuss) 22:29, 7 February 2014 (UTC)[reply]

Based on the discussion that -sche linked to, I don't think we should split off Old Latin at 75 BC. It seems that if anything it should be the "Primitive Latin" that Metaknowledge talks about that should be split off. --Wiki Tiki 89 22:45, 7 February 2014 (UTC)[reply]

I think instead of using a particular date, we can use a particular progression of sound changes. That would simplify things for us, because we can limit how many alternative forms and inflections we need to handle. Obvious choices would be s > r, weakening of unstressed vowels, monophthongisation, loss of final -d, and maybe even the stress shift itself (if we can somehow find out when it happened). (Speaking about codes I noticed we have roa-ptg for Old Portuguese. Maybe we should change that to roa-opt?) —CodeCa t 00:01, 8 February 2014 (UTC)[reply]

This looks more sensible to me. Otherwise, we would have to create an Old Latin section for every word that occurs in Terence's works (for example), which wouldn't make much sense. A sentence as non dici potest quam cupida eram huc redeundi, abeundi a milite vosque hic videndi, antiqua ut consuetudine agitarem inter vos libere convivium doesn't present "striking differences" from Ciceronian Latin. --Fsojic (talk) 00:20, 8 February 2014 (UTC)[reply]

Is Cicero's Latin what is normally used as the "base" for classical Latin? If so, then anything less than 100 years before his writing probably is similar enough to our idea of "Latin" to call it that. It's more practical to look at spelling, that's why I think sound changes give a better cutoff, because of how they influence spelling. w:Senatus consultum de Bacchanalibus from 186 BC is a good example of a text I would consider "old" but not "really old". It has the weakening of unstressed vowels, and probably the classical stress pattern as well, but it still has diphthongs which make quite a lot of words look rather different, final -d in the ablative, DV rather than B, GN rather than N, and a few other things. —CodeCa t 00:41, 8 February 2014 (UTC)[reply]

Yes, Cicero and Caesar are the base for it. --Fsojic (talk) 13:00, 8 February 2014 (UTC)[reply]

I've made the change. I don't know if I caught all the cases that used "OL.", so there might be a module error or two. We'll need to keep an eye out. A lot of links to Old Latin terms use "la" as the code too, so we need to look out for those and update them. —CodeCa t 00:38, 14 February 2014 (UTC)[reply]

rare terms[edit]

(I think this has already been discussed before, but I couldn't find where)

I did this edit, but I'm not satisfied with the categorization it brings about. This is not a "term with a rare sense" (which would imply it had several senses), it's only a "rare term". --Fsojic (talk) 23:54, 7 February 2014 (UTC)[reply]

A rare term is a term, all of whose senses are rare. Therefore, a "rare term" is also a "term with a rare sense". There isn't much use for a category of terms that only have rare senses. And it would also be very difficult to implement context labels in a way that would support that. --Wiki Tiki 89 00:02, 8 February 2014 (UTC)[reply]

From a technical perspective it'd be simple to have {{cx|rare term}} vs {{cx|rare sense}}. But getting people to use the labels correctly could indeed be difficult, in part because some people would probably continue to use {{cx|rare}} for both cases even if we deprecated it or made it a redirect to only one of the two. - -sche (discuss) 03:44, 8 February 2014 (UTC)[reply]

You're right that this has been discussed before, and some users have taken the same view as you, that "terms with rare senses" misleadingly implies that some senses are not rare. Others have taken the view that it doesn't explicitly say any sense are not rare, and so is more palatable than the former category ("rare terms", which nevertheless contained common terms with rare senses). Here are a few past discussions: WT:RFM#Category:English_terms_with_obsolete_senses, WT:Grease pit/2011/June#Bad_category_name_generated_by_template_obsolete, WT:Beer parlour/2011/June#English_terms_with_obsolete_senses.2C_etc.. In general, the way we handle rare/obsolete/etc terms, terms with rare/obsolete senses, and rare/obsolete forms is a mess, because we mix all of those things incompletely. For example, some entries use {{cx|obsolete|lang=en}} {{alternative form of|foo|lang=en}} while others use {{obsolete form of|foobar|lang=en}} — and that's not even necessarily wrong, because some have argued that there is a meaningful difference between the two. - -sche (discuss) 03:44, 8 February 2014 (UTC)[reply]

`{{head}}`'s links to e.g. Wiktionary:Hebrew transliteration[edit]

Currently, {{head}} prefaces the headword's transliteration with a bullet symbol linking to e.g. Wiktionary:Hebrew transliteration. I think I understand the motivation here, but I think the result is not quite right, for four reasons:

Pages in the Wiktionary: (Project:) namespace aren't part of our actual dictionary content: they're supposed to be for editors, not readers. If we want to document our transliteration schemes for readers, then we should do so in the Appendix: namespace.
I've looked through Wiktionary:Hebrew transliteration and several others, and in all cases, it's clear that they were written for use by editors who know the original script and want to write a transliteration. That means that they're categorically not useful to put between the original script and the transliteration when both are present. (If anything, that information would be useful when the transliteration is missing, if we want to prompt readers to become editors and add it.)
I can't think of what would be given in reader-oriented documentation of a transliteration scheme. I suppose it would give information about how to go from a transliteration to a pronunciation, but in that case it seems like {{head}} is one of the least useful places for the link, since the headword is already likely to have an associated ===Pronunciation=== section. (I don't feel strongly about this part, though.)
It's really not obvious that the bullet symbol is a link; and even if someone notices the color and correctly infers that it's a link, they're not likely to guess what it's a link to, since it doesn't seem closely tied to the transliteration. (The transliteration is wrapped in parentheses, and the bullet is outside those parentheses, so the link actually closer to the original script than to the transliteration.)

I think it might be best to just remove the link, but an alternative would be to (1) change it to point to an appendix rather than a project page and (2) move it after the transliteration, inside the parentheses, as something like <sup>''[[Appendix:Hebrew transliteration|help]]''</sup>.

—Ruakh_TALK 01:39, 9 February 2014 (UTC)[reply]

After thinking about it, you do have a point. But I have to point out that not everything on Wiktionary is only for "readers". In fact, the idea of a wiki is that all "readers" are also potential editors. The "[edit]" links, for example, also do nothing to help readers. On the other hand, the link does not help a reader add a missing transliteration since it does not appear when the transliteration is missing. And I agree that it wouldn't be the end of the world if we simply got rid of the link altogether. --Wiki Tiki 89 06:18, 9 February 2014 (UTC)[reply]

Re: "not everything on Wiktionary is only for 'readers'": Absolutely; and I addressed that in my second bullet-point. But given the context of the link, I don't think its intent can have been anything like that, anyway. —Ruakh_TALK 06:48, 9 February 2014 (UTC)[reply]

If we standardize transliteration-appendix pagenames (as we do About Languagename pagenames for example), then {{head}} can check for existence and link to such.—msh210℠ (talk) 18:08, 9 February 2014 (UTC)[reply]

Yup, that's what it currently does, just with the Wiktionary namespace (checking for e.g. Wiktionary:Hebrew transliteration, which redirects to Wiktionary:About Hebrew#Romanizations) instead of the Appendix namespace. But. —Ruakh_TALK 18:30, 9 February 2014 (UTC)[reply]

Come to think of it, your bullet point 4, above, Ran, is a very good point. Perhaps if this is kept it should be as <small>[[appendix:Language transliteration|translit.:]]</small> fú (or some such) within the transliteration parens.—msh210℠ (talk) 18:37, 9 February 2014 (UTC)[reply]

Request for permission to merge[edit]

Discussion moved from WT:RFDO. --kc_kennylau (talk) 12:14, 9 February 2014 (UTC)[reply]

I would like to request for permission to merge the following templates to {{de-decl-noun-n}}:

If permitted, I will do the orphaning and merging by myself. --kc_kennylau (talk) 17:24, 7 February 2014 (UTC)[reply]

Bot permission[edit]

I'd like to request permission to run User:Asturianbot legally. He is to do the same thing that User:Asturbot did. Luckily, KassadBot sorts the page into the correct alphabetical order. --Back on the list (talk) 12:30, 10 February 2014 (UTC)[reply]

@Back on the list I am aware of the statement on the userpage saying that it has done testing on the page lladrar, but the page history of lladrar shows no evidence of Asturianbot making any changes to it. --kc_kennylau (talk) 12:58, 10 February 2014 (UTC)[reply]

It created entries for the inflected forms of lladrar. — Ungoliant ^(falai) 14:06, 10 February 2014 (UTC)[reply]

When is English not English?[edit]

I've noticed that "Gorsedd" is described as an English word, and its etymology is shown as "from Welsh"; similarly "satyagraha" is categorised as "English, derived from Sanskrit", although it seems to me that Gorsedd is simply a Welsh word denoting a Welsh institution, and Satyagraha is a Sanskrit word denoting a Hindu concept.

What is the reasoning behind this? Any language can borrow and eventually absorb foreign words, but if a word becomes English as soon as it is used in an English sentence, then every word in the world could be categorised as English!

For heaven's sake, we English-speakers are insular enough already!— This unsigned comment was added by Hoffoholi (talk • contribs).

WT:Criteria for inclusion describes how we decide what words are part of the language. Short version: a word has to be used three times. —Michael Z. 2014-02-10 23:13 z

Good question. English has a mechanism for indicating a word as foreign: italics. Thus, if a well-edited book (which italicizes foreign words) uses a word without italics, it's using that word as English. As Michael Z. notes, we use those citations to determine whether the word should be listed as English in the dictionary. On the other hand, on Usenet, for example, where (at least traditionally) italics aren't used, it's hard to show whether a word would be in italics were the medium conducive to it, so citations from Usenet that show use of a traditionally foreign in English are not useful in determining whether the word is English or not.—msh210℠ (talk) 16:13, 11 February 2014 (UTC)[reply]

Also, italics are used for other things, like emphasis, so if a word is printed in italics, it isn't always clear whether the author italicized it for emphasis or because he considers it foreign. —Aɴɢʀ (talk) 18:20, 11 February 2014 (UTC)[reply]

Indeed, I remember a Latinate (but ==English==) term being RFVed a year or two ago, which was found to be used in several old books, which italicized it — but also italicized month names like "July" and other English words! - -sche (discuss) 22:14, 15 February 2014 (UTC)[reply]

Even when italics are used, it's a use of the word nonetheless. Very often, people use italics when they don't find the word in their favourite dictionary. There cannot be any other criterion for inclusion for a given language than the use (not only the mention) in the language. Lmaltier (talk) 21:34, 15 February 2014 (UTC)[reply]

Note that italics are not (and cannot) be used in speech, and so I don't think they are a reliable factor in determining Englishness. --Wiki Tiki 89 22:16, 15 February 2014 (UTC)[reply]

Well, italics are an indication that a word is “not fully naturalized” in English. This includes nonce borrowings, but also many words that have a long English history, and are clearly within our criteria for English. If a word is usually italicized, we should include a usage note about this. Some print dictionaries will italicize the headwords of such entries (including the COD and CanOD – perhaps we should do so).

There are other characteristics of foreignisms, like indeterminate spelling, usually being accompanied by a gloss, etc. But there no clear proof of acceptance in the language, except maybe a sufficient frequency of use. —Michael Z. 2014-02-16 02:05 z

Indeterminate spelling doesn't mean much either. The word "Hannukah" can be spelled a million different ways, but there is only one accepted pronunciation. A word like "varenyky", on the other hand, likely has many variant pronunciations, and I would be more inclined to call it an unnaturalized. --Wiki Tiki 89 02:30, 16 February 2014 (UTC)[reply]

Proposal for Template:ttbc to accept unrecognized language names[edit]

Inspired by recent edits to [[ostrich]], I think {{ttbc}} should accept unrecognized language names. Not all users know all of our language name conventions and I think we should allow translations to be added such as "{{ttbc|Sami}}: [[struhcca]]" when the specific Sami lect is unknown. It would presumably categorize separately to something like Category:Translations to be checked (language unknown). --Wiki Tiki 89 13:46, 11 February 2014 (UTC)[reply]

It originally did work. I tracked down the problem to Template:langrev/Sami, which User:Kc_kennylau created a few days ago with the contents "smi". Because "smi" is a family code rather than a language code, it fails. I wonder why he created that subpage if he clearly didn't understand what it did. —CodeCa t 13:54, 11 February 2014 (UTC)[reply]

In that case, I think that if there is any sort of error in determining the language, then it should treat it as an unknown language. --Wiki Tiki 89 14:25, 11 February 2014 (UTC)[reply]

There are plans to replace {{ttbc}} with something else anyway, and then that won't be a problem anymore. Right now we're just waiting for someone to update WT:EDIT so that it supports the new format. There's a discussion about it on the GP. —CodeCa t 14:44, 11 February 2014 (UTC)[reply]

I've been commenting out most of the Sami translations, since they're the result of indiscriminate copying from another website. Chuck Entz (talk) 14:38, 11 February 2014 (UTC)[reply]

Commenting out is a bad idea. Things that are commented out will never be looked at again. There is no reason not to just move them to the checktrans section. --Wiki Tiki 89 14:49, 11 February 2014 (UTC)[reply]

Template to format quotations[edit]

Currently we only have {{usex}} to format quotations and usage examples alike. User:Angr said that there might be objections against using {{usex}} for quotations, but I'm not sure why exactly. Either way, we don't really have a template that can format quotations properly the way WT:QUOTE describes it. So I think that we should add parameters to {{usex}} that allow us to show sourcing information. We could also make a dedicated template like {{quote}}, {{quotation}} or something similar, but I'm not sure what the benefit of having a separate template would be over extending an existing one. —CodeCa t 20:58, 11 February 2014 (UTC)[reply]

What's wrong with {{usex}} the way it is now? --Wiki Tiki 89 21:01, 11 February 2014 (UTC)[reply]

We have at least 3 different templates to format quotations ({{Q}}, {{quote-book}}, {{reference-book}}). The benefit is, having separate templates allows us to easily distinguish between quotations that we made up from quotations that actually occur in the wild. That's not to say that both templates couldn't use the same back end- but they should absolutely be kept separate in entries. DTLHS (talk) 21:02, 11 February 2014 (UTC)[reply]

Yes, but they are mainly for formatting the citation line of the quotation, not the quotation itself. In fact {{quote-book}} is very bad at formatting the quotation itself for any language that doesn't use the Latin script. --Wiki Tiki 89 21:04, 11 February 2014 (UTC)[reply]

Right, and I'm all for anyone that feels that they can unify the multiple templates we already have into something better. DTLHS (talk) 21:08, 11 February 2014 (UTC)[reply]

Why are quotations hidden?[edit]

Currently we have a script that auto-hides quotations in a collapsible box, but I don't really know the reasoning behind this. I consider quotations to be a type of usage example; one taken from existing citations instead of made up on the spot. So I don't think quotations should be hidden, just like we don't hide usage examples that aren't quotations. If the size of the text is a concern, then we could decide to hide only that part of the quotation that is not also part of the usage example. Then again, usage examples the way they're produced by {{usex}} can become quite long too; they can have text, transliteration, transthat's 3 lines already. —CodeCa t 20:58, 11 February 2014 (UTC)[reply]

However long a usex can be, a quotation can be even longer. --Wiki Tiki 89 21:00, 11 February 2014 (UTC)[reply]

Then why not just choose a shorter one? —CodeCa t 21:12, 11 February 2014 (UTC)[reply]

Because sometimes we don't have a large number of attestations to pick and choose from, and sometimes you need some context to demonstrate the meaning you're attesting. I try to use whole sentences where I can, which (especially with 19C sources) doesn't help with brevity any. --Catsidhe ^{(verba, facta)} 21:18, 11 February 2014 (UTC)[reply]

There isn't a requirement to have verbatim-citable attestations in the entry, though. If there are no attestations that are well-suited to being used as a usage example, you could shorten it or do something else with it to make it fit better into the entry. —CodeCa t 21:22, 11 February 2014 (UTC)[reply]

I think lengthier quotations are often necessary to demonstrate the meaning of the word. --Wiki Tiki 89 21:27, 11 February 2014 (UTC)[reply]

Maybe, but then the same would apply to a usage example that isn't a quotation from somewhere else, right? So the problem is not specific to quotations alone, and we need to consider why quotations are hidden but other usage examples are not. They should really be collapsed based on length, not based on source. —CodeCa t 21:32, 11 February 2014 (UTC)[reply]

I am very much a fan of consistent behavior. So either have quotations / usexes always hidden, or have them never hidden. Changing the display state depending on length sounds like a recipe for confusion and frustration. ‑‑ Eiríkr Útlendi │ Tala við mig 21:44, 11 February 2014 (UTC)[reply]

@CodeCat: This seems so obvious. Usexes are brief and usually serve to orient the user at least as well as the definitions. Quotations have all the overhead of sources, dates, titles, etc. and may be present only for purposes of attestation. Putting them on the Citations pages just makes them a little harder to get to and runs the risk of losing the connection with specific senses for polysemic terms. As anyone who cites entries knows, we do not live in a world in which there are citations good for all usage example purposes and for all attestation purposes, especially not for every sense of every polysemic term. DCDuring TALK 21:49, 11 February 2014 (UTC)[reply]

Usage examples don't have to be brief, because quotations are an example of usage examples that are not always brief. And like Wikitiki said, sometimes you need a lengthy example to show the meaning, whether it's quoted or made up is not relevant then. I don't understand your point about Citations pages. Is "being hard to get to" is a reason for making our entries less accessible to the average user? Usage examples (and therefore also quotations) should serve the purpose of illustrating usage to normal users who want to know what the word means and how to use it. They're not there to quench the curiosity of the few users who are more interested in the documented historical record, because that's a minor use case that isn't really relevant to knowing what the word means. Documenting usage is what citations pages are for, and have always been for. If they're not good enough for their their intended purpose, then fix them where they are, rather than having the problem spill over into the entries themselves where they affect a lot more of our users. Keep citations on the citation pages, keep usage examples with the senses. —CodeCa t 21:58, 11 February 2014 (UTC)[reply]

I guess I fail to see what the harm is here. We have usexes to give the reader an at-a-glance feel for the word, how it's used, what it means, etc. We have quotations to demonstrate real usage, track the word's journey through time, etc. Quotations are simply bigger, and of less interest to the general user, but at the same time utterly indispensable to someone who really wants to drill in. It seems to me that the only argument you've offered in support of the assertion that anything's wrong is that it irks you in some vague way, which seems to me as insufficient impetus to make a major change to our presentation. -Atelaes λάλει ἐμοί 22:14, 11 February 2014 (UTC)[reply]

Leave quotations hidden, user examples unhidden. Quotations are too long and less useful to most users. --Anatoli ^{(обсудить}/^вклад) 22:49, 11 February 2014 (UTC)[reply]

@CodeCat, usage examples can be intentionally constructed in a way that conveys the meaning with less context. Quotations are more likely to be longer, because their primary purpose usually was not to demonstrate the meaning of the given term. --Wiki Tiki 89 00:40, 12 February 2014 (UTC)[reply]

I am in favour of hiding both the citations and usexes, because entries can get overly long like this. --Vahag (talk) 05:25, 12 February 2014 (UTC)[reply]

Oh, you may consider using {{der-top|example of using ձի}} or similar. --Anatoli ^{(обсудить}/^вклад) 05:37, 12 February 2014 (UTC)[reply]

No, that's not standard or allowed. --Vahag (talk) 05:43, 12 February 2014 (UTC)[reply]

Maybe a separate template could be used for a large number of examples? Up to 15 is probably OK without hiding. --Anatoli ^{(обсудить}/^вклад) 05:47, 12 February 2014 (UTC)[reply]

It is possible to make a template/script that automatically hides lines when their number exceeds a given threshold, for example displaying only the first 5 out of 20 by default. Dakdada (talk) 09:38, 12 February 2014 (UTC)[reply]

Notwithstanding CodeCat's apparent desire to conflate contributor-authored usage examples and published quotations by eliminating the distinct names of usage examples andquotations (or citations), it is very useful to shun her idiolect and maintain the names and the distinction made in Wiktionary discussions and in the current use of the word "quotations" in the control to allow or suppress display of quotations. The distinction is particularly meaningful for Wiktionary because "quotations" count for attestation and "usage examples" do not.

@Darkdadaah: I don't think we actually have any instances of more than half a dozen usage examples. Certain English function words have so many. It is an editorial decision as to how many such usage examples should be displayed. There may sometimes be too many. Assuming that designing Wiktionary for actual normal human users, there is much to be said for keeping human input in the design of entries and not allowing bot-implemented edits to supersede it. DCDuring TALK 13:55, 12 February 2014 (UTC)[reply]

Should quotations be normalized?[edit]

The quotations rules are too strict, IMHO, especially for foreign languages. The photographic image in quotations of the original text + transliteration may even be misleading for some language learners with less than native/advanced knowledge of a language, e.g. Russian words with letter "ё" are spelled with "е", Arabic text may lack hamza or "ي" written as "ى", word stresses are missing, the original orthography may be imperfect, dated or even wrong. --Anatoli ^{(обсудить}/^вклад) 22:49, 11 February 2014 (UTC)[reply]

Actually, I think exposing users to the way things are actually written is much more helpful than helping them read it. --Wiki Tiki 89 00:40, 12 February 2014 (UTC)[reply]

If that were the case, we wouldn't need transliteration (manual or automated), pronunciation sections, word stresses and other things used in foreign language entries. IMO, your comment just confirms that quotations are not there to help with the language but provide attestation that a term is/was actually used. --Anatoli ^{(обсудить}/^вклад) 00:55, 12 February 2014 (UTC)[reply]

No we do need pronunciation sections because one reason to look up a word is to find out how it's pronounced. Quotations are not there for pronunciation, but as examples of real usage. Showing real unenhanced examples of language helps users learn to read the language. --Wiki Tiki 89 01:19, 12 February 2014 (UTC)[reply]

@Anatoli, re "the original orthography may be imperfect, dated or even wrong": de.Wikt routinely normalises quotations for this reason. I don't necessarily agree with that practice, but I recognise that even en.Wikt normalises quotations quite a bit (e.g., we don't reproduce books' use of different font colours, and we don't often reproduce their use of different font sizes or fonts (Fraktur vs Antiqua), and if they are Latin-script, we don't reproduce their use of obsolete ligatures). - -sche (discuss) 01:42, 12 February 2014 (UTC)[reply]

We should omit purely stylistic typographic expression, but include anything with lexicographical or semantic significance. Modern use of italics, for example, often carries meaning. I don’t think I’d agree with a blanket ban on historic fonts or glyphs. —Michael Z. 2014-02-12 04:08 z

Another point is, Wilktionary is a dictionary, not a legal firm or an archaeological company. It's about words and grammar, not facts. Of course, learners need to be exposed to the real life situations but that's not the purpose of dictionaries. --Anatoli ^{(обсудить}/^вклад) 01:58, 12 February 2014 (UTC)[reply]

That's a valid point. But I think that when we should avoid non-trivial interpreting of quotations if no real problem is being solved. And I don't think that unvocalized text is a "problem". --Wiki Tiki 89 02:05, 12 February 2014 (UTC)[reply]

I see no reason in reproducing obvious typos, for example, especially in modern languages. Unvocalised texts is not a problem (hamza and dotted yāʾ are not part of vocalisation but a more standard and strict way of writing), so is Russian letter "ё". Adding Arabic diacritics is cumbersome but adding word stresses and standardise spellings a bit is not a big issue, really. The same text may appear in a reprint/example for children/foreigners with word stresses and vocalisation. --Anatoli ^{(обсудить}/^вклад) 02:11, 12 February 2014 (UTC)[reply]

Wiktionary’s quotations are there to demonstrate the original author’s usage. Not some amateur lexicographer’s improved version of it. If you use a quotation with a typo, then make an editorial note. —Michael Z. 2014-02-12 04:08 z

Thanks - I thought I should ping you on this as someone who opposes any changes to citations. I'll give you an example. The entry оне́ (oné) demonstrates the usage of the archaic Russian pronoun (they, feminine plural), which is now они́ (oní) (they, both genders) in all modern reprints.

Original (A. Pushkin): ...и завидуютъ онѣ государевой женѣ (...i zavidujut oně gosudarevoj ženě)
Modern reprint: ...и завидуют оне государевой жене (...i zavidujut one gosudarevoj žene)
"Improved":...и зави́дуют оне́ госуда́ревой жене́ (...i zavídujut oné gosudárevoj žené)

Providing stresses doesn't change anything and "оне́" is the same as "оне", they demonstrate that оне́ (oné) rhymes with жене́ (žené) (dative of жена́ (žená)). Because of the rhyming and the word stress the word cannot be replaced with modern они (oni), as it was done with other occasions of оне (one). Word stresses (and other phonetic markings) are especially critical for poems. Long foreign language native citations are difficult and not so helpful without any small normalisation like this. As I said, it's not a legal document but a dictionary, it's about words, not facts. The so-called "improved" or "amateurish" version may appear in children's books or books designed for foreigners (Wiktionary is also designed for people of all ages and language levels). Finding such reprints using Google is not easy but I have seen many of them.--Anatoli ^{(обсудить}/^вклад) 04:40, 12 February 2014 (UTC)[reply]

I think I have mentioned before that for poetry it's more ok to add stress marks. Adding stress marks to poetry is much easier, much more useful, and much more likely to actually be attestable with stress marks. For prose, I think adding stress marks is completely ridiculous. --Wiki Tiki 89 17:28, 12 February 2014 (UTC)[reply]

I don't see how poetry is different from prose when users need to be able to read a quotation. Are vocalised quotations from Torah or Qur'an any different from unvocalised? Stress marks, vowel points, Japanese furigana don't add any meanings or change the style of the original, they are simply there to help to read words correctly. There's nothing ridiculous about stress marks in the prose, you should probably see Russian books designed for children and foreign learners. I see stress marks in the Slavic languages, vowel points in abjad languages, Japanese furigana, Mandarin pinyin (used in various editions) are used interchangeably to help to pronounce words correctly. If I find two versions of Harry Potter in Japanese (with and without furigana), I'll post here. --Anatoli ^{(обсудить}/^вклад) 22:32, 12 February 2014 (UTC)[reply]

But children's books usually don't contain ordinary prose. They're usually filled with simple rhymes like На́ша Та́ня гро́мко пла́чет: / Урони́ла в ре́чку мя́чик., and even then don't always have stress marks (for example, here's a page from the same Букварь that I grew up with. --Wiki Tiki 89 23:33, 12 February 2014 (UTC)[reply]

It's a recent story book, not an ABC-book, which I have :) (it only has 20 pages or so). Old and some new букварь's all use stress marks and diaeresis, they don't have too. I'm just saying they may. Not just schoolbooks but stories and novels. --Anatoli ^{(обсудить}/^вклад) 02:10, 13 February 2014 (UTC)[reply]

Here's an example of a textbook for foreigners: [1] --Anatoli ^{(обсудить}/^вклад) 02:45, 13 February 2014 (UTC)[reply]

I confused it with this, which (I think) is what I had as a kid. Of course it's been ages since I've seen it. --Wiki Tiki 89 06:27, 13 February 2014 (UTC)[reply]

But more to the point, it's not always possible to tell what the intended stress was in words that can be stressed in more than one way. In poetry, it is easy to tell where the stress is (even if the stress is technically in the wrong place) because of the meter and the rhyme. In prose, there are absolutely no hints to the stress. So of course if the original author included stress marks, such as in your textbook for foreigners example, then there is no problem. But if the original author did not include stress marks, then for all you know he could have intended "документы" to be mispronounced as "доку́менты". --Wiki Tiki 89 06:33, 13 February 2014 (UTC)[reply]

See my comments below about the quotes from Qur'an. My point is, despite what WT:QUOTE says about "cafe" vs "café", providing stress marks, vowel points or ruby (Japanese, Mandarin) doesn't change the original style or meaning. It only serves to help in reading unknown texts. Words having multiple accents, like "до́гово́р", "апо́стро́ф", etc. can be marked with dual stresses or just by deciding on the most common/standard or intended stress (if it was pronounced) or, if really in doubt left without any stress. I'm not suggesting to make stress marks mandatory but "За сове́ты Чи́чиков благодари́л, говоря́, что при слу́чае не преми́нет и́ми воспо́льзоваться, а от конво́я отказа́лся реши́тельно, говоря́, что он соверше́нно не ну́жен, что ку́пленные им крестья́не отме́нно сми́рного хара́ктера, чу́вствуют са́ми доброво́льное расположе́ние к переселе́нию и что бу́нта ни в како́м слу́чае между́ ни́ми быть не мо́жет." is more helpful linguistically than "За советы Чичиков благодарил, говоря, что при случае не преминет ими воспользоваться, а от конвоя отказался решительно, говоря, что он совершенно не нужен, что купленные им крестьяне отменно смирного характера, чувствуют сами добровольное расположение к переселению и что бунта ни в каком случае между ними быть не может." and doesn't violate the original text in any way (apart from being normalised with the modern orthography). (N. Gogol, Dead Souls, 1842) --Anatoli ^{(обсудить}/^вклад) 07:00, 13 February 2014 (UTC)[reply]

Quotations also demonstrate usage. And I’m not sure what you mean by style, but altering the text as you advocate changes the style of writing and typography. Quoting means not editing.

Have readers been asking us to “help” them read quotations?

If you are unhappy with WT:QUOTE, please propose some changes. —Michael Z. 2014-02-13 18:12 z

So a quotation from an un-cited, un-dated (20th-century?) reprint, misleads our readers into thinking that the 1831(?) usage of Pushkin’s publisher was “оне́” and not “онѣ”. Is this not an example of how to fail as a descriptive dictionary and as a historical dictionary? —Michael Z. 2014-02-12 18:04 z

I understand what you mean but the reality is that Russian language reform happened nearly 100 years ago. No-one uses old spellings and obsolete letters but the old literature, including Pushkin's "Сказка о царе Салтане" is available for today's readers. If it is important to quote the pre-reform spelling, it's a different story (e.g. to demonstrate original spelling rules, or obsolete letters, if they are available). Besides, the quote is for "оне́", not "онѣ́". Confucius's works are available in simplified Chinese and would be quoted in a simplified form on a simplified Chinese entry, even if simplified was not used in his time. I'm sure it's a similar situation with many languages. --Anatoli ^{(обсудить}/^вклад) 22:32, 12 February 2014 (UTC)[reply]

How that a reason to change quotations or include misleading citations? —Michael Z. 2014-02-12 22:44 z

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ What do you mean by "misleading"? Which reason do you mean? "и завидуют оне государевой жене" in books. Or search "如杀无道"... (simplified)/ "如殺無道" ... (traditional) in Google books (+ Confucius/孔子). --Anatoli ^{(обсудить}/^вклад) 22:58, 12 February 2014 (UTC)[reply]

The so-called quotation in оне implies that Pushkin wrote “оне́”. But in fact he wrote “онѣ”, the Bolsheviks melted down the ѣ types decades later, so some book printed “оне”, and then a Wiktionary editor enhanced the quotation by indicating stress to yield “оне́”. Here is a quotation whose purpose is to demonstrate usage, showing 21st-century schoolbook style, with no date, and the title of an 1831 publication. Changing quotations and leaving out accurate citations lacks integrity and is poor academic practice. It is also against our guidelines.

WT:QUOTE says

“The date corresponds with original authorship, the time that the citation was put into the exact words quoted.” – Please don’t omit dates in citations. Please don’t quote a 2000s children’s edition and cite it as if it were published in the 1830s. (If someone updated the orthography, then it is a new edition, not a reprint.)
“Reproducing the spelling is important” – please don’t change 1830s quotations to 1960s spellings. Don’t change ѣ to е, or е to ё.
“The presence or absence of diacritics, and which diacritic(s) are used is important” – please don’t add schoolbook stress marks to quotations. If it is important to indicate stress, add them to the transliteration, or indicate stress in editorial notes.

—Michael Z. 2014-02-12 23:21 z

We do the same thing in English with Shakespeare quotes, we use the modern normalized spellings rather than the original. --Wiki Tiki 89 23:33, 12 February 2014 (UTC)[reply]

Says who? I see Shakespeare’s “star-cross'd” quoted in star-crossed.

We should strive to quote and cite accurately and in detail. As a historical dictionary, we should be useful to anyone studying Shakespeare or Pushkin, or the history of English or Russian usage. —Michael Z. 2014-02-13 00:17 z

That's not a counterexample. If you've ever looked at older printings of Shakespeare (old as in hundreds of years), you would see things like (as printed here):

Gregorie, of my word Ile carrie no coales.

2 No, for if you doo, you ſhould be a Collier.

1 If I be in choler, Ile draw.

2 Euer while you liue, drawe your necke out of the collar.

I added emphasis to the spellings that would today be normalized, not including trivial differences like u/v and s/ſ. --Wiki Tiki 89 00:44, 13 February 2014 (UTC)[reply]

I must agree with Michael to some extent that a quotation should be a faithful quote -- no editing, no spelling fixes.

That said, to provide another (hopefully useful) point on the graph, when I dig up old Japanese for quotation purposes, I try to find the original rendering. For quotes from the w:Man'yōshū, for instance, this is entirely in kanji, using different kanji from most modern usage, and with no kana at all. I then make use of extra lines to give 1) a transliteration into kana, 2) a transliteration into the Latin alphabet, and 3) a translation into modern English. See [[鶴#Noun]] or [[牙#Noun]] for two such examples.

For quotations in other languages, if desirable, I would support the addition of an extra line just below the quote itself to provide a modernized rendering of the quote. And if the quote in question is in a non-Latin script, I would also strongly recommend that we add a transliteration into the Latin alphabet, if at all possible.

With regard to WikiTiki's point, I think what Michael objects to is wikieditors altering quotes. Shakespearean terms can be found in published works in many different forms, so choosing one such published example should suffice. What I would certainly object to would be finding a quote and then altering it yourself as you type it in here to change spelling or diacritics, etc. If needed for clarity, add the altered version after the quotation, but not as the quotation. ‑‑ Eiríkr Útlendi │ Tala við mig 01:04, 13 February 2014 (UTC)[reply]

I agree, but I think you are wrong about Michael's point. The Russian examples that Anatoli gave are the exact equivalent of my example of Shakespeare. The modern orthography has been used in almost all reprints of older texts since the Russian Revolution, and the citation was of a reprint. --Wiki Tiki 89 01:10, 13 February 2014 (UTC)[reply]

Perhaps I got muddled? I was responding (I think) to the mention of added diacritics to show stress, which seems to have been done by a wikieditor. That should (in my view) go on a second line, and not be given as the quotation itself. ‑‑ Eiríkr Útlendi │ Tala við mig 01:36, 13 February 2014 (UTC)[reply]

@Eirikr I wonder what you think of furigana in Japanese quotations? Do you really believe that furigana or ruby would somehow modify the original meaning or style? Would a quotation like this really break any rules of quotation?

恐(おそ)れ入(い)ります

osore irimasu

I beg your pardon, I'm very much obliged

.

(I'm using usex templates for sample quotatios.) The published books can definitely go both ways, with or without (usually without). Russian word stresses, Arabic/Hebrew diacritics are nothing but a means to help reading. They don't change the style or meaning of the original. Should I quote Qur'an? بسم الله الرحمن الرحيم and بِسْمِ ٱللهِ ٱلرَّحْمٰنِ ٱلرَّحِيمِ (bi-smi llāhi r-raḥmāni r-raḥīmi) is available with or without diacritics.

I find original Shakespeare impossible to read. Russian hasn't changed so badly but most people don't understand archaic letters. It doesn't matter now, who made the language reform, Communists or Nazis, the reform went ahead. Russian is not a dead language, so words said by someone in the past are now available and more common in reprints and new editions with a new spelling with currently standard spelling, letters and glyphs. --Anatoli ^{(обсудить}/^вклад) 02:10, 13 February 2014 (UTC)[reply]

A quotation of Shakespeare in modern orthography, without a real citation, is a usage example (of modern Shakespeare), not a quotation. It has its place, but this doesn’t contribute to our role as a historical dictionary.

If our reader is to learn about Shakespeare’s English and how it has changed over four and a half centuries, then we must provide accurately dated, verbatim quotations from various periods. —Michael Z. 2014-02-13 18:30 z

So you would be in favor of changing the quote at star-crossed to "From forth the fatall loynes of theſe two foes, / A paire of ſtarre-crost Louers tooke their life:"? --Wiki Tiki 89 18:37, 13 February 2014 (UTC)[reply]

I would be in favour of having quotes on every citation page showing a term’s introduction, development, and the breadth of its meaning and usage. Of course I would love to see its first use and a Shakespeare quotation from every century.

Specifically, I don’t think a citation with 21st-century spellings dated “1592” belongs in a historical dictionary. —Michael Z. 2014-02-14 04:16 z

To complicate matters further, the spellings I gave above are not Shakespeare's original spellings, but normalizations made by the publishers at the time. --Wiki Tiki 89 04:25, 14 February 2014 (UTC)[reply]

Well, most published work has also been edited. Although citing different editions could show the range of spellings once once acceptable that are now more standardized. —Michael Z. 2014-02-15 00:49 z

@Atitarev:

Sorry I missed that earlier. Furigana as-is in usexes, I'm fine with -- usexes can often be created by wikieditors, and don't purport to be sourced from any particular published text. For quotations, I really don't think we should be putting furigana right in the quote. See again my preferred format for providing quotation readings as illustrated at [[鶴#Noun]] or [[牙#Noun]].

@Atitarev, Wikitiki89, and others:

To reiterate (and hopefully clarify), I feel quite strongly that the main line of the quotation should give the text as-is from the published work. If the published work has typos, misspellings, non-standard orthography, missing vowel points, etc., that should be faithfully reproduced in the main line of the quotation. Since such oddities may indeed be difficult for users to read, I also feel quite strongly that we (wikieditors) should have the option of also including a second line just beneath the quotation, wherein we can give reading assistance -- kana, diacritics, modern spellings, stress markers, what have you. And if the quotation is in a non-Latin script, we should then give a transliteration into the Latin alphabet, followed finally by the translation.

If an editor doesn't like the format of a quotation (in terms of missing diacritics, obsolete spellings, etc.), they have the option of finding a published work that provides the quotation without such shortcomings -- so long as we are sourcing from a published work (and appropriately referencing / pointing to that work), I think we're fine. For example, if we can find a published Qur'an that includes the vowels and other diacritics, we can choose to use that version, so long as we reference it. Quoting the Qur'an with vowels and diacritics, but referencing a version that doesn't have those, sounds to me like an error.

I hope that makes my position clearer? ‑‑ Eiríkr Útlendi │ Tala við mig 18:59, 13 February 2014 (UTC)[reply]

for fun[edit]

Just for fun, how many languages can you pick out by ear?: http://greatlanguagegame.com/ —Stephen ^(Talk) 03:25, 13 February 2014 (UTC)[reply]

Cool! I just played it several times and my highest score was 850. --Wiki Tiki 89 07:26, 13 February 2014 (UTC)[reply]

I scored 700. The only languages I mixed up were either closely related (Czech/Serbian, Urdu/Panjabi) or share areal features (Tamil/Gujarati). —Aɴɢʀ (talk) 10:29, 15 February 2014 (UTC)[reply]

Czech/Serbian wasn't a problem for me, but Macedonian/Serbian was. But what confused me most were the Indian, African, and the non-CJKV East Asian languages. --Wiki Tiki 89 10:38, 15 February 2014 (UTC)[reply]

I admit I only guessed Tigrinya correctly because I heard the speaker say something that sounded like Afrika and there were no other African languages suggested. —Aɴɢʀ (talk) 11:41, 15 February 2014 (UTC)[reply]

Bot vote[edit]

Hi all. I'd appreciate some discussion about the vote from my Asturian inflectobot. I'm the only one who's touched the page so far! --Back on the list (talk) 10:29, 14 February 2014 (UTC)[reply]

We need a good Asturian inflection bot, but there does not seem to be much confidence in your encoding logic. I can’t write bots myself, so I can only rely on what others think. —Stephen ^(Talk) 22:54, 15 February 2014 (UTC)[reply]

It doesn't matter too much anyway. There are other ways to mass-add the conjugated forms. At the moment, thousands upon thousands of semi-automatically-created Asturian conjugated form entries. The result is essentially the same, just it's slower and less fun. --Back on the list (talk) 16:02, 21 February 2014 (UTC)[reply]

Request edit for Module:labels/data[edit]

Please remove these lines:

labels["in singular"] = "in the singular"
labels["in dual"] = "in the dual"
labels["in plural"] = "in the plural"

Please replace them with these lines:

labels["in the singular"] = {display = "in the [[singular]]"}
aliases["in singular"] = "in the singular"
labels["in the dual"] = {display = "in the [[dual]]"}
aliases["in dual"] = "in the dual"
labels["in the plural"] = {display = "in the [[plural]]"}
aliases["in plural"] = "in the plural"

--kc_kennylau (talk) 08:14, 16 February 2014 (UTC)[reply]

Done. Keφr 10:24, 22 February 2014 (UTC)[reply]

@Kephir Thanks. --kc_kennylau (talk) 11:04, 22 February 2014 (UTC)[reply]

Arabic script for w:Dungan language?[edit]

I ran into Category:Dungan terms needing native script in Special:WantedCategories, and started to create it. Then I noticed that both members were already in Cyrillic, which we treat as the only script for the language. Further digging turned up the fact that it was originally written in a Perso-Arabic-based script referred to by its Chinese name of w:Xiao'erjing, but that Soviet Union required speakers to switch to Cyrillic (with a Roman alphabet in between). The {{rfscript|sc=Arab}} was added by User:Hippietrail, so it was a serious request- not the work of some POV-pushing islamist IP.

I think we should add it as a secondary script, but the question is: does our infrastructure associated with the Arab script code cover Dungan Xiao'erjing (apparently truly alphabetic) adequately, or do we need to create a separate script code? Chuck Entz (talk) 00:42, 17 February 2014 (UTC)[reply]

Our aim is to include all words in all languages, but also from all of their stages. So before and after every spelling reform, writing system change, etc. We did already have at least one or two Dungan terms in Arabic script before I added these requests. Let me see if I can find them ... — hippietrail (talk) 03:29, 17 February 2014 (UTC)[reply]

— hippietrail (talk) 03:32, 17 February 2014 (UTC)[reply]

Suggestions for Chinese tone contour categories[edit]

I'm going to start adding categories for "tone contours" for multi-character Chinese terms. I'd like to hear some opinions from other contributors.

By tone contours I mean grouping together all two-character words that have rising tone for the first character and falling tone for the second character, etc.

The obvious name for the categories is "Chinese tone contours "2-4" etc. But maybe there are better ideas.

Tone symbols could be used but that would only make it trickier to enter and use and the tone numbers are widely used and understood, and the method will work for other tone languages that use tone numbers.

The second issue is the neutral tone. Would "5" or "0" be preferred? I always liked "0" better, but "5" seems more popular.

A third issue is whether to group or separate traditional and simplified. I prefer to have just one category and the trad. and simp. forms should sort next to each other anyway. But for that would we need to include explicit sort information?

The worst issue though might be tone sandhi. The point of these categories is to help learners with tones. Most learners can reproduce correct tones in isolated syllables but even many fluent foreign speakers of Chinese end up with very flat tones in continuous speech. So I have been told both by native speaking friends and advanced learners.

So it seems that to make the category most useful for this purpose we should categorize by tones after sandhi. But to fit in with how we and other dictionaries generally treat tone sandhi we should categorize by tones before sandhi.

Maybe a compromise is to categorize by tones before sandhi and explain the process in the cat. boiler and link to any other categories which would end up with the same tone contour after sandhi.

So let's hear other people's thoughts. — hippietrail (talk) 01:53, 18 February 2014 (UTC)[reply]

I think the information about what the pre- and post-sandhi tones are is best covered in the entries themselves. Categories would be most useful to link those that have the same pattern of change in the tones: 33->23 or whatever. Chuck Entz (talk) 02:42, 18 February 2014 (UTC)[reply]

I don't like this idea. Wiktionary is a dictionary, not a language textbook. There is no Category:English words with counterintuitive pronunciations, Category:English pronunciations containing θ, or Category:English words with dark l. Wyang (talk) 08:37, 18 February 2014 (UTC)[reply]

Yes but you don't like many established features of Wiktionary. Wiktionary is a dictionary. It's a dictionary with lots more stuff in it than in some other dictionaries. This would be a type of index or appendix and Chinese dictionaries already have several kinds of indices and appendices that English dictionaries don't use. I could easily list three categories I personally don't like too. But that would be a separate discussion. And like any feature I don't care for I don't use those and don't contribute to those. Simple.

By the way I don't think any of your examples are bad ideas for categories. Certainly no worse than some categories I've seen and probably useful to some people. — hippietrail (talk) 09:58, 18 February 2014 (UTC)[reply]

The fact that I don't like many of the current practices doesn't mean my viewpoint should be viewed as more likely to be ridiculous than other people's. How much benefit will be gained by language learners from the existence of these categories, considering less than 5% of the current entries actually have audio samples? Wouldn't it be immeasurably easier for learners to resort to actual audio material elsewhere to get a feel for the native intonations than to look at these categories trying to figure out which tonal contour group a word belongs to? Who is going to be populating these categories with the 30000+ currently existing entries and who will be maintaining them, and how? Do these apply to disyllabic combinations only or do they apply to other multisyllabic combinations too (>4*5*5=100 possibilities for trisyllabic, >4*5*5*5=500 possibilities for tetrasyllabic)? Is the categorisation "Words with the tonal contour 4-1-3-2" meaningful? What about other Chinese varieties, eg. Min Nan (>8*8*8=512 trisyllabic for Taiwanese), or Cantonese (>9*9=81 disyllabic, >9*9*9=729 trisyllabic), especially ones with drastically different mechanisms of tone sandhi from Beijing Mandarin (eg. Min Nan, Shanghainese)? I don't mean to be picky but here are just some complications associated with such categorisations, which are the reasons I disliked this idea. Wyang (talk) 10:22, 18 February 2014 (UTC)[reply]

I suspect you never had to learn Chinese as an adult so you can only make guess at what is easy or difficult or useful. You're not the target audience so don't need to be annoyed with a feature you would never need or use.

Quoting what's missing only tells us what can be improved. We don't have enough audio. It's not directly relevant. One thing is that only native or very good speakers can contribute audio whereas anybody with some resources can contribute to the suggested categories. It could perhaps be that because native speakers wouldn't benefit from audio that they don't feel compelled to put the effort into creating them. I can point at any number of fluent nonnative speakers with very poor tones who had audio materials but didn't get a feel for the native intonations in polysyllabic words. Because the materials don't abstract the contours.

That fact that you imagine learners looking at categories trying to make guesses who how hard it is far a native speaker who has native intuition of tones to imagine the difficult learning process of a foreign learner. This is not what they'd do at all. They would look at the category, find some terms they recognize or think are useful, then ask somebody to say each of them to try to tone their ear to their shared aspect, the tone contours.

Can you point to a resource elsewhere which groups audio resources by tone contours? I agree that would be very useful indeed. I haven't been able to find one.

Open source is populated by people with an itch to scratch. People who are interested in populating the categories populate them, just like everything else on Wiktionary was created. You don't have to do any of it.

It would apply to any Mandarin terms of more than one syllable. Calculating numbers of combinations and permutations doesn't illustrate the futility of a category any more than you could do the same to illustrate the futility of attempting any kind of open source / data project. You could've used such an argument to illustrate that nobody would create Wiktionary or Wikipedia. It would've been wrong. There don't need to be categories for every possible sequence of tones because not all possible sequences are in use. Just as we don't have entries for pinyin syllables which are not in use.

If anybody wants to create similar categories for another sinitic or another tonal language I would support them. If nobody wants to create them, nobody will create them. There is no problem.

Another aspect of the supposed combinatorics problem is that it would be quite an easy project for a bot, if it were to capture anybody's attention. — hippietrail (talk) 12:04, 18 February 2014 (UTC)[reply]

Instead of categories, why not create appendices? A link of the appropriate appendix could be added to the actual entry under the See also header. While I like categorization, unfortunately, categories do not show up in the mobile view. A bot could monitor the appendices to make sure that the appendix items and the actual entries are in sync. --Panda10 (talk) 13:40, 18 February 2014 (UTC)[reply]

Yes I intended to fall back to indices if there was substatial objection to categories. Categories are generally better though because they do some of the work for you and work in a uniform manner.

It could be a good policy to not allow new categores that have not first been implemented and approved as indices though. And perhaps not even allow new public indices until they have first been implemented and approved as private indices.

As for category support lacking in the mobile app, that is disappointing to hear. But forcing people not to implement features because implementations are lacking is a bit like refusing to buy a smartphone because they don't have good Wiktionary apps. The correct approach is to use the better features and push to have implementations support them. See if there's a bug report to watch, or if not then file a bug report.

Since it's a rainy day today I'll look at making some sample private indices using my personal vocabulary study lists as subpages of my user page. — hippietrail (talk) 07:16, 19 February 2014 (UTC)[reply]

Here's my first sketch of what such a list could look like:
User:Hippietrail/Mandarin tone contours

— hippietrail (talk) 08:10, 19 February 2014 (UTC)[reply]

The following tables are excerpted from the article on the Tianjin dialect in Chinese Wikipedia I wrote a while ago, which is spoken ~100 km east of Beijing and which is characterised by distinctly different tonal values and tone sandhi patterns from Beijing Mandarin. The Beijing dialect is much simpler than this. I think the amount of information contained within these tables (at most, with in-table audios and examples in Table 2) is optimal for appendix content for tone sandhi patterns on Wiktionary, if it is to exist.

Tone sandhi patterns of disyllabic words in the Tianjin dialect
_{1st syllable}＼^{2nd syllable}	Dark level (21) （L）	Bright Level (45) （H）	Rising (24) （R）	Departing (53) （D）
Dark level (21) （L）	RL (24 21) 观音开车	中华金银	天主生产	金库希望
Bright level (45) （H）	桃花回家	红糖长城	鞋底良好	罗汉鞋店
Rising (24) （R）	火车紧张	LH (21 45) 主人找钱	HR (45 24) 选举总理	LD (21 53) 手段讲话
Departing (53) （D）	HL (45 21) 汽车送书	汽油问题	怕死市长	LD (21 53) 世界运动

Tone sandhi patterns of trisyllabic words in the Tianjin dialect
_{First two syllables}＼^{3rd syllable}	Dark level (21) （L）	Bright level (45) （H）	Rising (24) （R）	Departing (53) （D）
LL (21 21)	LRL (21 24 21)	RLH (24 21 45)	RLR (24 21 24)	RLD (24 21 53)
LH (21 45)	LHL (21 45 21)	LHH (21 45 45)	LHR (21 45 24)	LHD (21 45 53)
LR (21 24)	LRL (21 24 21)	LLH (21 21 45)	LHR (21 45 24)	LLD (21 21 53)
LD (21 53)	LHL (21 45 21)	LDH (21 53 45)	LDR (21 53 24)	RLD (24 21 53)
HL (45 21)	HRL (45 24 21)	HLH (45 21 45)	HLR (45 21 24)	HLD (45 21 53)
HH (45 45)	HHL (45 45 21)	HHH (45 45 45)	HHR (45 45 24)	HHD (45 45 53)
HR (45 24)	HRL (45 24 21)	HLH (45 21 45)	HHR (45 45 24)	HLD (45 21 53)
HD (45 53)	HHL (45 45 21)	HDH (45 53 45)	HDR (45 53 24)	HLD (45 21 53)
RL (24 21)	HRL (45 24 21)	RLH (24 21 45)	RLR (24 21 24)	RLD (24 21 53)
RH (24 45)	LHL (21 45 21)	LHH (21 45 45)	LHR (21 45 24)	LHD (21 45 53)
RR (24 24)	HRL (45 24 21)	HRH (45 24 45)	HHR (45 45 24)	HLD (45 21 53)
RD (24 53)	LHL (21 45 21)	LDH (21 53 45)	LDR (21 53 24)	RLD (24 21 53)
DL (53 21)	DRL (53 24 21)	HLH (45 21 45)	HLR (45 21 24)	HLD (45 21 53)
DH (53 45)	DHL (53 45 21)	DHH (53 45 45)	DHR (53 45 24)	DHD (53 45 53)
DR (53 24)	DRL (53 24 21)	DLH (53 21 45)	DHR (53 45 24)	DLD (53 21 53)
DD (53 53)	LHL (21 45 21)	LDH (21 53 45)	LDR (21 53 24)	HLD (45 21 53)

Wyang (talk) 12:49, 19 February 2014 (UTC)[reply]

Pre-Roman substratum language code nargery[edit]

Following up on a question I posed last year: should und-ibe ("pre-Roman (Iberia)") and und-bal ("pre-Roman (Balkans)") be moved from Module:languages/datax to Module:etymology language/data? Module:languages is for languages that are allowed entries (whether in the main namespace or in appendices); Module:etymology language is for lects that are only mentioned in etymologies and are not allowed entries. Given that both und-ibe and und-bal potentially represent multiple unrelated languages, would it ever be appropriate to have entries/appendices in either? If so, then I guess they should stay in Module:languages; if not, then it seems like they should be moved to Module:etymology language. "Pre-Greek", which seems conceptually related, is in Module:etymology language. - -sche (discuss) 06:06, 18 February 2014 (UTC)[reply]

Support. Incidentally, is it possible to make these codes categorise as pre-Roman (foo) but only display pre-Roman? — Ungoliant ^(falai) 23:19, 24 February 2014 (UTC)[reply]

I've moved the codes.
I don't think there is a way, within the current framework, to make pages display one name and categorise using another. Besides, so long as we consider "pre-Roman (Iberia)" and "pre-Roman (Balkans)" separate enough to warrant separate codes and categories, I think that it's appropriate that they also display distinct names. - -sche (discuss) 07:06, 1 March 2014 (UTC)[reply]

Semi-related. Has the question of a general pre-Ide. category ever been brought up (i.e., not paleo-Balkan or pre-Itallic but a general category usable for whole of Europe)? I think I searched for this a while ago but didn't come up with any discussions. At present there are a couple of entries with a (referenced) proposed pre-Ide. etymology (zaķis is one that I recall). I could potentially have one Livonian entry with a detailed reference ultimately tracing it to a possible pre-Ide. substratum (via Curonian though) and I'd love to have it classified as such.

Pre-Ide. is notorious for attracting pseudo-science but since such a category would have maybe a couple dozen items at most I think it would be pretty easy to check it for offenders and ax them on sight. If this hasn't been discussed in the past ad infinitum, ad nauseam (which didn't seem to be the case) perhaps it would be better to create a new section in BP? Neitrāls vārds (talk) 23:08, 13 March 2014 (UTC)[reply]

I use the qfa-sub (substrate) code in such cases, e.g. in մուր (mur). --Vahag (talk) 07:02, 14 March 2014 (UTC)[reply]

Thanks, what I needed! The only problem – the w: link directs to a redirect to w:Stratum (linguistics) which has zero relevance, it should link to w:Pre-Indo-European languages, could this be changed? Neitrāls vārds (talk) 04:48, 15 March 2014 (UTC)[reply]

No, qfa-sub is not just for Pre-Indo-European languages. It is the generic code for any substrate language. --Vahag (talk) 07:37, 15 March 2014 (UTC)[reply]

But what is there besides ine? Uralic is the other well represented family and I have yet to see any concrete speculations of "pre-Uralic" etymologies (they always stop at "...source is not clear.") Maybe I could inquire with User:-sche on the possibility of introducing a general pre-ine code. My Livonian word is via Baltic so as I said wouldn't even touch "pre-Uralic" (which seems nonexistent) it supposedly has cognates in Romanian so I could use pre-Balkan but that would be mildly retarded because Eastern Baltic is not Balkans, lol. Neitrāls vārds (talk) 04:27, 16 March 2014 (UTC)[reply]

Substrate languages occur world-wide, and can potentially be detected anywhere the historical phonotactics have been worked out. In general, it's not a good idea to make categorical assumptions based on the extremely fragmentary nature of our information on most of the languages of the world. How well do you know Sino-Tibetan or Afro-Asiatic or Niger-Congo or Pama-Nyungan or Uto-Aztecan or Na-Dené or Algic or Northwest Caucasian, or even Turkic? Can you guarantee that none of them will ever have substrates detected that are unidentifiable to family? Chuck Entz (talk) 06:05, 16 March 2014 (UTC)[reply]

I'm not sure what you're getting at. I don't think anyone is proposing to get rid of the generic "qfa-sub". Rather, Neitrāls vārds wants an additional, more specific code for "pre-Indo-European", on the model of "pre-Roman (Iberia)". I suppose such a code would be as useful as pregrc (sic) and und-ibe and und-bal. As I commented in a previous thread, we should perhaps develop a naming scheme for these codes, though; you'll notice how variant the codes currently are. Perhaps "pre-grc", "pre-ine", "pre-rib" (pre-Roman (Iberia)) and pre-rbk (pre-Roman (Balkans))? Or perhaps three-part codes are in order, "sub-pre-grc", "sub-pre-ine", "sub-pre-ibe", "sub-pre-bal"? - -sche (discuss) 06:36, 16 March 2014 (UTC)[reply]

I was (over-)reacting to the first sentence, not the rest of it. Chuck Entz (talk) 07:03, 16 March 2014 (UTC)[reply]

(reset indent) Lol at the last comment :D (well, the "what is there besides ine" was intended to be semi-facetious as I'm myself mostly interested in Uralic.) Yes, -sche, that is exactly what I'm thinking – pre-ine as a "hypernym" to the already existing pre-Itallic and pre-Balkan. Would be great if that could be added. Maybe as a safety measure an extra sub- (as in sub-pre-ine) could, indeed, be added (as thing like "pre-German" can sometimes be encountered.) Neitrāls vārds (talk) 21:06, 16 March 2014 (UTC)[reply]

Preventative block for User:DCDuring[edit]

Half an hour ago I noticed I got blocked out of the blue, with no stated reason. I doubt DCDuring would really do something like that, so I'm thinking that the account may have been compromised and I've issued a preventative block of a day so that we can figure out what's going on. —CodeCa t 01:06, 19 February 2014 (UTC)[reply]

Although that is odd, it should be mentioned that his immediately previous edit was on your talk page, saying "you are dead wrong" (re Ossining). Equinox ◑ 01:17, 19 February 2014 (UTC)[reply]

I saw that too, but I didn't really know what to make of it. Did he block me for reverting him? That alone would be bad, but I actually reverted my own revert. So was it out of spite? —CodeCa t 01:23, 19 February 2014 (UTC)[reply]

Dunno. Try an e-mail! Equinox ◑ 01:45, 19 February 2014 (UTC)[reply]

Ok, I sent an email asking him to shed some light on it. —CodeCa t 01:51, 19 February 2014 (UTC)[reply]

He replied, saying "You reverted my reversion of SB's reversion on entry for Ossining for no stated reason." I reverted DCDuring in the first place because I thought it was a simple mistake; sometimes people mis-click. But then I saw comments on User:SemperBlotto's talk page and realised it was intentional, so I undid my own revert. DCDuring blocked me half an hour after that. So apparently that's enough for a block? I'm not seeing it, it seems petty to me. I've removed the block in any case. —CodeCa t 03:30, 19 February 2014 (UTC)[reply]

I am finding it really hard to be interested in this issue but you have both been around for a zillion years and I am sure you can sort it out on talk pages. Block schmock. <3 Equinox ◑ 04:06, 19 February 2014 (UTC)[reply]

God, people are way too quick on the trigger finger around here! Purplebackpack89 ^{(Notes Taken) (Locker)} 05:09, 19 February 2014 (UTC)[reply]

Universal Language Selector will be enabled by default again on this wiki by 21 February 2014[edit]

On January 21 2014 the MediaWiki extension Universal Language Selector (ULS) was disabled on this wiki. A new preference was added for logged-in users to turn on ULS. This was done to prevent slow loading of pages due to ULS webfonts, a behaviour that had been observed by the Wikimedia Technical Operations team on some wikis.

We are now ready to enable ULS again. The temporary preference to enable ULS will be removed. A new checkbox has been added to the Language Panel to enable/disable font delivery. This will be unchecked by default for this wiki, but can be selected at any time by the users to enable webfonts. This is an interim solution while we improve the feature of webfonts delivery.

You can read the announcement and the development plan for more information. Apologies for writing this message only in English. Thank you. Runa

Codes the ISO has split or merged (first batch)[edit]

In 2012 and 2013, the ISO retired several codes by merging them into other codes or splitting them up. Thirty of these retirements appear to have escaped our notice. Here are the first fifteen, plus my thoughts on them; I'll post the rest another day. If you know a reason we should or shouldn't follow the ISO in a particular case, please comment! - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

merging the Tanudan Kalingas[edit]

The ISO had granted separate codes to "Upper Tanudan Kalinga" (kgh) and "Lower Tanudan Kalinga" (kml), but in 2012 they merged them into kml as "Tanudan Kalinga" and retired the code kgh. I suggest we follow suit. (Kalinga is a dialect continuum; we could consider merging even more Kalingas later.) - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. - -sche (discuss) 04:59, 1 March 2014 (UTC)[reply]

merging the Wemales[edit]

The ISO had granted separate codes to "South Wemale" (tlw) and "North Wemale" (weo), but in 2012 they merged them into weo as "Wemale"; they retired the code tlw. I suggest we follow suit. Side note, does anyone know how "Wemale" is pronounced? - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. - -sche (discuss) 19:07, 25 February 2014 (UTC)[reply]

splitting Garawa and Wanyi[edit]

In 2012, the ISO retired gbc (the code they had used for this language), splitting it into wrk for Garawa proper (which they call by the less common name "Garrwa", and which also goes by "Karawa") and wny for Wanyi (also spelt "Wanji", "Waanyi"). Garawa and Wanyi are closely related, but enough scholarly literature distinguishes them that I think we should follow the ISO in splitting them. - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. I've gone ahead and made the split, because there is actually some Wanyi content that I've wanted to add. - -sche (discuss) 23:10, 21 February 2014 (UTC)[reply]

splitting Kadu and Kanan[edit]

In 2012, the ISO retired kdv, the code that had been used for the Kado/Kadu variety of Sak, and split it into zkd (Kadu proper) and zkn ("Kanan"). I can find no information about Kanan. One might say "well, let's assume the ISO know what they were doing", but compare Aghu Tharrnggala / Gugu Mini below! - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 22:52, 8 March 2014 (UTC)[reply]

splitting Paku and Mobwa Karen[edit]

In 2012, the ISO retired kpp (the code they had used for Paku Karen), assigning it the new code jkp at the same time as they split off jkm, "Mobwa Karen". Both Paku and Mobwa are dialects of S'gaw Karen, as is wea ("Wewaw"). - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 22:52, 8 March 2014 (UTC)[reply]

splitting Mudburra and Karranga[edit]

In 2013, the ISO retired Mudburra's code mwd, splitting it into dmw ("Mudburra" proper, also spelt "Mudbura") and xrq ("Karranga", "Karrangpurru"). Quoth WP, "McConvell suspects Karrangpurru was a dialect of Mudburra because people said it was similar. However, it is undocumented and thus formally unclassifiable." If that was the basis for incorporating it into Mudburra, I suppose we should follow the ISO in making the split. We're not going to have any content in Karranga either way: to quote Mark Harvey, "there is no linguistic material directly on Karranga". - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. - -sche (discuss) 04:30, 1 March 2014 (UTC)[reply]

splitting Jiwarli and Thiin[edit]

In 2013, the ISO retired djl, which had been Jiwarli's language's code, and split it into dze ("Jiwarli" proper — they used the markedly rarer spelling "Djiwarli"; it also goes by "Tjiwarli") and iin (Thiin). Djiwarli and Thiin are closely related; they and two other lects are sometimes considered to form the dialect continuum Mantharta. WP says the varieties (all extinct) "were distinct but largely mutually intelligible". We could either make the split, or not, or go as far as to not only keep Thiin and Jiwarli unified but also unify Mantharta's other two dialects, dhr (Dhargari) and wri (Warriyangga). - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 18:01, 23 March 2014 (UTC)[reply]

Aghu Tharrnggala et al[edit]

In 2013, the ISO retired ggr (the code they had used for this lect) and split it into gtu (Aghu Tharrnggala proper — they use the less common spelling "Aghu Tharnggala"), ggm (Gugu Mini), and ikr (Ikarranggal). They retired ggm a year later upon realizing that it was not a specific language but rather a cover term for various languages. I can't find evidence of Ikarranggal, either. I suggest we go along with the recoding of Aghu Tharnggalu as gtu (there being no reason to continue using a retired code when an up-to-date one exists for the same language). We already deleted ggm. ikr was already added to Module:languages; I suppose we could let it be. - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. - -sche (discuss) 07:15, 1 March 2014 (UTC)[reply]

merging things into Rakhine[edit]

In 2012, the ISO merged "Yangbye" (ybd) and "Chaungtha" (ccq) into Rakhine (rki). I propose we follow suit. Rakhine, also called Arakanese, is sometimes considered a dialect of Burmese. We can discuss whether or not to merge rmz (Rakhine's other major dialect) into rki, or even rki into Burmese, at a later date. - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. - -sche (discuss) 22:11, 24 February 2014 (UTC)[reply]

splitting Yendang and Yotti[edit]

In 2012, the ISO retired yen, the code which had been used for this language; they split it into ynq for Yendang proper and yot for Yotti. - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 18:16, 23 March 2014 (UTC)[reply]

splitting Yir-Yoront and Yirrk-Mel / Yirrk-Thangalkl[edit]

In 2013, the ISO retired Yir-Yoront's code yiy and gave it the new code yyr at the same time as they split off yrm for "Yirrk-Mel". Ethnologue and the ISO are rather "splittist" when it comes to Australian language, and the Yir-Yoront Lexicon speaks of it and Yirrk-Mel as having been merely "sister dialects"... but Yir-Yoront and Yirrk-Mel did have somewhat different phonological inventories... I don't really have an opinion on whether to split them or not. Note for whoever updates the language codes+names: "Yirrk-Mel" is also called "Yirrk-Thangalkl", "Yir Thangedl", "Yirr-Thangell"; "Yir-Yoront" is also spelt "Yir Yoront", "Yirr-Yoront", "Yirr-Yorront". - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Not split at this time. WT:LANGTREAT updated accordingly. - -sche (discuss) 18:16, 23 March 2014 (UTC)[reply]

merging Baagandji[edit]

The ISO had granted separate codes to bjd (which they called by the uncommon name "Bandjigali") and drl (which they called by the placename "Darling"). In 2012, they merged them into drl, which they now call "Paakantyi", although its most common name seems to be "Baagandji". (It also goes by "Baagandji".) I suggest we follow the ISO in merging bjd into drl, and call the end result "Baagandji". - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. - -sche (discuss) 06:18, 24 February 2014 (UTC)[reply]

merging Atamanu / Yalahatan[edit]

The ISO had granted separate codes to two dialects of Atamanu, named for the villages where the dialects were spoken: hrr for the variety spoken in Horuru / Haruru, and jal for the one spoken in Yalahatan. In 2012, they merged hrr into jal. Ethnologue says there were only "slight dialect differences reported between the 2 villages". (It also offers the curious comment that "the name Atamanu is not currently known".) I suggest we follow the ISO in merging hrr into jal. - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. - -sche (discuss) 04:52, 1 March 2014 (UTC)[reply]

merging Ibilo into Okpamheri[edit]

In 2012, the ISO merged Ibilo (ibi), "an Okpamheri dialect [...] spoken at the northern foot of the hills on which Oloma and Emhalhe (Somorika) are spoken" (quoth Ben Elugbe, writing in Current Approaches to African Linguistics, volume 6, 1983, →ISBN, into Okpamheri (opa). I see no reason not to follow suit. - -sche (discuss) 06:09, 20 February 2014 (UTC)[reply]

Done. - -sche (discuss) 22:07, 24 February 2014 (UTC)[reply]

Amendment to the Terms of Use[edit]

Hello all,

Please join a discussion about a proposed amendment to the Wikimedia Terms of Use regarding undisclosed paid editing and we encourage you to voice your thoughts there. Please translate this statement if you can, and we welcome you to translate the proposed amendment and introduction. Please see the discussion on Meta Wiki for more information. Thank you! Slaporte (WMF) 22:00, 21 February 2014 (UTC)[reply]

Phase out the synonyms of `{{a|GenAm}}` and `{{a|RP}}` (namely `{{a|US}}` and `{{a|UK}}`)[edit]

For a long time, we have tolerated {{a|US}} as a synonym of {{a|GenAm}} and of {{a|UK}} as a synonym of {{a|RP}}. This discussion in the Tea Room is the latest in a long line of discussions that have made clear how misleading this is: the labels "US" and "UK" seem to cover the entireties of those countries, yet we use them only for very specific accents. I propose that we have a bot replace all instances of {{a|US}} with {{a|GenAm}}, and all instances of {{a|UK}} with {{a|RP}}. Alternatively, if we wanted to check all uses of "US" and "UK" to ensure they were not being used in ways other than as synonyms of "GenAm" and "RP" (out of our 3.5 million pages, I did see one once that used both "US" and "GenAm"), we could simply change it so that "US" displayed "GenAm" and "UK" displayed "RP". Then, at our leisure, we could go through all the occurrences of those templates, checking them by hand; furthermore, they would still exist as redirects for editors to use, without misleading readers by displaying text that implied they covered all of the US and UK, respectively. Thoughts? - -sche (discuss) 21:34, 24 February 2014 (UTC)[reply]

I've been doing this manually for months whenever I see {{a|UK}} and {{a|US}}, so obviously I support getting a bot to do it. —Aɴɢʀ (talk) 22:00, 24 February 2014 (UTC)[reply]

To me it looks like yet another way of driving away normal users and making it a dictionary of little use to any but the kind of folks who contribute here. I would have thought that the right direction is precisely the opposite, with links and footnotes to explain the relationship of the instantly understandable and the 'accurate enough' "US" and "UK" labels to the technically precise and very impressive looking 'General American' and 'Received Pronunciation', which the those not students of language don't instantly understand. That a casual user will follow links goes against our experience and against human nature. To assume that they will or should is fully in accord with our practice and with human nature, but nonetheless wrong. DCDuring TALK 22:59, 24 February 2014 (UTC)[reply]

Is our mission to be "accurate" or "accurate enough"? —CodeCa t 23:06, 24 February 2014 (UTC)[reply]

after edit conflict:

With the right notes and descriptions, we can be both easy to use (DCDuring's "accurate enough") and linguistically academically precise ("accurate").

I'm considerably more up on labels and WT than casual users, and I must confess that I too am somewhat put off by the {{a|RP}} label. “UK”, especially in juxtaposition with “US”, is clearly the United Kingdom, but “RP” leaves me confused and uncertain until I click through. If we're expecting users to click through to find stuff (an assumption that is often flawed, as any longer observation of WT:Feedback will confirm), then why not put the extended explanation that "In this context, the label 'UK' refers to the received pronunciation common in Britain," etc., on the linked page? I'm not a fan of obtuse and opaque labels. ‑‑ Eiríkr Útlendi │ Tala við mig 23:19, 24 February 2014 (UTC)[reply]

From my perspective, "UK" is a more obtuse label than "RP", because contrary to your notion, RP is not common in the UK. Someone noted in the Tea Room the statistic from Peter Trudgill that it's used by only 3% of the population. The fact that all (or, see below, almost all) of our entries use "UK" to gloss RP pronunciations is thus obtuse ("intellectually dim-witted", "indirect or circuitous") — neither "accurate" nor "accurate enough". It needs to change regardless of whether our goal is to stop misleading our readers the way we currently do, or start misleading them in some new way (like by using "UK" to gloss some never-used "conflation pronunciation"/"abstraction" of Cockney plus RP plus Scottish plus Estuary plus Northern Irish plus Welsh, as proposed below). - -sche (discuss) 19:00, 25 February 2014 (UTC)[reply]

What about just using the full labels, then? It occurs to me to wonder why we're torturing ourselves by coming up with such obtuse initialisms. “RP” in a pronunciation section leaves me scratching my head, but “Received Pronunciation” is much more clear; even better, “Received Pronunciation accent (England)” with a link to w:Received Pronunciation. Likewise with “GA” or “GenAm” versus just writing out “General American accent” with a link to w:General American. WT:NOT #4: Wiktionary is not paper. Since we don't have to worry about ink or page space, why don't we just write out the full labels? ‑‑ Eiríkr Útlendi │ Tala við mig 19:40, 25 February 2014 (UTC)[reply]

(United States of America) IPA: /tuː/
(United Kingdom of Great Britain and Northern Ireland) IPA: /tuː/
(Commonwealth of Australia) IPA: /tuː/
(Republic of Ireland) IPA: /tuː/

Abbreviations are not just for saving ink. —Michael Z. 2014-02-26 21:30 z

I support spelling out "RP" as "Received Pronunciation" and "GenAm" as "General American". (I think "General American accent" would be unnecessarily wordy, and "Received Pronunciation accent" sounds like it has tautologitis, a condition related to PNS syndrome.) All it would take, I think, is one edit to Template:accent:RP and another to Template:accent:GenAm. (PS, note that "GA" is not used; the use of labels like "NY" means it would be too ambiguous, potentially meaning "Georgian accent" rather than "GenAm". See the discussions in the Whatlinkshere of Template:accent:GA if you're really bored.) - -sche (discuss) 19:35, 26 February 2014 (UTC)[reply]

Oppose. I have no problem with adding {{a|RP}} and {{a|GenAm}} in addition to {{a|UK}} and {{a|US}}, but I see these as all meaning different things and oppose the removal of pronunciations tagged as {{a|UK}} and {{a|US}}. My ideal pronunciation sections are as follows:
cannot:
- (UK) IPA^(key): /ˈkænɒt/
  - (Received Pronunciation) IPA^(key): [ˈkanɒt]
  - […]
- (US) IPA^(key): /kɪ(n)ˈnɒt/
  - (General American) IPA^(key): [kɪ(n)ˈnɑt]
  - (New England) IPA^(key): [kɪ(n)ˈnɒt]
  - (NYC) IPA^(key): [kɪ(n)ˈnɒt]
  - […]
father:
- (UK, US) IPA^(key): /ˈfɑːðɚ/
  - (Received Pronunciation) IPA^(key): [ˈfɑːðə]
  - (General American) IPA^(key): [ˈfɑðɚ]
  - (New England) IPA^(key): [ˈfaðə], [ˈfaðɚ]
  - (NYC) IPA^(key): [ˈfɑðə], [ˈfɑðɚ]
  - […]
Obviously, there will ideally also be other dialects such as Australian and South African included. --Wiki Tiki 89 23:18, 24 February 2014 (UTC)[reply]
~~What WikiTiki said.—msh210℠ (talk) 23:28, 24 February 2014 (UTC)~~ More precisely: I agree with WikiTiki that we should have as many dialects as possible, with a "general" American and British listed atop the other American and British (respectively) accents where such can be determined. I don't know, however, what delimiters we should use.—msh210℠ (talk) 02:39, 25 February 2014 (UTC)[reply]
I'm not sure about the delimiters either. In a past discussion, someone once suggested using //double slashes// for diaphonemic transcriptions and /single slashes/ for ordinary phonemic transcriptions. --Wiki Tiki 89 02:46, 25 February 2014 (UTC)[reply]

I think GenAm and RP should continue to be given in /broad/ transcription rather than [narrow] and your US and GenAm transcriptions of cannot seem to be wrong; our entry and all the other dictionaries I've checked have the first vowel as /æ/. But as far as I can tell, it would require us to do what I propose — systematically clear out all the current uses of "US" and "UK" — because it would not seem to be possible or consistent (and I would oppose) to start using "US" and "UK" to mean something other than "GenAm" and "RP" while all current uses of them are still as synonyms of "GenAm" and "RP". - -sche (discuss) 23:59, 24 February 2014 (UTC)[reply]
Re pronunciation of cannot: In my experience it is more common in the US to put the stress on the second syllable, thereby reducing the first vowel. Stress on the first syllable is less common and I didn't think it was worth including here (it is listed at the entry itself). --Wiki Tiki 89 00:16, 25 February 2014 (UTC)[reply]
If the first syllable is reduced, wouldn't it just be a schwa? [ɪ] is commonly an allophone of /ə/, but then it shouldn't be in broad transcriptions (since it doesn't contrast with a schwa)... - -sche (discuss) 01:38, 25 February 2014 (UTC)[reply]
There's a lot more to it than that. Since I have not actually done research on this, I will give examples from my idiolect. In my idiolect (and I believe this to be common in America and perhaps worldwide), the velars /k/ and /ɡ/ are palatalized by back vowels (everything from /iː/ to /æ/, and even by /aʊ/, which is realized closer to [æʊ], but not by /aɪ/). In my idiolect, the /kə/ in caboose (/kəˈbuːs/) is contrasted with the /kɪ/ in kibbutz (/kɪˈbuːts/) and this distinction is very noticeable due to the palatalization of in kibbutz and lack of palatalization in caboose. Since the unreduced vowel after /k/ in cannot is /æ/, which palatalizes, it is reduced to the palatalizing /ɪ/ rather than the unpalatalizing /ə/. --Wiki Tiki 89 01:52, 25 February 2014 (UTC)[reply]

Re 'all current uses of them are still as synonyms of "GenAm" and "RP"': I don't think this is true. Maybe the majority, but definitely not all. --Wiki Tiki 89 00:16, 25 February 2014 (UTC)[reply]
Given that "US" = "GenAm" has been the case since long before I started editing, I imagine that fewer entries use "US" to mean something other than "GenAm" than use {{head|foo|adjective}} under a ===Noun=== header. Which is to say, there are certainly a few of our 3.5 million entries that do, but they're not standard (and there's no way to know if the way they do use "US" is the way you want to start using it). - -sche (discuss) 02:08, 25 February 2014 (UTC)[reply]
Maybe so, but I know I've seen and even added ones that are not. But I agree that either way, we should not do this by bot without checking the pronunciations in some way. --Wiki Tiki 89 02:15, 25 February 2014 (UTC)[reply]
Whatever bot changes the current uses of "US" and "UK" (because it is not be feasible to do except by bot) could be programmed to list and then ignore pages that used both "US" and "GenAm", or "UK" and "RP". - -sche (discuss) 18:44, 25 February 2014 (UTC)[reply]
I don't think it's common at all for them to both be used. In the cases that US does not refer to GenAm, it is because the pronunciation is more generalized, such as by using symbols such as /ɒ/, /ɑː/, /iː/, /ə(ɹ)/, etc. --Wiki Tiki 89 19:29, 25 February 2014 (UTC)[reply]

But when there is more than one phonemic representation within a country it will start to get messy and redundant:
footer:
- (UK) IPA^(key): /ˈfʊtə/, /ˈfʊtəɹ/, /ˈfutəɹ/
  - (Received Pronunciation, Southeast England, Northern England, Wales) IPA^(key): [ˈfʊtə]
  - (Southwest England) IPA^(key): [ˈfʊtɚ]
  - (Scotland) IPA^(key): [ˈfutɚ]
  - (Northern Ireland) IPA^(key): [ˈfut̪ɚ]
- (Republic of Ireland) IPA^(key): /ˈfʊtəɹ/, /ˈfutəɹ/
  - (Cavan, Ulster, Monaghan) IPA^(key): [ˈfut̪ɚ]
  - (Leinster, Connacht, Munster) IPA^(key): [ˈfʊt̪əɹ]

or

hot dog:
- (US) IPA^(key): /ˈhɑtˌdɔɡ/, /ˈhɑtˌdɑɡ/, /ˈhɒtˌdɒɡ/
  - (General American) IPA^(key): [ˈhɑtˌdɔɡ]
  - (cot–caught merger) IPA^(key): [ˈhɑtˌdɑɡ]
  - (Eastern New England, Western Pennsylvania) IPA^(key): [ˈhɒtˌdɒɡ]
- (Canada) IPA^(key): /ˈhɑtˌdɑɡ/, [ˈhɑtˌdɑɡ]

It's been suggested before that our pronunciation sections for English terms show in the first instance just RP and GenAm and that everything else gets put in a collapsible box. I think that's a good idea. —Aɴɢʀ (talk) 01:05, 25 February 2014 (UTC)[reply]
Why not use a diaphonemic approach? List the diaphonemic realisation(s), then RP and GA, and then any others. That way, someone who understands the diaphonemes knows how to derive their own dialect from this. We do it this way for Dutch already (which is also a language with two more or less "standard" pronunciations), and probably many other languages as well. —CodeCa t 01:14, 25 February 2014 (UTC)[reply]
Because that is essentially what I'm suggesting except that I didn't use the word "diaphoneme". --Wiki Tiki 89 01:18, 25 February 2014 (UTC)[reply]
@Angr, Your example right there is exactly what I want to avoid. (And as a side note, hotdog is pronounced [ˈhʌtˌdɒɡ] in New England.) So here's how I would correct it:
footer:
(UK, US, Ireland) IPA^(key): /ˈfʊtɚ/
(Received Pronunciation, Southeast England, Northern England, Wales) IPA^(key): [ˈfʊtə]

(Southwest England) IPA^(key): [ˈfʊtɚ]

(Scotland) IPA^(key): [ˈfutɚ]

(Cavan, Ulster, Monaghan, Northern Ireland) IPA^(key): [ˈfut̪ɚ]

(Leinster, Connacht, Munster) IPA^(key): [ˈfʊt̪əɹ]

hot dog:
(US) IPA^(key): /ˈhɒtˌdɒɡ/, /ˈhɒtˌdɔːɡ/
(General American) IPA^(key): [ˈhɑtˌdɔɡ]

(General American, cot–caught merger) IPA^(key): [ˈhɑtˌdɑɡ]

(Western Pennsylvania) IPA^(key): [ˈhɒtˌdɒɡ]

(Eastern New England) IPA^(key): [ˈhʌtˌdɒɡ]

(Canada) IPA^(key): /ˈhɑtˌdɑɡ/, [ˈhɑtˌdɑɡ]
Theoretically, I think we should group Canada with US, but then we would lose the simplicity of a simple "US" tag in favor of something like "North America" (because "NA" would not be understood by the
But your method of avoiding the problem is to provide incorrect broad transcriptions. IPA^(key): /ˈfʊtɚ/ is not the correct broad transcription for most of England and all of Scotland, nor are IPA^(key): /ˈhɒtˌdɒɡ/, /ˈhɒtˌdɔːɡ/ the correct broad transcriptions for most of North America. Being wrong is simply too high a price to pay for being easy. —Aɴɢʀ (talk) 01:29, 25 February 2014 (UTC)[reply]
That's because you're missing the point. If you take a particular dialect individual, the pandialectal transcription is most likely gonna be wrong. The pandialectal (or diaphonemic) transcription is the bigger picture. It gives the overall phonemic structure of the word that is common to all dialects, even if those dialects merge or shift some of the vowels. --Wiki Tiki 89 01:36, 25 February 2014 (UTC)[reply]
... so, in other words, what Angr said: the pan-dialectal transcription is wrong and not an accurate representation of how all or most of English's dialects actually pronounce things. - -sche (discuss) 01:54, 25 February 2014 (UTC)[reply]
It's not "wrong", it's just more general and abstract. You have to stop thinking of it as a direct phonetic transcription (which it wouldn't be anyway because we don't represent allophones). --Wiki Tiki 89 02:01, 25 February 2014 (UTC)[reply]
Phonetic transcriptions attempt to transcribe the actual sound speakers articulate. Phonemic transcriptions abstract from that and describe only the features that contrast within a particular speaker's dialect. So each dialect has its own set of phonemes. Diaphonemic transcriptions go further still, and try to describe correspondences between the phonemes of speakers across many dialects. So for example a diaphonemic description of English would acknowledge that /əʊ/, /aʊ/, /ɛʊ/ and so on are all different manifestations of the same underlying diaphonemic unit, depending on dialect. Diaphonemic transcriptions do not describe sounds, therefore, but correspondence sets, and are not all that different from correspondence sets used in historical linguistics; in a sense, they are a synchronic proto-language. Diaphonemic transcription sometimes runs into problems when it's not just the realisations of phonemes that change, but also the different phonemic contrasts (the many splits/mergers in English for example). In cases like that, usually the contrast is maintained in the transcription if it's made in at least some of the dialects. For English, that would mean that if we include Scottish English as part of the English diaphonemic system, then /x/ is an English diaphoneme separate from /k/, and there will also be contrasts among vowels that the majority of English speakers don't distinguish. Diaphonemic systems become progressively more abstract as you include more dialects, because by necessity that means digging further back into the history of the language to the point of departure between various dialects. For the British-vs-American split, that still lies in the modern period, but for many dialects within Britain itself, the splits may be Middle or even Old English in origin. —CodeCa t 17:24, 25 February 2014 (UTC)[reply]
That means it'd be worse than unhelpful — it'd actually be harmful to the project. In the words of DCDuring, it "seems like pure contributor-community self-indulgence". It would only serve to mislead and confuse our readers. - -sche (discuss) 18:44, 25 February 2014 (UTC)[reply]

If we take up more than an inch of vertical screen space, what parts of the pronunciation would be best hidden to allow folks to get to the definitions, which is what those few users who still come by say they want. Do we have any idea who actually wants pronunciations and how many of them can read IPA?

This seems like pure contributor-community self-indulgenceDCDuring TALK 01:33, 25 February 2014 (UTC)[reply]

Perhaps the nested transcriptions could be hidden the same way we hide quotations, leaving only the top-level ones by default. --Wiki Tiki 89 01:36, 25 February 2014 (UTC)[reply]

RP ≠ UK. Consider ally, pronounced /ˈælaɪ/ in RP and /ˈalʌɪ/ in UK. — I.S.M.E.T.A. 14:54, 25 February 2014 (UTC)[reply]

Your point is right, but your evidence is wrong. Those are two different transcriptions of the same pronunciation from different editions of the OED. --Wiki Tiki 89 17:04, 25 February 2014 (UTC)[reply]

@Wikitiki89, CodeCat Having first read the word diaphoneme here, used by the two of you, I did some research and created the aforelinked entry. Could you check it for accuracy, please? — I.S.M.E.T.A. 23:49, 25 February 2014 (UTC)[reply]

Could we not be both clear and precise by using simple terms and providing our specific meaning in a link. Most dictionaries do this for usage labels and no one ever minds. If we absolutely must be literalist in our labelling, then add a supplementary abbreviation:

(British [RP])
(North American [GenAm])

206.45.27.120

Entries for Japanese verb forms?[edit]

It seems like there's some demand for entries for Japanese verb forms. Somewhere down the years, I wound up with the impression that verb forms for Japanese were not to be added. More recently, I've been over WT:AJA and WT:ELE and I don't see any policy text stating that we shouldn't create these. Moreover, we do have entries for verb forms for numerous other languages.

Would any other editors have strong feelings in opposition to the creation of verb form entries for Japanese? If so, why? ‑‑ Eiríkr Útlendi │ Tala við mig 22:49, 24 February 2014 (UTC)[reply]

I think it is probably about creating verb forms for agglutinative languages in general. Wyang (talk) 01:17, 25 February 2014 (UTC)[reply]

Personally, I support creating entries for a few basic verb forms, as given in the tables. I suppose it can be difficult to draw the line, but that would be for the Japanese editors to decide. As a beginner, I initially found it difficult to guess the lemma form correctly, and although I already find that I am not in need of this, I can certainly see how it would have been useful. —Μετάknowledge^{discuss/deeds} 08:32, 25 February 2014 (UTC)[reply]

I strongly support the addition of Japanese verb forms. My Japanese is poor, so I'm not aware of how many verb forms any given verb can have. That said, I offer a Latin verb for comparison: Consider scīscō. It has 104 non-participial conjugated forms and four participles; its present active participle, future active participle, perfect passive participle, and future passive participle themselves inflect, the p.a.p. having 24 declined forms and the other three participles 36 declined forms each. That means that the one verb scīscō has 236 verb forms or forms of verb forms, all of which get entries; of course, not all of those forms are heteromorphic, so fewer than 236 pages are affected by it, but you get some idea by this of the extent of Latin conjugation. Even if Japanese verbs end up with a comparable number of conjugated forms, I still don't see what the problem is. — I.S.M.E.T.A. 17:00, 23 March 2014 (UTC)[reply]

Entries for Japanese adjective forms?[edit]

If the community is accepting of verb form entries, we might need a new header template.

Regular verb forms take a ===Verb=== header, followed by a template such as {{ja-verb form}}, and then the definition line containing the inflected verb form, a description of how it is inflected, and a link to the lemma.

Japanese also has a class of term generally described as an adjective in English, or more specifically as an i-adjective for English-speaking learners of Japanese, or as a 形容詞 (keiyōshi) in Japanese. These terms can be used as predicates, and do inflect for aspect.

As such, when creating an entry for a Japanese adjective form, how should we proceed? Do we use the ===Adjective=== header, and then create some new {{ja-adj form}} template for the second line, to avoid erroneous categorization? ‑‑ Eiríkr Útlendi │ Tala við mig 18:44, 19 March 2014 (UTC)[reply]

@Eirikr: I think your proposals are bang-on. Presumably, you'd only want lemmata in Category:Japanese verbs and Category:Japanese adjectives, so it would be a very good idea to have separate Category:Japanese verb forms and Category:Japanese adjective forms categories as well. — I.S.M.E.T.A. 17:00, 23 March 2014 (UTC)[reply]

AWB application[edit]

I would like to apply for permission to use AWB on the English Wiktionary. I will be using it for two purposes if the application is successful:

Template:Pinyin-IPA. Check all main namespace pages using this template, and remove syllable delimitations, since the template now accepts unsyllabilised input.
Obsoleting Template:zh-hanzi and Template:Hani-forms, since they have been superseded by Template:zh-hanzi-box.

Thanks, Wyang (talk) 02:51, 25 February 2014 (UTC)[reply]

Actually my bot could do the second one, but still support. --kc_kennylau (talk) 12:48, 25 February 2014 (UTC)[reply]

If no-one objects or beats me to it, I'll add you to the check page in a day or two. (Ping me if I forget to.) - -sche (discuss) 04:20, 26 February 2014 (UTC)[reply]

@Wyang I have added you to the check page; you may now use AWB. Cheers, - -sche (discuss) 04:33, 28 February 2014 (UTC)[reply]

@-sche Thanks heaps. I just tried about 50 edits, and they are alright. There are still >3000 pages remaining for task #2, and >30000 pages remaining for task #1. Would it be possible to get some sort of special flag which allows the page to be automatically saved and not flood RC? I will be using Wyangbot (talk • contribs). Thanks, Wyang (talk) 00:51, 4 March 2014 (UTC)[reply]

If the tasks are fully automatable, then yes, it makes more sense to do them by bot than by AWB. You can draft a vote like this one to request a bot flag. There are instructions at the top of WT:V to guide you. The flag is not what causes the page to be saved automatically, though; you must write and run a script to do that... or, supposedly, there is a way to turn on an 'automatic' mode in AWB, but I don't know what it is. Maybe someone can enlighten the both of us! - -sche (discuss) 02:29, 4 March 2014 (UTC)[reply]

Thanks, I have drafted a vote here. Wyang (talk) 03:43, 4 March 2014 (UTC)[reply]

Request AWB for my bot[edit]

I would like to apply for permission to use AWB with my bot Kennybot (talk • contribs). --kc_kennylau (talk) 10:16, 26 February 2014 (UTC)[reply]

Given that you already have AWB privileges on your user account, and a bot flag on your bot account, I don't imagine any objection to this request. If no-one objects or beats me to it, I'll add you to the check page in a day or two. - -sche (discuss) 19:13, 26 February 2014 (UTC)[reply]

Your bot now has AWB privileges. I added it to the "Users" section because I assume you will be using AWB in "regular (manual-review) mode" rather than "fully automatic mode". If you in fact intended to use AWB in full auto mode, let me know. - -sche (discuss) 04:38, 28 February 2014 (UTC)[reply]

Different Spellings of Ladino[edit]

Discussion moved from Category talk:Ladino headword-line templates#Different Spellings of Ladino.

Hi, I'm working with Ladino Wikipedia and Wiktionary and I also want to contribute to the Ladino words in the English Wiktionary. A lot of words are misspelled eg. sefardí is not Ladino it's Castilian. In Judaeo-Spanish we say sefaradí (sefaradi in Aki Yerushalayim and Turkish spelling and sefaradhí in the Multidialectal spelling).

Today, Judaeo-Spanish (Ladino) is spoken in 36 countries (although in 16 of them the speakers are less than 100). Historically, until the mid-19th century, it has always been written with the Rashi script, which is now not possible to use here, due to technical difficulties. However, with the rise of nationalism and also because of other reasons, most people started writing with the Latin alphabet. However, the conventions they use, more or less depend on the country they live in and on the type of schools they've been in (such as Alliance Israélite Universelle)...Thus today there are around 20 different orthographical norms used for this language! (17 according to some and 22 according to some others)

Not all of these 20 are very common, for example those who use the Greek and Arabic scripts and those who use the German othographical conventions (including ẞ; only very old Sephardim of Hamburg decent) and Dutch spelling (Jews of Curaçao) and Portuguese spelling (Jews of Brazil) etc. are not very wide-spread, however still in use. If we were to write each word with each kind of spelling, it would be way too much. However, just minimising it to 2; Latin and Hebrew would not be just for the very much used other spellings.

The most common/important 6 spelling systems are as follows (alphabetically ordered - and the possible letter to use in the template):

Aki Yerushalayim (a - Autoridad Nasyonala del Ladino)
French (f - Vidas Largas)
Hebrew (h)
Multidialectal (m - Ortoǵrafía Unida)
Rashi (r)
Turkish (t)

The next 6 important spelling systems are as follows (not necessarily to be used here, again alphabetically):

Cyrillic (c)
Italian (i)
Jaquetía (j)
Nehama (n)
Old Spanish (os)
Spanish (s - Arias Montano)

These twelve spelling systems are the ones that we can most possible use to find citations. However I suggest, we use the most common 6, and as we can't use Rashi for now, let's just use the 5 (a, f, h, m, t) instead of the 2 (l, h). However giving four alternative spellings in the same row, doesn't seem logical to me, thus I suggest we give them under the title Alternative forms.

If you guys, can help me change the template accordingly, that would be very nice. WikiTiki, may be you could help me out, or direct me towards the right people who knows to do this?

One more thing, this parameter should be optional, because not every word is spelled in a myriad forms:

papel (m=a=f=t) and פאפיל (h) → We could mark papel (m) and פאפיל (h)
parâ (m), para (a=f=t) and פארה (h) → We could mark parâ (m), para (a) and פארה (h)
justo (m), djusto (a=f), custo (t) and גֿוסטו (h) → We could mark justo (m), djusto (a), custo (t) and גֿוסטו (h)
muchacho (m=a), mutchatcho (f), muçaço (t) and מוגֿאגֿו (h) etc.

Thank you in advance,

Friendly --Universal Life (talk) 10:38, 27 February 2014 (UTC)[reply]

I agree with everything you said except what you said about Rashi script. We consider Rashi script and Hebrew script to be the same script. Rashi script is just a different font. It will never have separate Unicode code points. Therefore, I see no reason why we can't have entries in the Hebrew/Rashi script. If a Rashi-script font ever becomes available with the proper licenses, we will be able to integrate it into Wiktionary. But until then there is nothing wrong with using the square-script fonts that we use for Hebrew and Yiddish. --Wiki Tiki 89 19:14, 27 February 2014 (UTC)[reply]

I feel like this discussion got lost in the midst of the others. Can anyone offer any input? --Wiki Tiki 89 17:50, 1 March 2014 (UTC)[reply]

I think the orthography used by the extant regulatory body (bodies?) of Ladino should be used for lemmata, except for terms not citable with it. Naturally, any citable spelling should have an entry as an alternative form. — Ungoliant ^(falai) 18:15, 1 March 2014 (UTC)[reply]

French Verb Usage With Prepositions[edit]

Currently the meaning of French verbs is listed with a note about whether the meaning applies in the transitive, intransitive, impersonal, reflexive case etc. I have found that often this information is insufficient to be able to correctly use the verb in a sentence, particularly for the intransitive case. In French, intransitive verbs often change meanings depending on which preposition is used, à or de, or only one preposition can be used for that particular verb. Whether a verb takes à or de is often arbitrary and can not always be worked out from context. Regardless of whether the meaning changes, whether a verb takes à or de drastically changes that verb's usage in sentences where objects are replaced with pronouns.

Some examples:

parler à - to talk to someone

parler de - to talk about something

arriver à - to go to somewhere

arriver de - to come from somewhere

penser à - to imagine

penser de - to have an opinion about

jouer à - to play a game or sport

jouer de - to play an instrument

être - to be

être à - to belong to

rire - to laugh

rire de - to laugh at

I propose that entries for French verbs should contain more information about the prepositions that go with them. This information would be extremely helpful as it provides clarity about when the verb takes on particular meanings and is vital for understanding how to use the verb in a sentence. Other online dictionaries such as Oxford Dictionary and Word Reference do contain this type of information. Some verbs (very few) already contain this type of information but they seem to be the exception rather than the rule, e.g. faire.

The current parler entry looks like this:

(intransitive) To speak or talk.
Il ne s'est mis à parler qu'à l'âge de quatre ans.

Ils parlèrent plusieurs heures avant d'aller se coucher.
(transitive) to be able to communicate in a language; to speak
Elle parle couramment français. - She speaks French fluently

I imagine it would need to change to something like this:

(intransitive) To speak or talk.
Il ne s'est mis à parler qu'à l'âge de quatre ans.

Ils parlèrent plusieurs heures avant d'aller se coucher.
(intransitive, ~ à) To speak or talk to someone.
(intransitive, ~ de) To speak or talk about something.
(transitive) to be able to communicate in a language; to speak
Elle parle couramment français. - She speaks French fluently

— This unsigned comment was added by Spuzzdawg (talk • contribs) at 20:29, 27 February 2014.

Yes, we do that in some entries (see débattre). We should make a point of doing it everywhere it's necessary. --Wiki Tiki 89 20:34, 27 February 2014 (UTC)[reply]

The {{context}} template isn't really designed for that, though. And I'm not sure if this is the best way to show it, either. After all, context labels give specific contexts in which the word has a certain sense. But in the phrase parler de, "parler" does not mean "speak about" when it is followed by "de", it's the combination "parler de" as a whole that has that meaning. —CodeCa t 20:38, 27 February 2014 (UTC)[reply]

I am trying to show something similar in Hungarian entries, although the Hungarian language uses suffixes instead of prepositions. See the verb tartozik for the format. It would be good to come up with a format that other languages could use, too. --Panda10 (talk) 21:02, 27 February 2014 (UTC)[reply]

Many languages have things like this. In English, you have "talk about" and "talk to", Dutch has "praten over" and "praten tegen". The preposition to be used is often unpredictable and should be idiomatic by our standards. In many older Indo-European languages, cases fulfilled this role as well, like in Gothic 𐌱𐌹𐌳𐌾𐌰𐌽 (bidjan, “to ask”), which took an accusative object for the person being asked, and a genitive object for the thing desired. Modern Finnish still uses cases like this, thanks to its elaborate declension system. So if we come up with a consistent way of indicating this, we should also include case usage. —CodeCa t 21:15, 27 February 2014 (UTC)[reply]

I'm a complete newbie at wiktionary editing, so I'm not particularly familiar with the purposes of certain templates. I don't really have any strong opinions about what mechanism would best convey this information, just that this information needs to be conveyed. --Spuzzdawg (talk) 08:22, 28 February 2014 (UTC)[reply]

There are a couple of options with or without templates. No matter which option you go with, it would be useful to create categories for French verbs using a specific preposition. e.g. Category:French verbs with preposition de. Possible formatting options:

Create a separate entry for the French parler de similar to the English talk about. Add parler de to parler under the ====Derived terms==== section.
Or keep one entry for parler. Add the French preposition after the corresponding English preposition: To speak or talk (about something de).
A template such as {fr-prep} would be useful to display the de part because it keeps the formatting the same and it can automatically categorize the entry. --Panda10 (talk) 15:25, 28 February 2014 (UTC)[reply]

The first approach would not work very well for all languages. Take pożyczyć:

With a direct object (thing) in accusative case and an indirect object (person) in dative, it means "to lend";
With a direct object (thing) in accusative case and an indirect object (person) in genitive case and preceded by od, it means "to borrow";
With a direct object in genitive case and an indirect object (person) in dative it means "to wish" (though this is rarely used).

One would probably like to keep "to lend" and "to borrow" meanings together, but with this approach there would be two headwords on pożyczyć for "to lend" and "to wish", and pożyczyć od for "to borrow". (I conjecture similar issues for other Slavic languages. Анатоли? Ivan? Vahagn? hell, Dan Polansky even? Damn, we have lots of Slavicists.) It does not suit English well either — does "I talked to him about the situation" use "talk about" or "talk to"? Also, preposition stranding is something we probably should do away with.

I would suggest creating a new template to be placed at the end of definition lines. Say, {{+obj|pl||acc|object being borrowed}} {{+obj|pl|od|gen|person being borrowed from}} would render as something like [+ [accusative]: object being borrowed] [+ od [genitive]: person being borrowed from]. Or maybe a few templates even, each for a different grammar: {{+cobj}} (inflection only, e.g. Finnish), {{+prepobj}} (preposition only, e.g. French), {{+prepcobj}} (preposition and inflection, e.g. Slavic languages), {{+postpobj}} (postposition only, e.g. Japanese). Keφr 17:14, 28 February 2014 (UTC)[reply]

I was just about to suggest something very similar. Only I don't see why we need separate templates, when optional parameters can handle all that. This will also be useful in defining prepositions that have different meanings when used with different cases. --Wiki Tiki 89 17:18, 28 February 2014 (UTC)[reply]

I do like this idea because it's flexible enough to handle a wide variety of situations. It also prevents us from having to stuff it all into the context label, where it doesn't really belong. Concerning prepositions, we probably shouldn't code things in too hardly. Finnish has many postpositions, and languages may have circumpositions as well. Word order might also be different for different languages, so that in an SOV language the verb comes last in the phrase. For example Dutch has: over[prep] (object) heen[adv-prep] komen[verb] "to recover (emotionally) from", which contains the circumposition overheen and the verb in final position. —CodeCa t 17:47, 28 February 2014 (UTC)[reply]

I intended the separate templates to be there mostly for the sake of convenience (to make markup terser, and to save the tedium of specifying empty arguments for languages that never need them). The whole set of templates may be implemented by a single procedure in a single module (with args in the #invoke frame telling it how to handle .args in the parent frame). Keφr 21:23, 28 February 2014 (UTC)[reply]

That's what optional named parameters are for. --Wiki Tiki 89 23:12, 28 February 2014 (UTC)[reply]

These are more verbose, though. But whatever. Anyone willing to implement this, this form or another? Keφr 10:04, 2 March 2014 (UTC)[reply]

In parler de and parler à, the sense of parler is exactly the same. Therefore, there should be a single definition, but a usage note can be added about prepositions to be used in different cases. There should be different definitions only when the meaning is actually different. When, for a single meaning, several different words are used in English according to the case, I suggest that a Translations section would be appropriate (this is unusual here, but may be very useful). It would be the same case as stale, which, for a single sense, is translated differently in French for stale water, stale butter, stale news, etc. Lmaltier (talk) 19:05, 25 March 2014 (UTC)[reply]

I've not studied linguistics, so I'm unfamiliar with the technical definitions of things, however, how much does a meaning have to change before its 'sense' is considered to have changed? Is this even relevant for a definition? To me, 'talking to someone' and 'talking about something' are different. Granted that they are about talking, one indicates the recipient of the action while the other relates to the content of the action. In a similar example, arriver a (to go to) / arriver de (to come from), the actions are similar but their directions are reversed. For penser a (to imagine) / penser de (to have an opinion about), the meanings are completely different. Perhaps technically their 'senses' haven't changed, but this is exactly the kind of information a user is looking for. Is the purpose of a dictionary to be linguistically purist or practically useful? As a student of French, I use a dictionary to find out the meanings of words and how to use them. The preposition a meaning requires is fundamental to the use of that language. Wiktionary currently displays this information very rarely and in a non-standardized way. I don't think it appropriate to hide such fundamental information away in the Translation table. The translation table should be about translating from one language to another, not locating fundamental information about a word in its own language.--Spuzzdawg (talk) 01:25, 11 May 2014 (UTC)[reply]

Keep in mind that to imagine may also be stated in in some dialects of English as to think up. "I thought of a solution we can try." And to think about is a way of stating an opinion - e.g. "I think that in these phrases the sense of penser is unchanged, while the use gives context." - Amgine/^t·e 16:51, 11 May 2014 (UTC)[reply]

Perhaps the senses of parler à/de are not that dissimilar. There are other more clear cut cases where the sense completely changes with the use of prepositions:

manquer à - to miss someone

manquer de - to neglect (to do s.t.)

venir à - to happen to

venir de - to have just (done s.t.)

Regardless of how much the sense changes, the dictionary user still needs to know the information or they can't use the language. --Spuzzdawg (talk) 07:45, 25 May 2014 (UTC)[reply]

Call for project ideas: funding is available for community experiments[edit]

I apologize if this message is not in your language. Please help translate it.

Do you have an idea for a project that could improve your community? Individual Engagement Grants from the Wikimedia Foundation help support individuals and small teams to organize experiments for 6 months. You can get funding to try out your idea for online community organizing, outreach, tool-building, or research to help make Wiktionary better. In March, we’re looking for new project proposals.

Examples of past Individual Engagement Grant projects:

Organizing social media for Chinese Wikipedia ($350 for materials)
Improving gadgets for Visual Editor ($4500 for developers)
Coordinating access to reliable sources for Wikipedians ($7500 for project management, consultants and materials)
Building community and strategy for Wikisource (€10000 for organizing and travel)

Proposals are due by 31 March 2014. There are a number of ways to get involved!

Hope to have your participation,

--Siko Bouterse, Head of Individual Engagement Grants, Wikimedia Foundation 19:44, 28 February 2014 (UTC)[reply]

Languages with difficult scripts[edit]

Discussion moved to Wiktionary:Beer parlour/2014/March.