Wiktionary:Beer parlour/2018/March

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← February 2018 · March 2018 · April 2018 → · (current)

March LexiSession: mathematics[edit]

This month, we suggest you to focus somehow on the words to talk about the mathematics. Yes, it's because of Pi Day, the 14th of March. As a starting point, you can have a look at Thesaurus:mathematics an there is still plenty domains to explore and to structure. Let's figure it out.

By the way, for those who do not know LexiSession yet, it is a collaborative transwiktionary experiment. You're invited to participate however you like and to suggest next month's topic. The idea is to look at other community improvements on the selected topic to improve our own pages. It already bring new collaborators to contribute for the first time on a suggested topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every Wiktionary on the same agenda, to give us more insight into the ways our colleagues works in the other projects.

By the way, it is the twentieth edition of LexiSession! Face-smile.svg Noé 10:56, 1 March 2018 (UTC)

I did some Spanish entries and translations for some math terms. --Otra cuenta105 (talk) 11:04, 1 March 2018 (UTC)
Time to crack open that Mongolian stats intro book. Crom daba (talk) 11:17, 1 March 2018 (UTC)
There are LOTS of mathematical terms listed in Requests for definitions in English entries. Maybe this is a good excuse for someone to have a crack at them! Kiwima (talk) 02:34, 8 March 2018 (UTC)

Unifying the display of romanisations in links and headwords: italicise romanisations by default[edit]

This follows from last month's topic Wiktionary:Beer parlour/2018/February#Inconsistent and confusing romanisation formats given by various templates and modules. I would like to propose that we make the display of romanisations in {{l}}, {{m}} and {{head}} consistent by italicising romanisations in an entry by default.

  • Rationale: consistency, clarity, and professionality.
  • Examples: Russian русский (russkij), Hindi युद्ध (yuddh), where italicised and unitalicised romanisations appear alternately in the entries.

Wyang (talk) 12:18, 1 March 2018 (UTC)

Symbol support vote.svg Support --Per utramque cavernam (talk) 12:29, 1 March 2018 (UTC)
Symbol support vote.svg Support. I really have to clean up that Hindi entry. —AryamanA (मुझसे बात करेंयोगदान) 13:42, 1 March 2018 (UTC)
Symbol support vote.svg Support. I've never understood why these display differently. ‑‑ Eiríkr Útlendi │Tala við mig 18:02, 1 March 2018 (UTC)
Symbol abstain vote.svg Abstain – I understand the reasoning behind the current format, but don't mind it being changed. — Eru·tuon 23:05, 1 March 2018 (UTC)
Symbol abstain vote.svg Abstain – Per ErutuonCrom daba (talk) 00:06, 2 March 2018 (UTC)
Symbol support vote.svg Caveated support: Inline with {{lang}} over on Wikipedia. The implementation of a transcription param would be prerequisite, because I wouldn't want to see 𐫁𐫏𐫇𐫡‎ (bywr /bēwar/) --Victar (talk) 03:29, 2 March 2018 (UTC)
Symbol support vote.svg Support, but keep romanizations in inflection tables that are on separate lines (as at e.g. алъдьи (alŭdĭi)) unitalicized. — Vorziblix (talk · contribs) 08:56, 4 March 2018 (UTC)

Speaking of transliterations, has anyone ever entertained the idea of switching them on and off at will (with a button or sumthin')? --Per utramque cavernam (talk) 22:23, 2 March 2018 (UTC)

@Per utramque cavernam:, you can, just use |tr=-. --Victar (talk) 22:26, 2 March 2018 (UTC)
@Victar: I'm thinking of a gadget that would let the user choose which scripts he wants to see transliterated, and which ones he doesn't.
While I personally need translits for Devanagari (everywhere), I don't need them for Greek (anywhere); but others will be in the opposite situation.
Thus, it would be convenient to be able to hide them on the entire website, without actually hardcoding |tr=-. --Per utramque cavernam (talk) 22:44, 2 March 2018 (UTC)
@Per utramque cavernam: That can be easily done with some custom JS in your preferences. Or for a quick fix, just add something like span[lang="el-Latn"] { display: none } to your custom CSS file. --Victar (talk) 23:12, 2 March 2018 (UTC)
@Victar That leaves empty parenthesis and commas everywhere, nice idea though. Crom daba (talk) 23:59, 2 March 2018 (UTC)
@Crom daba, that's why is said it should be done in JS. I thought you knew programming. --Victar (talk) 00:08, 3 March 2018 (UTC)
You can use the below. It's going to remove |pos= too, which could be fixed, that's all from me.
var element = document.querySelectorAll('[lang="el-Latn"]');

[].forEach.call(element, function(element) {
        var parent = element.parentElement.innerHTML.replace(/ *\([^)]*\) */g, "");
        element.parentElement.innerHTML = parent;
--Victar (talk) 02:38, 3 March 2018 (UTC)
That's cool, although maybe we should rework our modules so that css can do it. Crom daba (talk) 15:19, 3 March 2018 (UTC)
@Crom daba, no better time to learn some actual coding yourself. --Victar (talk) 17:56, 3 March 2018 (UTC)
@Victar I have prior Javascript experience (although I was never fluent, and I wouldn't be able to write the above code without some heavy SO consultation), I just figured it would be more elegant to solve it with .css Crom daba (talk) 18:31, 3 March 2018 (UTC)
Symbol oppose vote.svg Oppose Let's unify on non-italics. I don't see why the romanization should be in italics; the romanization is no more a mention than the term romanized. Better go for a simple typography. --Dan Polansky (talk) 21:53, 16 March 2018 (UTC)
@Dan, I'm confused -- what do mentions have to do with it? ‑‑ Eiríkr Útlendi │Tala við mig 23:36, 16 March 2018 (UTC)
In the referenced Beer parlour discussion, someone said "I believe the notion is that for scripts which we don't italicize in mentions (Russian, Greek, etc.), we italicize the romanization to show the distinction between the mention and non-mention formats."
That's the only argument in support of italics that I could find at the time of my post.
Now, below, someone said "Italicising romanisations of non-Latin-script foreign terms is a standard and internationally used practice in reference works and in academia": That is for romanizations in the middle of the sentence, right? Like below, we have an example: "The Arabic tāʾ marbūṭa is rendered a not ah." There, "tāʾ marbūṭa" is a mention. There, you might italicize a Czech term as well, where Czech uses roman letters by default.
In "Hindi युद्ध (yuddh)", I don't see a need to italicize "yuddh". I admit that the Jaschke Dictionary for Tibetan does italicize romanizations, but there, they are followed by English text, whereas in the uses of {{m}} and {{l}}, they are not followed by English text.
I looked in русский for visual inspection. The headword line in русский currently says "ру́сский • (rússkij)" without romanization, and it looks just fine; italics would not help it in any way.
What would make sense to me is italicizing romanizations in {{m}}, but not in {{head}}, {{l}} and {{t}}; this is because {{m}} can be used in the middle of the text and it italicize roman script in general, whereas {{l}} does not italicize roman script in general.
As for the rationale "consistency, clarity, and professionality" and as for legibility, romanizing in {{m}} but not in {{head}}, {{l}} and {{t}} would be consistent with what we do for roman script; as for clarity, I do not see how italics is more clear; as for professionality, I might admit that italics could be more in keeping with what other publications do, but doing something different when well justified is not necessary unprofessional in any bad sense; as for legibility, there is no doubt in my mind that italics is less legible, especially on computer screens. --Dan Polansky (talk) 07:50, 17 March 2018 (UTC)
On a procedural note, I created Wiktionary:Votes/2018-03/Showing romanizations in italics by default to ensure maximum audience. --Dan Polansky (talk) 08:05, 17 March 2018 (UTC)
Symbol oppose vote.svg Oppose: I'm changing my vote to Dan's side. I don't think italics brings anything but less legibility. --Victar (talk) 22:40, 16 March 2018 (UTC)
@Victar: Quite the contrary, actually.
Italicising romanisations of non-Latin-script foreign terms is a standard and internationally used practice in reference works and in academia. Take a look at Wehr and Steingass for Arabic, Oxford Dictionary for Hindi, Steingass Dictionary for Persian, Monier-Williams Sanskrit Dictionary, Jaschke Dictionary for Tibetan, The Chicago Manual of Style, The Oxford Style Manual, ... and even Wikipedia (Iran).
The International Journal of Middle East Studies guidelines become like this without italics:
The Arabic tāʾ marbūṭa is rendered a not ah. In Persian it is ih. In Arabic iḍāfa constructions, it is rendered at: for example, thawrat 14 tammūz. The Persian izafat is rendered -i: for example, vilāyat-i faqīh. []
and this with italics:
The Arabic tāʾ marbūṭa is rendered a not ah. In Persian it is ih. In Arabic iḍāfa constructions, it is rendered at: for example, thawrat 14 tammūz. The Persian izafat is rendered -i: for example, vilāyat-i faqīh. []
The basic meaning of italics is that “this Latin-script word is not English”, and legibility is precisely its advantage. Wyang (talk) 00:17, 17 March 2018 (UTC)
@Wyang:, I don't think that's a comparable usage. In those examples, the foreign terms are only distinguished by being in italic. We however are enclosing them already in parentheses. And yes, I do think italics are less legible than normal text, in that it's harder to read, especially with diacritics and special characters. --Victar (talk) 01:00, 17 March 2018 (UTC)
@Victar: Parentheses, quotation marks or not, the academic practice is that romanisations are italicised by default in running text, whenever they assume an auxiliary function to the script forms. We can find plenty of people writing 여름 yelum, 여름 (yelum), or yelum 여름, but hardly anyone writing 여름 (yelum) in proper works. Also see Citation Guidelines for Chinese-language Materials, § 2.3 In Parentheses. Such practice makes it easier for readers to parse the text and identify romanisations - and simply ignore them if the readers already know the script or language. 서울 (Seoul, Seoul) is much easier to parse than 서울 (Seoul, Seoul). To me, the current links on Reconstruction:Proto-Iranian/θanǰáyati are impossible to read. It's hard to make out which is which, and eyes become strained after reading a few lines, while the the italic version makes the romanisations stand out aesthetically.
There isn't really a reduced legibility of italic Latin text. We routinely italicise natively Latin-script terms in {{m}}, and some Latin-script languages are no less diacritics-heavy. For example, Vietnamese:
ủ trái cây bằng đất đèn cho mau chín. Thường mấy trái cây non, bọn buôn nó hay giú khí đá cho mau chín, nên ăn mấy trái cây ấy có ra gì đâu.
There is not really a legibility difference between the non-italic and italic sentences above. Our readers seem to read these italic letters with diacritics just fine too. Wyang (talk) 02:41, 17 March 2018 (UTC)
@Wyang, all that you just expressed are preferences and opinions. I simply disagree with them. --Victar (talk) 02:45, 17 March 2018 (UTC)
@Victar It's not preferences and opinions. It's what typically happens in academia and lexicography and the rationales behind it. You can surely disagree, but it's just unfortunate that votes can happen with complete disregard for what other way more established dictionaries adopt as standard practice, dismissed as professional preferences. Wyang (talk) 02:55, 17 March 2018 (UTC)
@Wyang, again, that's simply not true; you're citing a standard for foreign terms within running English text. We don't have a need for it as we already use parentheses, so you're argument is of stylistic preference, not out of functionally. --Victar (talk) 03:07, 17 March 2018 (UTC)
@Victar Please have a closer look at the dictionary links I have given above. You are missing the point: this isn't a discussion on whether there is a functional need to italicise the romanisations. The point is: is there a stylistic need to do so? The answer is yes, on the grounds that:
This is the standard practice in academia and lexicography. Note that the “standard practice” is not that dictionaries italicise romanisations following headwords; the practice is:

Parentheses, quotation marks or not, romanisations are italicised by default in text, whenever they assume an auxiliary function to the script forms.

When we flip through Wehr, Steingass, Oxford, Monier-Williams, Jaschke, etc., they are full of examples of romanisations after headwords, in parentheses, in quotation marks, on their own, whatever, but all romanisations are italic, regardless of the environment the romanisations are found in after the headwords. On a quick glance, there were 10 romanisations in parentheses on the Wehr page I gave before alone. This simultaneous consistency of formatting romanisations as italic in reference works is what we have failed to appreciate so far in our infrastructure. And, this standard practice in reference works is supported by good rationales which are very relevant to us too. Wyang (talk) 03:50, 17 March 2018 (UTC)
@Wyang, don't be patronizing. I doesn't help your argument. I saw your links and understand the issue. I disagree and my points are above. I don't want to get into it with you any further. --Victar (talk) 04:10, 17 March 2018 (UTC)
@Victar: Sorry if you found it patronizing. The discussion is directed towards the arguments, and none towards the people. Your points were: (1) that reference works adopt italicity out of functional necessity and that such necessity is nonexistent when romanisations are enclosed in parentheses, which is incorrect as the italicity was a universal stylistic preference in lexicography as shown above, independent of text environment; (2) that italicity brings about a reduced legibility, also unsupported by our routine italicisation of Latin-script terms, many of which are no less diacritics-heavy. This is not a personal preference vote; it is a site-wide style format change which must be carefully deliberated on, and I unfortunately did not find the arguments above sufficiently reasoning-robust to balance the overwhelming evidence of a standard practice of italicisation in reference works. Wyang (talk) 04:56, 17 March 2018 (UTC)
Symbol support vote.svg Support --Anatoli T. (обсудить/вклад) 02:57, 17 March 2018 (UTC)

Passed and discussion closed. This and the February discussion have been around for sufficiently long- and as Korn said in the Japanese discussion below, those with interest have already expressed their opinions. No substantial opposing argumentation was put forth, compared to what we have as the literature and lexicographic evidence for unifying it as italicised. This affects and interests some people much more than others who may barely view and manage non-Latin-script entries. This isn't a popular polling station or a place for musings, but rather a “think tank” where arguments for and against should be proposed and evaluated in the presence of each other. Wyang (talk) 09:08, 17 March 2018 (UTC)

FWIW I support this. I think efforts (by the same one stick in the mud as usual) to insist on further bureaucracy can be dismissed. - -sche (discuss) 15:32, 17 March 2018 (UTC)
This has not been brought to a wide audience, the reasoning presented intially was nearly absent ("consistency, clarity, and professionality") and free of substantiation. This is why Wiktionary:Votes/2018-03/Showing romanizations in italics by default is the proper venue. The only way the vote can fail to pass is if there actually is not a consensus. --Dan Polansky (talk) 15:36, 17 March 2018 (UTC)
And on votes somtimes being evil and such, let the reader read the top of this Beer parlour "discussion". There is no discussion; there are blank and nearly blank votes on the supporting sites. A real discussion started only when I posted my oppose, and Victar changed his vote. --Dan Polansky (talk) 15:41, 17 March 2018 (UTC)
At no surprise, I agree with Dan. I can't speak for each personally, but the support votes look more like "whatever" votes. I was also "whatever", but I thought more into it and ended up disagreeing. I think with a vote, people will put more thought into both sides of the argument, especially since only one side was initially given. Also, @-sche, no need for personal attacks --Victar (talk) 16:03, 17 March 2018 (UTC)
As for: "arguments for and against should be proposed and evaluated in the presence of each other": Yes, let's. If I evaluate the arguments, I find in favor of my arguments. If I did not, I would change my vote above, right? Now what? That does not work. There is no mechanism of evaluation of arguments. The best we have is our venerable votes-cum-discussions. --Dan Polansky (talk) 16:22, 17 March 2018 (UTC)

I think the changes made towards italicized romanizations should be reverted immediately pending the completion of the vote. I find it completely inappropriate that @Wyang moved forward with this change. @Dan Polansky --Victar (talk) 03:59, 18 March 2018 (UTC)

I agree. This is a longstanding format that most of us have become very used to and should not be changed without a vote. --WikiTiki89 15:01, 19 March 2018 (UTC)
Not just gotten used to, but it nullifies important distinctions in Hittite and merges {{l}} and {{m}} for many languages. You were already desynoped for making changes without consensus and wheel waring. Please undo this change immediately before it escalates any further. --Victar (talk) 15:36, 19 March 2018 (UTC)
Now that I've found where the change was made, I've reverted it. --WikiTiki89 15:45, 19 March 2018 (UTC)
Thanks, @Wikitiki89. --Victar (talk) 15:52, 19 March 2018 (UTC)

Where tf were you before, when the discussions were ongoing? Bye-bye. Wyang (talk) 19:15, 19 March 2018 (UTC)

Middle Assamese[edit]

I have added a code inc-mas for Middle Assamese per Talk:ভাল. If anyone has any objections, please put them here. There are lots of cites on Google Books. —AryamanA (मुझसे बात करेंयोगदान) 18:40, 1 March 2018 (UTC)

@AryamanA, I think the usual format for that would be inc-asm (Assamese Middle), cf. frm (French Middle), goh (German Old High). --Victar (talk) 03:21, 2 March 2018 (UTC)
Pinging @-sche. --Victar (talk) 03:31, 2 March 2018 (UTC)
@Victar: But what about inc-ohi (Old Hindi), inc-ogu (Old Gujarati)? —AryamanA (मुझसे बात करेंयोगदान) 04:10, 2 March 2018 (UTC)
roa-opt (Old Portuguese) —AryamanA (मुझसे बात करेंयोगदान) 04:11, 2 March 2018 (UTC)
@AryamanA: Right, all not ISO codes, but yes, it seems wiki sub-codes are reversed, so inc-mas is appropriate. +1 --Victar (talk) 04:26, 2 March 2018 (UTC)
Yes, in my experience when we make our own codes, we tend to have the code approximate the name with the words in the same order (so, "inc-mas"); the ISO's "backwards" order may be a product of internally preferring names like "German, Old High" for sorting reasons and/or preferring codes that sort "nearby" ("fr", "frm"), or just a result of their not always approximating language names as well as they could (they couldn't use "mfr" for "Middle French" because they already use it for "Marrithiyel", despite that word not having an "f" in it). - -sche (discuss) 04:56, 2 March 2018 (UTC)
ISO codes are often quite inconsistent, e.g. owl for Old Welsh but wlm for Welsh, Middle; neither of which uses the native name Cymraeg the way cy for Modern Welsh does. —Mahāgaja (formerly Angr) · talk 15:41, 3 March 2018 (UTC)

Proposed change to Japanese entry format - using kana as the main entry form[edit]

Continuing the discussion from last month at Wiktionary:Beer_parlour/2018/February#Related:_Status_of_hiragana_entries.

In the process of cleaning up after some anon edits, I reworked the hiragana entry at うまい (umai) to show an example of what it might look like if we were to use the kana entries to store the main content, rather than the current practice of using kana entries only as soft redirects to the kanji spellings. The うまい (umai) entry is a bit of a simpler example, as this term only has one etymology. I think it can still help to illustrate how we might lay things out, and how we might show how different kanji spellings are applied to different senses of the same lemma.

For those interested in Japanese entries here, please read the linked thread above, have a look at the うまい (umai) entry, compare to 上手い (umai) and 旨い (umai) as (currently less comprehensive) examples of the conventional kanji-focused format, and discuss here as appropriate.

TIA, ‑‑ Eiríkr Útlendi │Tala við mig 21:21, 1 March 2018 (UTC)

Symbol support vote.svg Support. I think this is a step long overdue, but it also means there will be a lot of work... Suggestions: the kanji forms on def lines need to be made more conspicuous, cf. the 【】 notation in JA dictionaries. Maybe an additional template is indicated. Also I think the kanji forms can take even less information, provided we incorporate content into the kana entries, including the conj table, which can be extended to display multiple kanji forms in one cell. Wyang (talk) 22:01, 1 March 2018 (UTC)
I should have included this earlier -- @Wyang, please have a look at 巧い (umai) as an example of a kanji spelling entry as a soft-redirect to the fuller kana entry.
I agree that some formatting, and probably different (new?) templates, may well be called for. ‑‑ Eiríkr Útlendi │Tala við mig 00:12, 2 March 2018 (UTC)
I'd like to see more complex examples, and what sort of templates could improve them. As Wyang says, this is going to be a very big job — Chinese unification was also a big job, so I know it can be done, but more planning is necessary first. —Μετάknowledgediscuss/deeds 02:59, 2 March 2018 (UTC)

My thoughts (mostly related to practical usability):

  1. ja.wt has definitions on the kanji entry for Sinoxenic words; for example: ja:意味 (imi). Is this something we should consider doing?
    1. Do users want to manually go to 辞典 from じてん?
    2. See also ja:いみ (imi), which has definitions for native words and redirects to Sinoxenic words.
  2. Paper dictionaries use kana as the main entry because they are sorted alphabetically and have multiple words on one page.
    • Other online dictionaries don't have these problems because they are more database-like.
  3. How do we indicate the rarity of a kanji spelling?

Personally I think that the most common spelling should be used, solely because of usability (which admittedly can be unsightly), but kana entries make a lot of sense for native words. —suzukaze (tc) 03:48, 2 March 2018 (UTC)

I thought this proposal only affects native Japanese words, and that Sinoxenic words would be kept at their kanji spellings (?) Wyang (talk) 00:31, 4 March 2018 (UTC)
As initially conceived, I hadn't fully considered Sinoxenic terms. In light of the discussion above, I agree with the coalescing consensus that Japanese Sinoxenic terms (those deriving originally from Chinese borrowings) should use the kanji spellings for the lemmata, with the kana spellings serving as soft redirects -- much as the current status quo. Meanwhile, native Japanese terms and fully-nativized borrowings (such as たばこ (tabako), which is old enough that it has multiple broadly accepted kanji spellings) would have lemmata content moved to the kana spellings, with the kanji spellings serving as soft redirects -- the opposite of the current status quo. ‑‑ Eiríkr Útlendi │Tala við mig 09:08, 4 March 2018 (UTC)
What about words like  () (きょう)故郷 (ふるさと) (kokyō furusato)? —suzukaze (tc) 01:43, 5 March 2018 (UTC)
You bring up a good point, that Japanese is sometimes variable enough in the kanji spellings, but consistent in the kana, that it might make sense to use kana for lemma entries even for Sinoxenic terms.
As a counterargument, Daijirin lists at least 26 different kanji spellings for the kana sequence かんせい (kansei). If we were to use かんせい as the lemma, the entry would be quite horrifically huge. This is not true for every Sinoxenic reading, but it's common enough that we need to consider the ramifications. ‑‑ Eiríkr Útlendi │Tala við mig 20:48, 13 March 2018 (UTC)
Well, the entry looks useful, but the information beneath the definitions needs more collapsing. After all, definitions are the primary function of this site, everything else is secondary data, so ease-of-access to the defs should be our primary concern. Korn [kʰũːɘ̃n] (talk) 09:16, 4 March 2018 (UTC)
@Korn -- do you mean うまい (umai)? There was a thread somewhere about making usexes auto-collapsing, similar to the current behavior of quotes. I think it was Wiktionary:Beer_parlour/2018/March#Hiding_usexes. ‑‑ Eiríkr Útlendi │Tala við mig 18:55, 14 March 2018 (UTC)
Symbol support vote.svg Support for now. I think we'll be able to iron out problems. —suzukaze (tc) 01:43, 5 March 2018 (UTC)
Neither way is perfect, in my opinion.
  1. A kana or mixed spelling may be more common for Sino-Japanese spellings as well, especially for complex or rare characters.
  2. If we choose kana forms for lemmas, then it would make sense to do this for Sino-Japanese terms as well. By Sino-Japanese, I mean all terms using on'yomi readings, not necessarily just terms borrowed from any form of Chinese.
  3. I prefer the status quo but to care about duplication of contents. Perhaps native verbs and adjectives should only have inflections in the kana entries. --Anatoli T. (обсудить/вклад) 02:36, 5 March 2018 (UTC)
As another example, I recently reworked the あばく (abaku) entry. This spelling has three different etymologies by current research, all of which seem to be at least loosely related. The terms have three different kanji spellings, and one etymology for which no kanji spelling is (yet?) attested. ‑‑ Eiríkr Útlendi │Tala við mig 20:48, 13 March 2018 (UTC)
I hope this discussion doesn't die down. I think this layout is logical, and will prove to be much superior in the long run. The kanas were designed specifically for this reason (i.e. to record wago more accurately), and this makes the hiragana forms the most suitable lemma forms, out of all the possible variant forms of a word. Wyang (talk) 10:39, 14 March 2018 (UTC)
If the discussion dies down, it likely means everything was said by everyone who cares. Then it's time to be bold and just do the stuff that was agreed on. Korn [kʰũːɘ̃n] (talk) 10:59, 14 March 2018 (UTC)

Middle Japanese again[edit]

We still have 6 Middle Japanese entries without any code for that language- can we decide what should be done? DTLHS (talk) 02:43, 2 March 2018 (UTC)

Forgive me, I can't remember how to find those. Are they in a category? ‑‑ Eiríkr Útlendi │Tala við mig 01:02, 3 March 2018 (UTC)
Sorry, here they are: かめ, かへる, かへす, かはる, かはす, かふ DTLHS (talk) 01:07, 3 March 2018 (UTC)

Book Pahlavi in Unicode[edit]

Man, am I the only one that's pissed that Book Pahlavi hasn't been added to Unicode yet? Why the heck hasn't this proposal gone forward?! --Victar (talk) 03:15, 2 March 2018 (UTC)

It seems like Unicode blogged about working on it two days ago: [1]suzukaze (tc) 03:17, 2 March 2018 (UTC)
Woot! Thanks for sharing, @Suzukaze-c. I hope the move quickly on it. --Victar (talk) 03:24, 2 March 2018 (UTC)
lol, that's great! I am annoyed at having Latin script Middle Persian Entries too. —AryamanA (मुझसे बात करेंयोगदान) 04:14, 2 March 2018 (UTC)
More frustrating for me is that we already have Manichean Unicode but no good fonts to support it. Crom daba (talk) 10:39, 2 March 2018 (UTC)
I'm still waiting for Tocharian. —Mahāgaja (formerly Angr) · talk 11:24, 2 March 2018 (UTC)
@Crom daba:, I've been using this one, which I ripped from a Unicode proposal PDF. --Victar (talk) 16:08, 2 March 2018 (UTC)
Wow, thanks! I've been periodically checking this page, but they don't list whatever font this is. Crom daba (talk) 16:15, 2 March 2018 (UTC)
@Crom daba: Yeah, it's not publicly available yet. I ripped it from https://unicode.org/charts/PDF/U10AC0.pdf. FYI: @AryamanA, Vahagn Petrosyan --Victar (talk) 16:39, 2 March 2018 (UTC)


Why is this under English lemmas? ---> Tooironic (talk) 15:55, 2 March 2018 (UTC)

Fixed. Equinox 16:00, 2 March 2018 (UTC)

Middle Persian language codes[edit]

Now that we have Unicode Manichaean, and are soon getting Unicode Book Pahlavi, I think it imperative that we rehash the conversation on whether the two should still be split into separate languages codes, one for Pahlavi pal, and the other for Manichean xmn. My arguments for unifying them under one code are as follows:

  1. Book Pahlavi and Manichean are scripts (and a religion), not languages.
  2. The general pronunciation of Pahlavi and Manichaean is mostly identical, far less distinct that Old Avestan and Younger Avestan or Vedic Sanskrit and Classical Sanskrit.
  3. Unnessary category division, i.e. Category:Ancient_Greek_terms_borrowed_from_Manichaean_Middle_Persian.

Pinging @AryamanA, माधवपंडित, Vahagn Petrosyan, -sche, ZxxZxxZ. --Victar (talk) 17:52, 3 March 2018 (UTC)

Symbol support vote.svg Support Crom daba (talk) 18:00, 3 March 2018 (UTC)
Symbol support vote.svg Support, but we should tag the variety inside the page using {{lb}} or something in the headword line. --Vahag (talk) 20:02, 3 March 2018 (UTC)
Agreed. --Victar (talk) 20:11, 3 March 2018 (UTC)
Symbol support vote.svg Support -- माधवपंडित (talk) 03:07, 4 March 2018 (UTC)
Symbol support vote.svg Support, but as Vahag said. —AryamanA (मुझसे बात करेंयोगदान) 04:10, 4 March 2018 (UTC)

@-sche, do you have any thoughts before moving forward on this? --Victar (talk) 14:52, 14 March 2018 (UTC)

Symbol support vote.svg Support; based on previous discussion it does seem like we're dealing with dialects and not separate languages, especially because it sounds like the two varieties have differences within themselves (temporally, regionally or otherwise), not just between each other. ISO/SIL also split some other languages by script, e.g. Luwian. In that case, we just picked one of the codes to use for both script varieties; we could do that here; it would have the advantage that we'd be using a shorter code and a recognized (ISO) one, but the disadvantage that it might be confusing for people to see content from the second lect under the first lect's code. - -sche (discuss) 18:00, 14 March 2018 (UTC)
Symbol support vote.svg Support*i̯óh₁n̥C[5] 19:28, 14 March 2018 (UTC)
@-sche, why not run a bot to replace {{(.+)|xmn|(.*)}} and lang=xmn with pal? --Victar (talk) 21:33, 15 March 2018 (UTC)
That (or, the general concept of replacing the language code "xmn" with "pal", and changing the L2 headers at the same time) would work. In fact, it looks like we're dealing with so few entries that it would be feasible for me to do it with AutoWikiBrowser. Unless anyone has objections, I should have time to do that later. - -sche (discuss) 21:52, 15 March 2018 (UTC)
Does anything more complex need to be done to maintain the functionality of Module:Mani-translit beyond adding it to the data for "pal"? - -sche (discuss) 22:06, 15 March 2018 (UTC)
@-sche, nope, same functionality. If you're going to run some more bot conversions, adding |sc=Mani to those xmn entries would be awesome. --Victar (talk) 01:40, 16 March 2018 (UTC)

A fair number of entries have module errors because they still use xmn. — Eru·tuon 19:23, 17 March 2018 (UTC)

I've fixed all the ones I could, but take a look at Reconstruction:Proto-Indo-European/n̥- and Reconstruction:Proto-Indo-European/speḱ-. - -sche (discuss) 22:58, 17 March 2018 (UTC)

Spelling pronunciation[edit]

I've created a simple etymology template for marking (historical) spelling pronunciations. Is this okay with everyone? Crom daba (talk) 20:09, 3 March 2018 (UTC)

P.S. Once again I can't remember where I'm supposed to put category information other than Module:category tree/poscatboiler/data/terms by etymology to make {{autocat}} work.

Wiktionary:Beer_parlour/2014/January#spelling_pronunciations --Per utramque cavernam (talk) 20:11, 3 March 2018 (UTC)
I suspected there was a hidden Chesterton fence here, but I figured this was the best way to find it.
This is basically my use case: хязгаар#Mongolian, is this valid or not? Crom daba (talk) 20:36, 3 March 2018 (UTC)
Is it not a pronunciation spelling? From what I (think I) understand, the letter г was added to reflect the pronunciation more accurately. A spelling pronunciation is the reverse process: altering the pronunciation and matching it to the spelling (pronouncing salmon /ˈsælmən/, for example). --Per utramque cavernam (talk) 21:30, 3 March 2018 (UTC)
Yes, a spelling pronunciation is a pronunciation that's been altered because of how the word is spelled, like /ˈsælmən/ for salmon and a whole lot other examples. A pronunciation spelling, on the other hand, is a spelling that's been altered because of how the word is pronounced, like Enya for Eithne or (presumably) show for shew. —Mahāgaja (formerly Angr) · talk 21:59, 3 March 2018 (UTC)
I guess I should try to write more clearly. Classical Mongolian (g) stands for two different Proto-Mongolic phonemes (it's generally full of homography), and г (g) was added (to the [Khalkha] pronunciation, which is more faithfully reflected in Cyrillic orthography) as a misreading of (g) as *g instead of *x (actually as a mixture of both). Crom daba (talk) 12:20, 4 March 2018 (UTC)
I'm still not sure what's going on. Is the word pronounced the way it is etymologically expected to be pronounced, but "misspelled" (from an etymological point of view)? Or is it spelled the way it's etymologially expected to be spelled, but "mispronounced" (from an etymological point of view)? Or both, or neither? —Mahāgaja (formerly Angr) · talk 15:24, 4 March 2018 (UTC)
It is mispronounced. ᠬᠢᠵᠠᠭᠠᠷ (qiǰaɣar) renders Proto-Mongolic *kïjaxar, which regularly goes to *kïjaar -> *xïjaar -> xyajaar -> hyadzaar and then (or in some intermediate steps) g was inserted because the spelling is ambiguous between *kïjaxar and **kïjagar.
When I asked whether this use was valid, I meant whether everyone is fine with there being a template that would do the thing I did here (link to appendix + categorize), not whether this is an instance of a spelling pronunciation (I already know it is). Crom daba (talk) 00:00, 5 March 2018 (UTC)



Hi! From my own counting, it seems that unquestioningness‎ is the 5,500,000th pages to be created here. Congratulations Pamputt (talk) 09:26, 4 March 2018 (UTC)

For what it's worth as confirmation, I arrive at the same conclusion. (I counted back through the recent changes list of new entries as of when there were 5,500,059 entries per Special:Statistics, and again when there were 5,500,062 (double checking), to find the 5,500,000th new entry.) Congratulations to Equinox! - -sche (discuss) 10:00, 4 March 2018 (UTC)
Wiktionary:Milestones has been updated accordingly. SemperBlotto (talk) 10:09, 5 March 2018 (UTC)

Use of † in taxonomic entries[edit]

In biology, is sometimes placed before a taxonomic name to indicate that it is extinct. Some of our Translingual taxonomic entries, although only a small portion, use this notation, e.g. at Smilodon in the headword line as well as elsewhere in the entry. Whether or not a taxon is extinct is not lexical information; indeed, any species could go extinct without its definition, etymology, gender, hyponyms, or other lexical metadata changing. Extinction status is purely encyclopaedic, and belongs at Wikipedia. It is also unexplained in entries, which may confuse readers. As a result, it would probably be best to remove it from our entries. —Μετάknowledgediscuss/deeds 20:33, 4 March 2018 (UTC)

I'm not opposed to noting that a taxon is extinct, maybe in the definition ("an extinct species of..."), or it arguably is the sort of semi-lexical information we record in other cases by using {{lb|en|historical}}. (It does seem like a lot of effort to maintain.) But given that the cross/dagger also sometimes means the word or sense is obsolete — I think Chinese entries use it this way — and given that we have the space to spell things out, it's probably best to avoid using just the symbol. - -sche (discuss) 20:54, 4 March 2018 (UTC)
I don't work with these entries, but I agree with Metaknowledge. --Per utramque cavernam (talk) 14:03, 6 March 2018 (UTC)
I don't understand how the word encyclopedic applies to the dagger. Even less can I understand how the fact of something being extinct or extant or endangered isn't information of considerable interest to someone looking at a taxonomic entry, as much as whether something is "endangered" or "red" or "large", or a bird. Have we really lost touch with normal dictionary users to that extent? As to the changeability of the status of something sub specie aeternitatis, the same applies to the very words we use to gloss other words.
As to it not being explained, the same applies to m, f, n, and plenty of terms which use in labels, category names, etc, and even in definiens. We could link to [[]] or to WT:GLOSS, approaches we have taken with a few of these others. DCDuring (talk) 15:25, 6 March 2018 (UTC)
@DCDuring: It looks like you've "lost touch with normal dictionary users" if you think that they will know what the daggers mean and not be confused by their only occasional usage here. Gender is explained, if you merely hover your mouse over the letter. I agree with -sche that it is often appropriate to use "extinct" in the definition line, although not always (it probably isn't particularly useful for dinosaurs, for example). We can make those kinds of decisions without reliance on daggers. —Μετάknowledgediscuss/deeds 17:56, 6 March 2018 (UTC)
I am glad you support the use of hover notices. That might be a desirable alternative to a link, though a link to a sense id gloss can include more information, eg context and is more accessible for us technically challenged contributors.
I was actually surprised that you seemed to object to semantic content in your original complaint.
The dagger is just the kind of orthographic element that we have lavishly honored in showing ligatures and obsolete English characters in entries, especially in alternative forms and in citations. DCDuring (talk) 20:05, 6 March 2018 (UTC)
@DCDuring: Parts of your response are unintelligible to me, particularly the final sentence. I see no relationship between documenting all words that meet our criteria regardless of what characters they use and what we choose to put in our entries. Can you make a clear statement about what you want to do with this issue? To be abundantly clear, I want to remove all daggers and ensure that the word "extinct" is on definition lines where it is deemed useful. —Μετάknowledgediscuss/deeds 20:27, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose DCDuring (talk) 19:44, 7 March 2018 (UTC)
  • I hereby propose that someone, not me, create a means of linking a to the entry appropriate English definition at [[]] and that the daggers be permitted on the inflection line of taxonomic names and wherever else the taxonomic name of an extinct species may appear. DCDuring (talk) 19:44, 7 March 2018 (UTC)
The dagger is the ordinary means of indicating in text that a species is extinct (as an aside, Wikipedia makes heavy use of it as well), and I see no reason to prohibit it. However, I'm also not opposed to replacing it with the word "extinct." Andrew Sheedy (talk) 11:18, 12 March 2018 (UTC)
I intend to use the dagger as a means of locating entries that would benefit from the addition of {{R:Fossilworks}}. DCDuring (talk) 11:53, 12 March 2018 (UTC)

Hiding usexes[edit]

I assume it is OK to hide longer usexes in the same manner as quotations. The only thing that worries me is the click-on "quotations" heading is slightly misleading when it's a usex. I did an example at rekke. DonnanZ (talk) 11:42, 5 March 2018 (UTC)

I think we should change it to examples ▼, and hide all 'usage examples' and 'quotations' by default. Wyang (talk) 11:44, 5 March 2018 (UTC)
That would be better, or perhaps "usage examples and quotations" (perhaps a little long). As long as {{ux}} / {{usex}} or {{quote}} are used where appropriate. DonnanZ (talk) 13:50, 5 March 2018 (UTC)
Well, they're not, so this discussion seems a little pointless. DTLHS (talk) 15:53, 5 March 2018 (UTC)
It's not necessarily pointless if users are made aware that entries can be updated. DonnanZ (talk) 16:23, 5 March 2018 (UTC)
  • I don’t like the idea very much (after all, if a usex is so long that it’s a good idea to hide it, it’s almost always a better idea to use a shorter one instead). But if you do, please keep using {{ux}} so that parsers can have a chance at knowing it is not a quotation. — Ungoliant (falai) 16:33, 5 March 2018 (UTC)
In the example I gave above it was a sentence from Wikipedia which is I believe not allowable as a quote, so I treated it as a usex. DonnanZ (talk) 17:20, 5 March 2018 (UTC)
I think sentences from Wikipedia should be treated as quotes. They don't count toward attestation for CFI purposes because they're not durably archived, but they're still quotes and ought to be properly attributed. —Mahāgaja (formerly Angr) · talk 20:16, 5 March 2018 (UTC)
Oh right. So if a word is attestable for CFI, e.g. with dictionary references included, there's no problem with quotes from Wiktionary? DonnanZ (talk) 21:05, 5 March 2018 (UTC)
God no. Please don't start adding quotes from ourselves to pages- what an awful idea. DTLHS (talk) 21:18, 5 March 2018 (UTC)
We invent usexes so we can just shorten them as needed. Equinox 19:38, 5 March 2018 (UTC)
Y-yes, in this case it made sense to include the whole sentence. DonnanZ (talk) 20:18, 5 March 2018 (UTC)
Personally I like having more space between definitions- it makes the page easier to read for me. DTLHS (talk) 20:25, 5 March 2018 (UTC)

Category:Quotation templates to be cleaned[edit]

There's over 8,000 of them, but what exactly is needed: {{quote}} templates, or is it something else that needs attention? DonnanZ (talk) 13:42, 5 March 2018 (UTC)

They might not need to be cleaned at all, the only problem with them is that they are using a generic {{quote-text}} instead of a specific variant ({{quote-book}} usually). If we are OK with some quotes being generic, you only need to clean the category out of the template. - TheDaveRoss 20:07, 5 March 2018 (UTC)
@TheDaveRoss: Thanks. That was indeed the case in the entry I was looking at. That's one off the list, only 8,621 to go. DonnanZ (talk) 20:44, 5 March 2018 (UTC)
@TheDaveRoss There would appear to be a problem with these, getting them to register in the category for quotes. I tried adding "en" to the quote at glowing, but that isn't the solution, there must be something else that should be done, a rewrite? DonnanZ (talk) 14:25, 9 March 2018 (UTC)
@Donnanz What category for quotes? - TheDaveRoss 15:22, 9 March 2018 (UTC)
@TheDaveRoss: Category:English terms with quotations, where all of these should go. DonnanZ (talk) 15:31, 9 March 2018 (UTC)
@Donnanz: That is from {{quote}}, not the {{quote-book}} family. If you want all of those quotes to go into that category you will have to add the category to {{quote-meta}} or each of the family of templates. - TheDaveRoss 15:46, 9 March 2018 (UTC)
@TheDaveRoss: I'm not allowed to edit that template, even if I could make head or tail of it. DonnanZ (talk) 16:02, 9 March 2018 (UTC)

Moving snowclones back to the mainspace[edit]

I think that snowclones (i.e., the ones listed in Appendix:English snowclones) should be included as dictionary entries in the main namespace, as long as they follow CFI's rules on attestation and idiomaticity. The rationale is that it is far less convenient for a dictionary reader to have to look in the appendix for this than it is to look in the mainspace. They are just like any other idioms and are just as lexical; it's just harder to fit general semantic variables like someone or something into them. I think things like X is the new Y or to X or not to X are perfectly decent entry-material. So, let's make it easier for the readers and move these exceptional idioms to the mainspace. (P.S.: I remember now that a while back I created ride the ... train. This was before I knew about the appendix page or about what a "snowclone" is. I just modeled the "..." after the already existing phrasebook entry I am ... year(s) old, which may need to be changed a bit too. That name looks a little funky.) PseudoSkull (talk) 04:28, 6 March 2018 (UTC)

Symbol support vote.svg Support. --Daniel Carrero (talk) 08:40, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose having them in the main space. We can create redirects to an appendix. --Per utramque cavernam (talk) 12:05, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose per Per utramque cavernam. DCDuring (talk) 12:56, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose Equinox 12:57, 6 March 2018 (UTC)
Symbol support vote.svg Support Crom daba (talk) 13:52, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose --WikiTiki89 16:18, 6 March 2018 (UTC)
Comment: @Per utramque cavernam, DCDuring, Equinox, Wikitiki89 If I may kindly ask, can any of you do me the favor to explain (further) why you oppose this idea? PseudoSkull (talk) 16:41, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose --Victar (talk) 23:03, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose. Some, but by no means all, are already in mainspace. These constructions are not necessarily phrases suitable for inclusion in the body of a dictionary. bd2412 T 03:00, 7 March 2018 (UTC)
Important note: I think I forgot to mention that I only propose to include idiomatic snowclones. Things like I know X better than you'll ever know X, which can be deduced from its parts, should not be included (if it's even attested in the first place). ride the X train, which cannot be deduced from its parts, should. PseudoSkull (talk) 04:49, 7 March 2018 (UTC)
See also Category:English snowclones. That category is important for our discussion.
I was thinking the same thing when I voted "support", though I failed to say it. Yes, only idiomatic ones. In my opinion, phrases like Appendix:Snowclones/X with a capital Y should be included in the mainspace, because you can't deduce the meaning from the sum of its parts.
The entry awesome with a capital A was deleted in 2011 (RFD: Talk:awesome with a capital A) because it's a snowclone.
But technically, the CFI doesn't currently offer a snowclone caveat, so it seems that in theory all attestable variations of "X with a capital Y" could be created as entries (jerk with a capital J, snowclone with a capital S, dictionary with a capital D...), they can't be deduced from their parts.
In my opinion, having a snowclone entry (X with a capital Y) as opposed to entries for all variations of a snowclone is better, because it covers all possibilities.
The entry name could be X with a capital Y or ... with a capital ..., but it's tempting to create a title like just with a capital for snowclones that have no variables in the middle, just at the extremities. Or to be even more minimalistic, just add a new sense at capital to explain the "X with a capital Y".
I gave a few ideas to be discussed, but my preference is for having a mainspace snowclone title like this: X with a capital Y.
Yes, some other snowclones are just common SOP phrases, like Appendix:Snowclones/X and Y and Z, oh my!. They don't merit mainspace entries whatsovever. --Daniel Carrero (talk) 08:50, 7 March 2018 (UTC)


I am trying to a get a vote going to amend CFI to expressly allow retronyms. I think retronyms are interesting to people who study language. Quite often they will have a current meaning which is transparently equal to 'sum of parts' and that is why I feel they are deserving of special protection. As an example, analogue clock merits inclusion in my view even though it is really just analogue + clock. See also: Category:English retronyms. John Cross (talk) 06:33, 6 March 2018 (UTC)

How interesting are retronyms to normal people who use dictionaries? —This unsigned comment was added by DCDuring (talkcontribs).
A large number of words we have will be uninteresting to most normal people. I am interested in creating entries that a small proportion of our users find interesting. John Cross (talk) 21:09, 6 March 2018 (UTC)
I feel as though this should at least be qualified, i.e. allow retronyms unless... What are the worst types of SoP etc. that this rule would permit? Equinox 13:00, 6 March 2018 (UTC)
This would permit the likes of paper book and mechanical mouse. I don't think these are useful or belong in a dictionary. —Μετάknowledgediscuss/deeds 21:28, 6 March 2018 (UTC)
And even biological mouse, perhaps even mammalian mouse. ←₰-→ Lingo Bingo Dingo (talk) 13:05, 16 March 2018 (UTC)
I also don't think this is a useful concept for us. Retronyms are just one example of general disambiguating techniques. Just like in England, what Americans call football is called American football, and who in England is called "the Queen", in America is called "the Queen of England". When these terms are idiomatic, we include them. When they are SOP, we don't. Retronyms are no different. --WikiTiki89 22:37, 6 March 2018 (UTC)
The only thing that gives me pause is that something that wasn't SoP originally might become SoP over time. I wish I could think of an example; I'm sure this came up on here before. But suppose that a new phrase Adj+N is coined, and gets in all the dictionaries, and then Adj comes to be used more generally, with other nouns, significantly later: it seems wrong to delete the original Adj+N when it was the predecessor. Equinox 22:41, 6 March 2018 (UTC)
We do have a rule that if a word was once idiomatic but is now SOP, it is to be included. --WikiTiki89 22:46, 6 March 2018 (UTC)
If as an example, it can be shown that hammerhead shark was used first and hammerhead came later as shortened form then our policies appear to allow both to be included even if hammerhead sharkhammerhead + shark. I want the same in reverse - sort of - if it can be shown that compass is the original term and magnetic compass comes second to distinguish from steady-state compasses then ... I would still want to be able to include magnetic compass even if magnetic compassmagnetic + compass. -- John Cross (talk) 06:18, 7 March 2018 (UTC) (edited John Cross (talk) 06:29, 7 March 2018 (UTC))
I disagree. All that would need to be done is to have enough definitions at compass that cover all historical usage. There's no need to have magnetic compass, unless it is shown to have its own specialized meaning. --WikiTiki89 15:31, 7 March 2018 (UTC)
I agree with Wikitiki. (And the test referred to above is WT:JIFFY, for anyone who didn't already know.) - -sche (discuss) 16:07, 7 March 2018 (UTC)

News from French Wiktionary[edit]

Logo Wiktionnaire-Actualités.svg


February issue of Wiktionary Actualités just came out in English!

A snowy issue of Actualités just fall on Wiktionary with not-so-chilled news and stats, surrounded by three articles: Wiktionarians allies, a dictionary that went through some trouble way know as well and a speaking orca! As usual, some changes in Wiktionary projects and advices of videos about languages and linguistics (including some in English!).

This issue was written by seven people and was translated for you by Pamputt. This translation may be improved by readers (wiki-spirit) like it was last month by Xbony2 (thanks a lot!). We still receive zero money for this publication and your comments are welcome. You can also receive a notice on your talk page if you want Face-smile.svg Noé 13:57, 6 March 2018 (UTC)

I like that Bahubali got a mention, haha. —AryamanA (मुझसे बात करेंयोगदान) 15:33, 6 March 2018 (UTC)

Romance words and Medieval Latin[edit]

I just thought about something. Can we truly say that a Romance word can be inherited from Medieval Latin (or Ecclesiastical)? So far I've been doing that at times. Like if a certain word is found mostly in Medieval or Ecclesiastical Latin, but underwent all the normal changes into the descendant language, and it is a common, popular word. Or if the meaning matches the Medieval Latin sense of a word more than the Classical (like coxa ("thigh" in Medieval and Romance langauges but "hip" in Classical). By all indicators these should be inherited terms.

But the problem is, the way I see Medieval Latin defined on Wikipedia for example, was as this kind of artificially preserved language that was no longer popularly spoken, but used in things like administration, writing, church, etc. By the time the Middle Ages came along, the Romance languages/vernaculars had already begun diverging from the spoken Vulgar Latin. So does it really make sense to say they're inherited from this register or form of Latin (the way the Wiktionary templates work now allows inherited to be used on Romance terms with any form of Latin, including New Latin even!)? How are we going to define "Medieval Latin" for Wiktionary's purposes? Is it possible that what we're really looking at for those terms is rather (inherited) descent from a parallel Late or Vulgar Latin term that was more or less the same in form (or at least meaning) as the attested Medieval Latin one? There are certainly many cases of obvious borrowings from Medieval Latin (and some Medieval Latin words even crafted or coined based on existing Romance, like Old French, words), but I'm talking about apparent inherited ones. Like how do we handle coxa for example? Maybe it's best to put that sense as Late Latin instead? Word dewd544 (talk) 16:00, 7 March 2018 (UTC)

If the meaning "thigh" is what the Romance languages have, it's probably best to call that sense Vulgar Latin. Same with focus (fire) rather than "fireplace". —Mahāgaja (formerly Angr) · talk 16:48, 7 March 2018 (UTC)
Indeed, there are many cases where a sense is shared between Vulgar Latin and Mediaeval Latin because the latter has borrowed it from a Romance language that inherited it from the former. —Μετάknowledgediscuss/deeds 18:25, 7 March 2018 (UTC)
Ok, that works for me. But it will admittedly require a bit of backtracking and redoing of some etymologies in which I've put "inherited" from Medieval Latin (because at the time I didn't know how else to handle it; I used to incorrectly treat Medieval Latin in these contexts as essentially being Vulgar Latin in the Early Middle Ages, which is more accurately what we'd be looking at for inherited terms). I assume the same goes for Ecclesiastical Latin? Like all the religious related words like presbyter, episcopus, pascha, abbas, monachus, basilica, baptizo, blasphemo, etc.? And here's another issue: say if we have a Latin word that is listed in its entry as Medieval Latin, but in the Romance descendant's etymology we use Vulgar Latin instead. I imagine it would still be linked to the main entry that is described as "Medieval Latin" (without an asterisk), since making a separate VL. reconstruction page for each instance would be ridiculous. That's just reserved for terms that were unattested in any written form of Latin. Word dewd544 (talk) 17:16, 8 March 2018 (UTC)

sports ticker and score card abbreviations[edit]

Over in [Requests for verification:English] someone has tagged a number of sports ticker abbreviations, such as UTA, LAL, WIM, etc. They are clearly a thing, but are they a thing we want in Wiktionary? The RFV process doesn't work well for them, because sports tickers are not durably archived, but they seem pretty standard in their own subculture. Given the number of these, it seems like something we could use a policy decision on. Kiwima (talk) 01:50, 8 March 2018 (UTC)

First version of Lexicographical Data will be released in April[edit]

I come bearing a message from WikiData.

After several years discussing about it, and one year of development and discussion with the communities, the development team will deploy the first version of lexicographical data on Wikidata in April 2018.

A new namespace and several new datatypes will be created in order to model words and phrases in many languages. Editors will be able to describe words in Wikidata, and in the future, to query this information, and to reuse it inside and outside the Wikimedia movement.

If you’re curious to discover how this new data structures will look like, you can have a look at the data model. It is suggesting a technical structure, but the editors will remain free to model and organize data as they prefer, with the usual open discussions and community processes that we apply on Wikidata. The documentation will be improved step by step, with the different releases and help of the community.

Please note that the version that will be deployed in April is a first version, that will be improved in the future, thanks to your tests, comments and suggestions. Some features may be missing, some bugs may occur. We can already tell you that the following features will be included in the first version:

  • Add, edit and delete Lexemes, Forms, statements, qualifiers, references
  • Link from an Item or a Lexeme to an Item or a Lexeme
  • Basic search feature

And the following features will not be included in the first version, but are planned for the future:

  • RDF support (which means: the ability to query it with query.wikidata.org)
  • Senses will not be included in the first version, to give you all some time to get properties, processes, etc in place for Lexemes and Forms
  • Entity suggestion and better search features
  • Merge Lexemes

You can have a look at a more detailed features list. After the first deployment, we will start a discussion with all of you about what are the most important features for you, so we know which ones you would like us to work on next.

Thanks to the people who already showed support and curiosity about lexicographical data on Wikidata. We hope that when it will be deployed, you will test it, experiment with the languages you know, and give us some feedback to improve the tools in the future.

While waiting for the release, here’s what you can do:

  • Improve the list of tools with ideas of tools that could be built on the top of lexicographical data
  • Add your ideas of cool queries you’d like to do with words and phrases in the future
  • Have a look at the project page and especially the talk page, where people are already asking questions, and discussing about how to model data and other topics
  • If you’re involved in a Wiktionary community, discuss with them and answer any questions they might have about Wikidata. You can also register as ambassador for your community.

Last but not least, we are kindly asking you to not plan any mass import from any source for the moment. There are several reasons behind that: first of all, like mentioned above, the release will be a first version and we need to observe how our system reacts to the manual edits before starting considering automatic ones. The system may not be ready for big massive imports at the beginning. Second reason is legal. Lexicographical data in Wikidata will be released under CC0, and the responsibility of each editor is to make sure that the data they will add is compatible with CC0. For more information, you can have a look at the advice of WMF Legal team. Finally, we strongly encourage you to discuss with the communities before considering any import from the Wiktionaries. Wiktionary editors have been putting a lot of efforts during years to build definitions, and we should be respectful of this work, and discuss with them to find common solutions to work on lexicographical data and enjoy the use of it together.

If you have any question or idea, feel free to write on Wikidata:Wikidata talk:Lexicographical data. Further discussion is also ongoing at Wikidata:Wikidata:Project chat#First version of Lexicographical Data will be released in April. Cheers! bd2412 T 02:59, 8 March 2018 (UTC)

As an offshoot, but really unrelated to the Wikidata effort, I wonder how much content on en.wiktionary would become CC0 based on a small number of contributors ex post facto releasing their work in such a manner. There are many entries which have only been touched by a single primary author and then a number of bots for formatting, I don't know whether the bots even count as authors. If someone has the full edit history downloaded I imagine it would be possible to do some modeling and determine how many entries here would be CC0 if the top 10, 20, 50, etc. editors were willing to transfer the license. If we wanted to get fancy and remove from the edit history all reversions (that is any intermediate edits between two equivalent versions of the same page), or perhaps consider section by section. While I am not a fan of the process that Wikidata seems to favor when interacting with other projects, I would love to be able to back our project with a more structured data. I think this would open myriad doors for improvements in presentation and usefulness. - TheDaveRoss 13:02, 8 March 2018 (UTC)
I would actually be hard-pressed to imagine a Wiktionary editor asserting any kind of copyright in their contributions. bd2412 T 21:22, 8 March 2018 (UTC)
It seems like they're treating us in the same way that we treat other dictionaries- as a source of information that can't be copied directly but can be paraphrased and used as a source. DTLHS (talk) 21:27, 8 March 2018 (UTC)
Except that unlike a print dictionary, we're an active community that they could collaborate with if they chose to do so. —Μετάknowledgediscuss/deeds 21:30, 8 March 2018 (UTC)
Lots of frustrated voices over at the project chat discussion (now several pages long). Anyway, I encourage you to participate in the technical discussion happening right now, the project enters a crucial early phase were important decisions are made. I'm curious when and why the project "rebranding" of "Wikidata: Structured data for Wiktionary" to "Wikidata: Structured Lexicographical Data" happened, it was mentioned a few times in the discussion (and, as pointed out, changed the tone of the collaboration). – Jberkel 12:13, 12 March 2018 (UTC)

Category for Insurance terminology[edit]

Is there a particular process for agreeing and implementing a new category for English terms? I'd like to create and populate Category:en:Insurance for terminology used within the insurance industry (a subcategory of Category:en:Finance seems most appropriate), and add automated categorisation via labels using Module:labels/data/topical. I'm holding back from "being bold" to make sure I don't step on any toes. -Stelio (talk) 10:37, 8 March 2018 (UTC)

Be bold. - TheDaveRoss 12:56, 8 March 2018 (UTC)
Yes check.svg Done -Stelio (talk) 14:58, 8 March 2018 (UTC)
Some of the items in the category (eg, economic) seem not to have any distinct insurance sense. Unless they do, the category seems misleading. This is not unique to this topical category, but it might be well to address the problem now before populating the category recklessly. I think the problem is associated with the presence of hard categorization and the absence of {{label|insurance}} categorization. An "incategory" and "insource" Cirrus search should quickly identify the possible problems. DCDuring (talk) 16:09, 8 March 2018 (UTC)
The category may be too new for the Cirrus search term "incategory:en:Insurance" to work. The "insource" term won't be allowed to run unless it is restricted to run over a readily identified, "not-too-big" subset of Wiktionary entries. DCDuring (talk) 16:18, 8 March 2018 (UTC)
Thanks you very much for the review, @DCDuring. Yes, I had two difficulties here:
  1. Words with definitions that are more generic that a specific insurance sense. For example term, definition 8, is "Duration of a set length...". That's the insurance definition: the term of an insurance policy is the amount of time from its inception to latest expected termination. But I shouldn't label that definition with "insurance" because it applies in wider circumstances too. Would an indented insurance definition would be appropriate (8.1)?
  2. Avoiding SOP terms. For example "economic assumption" is a modelling assumption (assumption is on my list to update with that sense) that relates to economic factors. That feels SOP to me, but the economic/demographic split is sufficiently important in the insurance world to merit categorising those terms. Perhaps then an additional definition of economic with a sense that is labelled as "insurance" and "of an assumption", then?
I'm definitely keen to get this right and conform to established site norms, so I value this feedback. -Stelio (talk) 16:41, 8 March 2018 (UTC)
If the insurance sense of "term" is in fact covered by an existing, broader sense of "term", then I wouldn't add a subsense. (To give an extreme example: insurance documents also use "the", but it's the same "the" as everyone else also uses, so there's no need for an insurance-specific sense.) If the insurance sense is significantly different, then a subsense is merited. Whether or not terms that seem important to insurance but aren't specific/limited to it (like "economic" and maybe "term") should be categorized is less clear. As DCDuring suggests, it's an unclearness that plagues our category structure in general, and despite giving it thought, I don't know what to advise you. Many other categories do include terms that seem related/important without being limited to the category's named context. - -sche (discuss) 18:06, 8 March 2018 (UTC)
The problem is that a topic-specific glossary is useful because it contains ONLY the relevant sense and usage of polysemic words like term, policy, and economic. As a comprehensive (and historical dictionary) we, by definition, try to include all definitions. Someone only interested in the insurance use of a term can get lost in our entries for such terms. We could have appendices (eg, Appendix:Glossary of terms used in insurance [term again!!!]) that contained links to the specific definitions using {{senseid}}. This would serve as a specialized portal for passive users as well as contributors who had specialized topical interests. DCDuring (talk) 21:30, 8 March 2018 (UTC)

Automatic transliteration of Biblical Hebrew[edit]

Would it be possible and desirable to implement automatic transliteration of the etymology-only language Biblical Hebrew (hbo) when it's fully pointed? That way, {{der|en|hbo|אָמֵן}} would automatically provide the transliteration ʾāmēn, but {{der|en|he|אָמֵן}} (and of course both {{der|en|he|אמן}} and {{der|en|hbo|אמן}}) would still require manual transliteration. Would that be technically possible, and if so, would other people find it a good idea? —Mahāgaja (formerly Angr) · talk 13:15, 8 March 2018 (UTC)

There are still a lot of complications that need to be solved. We already have an experimental module Module:he-translit, which is able to transliterate about 90% of words (which is not at all good enough for automatic transliterations), but without stress marks. The next step would be to implement support for stress marking, but this would also require adding stress marks or cantillation marks to Hebrew text that needs to be transliterated. We decided a while ago not to allow stress marks or cantillation marks in Hebrew text due to poor font support, which has improved a little but not enough over the years. Additionally, we would need to start strictly using the Unicode HEBREW POINT QAMATS QATAN (U+05C7) instead of the regular qamats mark whenever it represents a short-o, and we would need a way to mark the distinction between sheva na and sheva nach, which currently do not have separate Unicode codepoints. Once all that is done, however, it should work equally well for Biblical Hebrew and Modern Hebrew. And then there is also a minor issue that it would be impossible to distinguish between abbreviations (which should be transliterated letter-for-letter) and Hebrew numerals (which should be transliterated as "Arabic" numerals). So in short, it's not possible yet until we can solve some of those problems. --WikiTiki89 17:19, 8 March 2018 (UTC)
Regarding the shevas, it was pointed out that Michael Everson is a wiki user, w:User talk:Evertype; if we can point (no pun intended) to texts where the shva na and shva nach are used contrastively, we could ask him about proposing a new Unicode codepoint. - -sche (discuss) 18:22, 8 March 2018 (UTC)
@-sche: Here are couple examples:
Both of these examples also differentiate qamats qatan from qamats gadol. --WikiTiki89 19:39, 8 March 2018 (UTC)
@Wikitiki89 Thanks. I'm writing to him now. I see that some references say the shvas are no longer normally pronounced differently in modern Hebrew; if so, which one are they pronounced as? If Unicode, instead of adding new codepoints for both, were to desire to assume that the existing shva codepoint could be taken to be one of them (with only one new codepoint added, for the other one, for those texts which distinguish it), which shva should be the "default" shva and which one should get a new codepoint? (I will see if Michael thinks it would be better to propose two new codepoints or just one.) - -sche (discuss) 20:18, 8 March 2018 (UTC)
@-sche: Regarding your question about Modern Hebrew pronunciation: Generally, the old distinction of the shvas has disappeared, but there is a new distinction between null and /e/, depending on the phonological environment or morphology, with some environments having free variation between them. Regarding your question about codepoints, I definitely don't think we need two new codepoints and I would say that simply for graphical reasons the shva nach should share the current codepoint, and the shva na should get the new codepoint, because generally when they are distinguished, the shva nach has a normal or maybe slightly reduced size, while the shva na is clearly enlarged and/or bolded. --WikiTiki89 20:40, 8 March 2018 (UTC)
OK, I left a message, with your informative explanation of how differently they are displayed. Hopefully he can either make the proposal or advise us on making it. - -sche (discuss) 21:28, 8 March 2018 (UTC)
If the two are sometimes distinguished in writing, then by all means they should have separate code points. But isn't the distinction always clear from environment anyway? Are there any words where you can't tell whether a schwa is na or nach just from its environment? —Mahāgaja (formerly Angr) · talk 11:53, 9 March 2018 (UTC)
No, the distinction is not always clear from the environment, otherwise we wouldn't have this problem. --WikiTiki89 20:11, 9 March 2018 (UTC)
Alright, Michael Everson sent me an e-mail explaining that the next step is that we need a point person who is willing to use their real name, and it would be helpful but not obligatory (because you can always just check back in here if questions come up) if it was someone with some knowledge of these characters and/or of Hebrew script generally, to e-mail him and another gentleman. If someone here is willing to be that person, I will send you the contact information. - -sche (discuss) 02:29, 11 March 2018 (UTC)
I'm willing to send the message and use my real name. I do not however have knowledge of Hebrew script (beyond the parallels it shares with Arabic script). So feel free to use me if a more suitable candidate does not arise. -Stelio (talk) 11:44, 12 March 2018 (UTC) I should probably ping you too, @-sche, in this response. -Stelio (talk) 12:16, 12 March 2018 (UTC)
Since I know Michael Everson IRL I'm willing to use my real name too. I do have a fair knowledge of the Hebrew script, although until this thread I never knew that the two schwas were sometimes distinguished in writing, so maybe my knowledge of the Hebrew script is insufficient. —Mahāgaja (formerly Angr) · talk 13:10, 12 March 2018 (UTC)
@Mahagaja: Don't beat yourself up over that. It's a recent phenomenon that is still limited to very specific religious publications, the same ones that also make a distinction between the two qamatses. If you look carefully at the second example I linked to above, you'll notice they even distinguish between the two types of dageshes. --WikiTiki89 15:01, 12 March 2018 (UTC)
@Wikitiki89: Yes, I see that now; it also distinguishes the two types of qamats. Do we want to request a new code for dagesh forte while we're at it? —Mahāgaja (formerly Angr) · talk 15:10, 12 March 2018 (UTC)
@Mahagaja: Maybe. It's less common, because it's much more straightforward to distinguish them (in 99.9% of cases). But I guess since it does exist, it might deserve a code. --WikiTiki89 15:16, 12 March 2018 (UTC)
OK, I've passed the contact info on to Mahagaja (by e-mail). Hopefully we get us some shiny new codepoints! - -sche (discuss) 00:47, 13 March 2018 (UTC)

"Proverb" PoS isn't a PoS[edit]

PseudoSkull pointed out that "proverb" is not in fact a part of speech. Please see "Proverb"_POS_at_Wiktionary. We should presumably convert these PoS into "phrase", and possibly tag proverb as a gloss. Thoughts? Equinox 09:32, 10 March 2018 (UTC)

It isn't, but phrase isn't either. Both is accepted per WT:EL#Part of speech. - 21:24, 12 March 2018 (UTC)


I'm getting "Lua error in Module:headword/templates at line 103: attempt to index field 'falt' (a nil value)". What's this? ---> Tooironic (talk) 02:32, 12 March 2018 (UTC)

Wiki Indaba 2018[edit]

Presentation for Wiki Indaba 2018.

Hi, Benoît Prieur and I will be at Tunis from 16 to 18 March to attend at WikiIndaba conference 2018. We will go there to present the Wiktionary (especially the French version) and hope to incite some people (everybody?) to contribute at the Wiktionary in languages from Africa that are currently underrepresented on Wiktionaries (Arabic, Berber, Fula, ...). A conference and a workshop have been accepted. The goal of the presentation is to try to show the interest of the Wiktionary (mainly French one) for the development and the visilibility of languages spoken in Africa.

I have posted a first version of the presentation. I would really appreciate if one of you could correct the English on the slides before Thursday so that I can take it into account. I can provide the odp file if needed. I would be also happy if you have comment about what we wrote. Thanks in advance. Pamputt (talk) 06:51, 12 March 2018 (UTC)

I really appreciate that you're doing this! Obviously, Francophones will have you as a point of contact, but for people who are more comfortable with English, I would be happy to mentor people who want to contribute in African languages to en.wiktionary. —Μετάknowledgediscuss/deeds 07:17, 12 March 2018 (UTC)
@Pamputt: In general, the slides are well written. :) On page 7, perhaps "allow understanding by all audiences" would be better than "allow understanding for all audiences". On page 8, it's not clear what "words with a confidential use" would be, or how Wiktionary would know a secret use; perhaps you mean "restricted (to certain contexts/jergons)" or "literary"...? On page 9, instead of "built languages", it's more natural to say "constructed languages" or "artificial languages". On page 11, the language is fine, it's just hard to read the yellow font "Austronesian: Malagasy" is in. On page 12, "Adding a new entry... (compare to..." sounds more natural than "Add a new entry... (to compare to..." IMO. And "limited knowledge" wouldn't normally take an article (a) AFAIK. In "Contributing help to learn its own language or rediscover it", it's unclear what "its" and "it" refer to... maybe "Language communities contributing helps them maintain and deepen knowledge of their own languages"? And one would normally speak of interest from linguists or of something being of interest to linguists. On page 15, "native speaker" is more natural and simpler (for any non-native speakers to understand) than "locutor". But again, on the whole, well-written; it sounds like an interesting and informative presentation! :) - -sche (discuss) 07:41, 12 March 2018 (UTC)
@Metaknowledge thanks for your messge. Sure I will give your name if some English speaker needs help to contribute here. Pamputt (talk) 22:36, 12 March 2018 (UTC)
@-sche thank you very much for your corrections. I took them into account within the new version of the presentation. If you have more comments, do not hesitate to write them. :D Pamputt (talk) 22:36, 12 March 2018 (UTC)


I'm wondering what the best way to add Pazend, the Middle Persian variant of Avestan, which contains an extra character (𐬮) and several unique ligature. Would this be inappropriate?

m["pal-Avst"] = {
        canonicalName = "Pazend",
        characters = m["Avst"].characters,
        direction = "rtl",
        parent = "Avestan",

--Victar (talk) 19:25, 12 March 2018 (UTC)

What's the reason it needs to be a separate script? --WikiTiki89 21:02, 12 March 2018 (UTC)
It wouldn't be inconsistent with there being so many language-specific versions of Arabic script. But if what's desired is different fonts for Pazend as opposed to Avestan proper, that can be achieved without a separate script code, using the CSS selector .Avst:lang(pal) (or additional selectors if other languages were also written in Pazend). — Eru·tuon 21:16, 12 March 2018 (UTC)
It's not a difference of font, but rather the addition of a character and various ligatures, so the unicode characters employed for spelling a word would be different. Incendeltly, I'll need to create a modified typing-aids module dataset. It would also be nice to call up the name in templates, ex. {{desc|pal|𐬯𐬞𐬁𐬵|sc=pal-Avst|sclb=1|tr=spāh}} and populate categories specially for with pal-Avst script requests. --Victar (talk) 21:30, 12 March 2018 (UTC)
If it doesn't need a different font, that doesn't sound like the kind of thing that needs a new script code; I mean, different languages use different subsets of Latin letters while still using just either "Latn" or "Latinx". Just make sure "Avst" covers all the characters that are used. (I wonder if we need as many Arabic script codes as we have, to Erutuon's point...) - -sche (discuss) 22:02, 12 March 2018 (UTC)
@Victar: Huh, how are the ligatures rendered if not by a separate font? — Eru·tuon 22:11, 12 March 2018 (UTC)
@Erutuon: I'm not sure how it exactly works, but e.g. Noto Sans Devanagari has different styles for Hindi and Marathi. It's one font. —AryamanA (मुझसे बात करेंयोगदान) 22:23, 12 March 2018 (UTC)
@-sche: There are stylistic differences that could be better represented in a different font, but I'm not aware of any font though specifically made for Pazend. I think the best comparison is Latn to Latf. Note that Fraktur uses the same range as Latin. @Erutuon: What's recommended is actually a silly system of using U+200C between letters to prevent ligatures, but your idea of a separate font would be much better. I wonder how the Unicode Avestan font compares to the Google Noto font. I'll have to check that out. --Victar (talk) 22:31, 12 March 2018 (UTC)
I looked into it and there are actually only two free unicode Avestan scripts, one, Noto Sans Avestan, used ligatures, and the other, Ahuramzda, does not.
Ligature Ahuramzda Noto Sans Avestan
š + a 𐬱 + 𐬀 𐬱𐬀 𐬱 + 𐬀 𐬱𐬀
š + c 𐬱 + 𐬗 𐬱𐬗 𐬱 + 𐬗 𐬱𐬗
š + t 𐬱 + 𐬙 𐬱𐬙 𐬱 + 𐬙 𐬱𐬙
a + h 𐬀 + 𐬵 𐬀𐬵 𐬀 + 𐬵 𐬀𐬵
So, we could use Ahuramzda for Avestan (Old and Younger) and Noto Sans Avestan for Pazend. The distatanges of this is that their stylistic differences aren't along historical lines, and the Avestan language does employ some ligatures which aren't in Ahuramzda. I could make a variant of Noto Sans Avestan with the Pazend ligatures removed, but I don't know the legality of that, nor do I know if we could host that variant as a webfont. --Victar (talk) 04:40, 13 March 2018 (UTC)
Could one of these CSS properties disable Noto ligatures? —suzukaze (tc) 04:45, 13 March 2018 (UTC)
@Suzukaze-c: HAH! I didn't even know that existed! It does indeed work: 𐬀𐬵. No IE support, but no surprise there. I'll have to look inside Noto Sans to see if they're using distinct suffixes for those Pazend ligatures. --Victar (talk) 05:02, 13 March 2018 (UTC)
 :D —suzukaze (tc) 05:20, 13 March 2018 (UTC)
It looks like those are the only ligatures supported by Avestan unicode anyway. So, perhaps as a first step, I recommendation that .Avst be set to font-family:"Noto Sans Avestan", "Ahuramzda";font-variant-ligatures: none and .Avst:lang(pal) set to font-variant-ligatures: normal. @Erutuon, -sche, would that work for you? --Victar (talk) 05:32, 13 March 2018 (UTC)
I'm not against adding a separate script code for it if it'll have a benefit. For most scripts, the benefit is in the form of a subscript needing a different font to display correctly (e.g. display the special non-Latn letters of Latinx at all) or accurately (e.g. when rendering Latf differently from Latn). For this, if it's not the font that's different so much as the presence or absence of ligatures, I don't know what's better. In the table above, all the words display in the same font and are identically ligatured, probably because I don't have both fonts installed yet. Does anyone know if a separate script code would result in correct display for a greater number of users than the lang(pal) approach? - -sche (discuss) 15:59, 13 March 2018 (UTC)
@Erutuon, you seem most familiar with this. Are there any advantages/disadvantages to either approach? --Victar (talk) 14:55, 14 March 2018 (UTC)
@Victar: Which approaches do you mean? Using a dedicated script code versus a combination of a script code and language code? — Eru·tuon 20:03, 14 March 2018 (UTC)
@Erutuon, yes, I believe that's what @-sche is asking, .pal-Avst over .Avst:lang(pal), estentally. --Victar (talk) 20:11, 14 March 2018 (UTC)
I believe having sub-scripts is a holdover from before CSS supported language tags. As long as we're confident that enough of a proportion of users are using browsers that support language tags, then we can start switching away from sub-scripts. --WikiTiki89 20:15, 14 March 2018 (UTC)
The only benefit I can see for a language-script script code is if multiple languages will be using the same combination of CSS properties and you want to shorten the list of CSS selectors. Otherwise, both do the same thing equally well. — Eru·tuon 20:18, 14 March 2018 (UTC)

To approach this from a different angle, is having a different font and native name enough to warrant a subscript? If not, what makes Latf and the various Arab variants different? Why does it even matter? --Victar (talk) 06:31, 13 March 2018 (UTC)

Latf is displayed in very different fonts from Latn. Some of the Arab subscripts may need different fonts to display closer to how they're written as far as letter-shapes (letters having dots, or entirely different shapes from in standard Arabic) or line-slanting, but I do wonder if we need so many. - -sche (discuss) 16:01, 13 March 2018 (UTC)

Generalizing of Japanese infrastructure[edit]

Currently we have templates like {{ja-def}}, {{ja-pos}}, and {{ja-r}}, but nothing for other Japonic languages (except for forks of {{ja-def}}). We should consider making these templates usable for other Japonic languages as well.

See also Module_talk:ja-kanji-readings#Okinawan, Template_talk:ja-readings#Separate_template_for_Okinawan, and diff.

(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take, Dine2016): and @Erutuon. —suzukaze (tc) 20:12, 13 March 2018 (UTC)

I'd definitely be supportive of that.
There are some Japonics with weird phonetics and weird spellings, like the Miyako dialect of Ryukyuan. In addition, there's at least one non-Japonic that uses katakana (Ainu) which could also benefit from this effort. ‑‑ Eiríkr Útlendi │Tala við mig 20:41, 13 March 2018 (UTC)

I support the move in principle, but probably don't have much to contribute technically. I was going to mention Ainu, but Erikr has that well in hand. Cnilep (talk) 00:41, 15 March 2018 (UTC)

Old West and Old East Norse[edit]

I would like to add a language code for Old West and Old East Norse respectively. Just like with New and Medieval Latin. I can’t seem to find the right module though. Can someone add the two? I suggest non-oen (Old East Norse) and non-own (Old West Norse). I will use this in the etymology templates.Jonteemil (talk) 16:28, 15 March 2018 (UTC)

Sounds like what you want is etymology-only language codes. You can add those at Module:etymology languages/data. --WikiTiki89 16:38, 15 March 2018 (UTC)
For Old East Norse we do already have Old Danish and Old Swedish. —Mahāgaja (formerly Angr) · talk 20:56, 15 March 2018 (UTC)

Is the war against the unified Serbo-Croatian raging on?[edit]

Is the war against the unified Serbo-Croatian raging on? Template talk:User hr. --Anatoli T. (обсудить/вклад) 01:11, 16 March 2018 (UTC)

I will repeat what I said on the talk page: there's no reason to try to make our user language templates correspond exactly to what's in Module:languages. I find it baffling that you have an objection if someone wants to say they speak Croatian and not Serbo-Croatian. DTLHS (talk) 01:13, 16 March 2018 (UTC)
The question is: if user declares with the template "this user pig meat", than he/she likes pig meat, they did not say "this user likes lamb meat". Redirecting the template to some other form is not a fair and not correct. Just for communication (talk) 01:28, 16 March 2018 (UTC)
Fixing political issues is NOT our business. Use whatever ISO says because we have nothing better. Equinox 02:45, 16 March 2018 (UTC)
I'm in favor of letting users specify whatever language variety they like in their userboxes. If Croation is going to be banned in userboxes, American and British English should be as well: they don't have full language codes or get headers in entries. — Eru·tuon 03:21, 16 March 2018 (UTC)
  • I agree. Just because we treat Serbo-Croatian as a single language for lexicographical purposes doesn't mean we can't allow userboxes to make finer distinctions. —Mahāgaja (formerly Angr) · talk 10:38, 16 March 2018 (UTC)
  • I also agree that people should be able to say whatever they want on their userpages. --WikiTiki89 12:18, 16 March 2018 (UTC)
  • This is an interesting statement because I see it and think "yes, anyone should be able to say anything on their page" (unless it's totally egregious, like posting our home addresses, or spam/propaganda without any contribs), and then I wonder why I oppose userboxes. Probably because they have a "viral" quality and people tend to copy them without thinking, and then we end up with a big infrastructure of needless rubbish. I suppose in theory I don't oppose an individual userbox. Huh! Oh well just ranting. Equinox 15:59, 16 March 2018 (UTC)
IMO, as long as "sr", "hr" etc still ultimately categorize into the "User sh" categories so the users can be found when their expertise is needed, it's fine to have Serbian-specific and Croatian-specific boxes. I think the other Babel system (that pulls from a central, off-Wiktionary repository of codes) allows them regardless. - -sche (discuss) 16:52, 16 March 2018 (UTC)
Good point. I can use {{#babel:hr}} overriding the local template or I can use a global user page on Meta including whatever Babel extention permits. In fact, there is no link to update on pages currently on Category:User hr. --Vriullop (talk) 21:05, 16 March 2018 (UTC)

This is what I dreaded when Just for communication (a.k.a. Kubura) first contacted me – I share Atitarev's fear that this might be the first inkling of yet another war against unified Serbo-Croatian. Heaven knows I've reverted my fair share of anons who have tried to change Serbo-Croatian lemmas in one direction or another, blatantly disregarding our policy. With that said, I can also agree that letting users specify which language variety they speak in their userboxes might not be such a big deal after all. As long as we can agree – unanimously – that it's something we're willing to stomach. --Robbie SWE (talk) 19:25, 16 March 2018 (UTC)

The template has been changed back to point to Croatian by User:DTLHS in diff with a summary "it is agreed". In my opinion too early in the discussion. The original poster was even offended by the term "Serbo-Croatian", calling it the "so-called", and it has been our long-fought policy for years! If we say we are apolitical, then using Serbo-Croatian/Croato-Serbian is not a political statement but a linguistic common sense. Do we stand for anything? Why have language policies, language treatment documents and modules, mergers, splits and votes? --Anatoli T. (обсудить/вклад) 01:42, 17 March 2018 (UTC)
What do "language policies" have to do with what people put on their user pages? We unify Serbo-Croatian for lexicographic convenience and nothing more. And no, something isn't apolitical just because you happen to agree with it. DTLHS (talk) 01:47, 17 March 2018 (UTC)
I tend to agree with DTLHS. To echo Mahagaja: "Just because we treat Serbo-Croatian as a single language for lexicographical purposes doesn't mean we can't allow userboxes to make finer distinctions." Any edits switching the mainspace should be dealt with separately, if and when they arise. --Dan Polansky (talk) 13:00, 18 March 2018 (UTC)
As Erutuon said, there is no reason to allow British and American varieties of English and disallow varieties in other languages. If the user feels more comfortable with Croatian or Serbian, let them feel comfortable. It does not influence the main space in any means. --Jan Kameníček (talk) 13:14, 18 March 2018 (UTC)

Vote: PseudoSkull for admin[edit]

Hi! I'm a newbie who has rarely done any work on this project, and (seriously) I can hardly tell how to create a vote and I am not sure if I've done it right. It's very hard. (Usability question?) Anyway, I think it's about time that PseudoSkull becomes an admin and here is a vote about it. Please visit Wiktionary:Votes/sy-2018-03/User:PseudoSkull_for_admin Equinox 04:53, 16 March 2018 (UTC)

If you're a newbie I must be a new-newbie. I know what you meant though. DonnanZ (talk) 22:02, 16 March 2018 (UTC)

Vote: Including translation hubs[edit]

FYI, I created Wiktionary:Votes/pl-2018-03/Including translation hubs.

Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 08:54, 17 March 2018 (UTC)

Czech noun phrases[edit]

lošák zprohýbaný should be marked as "Noun", just like černá díra and black hole should be marked as nouns. This is consistent with WT:EL, which forbids the part of speech "Noun phrase". Jan.Kamenicek disagrees. A discussion is at User_talk:Jan.Kamenicek#Noun phrases, but I was not convincing enough. I do not want to engage in a revert war. --Dan Polansky (talk) 12:06, 18 March 2018 (UTC)

  • Terms that consist of a noun and an adjective (in either order) are phrases according to our definition of phrase but we always treat then as nouns here. Please change it to a noun (I don't know if it has a plural). If the other person continues to change it to a phrase, I'll give him a short block. SemperBlotto (talk) 12:14, 18 March 2018 (UTC)
  • I do oppose such solution for Czech entries. I understand that English nouns may include noun phrases as well, but Czech nouns do not. General understanding is that only single-word expressions can be nouns in Czech language, which I explained in detail on my talk page. --Jan Kameníček (talk) 12:24, 18 March 2018 (UTC) E. g. houpací křeslo (rocking chair) is always analyzed either as adj + noun, or as a noun phrase, but never as a single noun. --Jan Kameníček (talk) 13:21, 18 March 2018 (UTC)
    • Czech and English are exactly the same as for noun vs. noun phrase, as I pointed out. No reason to treat Czech different from English or German. A dictionary treatment of a part of speech is not necessarily the same as a general linguistic treatment. The English linguistics does distinguish NP from N, no question about it. --Dan Polansky (talk) 12:27, 18 March 2018 (UTC)
      • Yes there is a huge reason to treat it differently, and that is that linguistic sources on Czech nouns use it differently and so do all people no matter whether they are linguists or laypeople. It is very confusing for readers if they meet here an attitude that is so different from what they are used to in real Czech language usage as well as Czech language textbooks. --Jan Kameníček (talk) 12:33, 18 March 2018 (UTC)
        • The first-time users of Merriam-Webster may experience the same confusion. But the confusion quickly withers away; they get used to it, some of them realizing that it is a part-of-speech classification and that it makes sense from a dictionary point of view. Czech = English as for linguistic sources distinguishing NP and N; no difference here. --Dan Polansky (talk) 12:47, 18 March 2018 (UTC)
          • To argue about classification of Czech words we should seek something that deals with classification of Czech words, which Merriam-Webster is not.
          • Here is also a link to an English book on Czech language that also differentiates between nouns and noun phrases. These two have always been understood as different things when analyzing Czech language and so it should be mirrored in Wiktionary as well.
          • It is not true that it has to be a part of speech classification, as various different headings are allowed, such as Phrase (which is the one I used), Prepositional phrase, Proverb, Suffix and many more...
          • The confusion does not wither away with non-regular users. --Jan Kameníček (talk) 13:00, 18 March 2018 (UTC)
            • I will break it down.
            • 1) General English linguistic sources about English distinguish NP from N.
            • 2) English dictionaries do not distinguish NP from N.
            • 3) The English Wiktionary has decided to abolish the distinction between NP and N, for all languages. It did so in keeping with 2).
            • 4) We do not have any example of a Czech dictionary that has černá díra, and ranks it either as NP or P.
            • 5) General Czech linguistic sources about Czech distinguish NP from N, similar to 1). No surprise here.
            • 6) There is no grammatical difference between černá díra, black hole and schwarzes Loch.
            • 7) ----- Therefore -----
            • 8) Let us enter Czech in a way consistent with 3). Let us do so until the decision made in 3) is reverted via general consensus of the English Wiktionary.
            • 9) Consistent with 8), please change lošák zprohýbaný to Noun, and leave it like that until you convince other people to change 3).
            • --Dan Polansky (talk) 13:34, 18 March 2018 (UTC)
              • Ad 1) and 2) Not applicable for Czech expressions.
              • Ad 3) I avoided using the heading "Noun phrase" which was rejected and used just the allowed "Phrase", although it is probably meant for different cases. I believe it is a good compromise until it is agreed whether "noun phrases" are allowed to have their own heading at least in Czech entries. If not, I would be happy just with "Phrase", too: it is not ideal, but at least it is not wrong and confusing.
              • Ad 4) Easy to explain, most dictionaries of Czech expressions do not have various phrases as separate entries, but among phrases and collocations connected with individual single words. Despite this there is evidence how the dictionary authors understand Czech nouns:
                • c) I have never seen a dictionary of Czech expressions marking noun phrases as nouns.
                • b) My Czech-English dictionary (by Josef Fronek, 2000) has loads of examples that suggest the authors do not consider noun phrases to be nouns. Every entry there has marked its POS. If they want to change it to a different POS within the same entry, they always mark it again. However, when I look up elektrický (electric) marked as adj., they have also got there elektrické křeslo (electric chair) within the same entry without marking any change of POS to a "noun". Instead they have got it among other phrases and collocations of the adjective electric with other words. If they considered it a noun, they would mark it so.
                • c) My dictionary of Czech phraseology, part on non-verbal expressions, contains entries many of which are noun phrases. Although they never mark any POS (which can also be understood as evidence that they do not consider them to be PsOS but phrases), in various comments in the preface and other chapters they directly call them noun phrases and never nouns.
              • Ad 5) And so do English sources about Czech.
              • To sum up my arguments again: various sources on both Czech lexicography and Czech language generally do not consider Czech noun phrases to be nouns (but phrases consisting of words of various PsOS) and so do also laypeople. Because the heading "Phrase" as such is not disallowed in Wiktionary, I hope it can stay (although allowing "Noun phrase" would be even better). --Jan Kameníček (talk) 17:30, 18 March 2018 (UTC)
Perhaps an argument can be made for using "Noun phrase" instead of "Noun", but this would have to apply to all of Wiktionary, Czech is not exceptional in this matter. Crom daba (talk) 17:56, 18 March 2018 (UTC)
Thank you. One difference I can see is that while Czech noun phrases are not understood as nouns, English ones sometimes are. General solution of allowing "Noun phrase" headings would be great (and I believe it will happen one day), but if such consensus does not occur, tollerating just the heading "Phrase" to Czech entries without specifying what kind of phrase it is would suffice. --Jan Kameníček (talk) 19:06, 18 March 2018 (UTC)
I agree with Semper, we treat these as nouns because they function as nouns, can be replaced with single-word nouns without changing the grammar of the sentence, etc. There does not seem to be anything different about these terms in Czech versus Polish, French, German, English, etc that would justify treating them differently; to Jan's point about "confusing" Czech-speakers I would counter that it is likely confusing for non-Czech people (who seem more likely than Czechs to be looking up English definitions of Czech words, i.e. using en.Wikt instead of cs.Wikt) who are looking up words to see things that are clearly nouns labelled as "phrases"; certainly, it seems wrong to me, since "phrases" normally refers here to things like "a little bird told me" (and if I had encountered it without realizing there were an ongoing discussion like this, I would have simply considered it an obvious mistake and misunderstanding of en.Wikt conventions and changed it to "Noun"). - -sche (discuss) 20:01, 18 March 2018 (UTC)
I second everything -sche said. —Μετάknowledgediscuss/deeds 20:05, 18 March 2018 (UTC)
@-sche, Metaknowledge: Non-Czech speakers trying to learn Czech are not likely to use Wiktionary as the only source for learning. They are likely to use other sources and Wiktionary only as a secondary source. So Wiktionary should be in accordance with the others, not the only one which is different (and in the context of linguistics dealing with Czech language also wrong). --Jan Kameníček (talk) 20:29, 18 March 2018 (UTC)

Headings linked?[edit]

Would you consider linking headings, or some headings? e.g. Participle. Often edited as Adjective or Verb. But is both, and in most pages there is no clarification. sarri.greek (talk) 13:21, 18 March 2018 (UTC)

Some languages do have "participle" as a part of speech and POS header. But (standard) POS headers should never have links in them (I seem to recall that some "Abbreviation" or "Initialism" headers may have contained links at some point, but that is deprecated). - -sche (discuss) 20:07, 18 March 2018 (UTC)
I see, thank you @-sche:. A pity: the PoS in many pages remains unexplained (αγιοποιημένη). I was comparing to @el.wiktionary with linked Pos. I presume, that at some stage in the future, all words in wiktionary will be clickable/linked. sarri.greek (talk) 21:26, 18 March 2018 (UTC)

German case ordering[edit]

Newer grammars tend to use the order "nominative-accusative-dative-genitive", which has the advantage that the often identical nominative/accusative forms are grouped together (easier to see patterns for learners). It also reflects usage: genitive is rare and listed last. I'd like to change our templates accordingly, any objections? – Jberkel 08:59, 19 March 2018 (UTC)

A recent discussion about this. --Per utramque cavernam (talk) 09:13, 19 March 2018 (UTC)
Ah, thanks. In general editors seem to be in favor of the change (this wasn't a general discussion though). As a compromise, I could change it to the proposed order with the option of reordering it back to the "traditional" layout via the script mentioned. – Jberkel 09:34, 19 March 2018 (UTC)
Symbol oppose vote.svg Oppose, this is confusing for those of us who grew up with the classical ordering, which I suspect includes most Germans. Crom daba (talk) 13:17, 19 March 2018 (UTC)
I also learned the traditional ordering at school (ages ago), but we should think about non-native readers who want to use Wiktionary as a resource. Imagine how confused they are with our current presentation. The proposed order is already used in DaF (German ESL) and is becoming a standard in modern grammars. What is currently taught in German schools, to native speakers, I do not know (any teachers reading?) – Jberkel 14:05, 19 March 2018 (UTC)
Additionally, it doesn't fit with European case names like German dritter Fall (third case = dative), vierter Fall (fourth case = accusative), Dutch vierde naamval (fourth case = accusative), Czech čtvrtý pád (fourth case = accusative). Also the German ordering fits with the ordering of Greek, Latin, Czech and other languages. - 16:27, 19 March 2018 (UTC)
That's irrelevant; we don't have to make our system match these case names. What we do have to do is add some explanations in these entries. --Per utramque cavernam (talk) 16:37, 19 March 2018 (UTC)
  • Symbol support vote.svg Support, as it is easier, quicker to see and is more appropriate for the German case system. --Mahmudmasri (talk) 13:56, 19 March 2018 (UTC)
  • Symbol support vote.svg Support; that's the ordering we used when I was learning German. --Per utramque cavernam (talk) 15:53, 19 March 2018 (UTC)
  • Symbol support vote.svg Support; it's a more logical order, IMO, grouping similar forms. - -sche (discuss) 16:06, 19 March 2018 (UTC)
  • Symbol oppose vote.svg Oppose. We should follow the lexicographical conventions of the language. --WikiTiki89 16:43, 19 March 2018 (UTC)
  • Symbol oppose vote.svg Oppose: per Wikitiki89. This mostly seems moot given the ability to reorder with JS, but as long as de.Wikt uses the traditional ordering, I don't think we should break with them. —*i̯óh₁n̥C[5] 21:57, 19 March 2018 (UTC)
  • Symbol question vote.svg Confused: I'm quite surprised to learn that the Nominative-Accusative-Dative-Genitive ordering is "new". I learned German first from the First-year German textbook by Jedan, Helbling, Gewehr, and von Schmidt, first published in 1975 and republished in 1979 (Amazon link), and they used that ordering. Is 43 years old still "new"? ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 19 March 2018 (UTC)
    It's new in comparison to two thousand years of nom-gen-dat-acc. Crom daba (talk) 22:37, 19 March 2018 (UTC)
Forgive me for not believing that German grammars have been around for 2000 years. If you are referring to Latin, I fail to see how that has any direct bearing. ‑‑ Eiríkr Útlendi │Tala við mig 22:40, 19 March 2018 (UTC)