Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:Beer Parlour)
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit


February 2018

Incorrect by extensions at hack[edit]

"Hack" in the computer sense initially meant a creative solution. It was only later that it came to mean compromising someone's Hotmail account or stealing files from three-letter agencies. Since we have it backwards, I'm soliciting feedback for how to fix it. Thoughts? —Justin (koavf)TCM 02:51, 1 February 2018 (UTC)

I fear this is a part of a larger project. The relationships among the various senses and etymologies of hack and hacker are not at all settled. See, for example “hack” in Douglas Harper, Online Etymology Dictionary, 2001–2018.. Also, is hack ("cough") onomatopoetic? DCDuring (talk) 05:58, 6 February 2018 (UTC)
To me, the verb hack in this sense is odd. I only recall hearing it used as a noun: 10 hacks to improve something or other (where hack means "clever idea"). If someone wants to hack their love life, I would probably understand it to mean they want to stop it. —Stephen (Talk) 10:34, 6 February 2018 (UTC)
See this "hack your love life" Google Search. DCDuring (talk) 15:48, 6 February 2018 (UTC)
@Koavf, one solution would be to put the computing definitions (and their extensions) in chronological order of their development, grouping related terms together. So first "hacking" for expert coding (and thence optimising daily processes), and then "hacking" for breaching security.
Current order of verb:
  1. (transitive, slang, computing) To hack into; to gain unauthorized access to (a computer system, e.g., a website, or network) by manipulating code; to crack.
  2. (transitive, slang, computing) By extension, to gain unauthorised access to a computer or online account belonging to (a person or organisation).
  3. (computing) To accomplish a difficult programming task.
  4. (computing) To make a quick code change to patch a computer program, often one that, while being effective, is inelegant or makes the program harder to maintain.
  5. (transitive, colloquial, by extension) To apply a trick, shortcut, skill, or novelty method to something to increase productivity, efficiency or ease.
  6. (computing, slang, transitive) To work with something on an intimately technical level.
Improved order (in my opinion):
  1. (computing) To make a quick code change to patch a computer program, often one that, while being effective, is inelegant or makes the program harder to maintain.
  2. (computing) To accomplish a difficult programming task.
  3. (computing, slang, transitive) To work with something on an intimately technical level.
  4. (transitive, colloquial, by extension) To apply a trick, shortcut, skill, or novelty method to something to increase productivity, efficiency or ease.
  5. (transitive, slang, computing) To hack into; to gain unauthorized access to (a computer system, e.g., a website, or network) by manipulating code; to crack.
  6. (transitive, slang, computing) By extension, to gain unauthorised access to a computer or online account belonging to (a person or organisation).
Current order of noun:
  1. (computing, slang) An illegal attempt to gain access to a computer network.
  2. (computing, slang) A video game or any computer software that has been altered from its original state.
  3. (computing) An interesting technical achievement, particularly in computer programming.
  4. (computing) An expedient, temporary solution, such as a small patch or change to code, meant to be replaced with a more elegant solution at a later date.
  5. (colloquial) A trick, shortcut, skill, or novelty method to increase productivity, efficiency or ease.
Improved order (in my opinion):
  1. (computing) An expedient, temporary solution, such as a small patch or change to code, meant to be replaced with a more elegant solution at a later date.
  2. (computing) An interesting technical achievement, particularly in computer programming.
  3. (colloquial) A trick, shortcut, skill, or novelty method to increase productivity, efficiency or ease.
  4. (computing, slang) An illegal attempt to gain access to a computer network.
  5. (computing, slang) A video game or any computer software that has been altered from its original state.
-Stelio (talk) 10:31, 21 February 2018 (UTC)
@Stelio: This is beautiful and probably better than anything I could have made. May I suggest that in the future, you don't use green/red in case any of the readers out there are color blind? —Justin (koavf)TCM 10:38, 21 February 2018 (UTC)
Indeed yes, I'm aware of colour blindness as a barrier for visual comparison; I take pains to distinguish colours on graphs I put in professional presentations and fully support the Web Accessibility initiative. The headers I used are meant as the main differentiators; the colouring was just for some quick visual impact. Red-blue is a safer combination, and one I usually use; mea culpa for publishing before thinking deeper. Fixed! -Stelio (talk) 10:48, 21 February 2018 (UTC)
I've gone ahead and done that reordering, in the absence of any comments. -Stelio (talk) 09:17, 13 March 2018 (UTC)

February LexiSession: radio[edit]

This month, we suggest you to focus somehow on the words to talk about the radio.

Well, for those who do not know LexiSession yet, it is a collaborative transwiktionary experiment. You're invited to participate however you like and to suggest next month's topic. The idea is to look at other community improvements on the selected topic to improve our own pages. It already bring new collaborators to contribute for the first time on a suggested topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every Wiktionary on the same agenda, to give us more insight into the ways our colleagues works in the other projects. Face-smile.svg Noé 09:45, 1 February 2018 (UTC)

We started with the creation of a thesaurus of radio in French and a thesaurus of waves in French. I'm eager to compare them with local Thesaurus:radio and Thesaurus:waves! For the last one, I am pretty sure we will not collect the same kind of vocabulary and that can be very interesting Face-smile.svg Noé 21:00, 5 February 2018 (UTC)
Thanks for your contributions! Face-smile.svg Noé 10:49, 1 March 2018 (UTC)

Middle Japanese[edit]

A new user, Kakiyomi2 (talkcontribs), has been creating entries for Middle Japanese (or Classical Japanese as the user calls it), but we don't have a code for the language. --Lo Ximiendo (talk) 16:37, 2 February 2018 (UTC)

That's me. I've added entries for かはす, かはる, かへる, かへす, かふ, and かめ, partly just to get interest going in this. I intend now to leave it some months to see what feedback it generates before starting to add more from my extensive materials.--Kakiyomi2 (talk) 17:18, 2 February 2018 (UTC)

@Eirikr, suzukaze-c, TAKASUGI Shinji, Wyang, Nibiko IMO they're beautiful entries. I would prefer the name Classical Japanese, as that's what much of the literature seems to call it, perhaps a code jap-cls? DerekWinters (talk) 18:14, 2 February 2018 (UTC)
For historical and political reasons, jap is generally eschewed in favor of ja or jp where a two-letter code might suffice, and jpn, where a three-letter code is needed.
In monolingual Japanese sources, the stage of the written language from roughly the Heian period (800s) through to the Meiji period (late 1800s) is broadly described as 文語 (bungo, literally literary language), in contrast to 口語 (kōgo, spoken language, vernacular, literally mouth language). There is precedent for using some variation of the term literary, as in the three-letter code ltc for Middle Chinese / Classical Chinese (presumably derived from literary Chinese). By extension, I'd prefer ltj if we can use just a three-letter code. If we need a 3-3 code, I'd suggest jpn-ltj. ‑‑ Eiríkr Útlendi │Tala við mig 19:24, 2 February 2018 (UTC)
PS: To my knowledge, the ISO only has codes for Old Japanese (ojp) and Japanese (ja). I'm not aware of any extant standardized codes for anything in between circa 800 and the modern era. ‑‑ Eiríkr Útlendi │Tala við mig 19:30, 2 February 2018 (UTC)
We do not get to make up two or three letter codes. Japanese is ja as a two-letter, ISO 639-1 code, and jpn as a three-letter, ISO 639-2/3 code. "jap" is an obsolete code for Madi, now merged into Yamamadi (jaa).
If we need a three letter code, qaa–qtz are reserved for local use. "ojp covers "7th-10th centuries AD", according to the Linguist List, which basically controls the extinct section of ISO 639-3. Japanese is listed as the child of Old Japanese, so presumably it covers everything from then until now.--Prosfilaes (talk) 21:50, 2 February 2018 (UTC)
Right, we can't make up a two- or three-letter code (because the ISO might later assign it, and besides it'd be confusing). If a code is needed, the customary naming scheme, described in Wiktionary:Languages, is to use the nearest ISO family code and then three letters that approximate the language named, so the code should be "jpx-ltj" if we call it "Literary Japanese", or "jpx-mja" if we call it "Middle Japanese", or something else starting with "jpx-". - -sche (discuss) 23:04, 2 February 2018 (UTC)
Re: codes, thank you both for the pointers. I dimly remembered that there was a mechanism for creating our own codes (the prefix of three letters from qaa through qtz), but I encounter such issues so rarely that I couldn't recall any useful details. I'm happier with the jpx- prefix, as that's a lot easier to remember than anything beginning with q.
Re: dating, there's some terminology confusion. OJP is variously described in English as including everything textual prior to the Heian period (i.e. 794 and before), or up through the end of the Heian period (1185), or until some relatively arbitrary point in the middle of the Heian period (probably where Linguist List gets its dating). For EN WT purposes, so far as I've understood it, we're using the earlier dating, in alignment with Japanese sources. The main inflection point in the development of the language is the loss of certain vowel distinctions recorded using w:Jōdai Tokushu Kanazukai, which shift was apparently complete by the start of the Heian period. The EN WP article on w:Early Middle Japanese describes some of this in more detail. (NB: Anything pre-historic, i.e. before the first texts, is usually described as Ancient Japanese, Proto-Japanese, or Proto-Japonic.) ‑‑ Eiríkr Útlendi │Tala við mig 00:32, 3 February 2018 (UTC)
It is not bad at all to use ojp if there is no other approprite code. In Middle Ages spoken Japanese changed but written Japanese stayed similar. — TAKASUGI Shinji (talk) 01:12, 3 February 2018 (UTC)
@Shinji: If ojp implies Jōdai Tokushu Kanazukai and the underlying vowel distinctions, then using ojp for later developments of Japanese could be unnecessarily confusing, no? ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 5 February 2018 (UTC)
@everyone who finds this thread relevant: I did some reformatting of the provisional entry at かへる, to: 1) bring the formatting more in line with WT entries in general and other JA entries more specifically; 2) add in kanji usage information in a way that mirrors JA WT and other monolingual dictionaries. I'm less certain about #2, since what I added isn't deeply researched (mostly I wanted to provide a quick-and-dirty visual example), and it's based on modern sources and historical kanji usage can be very divergent. Thoughts? ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 5 February 2018 (UTC)
Also, I was just looking at Okinawan today, and the literature suggests an Old (first documented until early 1600s), Middle (1600s to 1800s), and Modern stratification. If you all think it's appropriate, we could add those languages as well. DerekWinters (talk) 18:16, 2 February 2018 (UTC)

Related: Status of hiragana entries[edit]

For modern Japanese, hiragana entries are only ever soft redirects to the kanji spellings (except for those words that have no associated kanji).

The かへる entry in its current state is laid out as the lemma for the classical form of modern 帰る (kaeru, to return, to go back to one's starting point, to go home, intransitive).

I see a few problems with this.

(For our readers unfamiliar with the Japanese language, I hope the above makes clear some of the lexicographical challenges inherent in the language and its writing system.)

I feel rather strongly that we should have the same policy for both modern Japanese and older stages of the language with regard to choosing lemma spellings.

I'd suggest one of two approaches:

  • Align Middle / Classical Japanese practice with modern Japanese, and use the kanji spellings for the lemma, with hiragana entries as soft redirects.
  • We already have this practice, editors are used to it, and we can likely repurpose a good bit of the supporting infrastructure (templates, modules, etc.).
  • As illustrated above, using kanji for the lemmata obscures the fact that, in many cases, we have one word spelled in multiple ways, with each spelling imparting a shade of meaning, but not fundamentally altering the basic theme.
  • We must also either duplicate a lot of data, or arbitrarily choose one kanji as the "main" and create the others as soft-redirect "alternative form" entries. This can also obscure relationships between senses and spellings.
  • When one kanji spelling has multiple readings, and more than one reading belongs to the same category, only the last reading on the page actually gets added to the category. This appears to be a fundamental flaw in the underlying MediaWiki database software. See 避く (saku, yoku, to dodge, to avoid) as one such example -- although both readings are marked for inclusion in [[Category:Japanese_shimo_nidan_verbs]], only the yoku reading actually appears on that page.
  • Drastically rework our approach to Japanese to use hiragana spellings as the lemma, breaking each derivation out under its own ===Etymology=== section, indicating on each sense line which kanji spelling is most commonly used. Kanji-spelling entries would instead be stubs redirecting to the hiragana entries.
  • This aligns with the common practice of monolingual Japanese dictionaries, including JA WT, and is also closer to how many bilingual dictionaries function.
  • This is easier for learners, who may know how a word sounds (and can thus work out the hiragana), but might not know how to spell it in kanji.
  • This is easier for learners when looking for the various meanings that might apply to a particular verbal or otherwise-non-kanji context. In our current setup, unless the hiragana entries include glosses for all the kanji spellings, users have to click through each separate spelling to try to find the appropriate meaning. Maintaining glosses in multiple places can be difficult.
  • Categories will index more appropriately. While a single kanji spelling might have multiple readings that must all be indexed within the same category (but cannot be due to the software), a single hiragana spelling is already the reading, and will thus only need to be indexed once within the same category.
  • We'd need to rework all of our existing entries and infrastructure.

In terms of simple numbers of pros versus cons, it seems clear that hiragana spellings would be the better choice. However, that one con is a huge one. If we were starting from scratch, I'd definitely argue wholeheartedly that we go that route. Given the current state, I still argue in favor of hiragana spellings, for both modern Japanese and older forms, albeit with an awareness of the enormousness of the work required to convert our existing entry base.

Thoughts? ‑‑ Eiríkr Útlendi │Tala við mig 20:41, 2 February 2018 (UTC)

If hiragana spellings are the lemmas in monolingual Japanese and many bilingual dictionaries, then I think that already speaks in favour of that approach. —Rua (mew) 21:04, 2 February 2018 (UTC)
Some past suggestions:
 :) Wyang (talk) 07:20, 3 February 2018 (UTC)
I also like hiragana entries as far as Classical Japanese is concerned. Modern Japanese officially uses mixed spellings and we can very easily check real usage. — TAKASUGI Shinji (talk) 07:41, 3 February 2018 (UTC)

POS headers[edit]

In Wiktionary:Entry_layout#Part_of_speech it says "Some POS headers are explicitly disallowed:" which includes "Abbreviation" and "Initialism" but it doesn't suggest what should be used instead in those cases. And of course they end up in a cleanup category. DonnanZ (talk) 14:04, 3 February 2018 (UTC)

Please use the actual part of speech as if it were a normal word or phrase used in the same context, like PC (personal computer) is "Noun". SNAFU has both "Phrase" and "Noun". --Daniel Carrero (talk) 14:12, 3 February 2018 (UTC)
OK, I guess that can be done, but a guidance note there wouldn't go amiss. Another odd one is "Symbol" which isn't even mentioned, but is used at Translingual CH. DonnanZ (talk) 14:20, 3 February 2018 (UTC)
Absolutely, I support having some guidance note. But actually "Symbol" is mentioned, in the part that says "Symbols and characters: Diacritical mark, Letter, Ligature, Number, Punctuation mark, Syllable, Symbol". --Daniel Carrero (talk) 14:25, 3 February 2018 (UTC)
Heh, so it is, even though CH may be an abbreviation.... DonnanZ (talk) 14:37, 3 February 2018 (UTC)
I think it makes sense to say that "CH" is a "Symbol", because chemical elements and formulae are not phrases and so don't use nouns; they use letters as symbols, which may be in diagrams as opposed to text. That's just my personal interpretation. Feel free to disagree if you want. Aside from that, it seems normal in the English Wiktionary to use "Symbol" for chemical elements and formulae, so at least it's consistent if nothing else. --Daniel Carrero (talk) 16:10, 3 February 2018 (UTC)
Another one: Even though "Idiom" is specifically disallowed, it can still be selected if you use NEC (new entry creator). I didn't check if there are other disallowed ones there, I think there are. DonnanZ (talk) 14:42, 4 February 2018 (UTC)
What should be used instead of Idiom? There are many such lexical items in Japanese that don't fit into POS categories (they aren't nouns, verbs, adjectives, etc, but rather four-character set phrases in some cases, or whole sentences in others). ‑‑ Eiríkr Útlendi │Tala við mig 18:20, 7 February 2018 (UTC)
Many entries use "Definitions", which doesn't have a voted consensus. What should be done with those? —Rua (mew) 18:25, 7 February 2018 (UTC)
So far I've only encountered a Definitions header in Chinese entries, for which the Chinese editor community has made a strong argument (as I've understood it, largely based on Chinese terms not fitting nicely into POS categories). Is this header in use in the entries of any other languages? ‑‑ Eiríkr Útlendi │Tala við mig 18:29, 7 February 2018 (UTC)

Cuneiform and Unicode[edit]

I tried looking up a sentence in Hittite, but I've had trouble finding it. I've began to suspect that the signs don't match. Look at the thirty third line in KBo 6.3 i.

The text is transliterated as:

ku-iš-ma-kán ke-e-el tup-pí-aš 1-an-na me-mi-an wa-aḫ-nu-zi na-an-kán ku-u-uš li-in-ki-ya-aš DINGIR.MEŠ-eš ar-ḫa ḫar-ni-in-kán-zi.

If we use the characters provided by Unicode we get:

𒆪𒅖𒈠𒃷 𒆠𒂊𒂖 𒁾𒁉𒀸 𒁹𒀭𒈾 𒈨𒈪𒀭 𒉿𒄴𒉡𒍣 𒈾𒀭𒃷 𒆪𒌋𒍑 𒇷𒅔𒆠𒅀𒀸 𒀭𒈨𒌍𒌍 𒅈𒄩 𒄯𒉌𒅔𒃷𒍣.

Apparently the cuneiform characters given by Unicode are wrong. I've recognized some of the erroneous signs, including "e", "ku", "an", "kan". I've also noticed that the Neo-assyrian sign for "an" mentioned here matches perfectly the Hittite "an" sign, while here in Wiktionary we write "an" as "𒀭". It looks as if Hittite was written in Neo-assyrian, but instead we are writing it in some earlier stage. As far as I've read it seems like Unicode doesn't have signs for the Neo-assyrian cuneiform.

The site assyrianlanguages.org also offers Hittite texts with their corresponding transliteration. I don't know what we're supposed to do in this kind of situation. --Tom 144 (𒄩𒇻𒅗𒀸) 15:05, 4 February 2018 (UTC)

Btw, wikipedia's article w:Hittite cuneiform is also based in Unicode. --Tom 144 (𒄩𒇻𒅗𒀸) 15:24, 4 February 2018 (UTC)

I am afraid this is a problem with all paleoscripts. There is not a Hittite Unicode but a cuneiform script not specific to any alphabet. Unicode block for cuneiform script does not cover all variants of different alphabets nor its allophones, only a standard representation of each glyph based on the most common language. Generally it is hard to reproduce an original inscription in any paleoscript with Unicode. --Vriullop (talk) 12:27, 5 February 2018 (UTC)
@Vriullop: In that case, should we use pictures as in Egyptian?--Tom 144 (𒄩𒇻𒅗𒀸) 01:42, 6 February 2018 (UTC)
Probably. Doing so should be easier for Hittite than for Egyptian, if the Hittite signs are not stacked in various ways like hieroglyphs are: we could probably just get images of every variant sign and make a template which would convert input text like "fu bar2" into the sign for "fu" and the second of two variant signs for "bar". Although he's busy (aren't we all?), I think @JohnC5 had interest in doing something similar for Italic languages and might be interested in this idea. Wikimedia Commons hopefully already posseses all the needed images. - -sche (discuss) 01:51, 6 February 2018 (UTC)
FWIW, I am just doing it for Iberian script: ca:Viccionari:Escriptura ibèrica. --Vriullop (talk) 09:43, 6 February 2018 (UTC)
I'm afraid we don't have them all. I've looked up some online sources but they contradict each other. I'm thinking in trying to contact some known author, or would this be too much?--Tom 144 (𒄩𒇻𒅗𒀸) 23:20, 6 February 2018 (UTC)
You could try to make them yourself if you have the necessary expertise. DTLHS (talk) 23:25, 6 February 2018 (UTC)
@DTLHS: You mean as drawing and uploading them? --Tom 144 (𒄩𒇻𒅗𒀸) 15:17, 7 February 2018 (UTC)

Wikimania 2018 call for submissions now open[edit]

On behalf of the program commmittee of Wikimania 2018 - Cape Town, we are pleased to announce that we are now accepting proposals for workshops, discussions, presentations, or research posters to give during the conference. To read the full instructions visit the event wiki and click on the link provided there to make your proposal:


The deadline is 18 March. This is approximately 6 weeks away.
This year, the conference will have an explicit theme based in African philosophy:

Bridging knowledge gaps, the ubuntu way forward.

Read more about this theme, why it was chosen, and what it means for determining the conference program at the Wikimedia blog. Sincerely, Wittylama 08:22, 5 February 2018 (UTC)

News from French Wiktionary[edit]

Logo Wiktionnaire-Actualités.svg


January issue of Wiktionary Actualités just came out in English!

Actualités reach the stars! Despite a missing admin, this edition offer you four articles: about thesauri in French and English wiktionaries; new words coined by French government; a funny dictionary and the suffix -gate. Surrounded by shiny pictures are the shorts, galactic stats, nice videos and a note about the last LexiSession. Big news: we include the stats for the quantity of pictures included in French Wiktionary and we plan to reach 100.000 this year!

This issue was written by eight people and was translated for you by Pamputt and I. This translation may be improved by readers (wiki-spirit) like it was last month by Stephen G. Brown and Xbony2 (thanks mate!). We still receive zero money for this publication and your comments are welcome. To celebrate this new year, I worked on a description of our workflow to explain how we do our journal, and I translated it in English for you! I'll be happy to help if you want to start your own journal here in the future Face-smile.svg Noé 20:53, 5 February 2018 (UTC)


Currectly Wiktionary:Criteria_for_inclusion have a "Formatting" section, but its content is unrelated to whether a term should be included. I propose to remove this section.--Zcreator (talk) 22:31, 5 February 2018 (UTC)

I agree- does this need a vote? DTLHS (talk) 23:08, 5 February 2018 (UTC)
do we want to move it to anywhere, like WT:ELE? - -sche (discuss) 01:38, 6 February 2018 (UTC)

User:Rua removing information from Portuguese entries[edit]

About a week ago, Rua tried to remove the distinction between “epicene” nouns and nouns and those with sociolinguistic variation in gender usage from the Portuguese noun headline module. This caused {{pt-noun}} to display incorrect information and filled Category:Portuguese nouns with varying gender with thousands of entries that were never intended to be there. She had already tried to do this many times, and as before I had to stop what I was doing to write a hasty fix.

More recently, she speedied {{pt-noun-form}}. The reason given was “Deleted per RFD, RFDO”. Where is the RFD? I recall that some HWL templates that were redundant to {{head}} were RFDed, but pt-noun-form was not completely redundant: it had a paramater that made it display information about metaphonic plurals and add the entry to the appropriate category. Rua had her bot convert the template, in some cases manually removing said parameter. As a result, Category:Portuguese metaphonic plurals is now empty and the information about metaphonic plurals is gone.

As many here may remember, Rua (then CodeCat) was pulling the same crap on our Thai content a while back, trying to meddle with languages that she doesn’t contribute to nor understands. I hope I can avoid a drama as big as that one, but I confronted her about the removal of information and she didn’t respond, so I have to post here. The bot guidelines say that an operator must undo damage caused by their bot, and that’s what she should do. — Ungoliant (falai) 13:34, 6 February 2018 (UTC)

Rua has committed three fouls, by my count. She has removed valuable information from entries, which is easily solved by doing a bot run to undo what she did. Secondly, she deleted a template that does not seem to have been RFDed (or at least it was not linked to in any RFDO discussions) as having failed RFD, which is a misrepresentation. Thirdly, she did not respond to Ungoliant's question, which is irresponsible both as an admin and as a conscientious fellow editor. These are all related to problems in the past, which she swore would be nullified by her seeking consensus. @Rua, Chuck EntzΜετάknowledgediscuss/deeds 17:19, 7 February 2018 (UTC)
The edit summaries on past revisions of MOD:pt-noun are definitely troubling. —AryamanA (मुझसे बात करेंयोगदान) 18:03, 7 February 2018 (UTC)
Spite fights like with "Redoing the work that was too hard for poor Ungoliant" are not what edit summaries are for, that's my two cents. mellohi! (僕の乖離) 20:57, 7 February 2018 (UTC)
I would like to update that she has edited this discussion page after this was posted, and therefore is actively ignoring this thread. —Μετάknowledgediscuss/deeds 21:50, 7 February 2018 (UTC)
I have no interest in participating in a show trial. I can only make it worse, so I'm staying quiet and awaiting the inevitable storm. —Rua (mew) 23:05, 7 February 2018 (UTC)
@Rua: No, you can make it better. I listed the solutions in my earlier comment. To make it abundantly clear: if you restore the template and do a bot run to replace it where you removed it (or choose another solution for denoting metaphonicity in those entries, it need not be that particular template), and acknowledge that you made a change without consensus and will avoid that in the future, I (and I expect everyone else, including Ungoliant) will be satisfied. The storm is not inevitable unless you choose it to be. There would never have been a BP thread if you had responded to the message on your talk page, and the thread need not continue if you choose to fix the problem. —Μετάknowledgediscuss/deeds 00:44, 8 February 2018 (UTC)
This is what I mean. All eyes are on me, I'm the only one who does everything wrong. —Rua (mew) 13:23, 8 February 2018 (UTC)
That's a straw man. You removed lexicographical information from dictionary entries, so this is a clear-cut issue. You can choose to fix it, or indulge in an ill-advised persecution complex. —Μετάknowledgediscuss/deeds 15:29, 8 February 2018 (UTC)
Category:Portuguese nouns with metaphonic plurals. Happy now? If you want me to cooperate in the future, I suggest cooperating with me, too, rather than working against me all the time. Good will is earned. —Rua (mew) 15:37, 8 February 2018 (UTC)
No, I'm not happy. There are entries where you removed the information and have not replaced it. It used to display (show the reader the information) on the plural form, which is the actual one affected by the process; now all we have is categorisation of the lemma (with no display there either). Indeed, it is true that you have to earn good will. —Μετάknowledgediscuss/deeds 15:55, 8 February 2018 (UTC)
@Rua, you are ignoring me. You have not fixed the problem, but instead did a little bit of work toward fixing it. That is not acceptable. —Μετάknowledgediscuss/deeds 01:55, 10 February 2018 (UTC)
@Rua, this won't just go away. I am concerned by your behaviour here. —Μετάknowledgediscuss/deeds 17:03, 13 February 2018 (UTC)
I'd like to mention that both Rua and Ungoliant engaged in wheel-reversion far more times before seeking community help than good practice would permit. Please try to be more aware about becoming trapped in unproductive behaviour. Korn [kʰũːɘ̃n] (talk) 10:31, 8 February 2018 (UTC)
What reason would CodeCat/Rua have to make a corrective action? Similar fait accompli have worked for them in the past. After Wiktionary:Votes/sy-2017-11/Desysopping CodeCat aka Rua failed spectacularly, why should CodeCat/Rua bother? Their best strategy for them is to do nothing, and continue in the same vein as they did in the last multiple years. I do not remember CodeCat ever fixing after themselves anything for which it turned out there was no consensus, but my memory may fail me. I mean, if you misbehave for multiple years, and after that, the community gives you a resounding approval, why would you, at that point, introduce a behavior change? --Dan Polansky (talk) 10:32, 18 February 2018 (UTC)

Talahedeshki and Tudeshki[edit]

Does anyone know if the Iranian dialects Talaxedeškí[1]/Talakhedeshk[2]/Talahedeshk[3] and Tūdeški[4] are one in the same? --Victar (talk) 10:52, 7 February 2018 (UTC)

OK, I figured out Talahedeshki is an old dialect of Gurani, and not the same as Tudeshki, a Kermanic dialect. Mystery solved. --Victar (talk) 17:03, 7 February 2018 (UTC)


It would be a great if we had a code for Proto-Munda, the common ancestor of the Munda languages (mun). There are a lot of reconstructions here. @-sche. —AryamanA (मुझसे बात करेंयोगदान) 16:16, 8 February 2018 (UTC)

Yes; Yes check.svg added. - -sche (discuss) 16:25, 8 February 2018 (UTC)

Learned borrowing category and template[edit]

How long has this been around? I just noticed it on an Armenian entry only now. Someone should have told the word dewd about this. This can actually be a useful distinction in the case of some languages, like Romance ones, and I've been particularly interested in using something like this for Albanian to distinguish between terms it borrowed/took from Vulgar Latin in ancient times as a natural process of prolonged interaction versus the much later learned borrowings from Classical Latin in the last couple of centuries. One reason why I've been using the der template for ancient Latin loans as opposed to calling them explicitly borrowings, since the process was much different. But now I don't have to do that. Word dewd544 (talk) 22:22, 10 February 2018 (UTC)

@Word dewd544: Which entry? —Justin (koavf)TCM 08:43, 11 February 2018 (UTC)
Like կիթառ (kitʿaṙ), for example. Word dewd544 (talk) 16:27, 11 February 2018 (UTC)
@Word dewd544: We'd have thousands of entries to fix, that's why I've never bothered with it... But yes, it could be an interesting distinction. I wish it were developed a bit more in the documentation page though. --Per utramque cavernam (talk) 11:57, 14 February 2018 (UTC)
Yeah, I know. I don't have the stamina to start using it for many languages. There's too much to do. For most other languages it's understood that borrowings from Latin were learned. Albanian was just a unique case since there were at least two distinct "layers" or periods of borrowing/incorporation, the first of which happened organically in the distant past, sometimes from vulgate terms that weren't even fully attested. And Armenian can be an applicable language too, I guess, although just using the regular 'borrowed' from Old Armenian wouldn't really be that different. Same can go for modern Greek words borrowed from its Ancient counterpart. I also agree that it should've been described in more detail in the doc page; that could have been useful. I guess it never really took off. Word dewd544 (talk) 22:02, 14 February 2018 (UTC)

Allowing automatic transcription of Khmer terms[edit]

Can an administrator or template editor please add ["km"] = "Module:km" to the list phonetic_extraction on Module:links (with a comma after the Thai line)? The relevant discussion can be found at Wiktionary talk:Khmer romanization. Thanks! Wyang (talk) 13:04, 13 February 2018 (UTC)

You know you're an administrator right? DTLHS (talk) 16:56, 13 February 2018 (UTC)
@DTLHS: I believe that Wyang is hoping to avoid reigniting conflict with Rua. —Μετάknowledgediscuss/deeds 17:00, 13 February 2018 (UTC)
Perhaps one way to do that: @Rua, any problems with this change? - TheDaveRoss 19:03, 13 February 2018 (UTC)
@TheDaveRoss: Rua has not been active for a couple days, their last edit was on this page in the section "User:Rua removing information from Portuguese entries".
I think the edit request just preserves the status quo, so it should be okay. —AryamanA (मुझसे बात करेंयोगदान) 23:25, 13 February 2018 (UTC)
I agreed to not editing relevant modules, so I was hoping another admin or template editor could make the change. Hopefully it doesn't take too long. Wyang (talk) 21:51, 13 February 2018 (UTC)
Can this please be added? Wyang (talk) 07:15, 17 February 2018 (UTC)
Resolved. Wyang (talk) 06:26, 18 February 2018 (UTC)

Western Canadian Inuktitut (ikt)[edit]

Can we rename this to Inuvialuktun? This is the name that I believe is more commonly used throughout the literature (especially modern literature) and is simpler than Western Canadian Inuktitut. DerekWinters (talk) 08:32, 15 February 2018 (UTC)

Well-spotted. I agree it should be renamed. It looks like about 70 entries (translations tables, modules, etc) will be affected. I can rename it in a day or so, if no-one wants to beat me to it (feel free to!), or raise objections. - -sche (discuss) 09:51, 15 February 2018 (UTC)
@-sche: Thank you! DerekWinters (talk) 10:11, 15 February 2018 (UTC)
@-sche Just pinging as a reminder, for once you're done with all the Polynesian mess that's going on. DerekWinters (talk) 22:29, 15 February 2018 (UTC)
Yes check.svg Done. - -sche (discuss) 13:46, 16 February 2018 (UTC)

Proto-Central Malayo-Polynesian[edit]

In accordance with this discussion from three years ago, I've changed all languages formerly called Central Malayo-Polynesian languages to being Central-Eastern Malayo-Polynesian languages. But we do still have CAT:Proto-Central Malayo-Polynesian language with several lemmas and many words in various languages said to be derived from those lemmas. So my questions:

  1. Do we want to eliminate Proto-CMP (plf-pro) and replace it with Proto-CEMP (poz-cet-pro)? Some of the Proto-CMP entries already have identically spelled Proto-CEMP correspondents; but the others would have to be moved.
  2. If so, is someone with a bot willing to change all instances of plf-pro to poz-cet-pro in mainspace?

Pinging @Amir Hamzah 2008, Chuck Entz, Metaknowledge, -sche, Tropylium. —Mahāgaja (formerly Angr) · talk 14:46, 15 February 2018 (UTC)

Merging the ones that are spelled identically (and not just changing the code in mainspace entries but deleting then-redundant links like so) is a no brainer; I'll take a go at mainspace entries where CMP can be merged into CEMP that way with AWB. The other (Proto-CMP) entries should, I think, be moved, per the linked-to discussion. - -sche (discuss) 21:32, 15 February 2018 (UTC)
OK, since -sche removed the links, I've deleted all the categories and removed plf-pro from Module:languages/datax. —Mahāgaja (formerly Angr) · talk 09:34, 16 February 2018 (UTC)
As an aside, our reconstruction pages in this area have a lot of overlap, e.g. a lot of descendants are listed manually on both Reconstruction:Proto-Malayo-Polynesian/əpat and Reconstruction:Proto-Austronesian/Səpat (which were among the last remaining instances of plf, which I just removed since they were now categorizing into CAT:E). - -sche (discuss) 13:43, 16 February 2018 (UTC)

Subgroupings of Polynesian[edit]

@Amir Hamzah 2008, Chuck Entz, Metaknowledge, -sche, Tropylium again: the same discussion I linked to above also points out that we have CAT:Proto-Nuclear Polynesian language and CAT:Proto-Eastern Polynesian language but no corresponding CAT:Nuclear Polynesian languages or CAT:Eastern Polynesian langauges, thus we have two proto-languages without any descendants. How do we want to resolve this? I see two options: (1) We make Proto-NP and Proto-EP into etymology-only synonyms of Proto-Polynesian (which entails moving the existing PNP and PEP lemmas to PP), or (2) We recognize NP (poz-pnp) and EP (poz-pep) as families and start sorting languages into them in accordance with w:Polynesian languages#Languages. What do you think? —Mahāgaja (formerly Angr) · talk 20:03, 15 February 2018 (UTC)

It's never been clear to me what our end goal is in terms of grouping languages. If we want to eventually provide a code for every well demonstrated monophyletic grouping of languages, then #2 is the way to go. —Μετάknowledgediscuss/deeds 23:59, 15 February 2018 (UTC)
Providing a code for every well demonstrated and widely accepted monophyletic grouping seems to definitely be our goal for the Indo-European languages, so why not for other families? I guess what my question amounts to is this: are EP and NP well demonstrated and widely accepted as being both monophyletic and clearly distinct from general Polynesian, with the members listed on Wikipedia? —Mahāgaja (formerly Angr) · talk 09:54, 16 February 2018 (UTC)
Having codes for groupings is fine, but my opinion on the proto-languages is the same as three years ago: we do not need proto-languages with only miniscule differences from their parent as separate languages, and they are probably best treated as simply dialect labels of their parent. --Tropylium (talk) 00:32, 18 February 2018 (UTC)
I agree, but that's beside the point at this stage. We do currently have the proto-languages but not the groupings; I'm looking for agreement to add the groupings. Whether we want to remove the proto-languages is a different issue, and one I don't know enough about Polynesian linguistics to weigh in on. —Mahāgaja (formerly Angr) · talk 15:11, 18 February 2018 (UTC)
OK, I've added the language families poz-pnp (Nuclear Polynesian) and poz-pep (Eastern Polynesian). —Mahāgaja (formerly Angr) · talk 21:54, 18 February 2018 (UTC)

Need help with using javascript[edit]

Hello. I am an admin from Turkish Wiktionary. I need some help with storing javascript arrays into data files, just like we do with Lua modules. So, we have this (tr:MediaWiki:YeniMadde.js) js file which helps users who doesn't know how to create a new entry, but in it, there are some arrays used. I have also created this page: tr:MediaWiki:YeniMadde.js/Menüler.js to store all arrays. But I couldn't manage to access them from the main js file. I have read mw:Manual:Interface/JavaScript page, these are useful information, but still do not understand how can I access an array from an external js file. If anyone could help me, I would appreciate it. Thanks! ~ Z (m) 10:23, 16 February 2018 (UTC)

@HastaLaVi2: The way in which I transfer items between scripts is, in script 1, placing the items in the window object, then in script 2 loading script 1 with jQuery.getScript and using the items in a callback: jQuery.getScript(/* script URL */, function () { /* code that uses the items in this script */ }). You can see an example of this technique in MediaWiki:Gadget-AcceleratedFormCreation.js, where User:Conrad.Irwin/creationrules.js is loaded and its function window.generate_entry is used. (That's where I got the technique originally.) Maybe there is a more elegant way to do this, I don't know. I like the Lua way, in which modules don't write to the global object. 19:39, 16 February 2018 (UTC)
Thanks a lot for your response! Now I see it, actually using window object is the good way of doing this. I agree with you on that now. I am really new at this coding extensions to the wiki, but I hope to be getting better in time. So thanks again for your help! :) ~ Z (m) 10:30, 17 February 2018 (UTC)

Requesting rollback[edit]

Hi. I am already a rollbacker on Simple English Wiktionary. I am also autopatrolled here. I am trusted here and I regularly look into recent changes and revert vandalism. Therefore, I would like to request for the rollback right. Pkbwcgs (talk) 15:11, 16 February 2018 (UTC)

@Pkbwcgs: Looking at your edits, I don't actually see many which are undoing edits other than your own; most of your work seems to be fixing systematic formatting problems, which is still very helpful, thank you! Still, you've been around here and around Simple for a year and you are a rollbacker there, and I see no reason to deny this request (as another admin pointed out once, people can just undo edits or write js to acquire the same one-click functionality as the rollback feature; it's not a restricted ability the way being able to delete things or block people is), so I have granted it. - -sche (discuss) 15:49, 18 February 2018 (UTC)
Thanks. Pkbwcgs (talk) 16:01, 18 February 2018 (UTC)

Ancestor of Middle Indo-Aryan[edit]

I'd like to start a discussion about the ancestor of the various Middle Indo-Aryan lects. As stated by {{R:inc:Kobayashi:2004}} "Vedic was probably a specific dialect of Old Indo-Aryan; it was quite close to, but not identical with the language from which Middle Indo-Aryan developed." This is clearly illustrated by various archaisms found in MIA, such as no *gẓʰ-*kṣ merger in Gandhari and Pali, so to say they are descended from Vedic is demonstrably inaccurate:

What are people's thoughts on this? @AryamanA, माधवपंडित, JohnC5, Rua, -sche --Victar (talk) 03:49, 17 February 2018 (UTC)

@Victar: This issue has been raised before. Like last time, I think we should treat Vedic as representative of all Old Indo-Aryan dialects (which is the status quo now); it's just a technicality, and in 99% of cases the Sanskrit and MIA forms match perfectly. And if we take Vedic as representing all OIA dialects, it's not "demonstrably false" at all. Furthermore, MIA languages underwent later standardization where the thorn cluster Sanskrit क्ष् (kṣ) was standardized to kh (ch in Maharashtri Prakrit). For example we have Sindhi [script needed] (khã̄iṇu) and Kashmiri [script needed] (chawun) for the word you give as an example. (and oh look the Dardic matches the Sanskrit, how interesting)
Also, the layout you gave does not reconcile the Sanskrit dialects. How can *झापयति (jhāpayati) lead to क्षापयति (kṣāpayati)? The example also completely ignores the Prakrits, which are IMO equally if not more important than the languages here. It is also generally accepted that Sauraseni Prakrit is a direct descendant of Rigvedic Sanskrit. Ashokan Prakrit is missing too, which is of much greater antiquity than either Gandhari or Pali, comprising the "Early Middle-Indo-Aryan" stage. They're important if we intend to discuss the ancestor of all MIA languages.
I totally refuse to format etymologies in this manner. I make a *lot* of Hindi entries, and I am not changing anything to reconstructed Sanskrit unless it is necessary (like at Hindi झरना (jharnā), where Proto-Indo-Aryan is more than enough).
Anyway, I'll find a link to the old discussion ASAP. —AryamanA (मुझसे बात करेंयोगदान) 05:54, 17 February 2018 (UTC)
Here it is: Wiktionary:Beer_parlour/2017/August#Sanskrit_vs._Old_Indo-Aryan and Category talk:Hindi Tadbhava. I don't know why such few people responded? Anyways, given the history or our discussion of this topic, this discussion will drag on forever. Honestly, I think the status quo is good, so I'm going to be less willing to change stuff at this point. Also @DerekWinters, Kutchkutch. —AryamanA (मुझसे बात करेंयोगदान) 06:04, 17 February 2018 (UTC)
Woah, @AryamanA, slow your roll. No one is telling anyone to do anything -- I was just opening it up to dialog. I'm totally fine having "Sanskrit" represent all dialects of OIA; it's only when we start calling it "Vedic" that I find we run into a problem, which current literature would agree with. And I'm not "ignoring" Prakrits in my example. My intention wasn't to detail the whole of the IA tree; I was simply illustrating the *gẓʰ-*kṣ merger discrepancy I mentioned above it. If we ever added reconstructions for these unattested Sanskrit forms, I haven't even put thought into the transcription of it. I'm also well aware of the Sanskritization process.
I certainly would be opposed to creating a bunch reconstructed Sanskrit entries that are identical to Vedic Sanskrit, but I don't see a problem with creating Sanskrit reconstruction entries for differing ancestral dialectal forms. I also don't see a problem reflecting this dialectal form in descendent trees, if not as a separate level, perhaps on the same line, ex. Sanskrit: kṣā­pa­ya­ti, *jhāpa­ya­ti. All in all, it's not very different from what we already do for Latin. What are your thoughts on that? --Victar (talk) 07:04, 17 February 2018 (UTC)
@Victar: Sorry if my response was too aggressive, I'm just putting all my cards on the table so this discussion doesn't drag on like our previous discussion on this topic. I think Sanskrit *झापयति (jhāpayati) is unnecessary if we already have Proto-Indo-Aryan *gẓʰāpa­ya­ti. Maybe we could keep it unlinked or something, but I feel that having a full-blown entry for Sanskrit *झापयति (jhāpayati) is redundant.
That's what I would propose. —AryamanA (मुझसे बात करेंयोगदान) 14:56, 17 February 2018 (UTC)
@AryamanA: No worries. If I was to sum up your previous discussion on this topic, it was that we're treating Sanskrit as Latin, placing all forms in a developmental and dialectal continuum. I'm on board with that, but than, like with Latin, we need to address even the unattested forms. Compare *accatto to *झापयति (jhāpayati). I still take issue with calling *झापयति "Vedic" because it nullifies that whole advantage of the temporal and dialectical vauguity of a unified Sanskrit. Why not just K.I.S.S., as we do for Old French and Anglo-Norman French, and simply keep them all on the same line, as so? --Victar (talk) 18:57, 17 February 2018 (UTC)
--Victar (talk) 18:57, 17 February 2018 (UTC)
@Victar: That works perfectly! I can get on board with that. —AryamanA (मुझसे बात करेंयोगदान) 21:16, 17 February 2018 (UTC)
@AryamanA: Happy to be on the same page. =) --Victar (talk) 02:34, 18 February 2018 (UTC)
@Victar: yeah I too feel the actual ancestors of IA languages were so close to Sanskrit that distinguishing between them is often pointless. I don't oppose reconstructing Sanskrit terms if someone can. A slight problem may be posed if there's an IIR/IE etymon and we use {{inh|sa}} or {{der|sa}} in the reconstructions as it's going to cause CAT:Sanskrit terms derived from Proto-Indo-European to display unattested words. It can be resolved by entering "see kṣā­pa­ya­ti" in the etymology. -- माधवपंडित (talk) 09:49, 17 February 2018 (UTC)
As to the matter of chronology to keep in mind, Middle Indic dialects existed at the same time as Vedic Sanskrit. Even the Rig Veda has many words that clearly come from synchronic basilects spoken daily (as opposed to the conservative, ceremonial acrolect used in the Rig Veda). These dialects gave vocabulary, phonology, and morphology which appear all over the Rig Veda. It's a very frustrating issue, since Vedic Sanskrit cannot be their ancestors but existed within a dialectal continuum with them at the time of the composition of the hymns. Our lexicographical issue stems from the fact that only one dialect is recorded from this period. I'm not proposing a solution to this issue, but merely ensuring that when we talk about MI potentially “coming from Vedic,” we realize that this is deceptive because MI already existed by then. —*i̯óh₁nC[5] 11:10, 17 February 2018 (UTC)
I thought that Sanskrit is only an excellent proxy for the ancestor of Indo-Aryan languages and not the thing itself, so that we use it as such for convenience. Making reconstructed Sanskrit entries seems to me both inconvenient and technically incorrect. I feel the same way about reconstructed Ashokan Prakrit, but ultimately I believe that decisions like these should belong to those who do the work. Crom daba (talk) 13:54, 17 February 2018 (UTC)
@Crom daba: Who's to say the term Sanskrit can't refer to the collection of OIA lects, of which only one was standardized and made the prestige dialect. If we look at it that way, the Sanskrit reconstructions of other dialectical forms are perfectly correct. DerekWinters (talk) 15:39, 17 February 2018 (UTC)
We could say that if it pleases us. But there does seem to be an understanding philologically that when we speak of Sanskrit we mean a specific corpus of texts (especially when we talk of Vedic Sanskrit and so) and a certain usage of the language (as a language of Religion and higher learning), if I'm not mistaken its very name refers to this.
We could also reconstruct Old Church Slavonic or 18th-century Slaveno-Serbian or Classical Mongolian or Old Turkic, but it seems inconvenient and not necessarily correct. Crom daba (talk) 16:38, 17 February 2018 (UTC)
@माधवपंडित: If we're calling Sanskrit a OIA continuum, a Pali etymology with from {{inh|pi|sa|*झापयति|tr=jhāpayati}}, {{m|sa|क्षापयति|tr=kṣāpayati}} would be just fine. --Victar (talk) 19:41, 17 February 2018 (UTC)
One could also do something like from dialectal {{inh|pi|sa|*झापयति|tr=jhāpayati}} (compare {{m|sa|क्षापयति|tr=kṣāpayati}}), from... --Victar (talk) 22:49, 17 February 2018 (UTC)
@Victar: That's fine. However in the reconstructed Sanskrit entries, the user may be directed to the attested variation for further etymology. -- माधवपंडित (talk) 02:31, 18 February 2018 (UTC)
@माधवपंडित: Yep, see the example entry I created earlier, *झापयति (jhāpayati). --Victar (talk) 02:34, 18 February 2018 (UTC)
Shouldn't "Proto-Indo-Aryan" already cover this distinction? Or is there a language that descends from PIA, but not (pre-Vedic) Sanskrit? Crom daba (talk) 09:54, 17 February 2018 (UTC)
Also, just had a thought this morning. Ashok became Buddhist. Does that mean Pali predated Ashokan Prakrit o_O? DerekWinters (talk) 12:32, 17 February 2018 (UTC)
@DerekWinters: I think the Buddhist Canon was transcribed during or after the time of Ashoka, so it didn't really "predate" it, but was probably only a little later. An interesting thing to note is that the Girnar dialect of Ashokan Prakrit and Pali share a lot of features. —AryamanA (मुझसे बात करेंयोगदान) 14:56, 17 February 2018 (UTC)
@AryamanA: But didn't Buddha speak Pali natively (or a very similar version)? Also yeah, the Gujjars came in the from the northwest so I wonder how much of the Girnar dialect they absorbed. DerekWinters (talk) 15:29, 17 February 2018 (UTC)
@DerekWinters: Hmm, according to Masica both Pali and Ashokan Prakrit are of the same stage, the "Early Middle Indo-Aryan", along with Old Ardhamagadhi. I guess they were both spoken at the same time. And anyways, Ashokan Prakrit is not really one language, more of a pan-India group of early Prakrits that were mutually intelligible. —AryamanA (मुझसे बात करेंयोगदान) 16:07, 17 February 2018 (UTC)
@DerekWinters: We have Mitanni-Aryan. --Victar (talk) 18:57, 17 February 2018 (UTC)
If we want to have various PIA dialects as "Sanskrit", I agree with Victar that the sub-label "Vedic Sanskrit" needs to be limited to the actual attested Vedic, and not for other early forms from the same period. Merging things as Proto-Indo-Aryan instead of Sanskrit would probably work. I do not think Mitanni Aryan is a major issue here, since last I checked, the evidence to consider it a part of Indo-Aryan specifically at all, instead of simply early Indo-Iranian NOS, is pretty weak. It's clearly neither Nuristani nor Iranian, but that doesn't mean it has to be IA.
Also, minor historical phonology note: Sanskrit kṣ versus MIA jh implies PII *ĵž, not *gž (which I believe should result in kṣ ~ gh, not that I know of any examples OTTOMH). --Tropylium (talk) 00:27, 18 February 2018 (UTC)
@Tropylium: It seemed to be so but Sanskrit kṣárati (flows) is comparable to Pali gharati but c.f. Prakrit jhara-i. -- माधवपंडित (talk) 02:31, 18 February 2018 (UTC)
Huh, I guess that requires an OIA dialect that merged the "thorn" clusters by POA, but not by voice (*kš, *ĉš > *ch, but *gž, *ĵž > *jh)? Of course ch as the usual correspondence/reflex of kṣ in parts of MIA already suggests something of the sort. --Tropylium (talk) 10:17, 18 February 2018 (UTC)

Transliteration modules and 몽골어 물리[edit]

As of late, 몽골어 물리 (talkcontribs) has been editing the following obscure transliteration modules and policies:

Can anyone confirm any of this business? Also, could I get you to look at this user, @Chuck Entz? —*i̯óh₁nC[5] 21:22, 17 February 2018 (UTC)

This is a long-term problematic editor; see Special:Contributions/키르기즈스탄_공화국 for a previous incarnation. I don't know enough about these Turkic languages, although we should probably revert them. Pinging Turkic editors: @Curious, Anylai, Borovi4ok, AtitarevΜετάknowledgediscuss/deeds 21:46, 17 February 2018 (UTC)
I'm sorry, Metaknowledge, I don't know much about these languages, so I can't check their edits. -- Curious (talk) 11:32, 18 February 2018 (UTC)
They seem somewhat negligent, but put in work. Kumyk and Karakalpak changes are correct as far as I can tell (although the gsub function doesn't work that way so they aren't working as intended currently), Tofalar module change (I didn't think we had a automatic Tofalar transliteration, weird) was consistent with WT:Tofa language (which was apparently made by an earlier incarnation of the user still), but seems to have deleted h character probably by accident.
Proto-Turkic entries they made seem basically correct but also riddled with mistakes. This user requires cleaning after, but I'm hoping they can evolve into an asset. Crom daba (talk) 22:56, 17 February 2018 (UTC)
The problem is that we don't know what sources they're using, and they won't communicate with us. They're still making lots of mistakes after (at least) two years, and they've been quite disruptive at times, even apparently creating a throw-away new account to continue an edit-war (see the revision history at Module:ba-translit). They won't improve if they refuse to listen to us. Chuck Entz (talk) 02:34, 18 February 2018 (UTC)
I can only look at edits in the last 90 days, so I can't say anything about 키르기즈스탄_공화국 (talkcontribsglobal account infodeleted contribsnukeedit filter logpage movesblockblock logactive blocks), but 몽골어 물리 (talkcontribsglobal account infodeleted contribsnukeedit filter logpage movesblockblock logactive blocks) and Örümçek (talkcontribsglobal account infodeleted contribsnukeedit filter logpage movesblockblock logactive blocks) are technically indistinguishable, and all the edits in the IP ranges and for the past 90 days are technically indistinguishable from one or the other of the devices used by both of the logged-in accounts. I'll let everyone else decide what to do about it. Chuck Entz (talk) 02:34, 18 February 2018 (UTC)
He is somewhat trying to be useful but leaving lots of mess around to deal with. I believe there are a couple of accounts linked to him along with various IP addresses that do the same. Recently I created (plant) and mentioned its possible relation to *ïgač (tree). He immediately created two entries for one root. One as "ïgač" (what i mentioned) and the other as "ɨ(ń)gač" (From Starling), probably not realizing they are the same. One of the mess he leaves behind is related to orthography, looking at the entry "ɨ(ń)gač", he transliterated Old Uyghur word as "îġać" which is amusing along with other orthographies. It is rather interesting to see such dedication to add stuff so wrong. I witnessed him trying to add transliterations for Old Uyghur just by looking at some words i listed on PT pages, it seems that he has no idea and he is trying to come up with what might resemble the transliteration. He created *jāg (fat) and immediately decided that *jagɨ (enemy) should have a long /a/ as well and created that entry. He put "Starostin, Sergei; Dybo, Anna; Mudrak, Oleg (2003), “*jāgɨ”" in his reference not even bothering to pay attention the source has the /a/ short.
A lot of the time he is just inventing stuff and being annoying to deal with as he seems to be running alternative accounts. --Anylai (talk) 09:04, 18 February 2018 (UTC)
Maybe we need a Korean regular to communicate some rules to them. If they keep making more mistakes than we have the manpower to handle, blocking them might be a better option. Crom daba (talk) 11:51, 18 February 2018 (UTC)
I really hope this doesn't reflect their actual views. —suzukaze (tc) 19:28, 18 February 2018 (UTC)
As long as they preface it with "According to the controversial Anglo-Uralic theory" I have no problem with this. Crom daba (talk) 20:20, 18 February 2018 (UTC)
Reminds me of this "journal" article. I love the disclaimer: "Individual authors are responsible for facts included and views expressed in their articles". So much for peer review... Chuck Entz (talk) 07:18, 19 February 2018 (UTC)
Let me voice my opinion.
This user leaves behind mess that needs cleaning. For some time now, each of my sessions has begun with looking at my Watchlist and cleaning up what this user has done recently.
This user does not not appear to consult dictionaries, invents stuff e.g. based on cognates.
This user misses some of the fundamentals of Turkology - this is unfortunate, as s/he often edits the Etymology section.
This user won't communicate, although I have proposed him/her to register so we could communicate.
All of this is a pity. It would be nice to have a communication with this user. Would be ideal to see this user grow into a reliable and responsible editor — every contributor can potentially make a difference. Borovi4ok (talk) 08:41, 19 February 2018 (UTC)
Anglo-Uralic theory is a thing??? But why??? —AryamanA (मुझसे बात करेंयोगदान) 00:07, 20 February 2018 (UTC)
Other theories can't explain how Old English had front rounded vowels, also how Russian English use Cyrillic the same as Komy-Permyak. Crom daba (talk) 11:32, 20 February 2018 (UTC)

Appendix for English anagrams?[edit]

Although some users don't care for the anagram sections in entries, I've gotten so that I rather enjoy them (just cracked a smile, for example, when I saw that gone to the dogs and get the goods on are a pair). This is one of those quirky extras that contributes to Wiktionary's thoroughness and uniqueness.

Word lovers might also appreciate a system-maintained list of all the anagrams in English Wiktionary, perhaps in the form of an alphabetical appendix containing 2 entries for each anagram (1 for each member of the pair). I'm not a programmer, but expect that bots could probably build and maintain it. Does anyone else like this proposal? -- · (talk) 00:31, 18 February 2018 (UTC)

The gone to the dogs and get the goods on coupling is cool, admittedly. It would be equally cool to see other long anagrams. --Otra cuenta105 (talk) 21:14, 18 February 2018 (UTC)
The list is too big for one page. How should it be organized? DTLHS (talk) 22:22, 18 February 2018 (UTC)
Puzzlers' books typically order them by alphagram. Equinox 22:35, 18 February 2018 (UTC)
Perhaps we could get a new boring namespace: Anagrams:hist would include the alphagram for hits, shit, this. --Otra cuenta105 (talk) 13:29, 19 February 2018 (UTC)--Otra cuenta105 (talk) 13:29, 19 February 2018 (UTC)--Otra cuenta105 (talk) 13:29, 19 February 2018 (UTC)

Ligurian orthography[edit]

I've been adding a number of Ligurian entries as of late, and it occurred to me that there should probably be some kind of consensus on which orthography to use for the entries. The "official orthography" I use does not have an actual "official" status (it is promoted by the Académia Ligùstica do Brénno, which deals with Genoese, which - AFAIK - is the variant Ligurian is based on), and uses different levels of "accuracy".

  • Using only a few "mandatory" accents, in a colloquial context
    • on the final vowel of oxytone words of 2+ syllables
    • on the final vowel of monosyllabic words ending in two vowels, where the second one is stressed (e.g. nuâ)
    • on the verbal inflections ê ([you] are, singular), é ([he/she/it] is), ò ([I] have), à ([he/she/it] has)
  • Using only certain accents, in a formal context:
    • using all the "mandatory" accents
    • marking on all monosyllabic words ending in a long vowel (e.g. )
    • marking every ö and ò
    • marking every stressed òu diphthong
    • marking every long vowel (except for the stressed ones, unless they fall into one of the above cases)
  • Using every accent, in a didactic context (which is what I do, so that a word's phonemic realization is clear)

Since no official orthography exists, I wanted to see if anyone has any thought on this. Also, summoning @Lo Ximiendo, as a seemingly active user regarding the Ligurian language -- GianWiki (talk) 14:20, 18 February 2018 (UTC)

@GianWiki Could we make a survey for speakers of Ligurian in order to see what they think of this? --Lo Ximiendo (talk) 18:07, 18 February 2018 (UTC)
A survey is not reasonable. The best course of action is to take the standard you've selected and document it at Wiktionary:About Ligurian so other people can find it. @Ungoliant MMDCCLXIV may also find this of interest. —Μετάknowledgediscuss/deeds 18:28, 18 February 2018 (UTC)
@Metaknowledge So stick with the "didactic" level of accents, then? — GianWiki (talk) 18:46, 18 February 2018 (UTC)
@GianWiki, I'd choose the one that most closely approximates how people actually write and add more diacritics in the headword line, but I don't know anything about Ligurian. —Μετάknowledgediscuss/deeds 19:14, 18 February 2018 (UTC)
@GianWiki, in any case, we'll let you decide what to do. --Lo Ximiendo (talk) 20:51, 18 February 2018 (UTC)
@GianWiki, for etheric guidance, pray to either the Christian God, or even to Indo-European gods and goddesses such as Wotan or Freyja or whatever is appropriate. --Lo Ximiendo (talk) 20:53, 18 February 2018 (UTC)
Most Ligurian entries I added came from A Compagna, the magazine published by Académia Ligùstica do Brénno. I recall that the majority of articles were written without the extra accents but, unlike in Italian, it was not rare to find running text with didactic accents.
I have no preference. Whatever GianWiki supports I will support. — Ungoliant (falai) 11:18, 19 February 2018 (UTC)

Standardizing declension tables[edit]

Hey all, I wanted to start a discussion about the formatting of declension tables across PIE descendents. Right now, they're very unstandardized, as demonstrated here. I'm wondering if it wouldn't be a good idea to write a unified module that all theses languages can piggyback on instead. What are peoples thoughts on that? @Erutuon, JohnC5, Rua, Metaknowledge, Mahagaja, Per utramque cavernam --Victar (talk) 18:14, 18 February 2018 (UTC)

I don't see any need to standardise them, unless we intend to standardise all inflection templates across all languages (which I think would meet with a great deal of resistance). They have different aesthetics, many of which created in concert with other templates for those languages, and that's just fine. —Μετάknowledgediscuss/deeds 18:26, 18 February 2018 (UTC)
I think if you go beyond PIE descendents, it becomes more difficult to standardize. Also, I'm just talking about declension tables, not verbal inflection tables, etc. Looking past just formatting, many languages lack declension tables, and creating a sort of plug-and-play module would help remedy that. --Victar (talk) 18:52, 18 February 2018 (UTC)
This seems like a conspiracy to introduce the barbaric practice of placing the accusative before the genitive to more languages, I oppose it totally. Crom daba (talk) 19:04, 18 February 2018 (UTC)
NAGD is very reasonable for West Germanic, I'll have you know. Glory to its creator, boo Germany for not picking up on it. Jokes done, I don't see a need for unification either. The current practice allows for tailored tables and shows no major detriments. Writing some IE-module can be done without changing the current tables and the covens of the individual languages then can decide whether to migrate. Korn [kʰũːɘ̃n] (talk) 19:59, 18 February 2018 (UTC)
I think looking gross *is* a detriment. Drop shadows were cool in the 90s. —AryamanA (मुझसे बात करेंयोगदान) 22:36, 18 February 2018 (UTC)
NAGD isn't very reasonable for West Germanic. NADG would be reasonable. Accusative and dative (sometimes merged into a single accu-dative case) are more similar than genitive and accusative or genitive and dative. Instrumental and vocative however are another thing.
For Latin something like NV[Acc]-G(L)[Abl]D and at the same time NV[Acc]-GD[Abl](L) might be more reasonable: The optional locative is somewhere between genitive and ablative, and dative is between genitive (1st and 5th declension sg.) and ablative (1st and 3rd til 5th declension pl., 2nd declension). Considering Vulgar Latin and Romance languages, ablative should be near accusative like NV[Acc][Abl]D(L)G. This would even fit with the basic West Germanic NADG. For all of a PIE however, a sorting based on tradition as NGD[Acc]V[Abl]LI might be less controversal and might make more sense. - 20:46, 18 February 2018 (UTC)
LOL, I laugh because I think you're joking, but I honestly don't know. You could always make order an option of the module. --Victar (talk) 22:04, 18 February 2018 (UTC)
I am joking (mostly, NAGD does irk me), but it was meant to point out that standardization may be incompatible with the respective grammar traditions of languages (such as ordering).
As far as diversity of table styles is concerned, I like it, but maybe it could be seen as unprofessional, no strong feelings either way. Crom daba (talk) 23:22, 18 February 2018 (UTC)
I would support any kind of push towards standardization. —AryamanA (मुझसे बात करेंयोगदान) 22:00, 18 February 2018 (UTC)
Agreed. Why not do the same for conjugation, pronoun tables etc? The latter are nicely done in Wiktionnaire, for example wikt:fr:Modèle:pronoms_personnels/fr (/es, /pt etc). – Jberkel 23:02, 18 February 2018 (UTC)
I agree too. About the ordering of cases, this shouldn't be a problem. Erutuon has already written a script for rearranging them in one's preferred order. --Per utramque cavernam (talk) 23:13, 18 February 2018 (UTC)
It should be mentioned as context to this debate is this discussion. Part of the issue is whether to display transliterations in Sanskrit on a separate line as we do in Russian, Arabic, and Ancient Greek, which I feel is clean, clear, and allows you to read the table either in the native alphabet or in transliteration easily. Victar feels that we should have each transliteration follow every term. That should certainly be part of this discussion, though there does not seem to be much impetus to standardize them at this point, it seems. —*i̯óh₁nC[5] 05:22, 19 February 2018 (UTC)
That discussion did rejog my interest in this, but how to display non-Latin text next to transliterations is a conversation to be had down the road. The more important discussion at hand in the technical feasibility and community support for the idea, and I rather not muddy things by interjecting my personal formatting opinions, but people are welcome to chime in at the other discussion. --Victar (talk) 05:37, 19 February 2018 (UTC)
There was a giant programming project undertaken by someone to make a general inflection table interface module (maybe for Uzbek?) that could be used for all languages. @Erutuon, do you remember what I'm talking about? —*i̯óh₁nC[5] 05:53, 19 February 2018 (UTC)
Module:inflection Chuck Entz (talk)
I don't care to bikeshed it, especially not the order of the cases, which can remain unstandardized for all I care, but something a little less random in the colors and typesizes would be nice.--Prosfilaes (talk) 23:04, 19 February 2018 (UTC)
A simpler first step towards standardisation would be to use one CSS style for all of these templates (so that the look at least is consistent). Any future changes can then be made in one place instead of having to maintain them all separately. This has the additional advantage in that users could then choose to override the style formatting in their personal CSS file, and it would cascade through to all tables in all languages. The wikitable class is one example standard. -Stelio (talk) 16:06, 22 February 2018 (UTC)
This makes perfect sense to me. While different languages have different requirements in the structure and content of their declension and conjugation tables, that does not mean we cannot employ a single, uniform style across all of those tables. Each language can keep the structure which is most appropriate (as is a concern of several above) and still look like similar content across the whole project. - TheDaveRoss 13:43, 1 March 2018 (UTC)
Is it just me, or that I actually like the variety in language-specific styles of declension tables? mellohi! (僕の乖離) 13:56, 1 March 2018 (UTC)
Hear, hear! A different visual style for a given language has the added usability advantage of making it immediately obvious that you're looking at something specific to that language. Frankly, I don't want a unified visual style for all languages. I like that Spanish tables have their own color coding (see abrir), different from Finnish (kaahata), different from Navajo (atłʼó), different from German (gucken), different from Japanese (開く (aku)), etc. etc. ‑‑ Eiríkr Útlendi │Tala við mig 18:03, 1 March 2018 (UTC)

500,000 English lemmas[edit]

The 500,000th English lemma is motlopi. Congratulations and here's to the next 500,000. DTLHS (talk) 21:03, 19 February 2018 (UTC)

Exciting! And (on the subject of milestones) before the month is out, we should make it to 5.5 million entries. We also have entries from about half of the world's languages at this point. - -sche (discuss) 21:29, 19 February 2018 (UTC)
I just dun a graph.
Line graph plotting number of en.wikt entries against time.
Equinox 23:52, 19 February 2018 (UTC)
I'm surprised it's linear. Are we never going to run out of words? (Also Hindi just hit 10,000, but that's hardly anything) —AryamanA (मुझसे बात करेंयोगदान) 00:05, 20 February 2018 (UTC)
Obviously, the pool of English lemmas in existence has an input rate lower than our rate of adding English lemmas to Wiktionary, but not all such lemmas are equally easy to add, so we're not going to run out so much as come up against words that are increasingly difficult to define and cite (and you could say that we're already seeing the first signs of that). We're currently mainly limited by effort, so the number of lemmas in existence is irrelevant; while growth is linear, we can't "feel" the ceiling. —Μετάknowledgediscuss/deeds 00:12, 20 February 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Still a lot to do before reaching the ceiling for foreign languages...

(taken from Wiktionary talk:Statistics#Lemmas pie chart)

(all the foreign-language ones still have much room for improvement)

Plot of #gloss definitions over the last few years (using sequential data from Wiktionary:Statistics):


Wyang (talk) 01:48, 20 February 2018 (UTC)

lol, I can see my editing hiatus in the Hindi graph.

It would be nice to have lemma figures to go with the pie chart. DonnanZ (talk) 13:30, 20 February 2018 (UTC)

Congrats! That's great! It's a beautiful milestone! To compare, French Wiktionary only have 360,000 lemmas for French (but about a third of people contributing and the project is one year and half younger). @Wyang I am very interested if you can make a chart for French here, to discuss it with my colleagues Face-smile.svg Noé 12:38, 21 February 2018 (UTC)
@Noé Sure :), here you go:
Wyang (talk) 12:57, 21 February 2018 (UTC)
Applau.gif Face-smile.svg Noé 13:01, 21 February 2018 (UTC)
  • I had suggested some time ago that we have something like a "language of the month" where we pick a language that could use a lot of expansion and focus on that for a month. I still think it would be an interesting thing to try. I am curious as to whether we have exhausted the supply of translation dictionaries in the public domain. bd2412 T 14:27, 20 February 2018 (UTC)
    • @BD2412: No way, there is so much literature we haven't even touched, coming from the context of Indian languages. But I totally agree that "language of the month" would be a good idea. —AryamanA (मुझसे बात करेंयोगदान) 21:00, 20 February 2018 (UTC)
      • I am rather worried about what might result if people all started adding entries to a language with few native speaker editors, leaving them swamped with work just to catch our inevitable errors. —Μετάknowledgediscuss/deeds 21:02, 20 February 2018 (UTC)
Well Wyang you put my crappy graph to shame :D Thanks for your nice chart. Let us fight over the sweet sugar of the 5,500,000 milestone, I WANT IT. Equinox 15:58, 21 February 2018 (UTC)
lol, I didn't realize we were so close to 5.5 mil. —AryamanA (मुझसे बात करेंयोगदान) 22:15, 21 February 2018 (UTC)

Renaming to {{PIE root}} to {{root}}[edit]

What do people think about renaming {{PIE root}} to {{root}} and changing the format to {{root|ine-pro|iir-pro|*h₃er-}}? That seems more inline with the other etymology templates, and we can potentially use it for other languages, like Sanskrit, i.e. {{root|sa|hi|घट्}}. @Rua, Erutuon, AryamanA, JohnC5 --Victar (talk) 02:02, 20 February 2018 (UTC)

@Victar: Sounds great to me. Also something like {{root|sa|sa|धे}} (or {{root|sa|धे}} ideally) to categorize within a language. —AryamanA (मुझसे बात करेंयोगदान) 02:06, 20 February 2018 (UTC)
@AryamanA, actually, I think the best method, assuming it's possible, would be through {{head}}, i.e. {{sa-noun|root=घट्}}. --Victar (talk) 02:27, 20 February 2018 (UTC)
Hmm, maybe. Then {{root}} would not be needed I guess. —AryamanA (मुझसे बात करेंयोगदान) 02:33, 20 February 2018 (UTC)
@AryamanA: Yeah, {{root}} would only be used on child language entries, like Hindi entries with {{root|sa|hi|घट्}}. --Victar (talk) 02:39, 20 February 2018 (UTC)
@Victar: I'd be interested in this proposal, but @Rua is the one to convince. —*i̯óh₁n̥C[5] 03:13, 20 February 2018 (UTC)
I might suggest that the first and second params be switched, to better match the current behavior of {{bor}}, {{inh}}, etc. ‑‑ Eiríkr Útlendi │Tala við mig 02:07, 20 February 2018 (UTC)
@Eirikr, if we could do the same thing in PIE using {{ine-noun|root=*h₃er-}}, then it wouldn't be a problem switching the |lang= order, but otherwise it might be confusing, having to also use {{root|ine-pro|*h₃er-}}. Maybe not. --Victar (talk) 02:27, 20 February 2018 (UTC)

I created {{root}} to get the ball rolling for Sanskrit and PII roots. @Erutuon could I ask you a favor to make Module:category tree/PIE root cat work for both {{PIE root}} and {{root}}, since you created and are most familiar it? I took a whack at it and it did not go well. --Victar (talk) 01:33, 23 February 2018 (UTC)

Wiktionary:Votes/2018-02/Moving Lojban entries to the Appendix[edit]

I have created this vote concerning Lojban entries. I would appreciate it if other editors could look it over and offer their thoughts. —Μετάknowledgediscuss/deeds 05:07, 20 February 2018 (UTC)

Categories for intentionally nonstandard terms[edit]

I’m thinking we could have a separate category for deliberate/satiric mispellings and typos (Amerikkka, Grauniad, Micro$oft, etc.) and one for deliberately ungrammatic terms (who'd have thunk it, you pays your money and you takes your chances, no can do, etc.). These two along with Category:Pronunciation spellings by language and Category:Eye dialect by language can be added to a supercategory encompassing all intentionally nonstandard terms. — Ungoliant (falai) 12:49, 20 February 2018 (UTC)

I agree. I would split the two classes you mentioned: Category:Deliberate misspellings by language and Category:Deliberate grammatical errors by language. (The latter one probably needs a better name.) —Μετάknowledgediscuss/deeds 20:53, 20 February 2018 (UTC)
I broadly agree that deliberately nonstandard spellings are worth categorizing, but I'm not sure it's right to call "Amerikkka" (for example) a "misspelling", since it is deliberately/intentionally used to invoke "KKK"; it's not an error made out of ignorance. Deliberately nonstandard grammar seems harder to separate from dialectally typical grammar (of e.g. an immigrant community) from which I would expect most examples to derive; it also seems similar to uses of fossilized / no longer standard grammar ("if need be", etc), so the criteria for inclusion in such a category seem fuzzier, although they may not be a barrier to having such a category. - -sche (discuss) 21:08, 20 February 2018 (UTC)
Wikipedia calls Amerikkka a satiric misspelling, I don’t know what else to call it.
Category:Leet by language could probably be added to this category too. — Ungoliant (falai) 14:42, 21 February 2018 (UTC)

Transcription parameter again[edit]

I've just noticed that @Victar has been using the |tr= parameter to enter both transliteration and transcription of a term in Proto-Iranic entries (for example Sogdian [script needed] (ʾʾsʾwk’ /āsūk/, gazelle)).

I like how this looks and it satisfies the need I've been talking about in previous discussions on transliteration and transcription. I would suggest that an additional parameter is added, for example |tsc=, that will produce the same formatting while allowing transliteration to be automatically generated.

This could be used for: languages written in cuneiform, sparsely attested languages written in abjads (Middle Iranian and Middle Turkic languages, Arabic Middle Mongol), Old Turkic, Khitan and Jurchen (once they're properly encoded) even Kalmyk (phonemic schwas are unwritten).

Crom daba (talk) 15:09, 20 February 2018 (UTC)

A recent discussion about this: Talk:-ւ. --Per utramque cavernam (talk) 15:27, 20 February 2018 (UTC)
The takeaway is that Isomorphyc has been working on it (at User:Isomorphyc/Sandbox8), and will work on it again at some point, I think. --Per utramque cavernam (talk) 20:58, 20 February 2018 (UTC)
Symbol support vote.svg Support, At times I've had to do the same thing for Hittite. --Tom 144 (𒄩𒇻𒅗𒀸) 16:00, 20 February 2018 (UTC)
Symbol support vote.svg Support. --Vahag (talk) 19:03, 20 February 2018 (UTC)
Symbol support vote.svg Support. It's remarkable that we've had so many discussions and conflicts regarding this, but it still has not come to pass. —Μετάknowledgediscuss/deeds 20:56, 20 February 2018 (UTC)
Symbol support vote.svg Support. I've been doing the same thing as Victar for Middle Persian and Old Persian. —AryamanA (मुझसे बात करेंयोगदान) 21:02, 20 February 2018 (UTC)
Symbol support vote.svg Support: My only stipulation is that it be made clear that it shouldn't be used for IPA pronunciations. --Victar (talk) 00:32, 21 February 2018 (UTC)
I agree that the parameter should not be used for IPA, because the example /āsūk/ isn't IPA, and in order for IPA to be correctly formatted (and non-IPA not to be formatted incorrectly as IPA), the parameter can only be used for one or the other. — Eru·tuon 07:27, 2 March 2018 (UTC)
Also, I vote for |ts= (transcription). (And maybe we can rename |tr= to |tl= in the future.) --Victar (talk) 08:38, 2 March 2018 (UTC)
Symbol support vote.svg Support: I fully support this, if you if you are willing to solve this before I am able. Let me know if I can help with advice. Thank you! Isomorphyc (talk) 21:33, 4 March 2018 (UTC)

@Crom daba, this seems to have gotten wide support. Did you want to move forward? --Victar (talk) 04:40, 2 March 2018 (UTC)

I could, but what about Isomorphyc's work? Crom daba (talk) 10:30, 2 March 2018 (UTC)
@Crom daba, sorry, I thought you were a coder when I put that to you. Isomorphyc's module is too divergent from the standard module. @Erutuon, did you want to try your hand at implementing this? --Victar (talk) 21:20, 4 March 2018 (UTC)
@Erutuon, I threw this together. Did you want to check to see if it's too "hacky". :) {{User:Victar/Template:link|xpr|𐫙𐫢𐫗𐫇𐫍𐫡|t=grace, gratitude|ts=išnōhr}} → {{User:Victar/Template:link|xpr|𐫙𐫢𐫗𐫇𐫍𐫡|ts=išnōhr|t=grace, gratitude}}. --Victar (talk) 22:34, 4 March 2018 (UTC)
@Victar: I don't want to work on this at the moment. I would suggest a comma or some other separator between the transliteration and transcription, and something to identify transcriptions in the outputted HTML, so they can be found by CSS and JavaScript. (Transliterations have class="tr Latn" or class="tr mention-tr Latn" as well as a language code lang="xx-Latn", added by a function in Module:script utilities. That allows me to manipulate Russian and Modern Greek transliterations using JavaScript; see User:Erutuon/scripts/modifyRussianTranslit.js and User:Erutuon/scripts/phonemicGreekTransliteration.js.) — Eru·tuon 20:56, 6 March 2018 (UTC)
@Erutuon OK, span added to the transcription as well. I thought about having a comma, but I think it isolates it from the transliteration. Is there any reason you don't wish to implement this? --Victar (talk) 21:28, 6 March 2018 (UTC)
@Victar: I have no objection to the feature; I don't feel like working on any major projects at the moment. I might be willing to fix bugs, though. — Eru·tuon 00:42, 7 March 2018 (UTC)
@Erutuon Cool. Hope thing are OK with you. It's already coded, just looking for approval to be added to the module. If you don't have time, could you ping someone else with edit access that you trust? Thanks. --Victar (talk) 00:51, 7 March 2018 (UTC)

Is it typical for non-IPA transcriptions to be enclosed in slashes? That is likely to cause confusion about which transcriptions are IPA and which aren't, and about the purpose of the parameter: is it intended for IPA because the transcription contains slashes? Then again, there probably has to be something to distinguish the strict transliteration from the transcription in the output of the template, and slashes serve that purpose. — Eru·tuon 07:27, 2 March 2018 (UTC)

Could ⟨orthography brackets⟩ be used for the transliteration (letter-to-letter mapping), or is that also inappropriate? —suzukaze (tc) 07:32, 2 March 2018 (UTC)
There's an example of Durkin-Meisternernst's (a single person) Dictionary of Manichean Middle Persian and Parthian that uses slashes for non IPA transcription.
MacKenzie's Pahlavi has non-IPA transcription as a headword with transliteration(s) written inside square brackets.
Some Russian dictionaries also have transcription inside square brackets written with (modified) Cyrillic. Crom daba (talk) 10:36, 2 March 2018 (UTC)
I think angle brackets are too similar to parentheses, and square brackets will always mean strict IPA transcriptions to me. --Victar (talk) 16:54, 2 March 2018 (UTC)
I agree that the ⟨orthography angle brackets⟩ can be hard to distinguish from (regular parentheses) without looking closely. What about <regular angle brackets>? As far as I know, these aren't used for anything in IPA, and they're visually more distinct. ‑‑ Eiríkr Útlendi │Tala við mig 17:21, 2 March 2018 (UTC)
I suppose I'm also just used to //. This is how entries look in {{R:xpr:DMMPP}}: hwfryʾd Pa/MP /bufrayād/ a. 'helping well, helpful; helper'. --Victar (talk) 05:05, 4 March 2018 (UTC)

Just to mention the param would need to be |tsN=, because there are often several variants. This is surely pushing it, but I was also wondering if some sort a transcription qualifier would be warranted, i.e. {{l|pal|tr=bʾčwk|ts1=bāzūk|ts2=bāzūg|tsq1=early|tsq2=late}}[script needed] (bʾčwk /bāzūk (early), bāzūg (late)/). Or, this could be pointing to the need of a separate module. --Victar (talk) 18:28, 2 March 2018 (UTC)

@Victar Why not simply {{l|pal|tr=bʾčwk|ts=bāzūk (early), bāzūg (late)}}, simpler templates seem to hold up better over time. If it gets more complicated than that (and possibly already at that point) that information should be at the main entry. Crom daba (talk) 22:28, 2 March 2018 (UTC)

As will surprise no one, since I have advocated for this multiple times in the past, I Symbol support vote.svg Support this proposal. It did occur to me that we should perhaps limit the languages that may use this parameter to prevent abuse. As @Erutuon worried, I think users might misuse this parameter by giving IPA transcriptions to words. Furthermore, I think that in confusion, users might use |ts= instead of the more appropriate |tr= for normal transcriptions. I think we could greatly alleviate both issues by implementing a system similar to the one used for overriding manual transliteration. In this way, we can specify which languages would allow |ts= based on the languages' less informative writing systems (partial syllabaries, abjads, cuneiform nightmare-scapes, etc.). Indeed, we could even prevent |ts= from function in the absence of |tr=. This would prevent the bad behavior of only providing a transcription instead of both—a bad habit I've noticed with some frequency. The transcription should (in my option) always be secondary to the transliteration as transcriptions are often a reconstructed abstraction from the actual, attested symbols. I'd also mention that if we implement this, some clean up will be necessary for instances where users have used |pos= to provide transcriptions for Mycenaean Greek, which prevents giving transcriptions through the overridden |tr= parameter. I think this can be mostly accomplished by finding all the uses of |pos= that contain at least 2 / characters and converting them to |ts= manually. —*i̯óh₁n̥C[5] 11:16, 5 March 2018 (UTC)

@JohnC5, yeah, that was my concern, people using this as an IPA transcription. We could make it so that it only works for languages with non-Latn scripts. I'm not sure if that's possible, or how much of a drain that query would be. --Victar (talk) 17:53, 5 March 2018 (UTC)

The |ts= parameter is up and running, {{l|xpr|𐫙𐫢𐫗𐫇𐫍𐫡|t=grace, gratitude|ts=išnōhr}}𐫙𐫢𐫗𐫇𐫍𐫡 (ʿšnwhr /išnōhr/, grace, gratitude). Thanks, @JohnC5. I wanted to again address the issue John mentions above, which is preventing the misuse of |ts=. Perhaps we should only allow it to function if the language makes use of non-Latin scripts. We could also create a Category:Terms with manual transcriptions using IPA chatacters to monitor. @JohnC5, Metaknowledge, Erutuon, Per utramque cavernam, Tom 144, Vahagn Petrosyan --Victar (talk) 01:50, 9 March 2018 (UTC)

Those both sound like good proposals, although we'll want to spell characters correctly. —Μετάknowledgediscuss/deeds 06:28, 9 March 2018 (UTC)

Speak-only languages[edit]

How does this wiki have policy about speak-only (non-written) languages to be collected? Must their entries be made as Latin script or IPA or not at all? --Octahedron80 (talk) 20:42, 20 February 2018 (UTC)

You should follow the conventions of any published materials that exist, such as scholarly papers or dictionaries. DTLHS (talk) 20:55, 20 February 2018 (UTC)
(e/c) We have to select an orthography for unwritten languages, and document it on the relevant About page. If only linguists have documented it, and those linguists use IPA, then it may be most appropriate to follow their lead and add entries in IPA. If they have a working orthography (maybe Latin script, to make documentation work easier), or if any orthography is being taught to the native speakers (maybe using a modification of Thai script if this is a regional language in Thailand), that should be selected instead. —Μετάknowledgediscuss/deeds 20:56, 20 February 2018 (UTC)
You may be interested in Wiktionary:About_Western_Yugur. Crom daba (talk) 01:21, 21 February 2018 (UTC)


The Discord server that was discussed last month is up an running thanks to @PseudoSkull, click here to join. Hopefully this can replace our inactive #wiktionary IRC channel. —AryamanA (मुझसे बात करेंयोगदान) 20:58, 20 February 2018 (UTC)

I dropped in to look around. Glad it's got a way to join without "signing up" or installing anything, but I do feel that a free open project like Wiktionary should use free open tech like IRC, not a closed-source thing that may require a specific client, or be restricted by the makers in future (this does happen, e.g. LogMeIn). Equinox 21:42, 20 February 2018 (UTC)
Discord is nearly malware in my view, it is almost impossible to remove once you have installed it on PC. IRC, with all of its problems, is my preference. - TheDaveRoss 21:49, 20 February 2018 (UTC)
Yeah, before someone accuses me of a being a Luddite who hates graphics, sounds, etc.: I'd be fine with those things built optionally on top of free open tech. (Imagine the uproar if Wikimedia replaced e-mail as a contact medium with Facebook. Heh.) I'm gonna go on IRC RIGHT NOW and raise a ruckus <3 Equinox 21:57, 20 February 2018 (UTC)
Dave, if you're afraid of installing it, you can use Discord in your browser. Eq, I can't remember whether you hate xkcd or not, so have this. —Μετάknowledgediscuss/deeds 00:06, 21 February 2018 (UTC)
That comic accuses an open-source fan of being smug and autistic for wanting free open tech. I am the exact opposite of that, which is why I dislike xkcd, which is reliably smug and autistic. The comic doesn't even have a point. Huh! (P.S. The Wiktionary IRC is still pretty good, though I go there less than once a month and see the same four or five faces. Smart faces. Heheh.) Equinox 00:42, 21 February 2018 (UTC)
I would suggest that a character in the comic makes those accusations and that character is the butt of the joke, but that isn't really important. Also, it would be great if more people used the IRC if only on a semi-regular basis like Equinox. I think the Wiktionary community was closest back when a good chunk of regular contributors (20%?) were frequently able to engage in casual conversation. - TheDaveRoss 13:05, 22 February 2018 (UTC)
There's also an "official" English Wikipedia Discord FWIW. —AryamanA (मुझसे बात करेंयोगदान) 22:06, 20 February 2018 (UTC)
"Esperanza was a Wikipedia project founded on 12 August 2005..." Equinox 00:41, 21 February 2018 (UTC)
That's a really weird story, but I suppose it works as a cautionary tale too. —AryamanA (मुझसे बात करेंयोगदान) 22:18, 21 February 2018 (UTC)
There are rather a few self-hosting alternatives, w:Mumble (software) and w:TeamSpeak (proprietary iirc, but freeware) being two off the top of my head. (Also, Nextcloud's Talk, FLOSS implementation of Spreed, allows video conferencing - but my server is bandwidth limited to 9Mbs) The problem with Discord is the time-worn observation that if anything on the internet is free, you are the product being sold. Quite a few of us maintain servers of varying abilities on the internet, which we prefer to "free" services. - Amgine/ t·e 04:26, 21 February 2018 (UTC)
I doubt many, if any, project members are going to use the Discord voice chat feature, but it makes for a good chatroom software. I'm perfectly fine with them selling everything I write, just as anyone has open access to what I write in public discussions here. It makes no difference to me. --Victar (talk) 22:40, 2 March 2018 (UTC)


Is there a reason why we don't separate English thesaurus entries from other languages? E.g. thesaurus:die and thesaurus:死亡 are put in the same category. Seems strange to me. ---> Tooironic (talk) 04:38, 21 February 2018 (UTC)

This was discussed several times (for example here, see Dan Polansky's vote and the talk page; or here), but no solution has been reached yet. --Per utramque cavernam (talk) 15:22, 21 February 2018 (UTC)

Books of the Apocrypha or Catholic deuterocanon[edit]

The stance on their inclusion in Biblical canons varies across definitions. Baruch and 1 and 2 Maccabees only mention being apocryphal, Sirach and Wisdom don't mention it at all, Tobit only mentions being in the Catholic canon, not Eastern Orthodox ones, and only Judith actually seems unbiased. I propose using the definition given for Tobit for all seven, but with the additional mention of the Eastern Orthodox. (Sirach is especially interesting, because the synonym Ecclesiasticus mentions some groups not considering it canonical)

As a tangential note, I also updated Appendix:Books of the Bible to include them in the Catholic canon listed. That one was also odd because even though our listed source, Catholic Online, includes them now, I checked the Wayback Machine, and it didn't when the list was previously said to have been retrieved.

--RoseOfVarda (talk) 16:13, 22 February 2018 (UTC)

Be bold and have at it! :) And I suggest you link to this discussion in your edit summaries so that anyone who wants to propose some other wording can do so in this central discussion. - -sche (discuss) 19:06, 23 February 2018 (UTC)

Wikidata and CC0 licence for lexicographical data[edit]


You may have heard already, Wikidata people are very interested by Wiktionaries data. They are now at the step of creation of a dedicate Lexeme namespace in Wikidata. Lydia, in charge of this project, call for a vote for the licencing of this new namespace. I think we wiktionarian are concerned by this vote, because it may change the kind of connections we may do between Wiktionaries and Wikidata. Lydio only offered argument pro CC0, but there is a lot of con either. I summed some there, but I call for your expertise and capacity of judgment on this matter. I think it is not some much on the legal part but on the psychological and ethical aspects we can give a different perspective, as we are and we know people that have lexicographical data to share and people that reuse Wiktionaries data.

I think we need to imagine some prospective, because they may have built some but they didn't share the potential consequences for each possibility, and I am quite worry with their agenda. In this perspective, the Wikidata team asked for a Wikilegal note about lexicographical data but it is a draft that need to be severely improve, as it doesn't include some fundamental aspects of Wiktionaries so far. Your comments on this essay are welcome too.

Well, sorry if you feel this is not of your concern. I think it can't be bad to know more, to be able to collaborate rather than be notice of a undesired change too late Face-smile.svg Noé 08:35, 23 February 2018 (UTC)

I'm curious how this will play out in practice. I'm all pro-sharing and making data available as widely possible, but this basically means that Wikidata has to start from scratch, and that the collaboration between the projects will always be complicated (at least in one direction, taking data from Wikidata is fine). But if I write a bot to update Wikidata items from Wiktionary I would technically violate licensing terms. – Jberkel 10:24, 23 February 2018 (UTC)
For the last part, yes, and I am curious to know how they will prevent the violation of SA in CC BY-SA. For the first part, have some pieces of a page written in CC0 but displayed as CC BY-SA may also be considered as copyfraud, I think. So, both project may be independent and not compatible in any way. Strange. Face-smile.svg Noé 12:47, 23 February 2018 (UTC)
There would be no issue with using CC0 content within any other context. - TheDaveRoss 14:10, 23 February 2018 (UTC)
True, if there was a scrupulous curation of data which would aim at not including infringing material. But so far Wikidata is just making massive import regardless of license of the source. They didn't went as far as dumping attentive communities like OSM, but seems reckless about massive extract from misc. Wikipedia for example. This cast doubtfulness on the legality of the whole database, which propagate to any project using it. --Psychoslave (talk) 16:12, 27 February 2018 (UTC)

I think it's sad that licensing issues led to this situation, but I don't know the best way out of it.

Question: do contributors to Wikimedia projects have the rights to republish their contributions under a more permissive license? If some users think that it is acceptable (or even preferable) that their work is published under CC0, that should slightly reduce the issue of duplication.

The consequences of this proposal for the wiktionaries if it goes through, and of the introduction of lexicographic data on Wikidata more generally, is difficult to predict. Some amount of time that users would otherwise spend working on the wiktionaries locally will likely be lost through them working on Wikidata instead. On the other hand, a lot of work might become redundant locally through the increased efficiency that centralisation of data in theory is capable of providing. Furthermore, some amount of users that would otherwise work little or not at all with lexicographic data on the wiktionaries could end up working a lot with such data on Wikidata because that format appeals more to them (and they could also end up doing more work (or indeed any at all) directly on the wiktionaries as a consequence of this).

The relative strengths of such effects are difficult to predict, and hence whether the introduction of lexicographic data on Wikidata will have a net positive or net negative effect on the wiktionaries. --Njardarlogar (talk) 11:21, 25 February 2018 (UTC)

My 2 cents. The same way that you can import public domain data to Wiktionary under CC-BY-SA, there is no problem importing CC0 data from Wikidata. Other way round, you can not republish CC-BY-SA data under CC0. The underlying question is that facts are not copyrightable. So, what is a fact and what is a creative creation in Wiktionary? As far as I undertood, Wikidata will import facts as the information in the heading line and lexical categories, but not definitions at all. Other data as pronunciation may be in the border line. --Vriullop (talk) 17:02, 27 February 2018 (UTC)

Another paper about Wiktionary[edit]

Automatic Generation of Wiktionary Entries for Finno-Ugric Minority Languages. —AryamanA (मुझसे बात करेंयोगदान) 18:39, 23 February 2018 (UTC)

Interesting, thanks for sharing. They mention that they asked for permission to upload the terms they generated, anyone know where that request is? - TheDaveRoss 18:44, 23 February 2018 (UTC)
I don't think the entries were created on the English Wiktionary. DTLHS (talk) 18:57, 23 February 2018 (UTC)
It's mainly in the Hungarian and Finnish Wiktionaries; see Global edits of Finnotka and fi:Keskustelu käyttäjästä:Finnotka. Wyang (talk) 22:16, 23 February 2018 (UTC)

Plurale tantum vs. pluralonly[edit]

When should we mark a term with plurale tantum as opposed to using {{en-plural noun}} (which produces the gloss "plural only")? I tend to prefer the latter as it avoids jargon. Is there a real difference? Equinox 13:13, 24 February 2018 (UTC)

I don't think there is a real difference. For English, at least, our users would probably prefer "plural only", which doesn't need much explanation. DCDuring (talk) 16:32, 24 February 2018 (UTC)
I would say "plurale tantum" should be automatically converted to "plural only". Andrew Sheedy (talk) 19:08, 24 February 2018 (UTC)
Yes, whichever wording we decide on should be displayed by both templates/values (templates should accept one as an alias of the other). This problem has been noted for ten years, by the way! - -sche (discuss) 23:09, 24 February 2018 (UTC)
I wholeheartedly agree that "plurale tantum" should be done away with. DonnanZ (talk) 21:04, 12 March 2018 (UTC)
I have changed it to "plural only" in all the modules I could find, and will now look at entries. (Ideally, all templates/modules that currently accept one should be made to accept both as input, and just display plural only for both.) After ten years, let's finally fix this! - -sche (discuss) 22:37, 12 March 2018 (UTC)

Inconsistent and confusing romanisation formats given by various templates and modules[edit]

Example: русский (russkij), where about half of the romanisations are italic, and half are not.

Something I have wondered for a long time ― why is there a need to format romanisations differently in {{l}}, {{m}} and {{head}}? Why not italicise romanisations by default? And, is it really necessary to have both {{l}} and {{m}} for languages written in scripts not affected by italicisation? Wyang (talk) 00:30, 25 February 2018 (UTC)

@Wyang: I believe the notion is that for scripts which we don't italicize in mentions (Russian, Greek, etc.), we italicize the romanization to show the distinction between the mention and non-mention formats. Was it @Erutuon who implemented this? —*i̯óh₁n̥C[5] 00:46, 25 February 2018 (UTC)
@JohnC5: I think transliteration was italicized before I started messing with stuff. I just added extra classes to transliteration so that it could be located by CSS and JavaScript. — Eru·tuon 02:28, 25 February 2018 (UTC)
I think {{l}} and {{m}} have been generating differently formatted romanisations like this for quite some time, although I never really understood why romanisations are unitalicised in {{l}} and {{head}}. It is inconsistent and looks unprofessional on entries, when romanisations are differently formatted, some are rússkij and some are rússkij. Wyang (talk) 02:42, 25 February 2018 (UTC)
I'll chime in from a Japanese-entry editor standpoint to state that I agree that the difference is weird, and I'd prefer it if {{l}}, {{m}}, and {{head}} were aligned to show romanizations in italics. ‑‑ Eiríkr Útlendi │Tala við mig 04:35, 25 February 2018 (UTC)

Anachronisms within PIE[edit]

The way in which we reconstruct PIE's morphology is anachronistic. There is a general consensus that Anatolian left early, and that many features of traditional reconstructions are really post-PIE innovations, such as the feminine gender, the optative & subjunctive moods, the reconstructed dative and ablative plurals, dual number(?). I think we should update our terminology, and reconstruct PIE to two stages. One would be the closest common ancestor of all IE languages excluding Anatolian (Proto-Nuclar-Indo-European), and the other would be the common ancestor between PNIE and PAnatolian (Proto-Indo-European). This would mean that we would need to move all PIE pages to PNIE, and remove the Anatolian descendants and place them under cognates or add the to the etymology section. I realize that this idea is probably going to face a lot of opposition, but I believe it's necessary if we want to accurately represent PIE. What do you think? @Rua, JohnC5, Victar, AryamanA, Mahagaja, Chuck Entz. --Tom 144 (𒄩𒇻𒅗𒀸) 01:07, 25 February 2018 (UTC)

I support this, having proposed this before. —*i̯óh₁n̥C[5] 01:46, 25 February 2018 (UTC)
Right, I forgot to credit you and to link to the original discussion.--Tom 144 (𒄩𒇻𒅗𒀸) 02:13, 25 February 2018 (UTC)
I oppose moving all of our PIE material to "PNIE", that's just ridiculous. I encourage you to add the relevant information to our entries, perhaps under the Reconstruction sub-heading, and generally think of reconstruction pages as places to organize our current knowledge about an etymon rather than absolutely final representations of words as they really were.
If this information really is that huge and incompatible with our current PIE entries, then I'd approve of adding an Indo-Hittite (I don't like this term either) language to host it. 03:22, 25 February 2018 (UTC) —This unsigned comment was added by Crom daba (talkcontribs).
I think that would be quite a drastic move, even if it is more accurate. Maybe an Indo-Hittite language would be better for the common ancestor of PIE and PAnatolian. Also pinging @माधवपंडित who is more knowledgeable in PIE (and Old Indo-Aryan) than me. —AryamanA (मुझसे बात करेंयोगदान) 03:38, 25 February 2018 (UTC)
I proposed this somewhere, but the term Indo-Hittite isn't very popular. Maybe we could work around this using synonymous terminology such as Early-PIE and late-PIE, but this terminology is generally used by revival sites, and after reading about them is difficult to take them seriously. Proto-Indo-Anatolian does not sound better. Certainly the most aesthetic solution is to redirect everything. --Tom 144 (𒄩𒇻𒅗𒀸) 03:57, 25 February 2018 (UTC)
Like @AryamanA said, this is a very big change. Because of the human tendency to resist change, this idea is alarming to me right now (moving most PIE entries to a new and comparatively unheard of language, PNIE!), but this is not something I cannot get behind. If the literature confirms this, we can implement this change. Wiktionary should strive to be correct and consistent so that in the future people can refer to and rely on Wiktionary's information rather than having to consult various sources. -- माधवपंडित (talk) 05:29, 25 February 2018 (UTC)
@माधवपंडित: I'll leave this quotes from respected authors that support the view:
"Interestingly, there is by now a general consensus among Indo-Europeanists that the Anatolian subfamily is, in effect, one half of the IE family, all the other subgroups together forming the other half; and it is beginning to appear that within the non-Anatolian subgroup, Tocharian is the outlier against all other subgroups."Ringe, Don (2006) From Proto-Indo-European to Proto-Germanic, Oxford University Press, page 5
"If we compare the New Zealand tree of IE with the Pennsylvania tree, we see that they share some fundamentals on the interrelationship of the IE languages. In both models, the first split in the tree is between the Anatolian group of languages and all the others, and the second is between Tocharian and the rest of the family. This is in accordance with the views of the majority of Indo-Europeanists at present. Anatolian is radically different from the rest of the family in many respects..." – Clackson, James (2007) Indo-European Linguistics: An Introduction (Cambridge Textbooks in Linguistics), Cambridge: Cambridge University Press, page 13
"Support for the Indo-Hittite scenario (sometimes under a different name) has increased in recent years (since 1995). There is a growing body of evidence which is best explained on the assumption that Proto-Anatolian did not share all the common changes which characterize the other IE languages."Beekes, Robert S. P. (2011) Comparative Indo-European Linguistics: An Introduction, revised and corrected by Michiel de Vaan, 2nd edition, Amsterdam, Philadelphia: John Benjamins Publishing Company, page 31
"The ‘Indo-Hittite hypothesis’ has been much discussed over the years, even resulting in a monograph (Zeilfelder 2001). Although at first scholars were sceptical, in the last decade it seems as if a concensus is being reached that the Anatolian branch indeed was the first one to split off of the Proto-Indo-European language community."Kloekhorst, Alwin (2008) Etymological Dictionary of the Hittite Inherited Lexicon (Leiden Indo-European Etymological Dictionary Series; 5), Leiden, Boston: Brill, ↑ISBN, page 22
"But evidence has been growing that Anatolian split off at a time when the development of some of these categories (such as tthe s-aorist) was only nascent." – Fortson, Benjamin W. (2004) Indo-European Language and Culture: An Introduction, first edition, Oxford: Blackwell, page 155

Experts from all schools of thought and all Universities tend to agree on this topic. I do not know any respectable source that argues against this.--Tom 144 (𒄩𒇻𒅗𒀸) 15:00, 25 February 2018 (UTC)
Are we going to be able to say anything different in the new framework that we couldn't before, or are we just putting a new (albeit more accurate) label on the same can? What PIE content do we have that's identifiably not NPIE? After all, we can't call anything IE if it's not attested in some form in a descendant of NPIE, so if NPIE is an abstraction, then PIE is an abstraction of an abstraction.
A change that makes what all the easily-accessible sources call PIE different from what we call PIE is going to require extra effort to avoid confusion. I'm not saying we shouldn't do it, but we need to be sure we're clear about what we're doing, and why we're doing it. Chuck Entz (talk) 06:10, 25 February 2018 (UTC)
@Chuck Entz: It's not really about what PIE has that PNIE does not, but rather the other way around. PNIE has a feminine gender, an optative, subjunctive, a well formed dual, simple thematic verbs, a perfect, adjectives in *-to-, comparatives in *-yos-. While PIE doesn't have any of those things, and some of them had different meanings (such as the perfect < stative). A hole bunch of suffixes and word formations are not reconstructible for PIE.
Those are the issues concerning morphology, but we also have the lexicon. Not all etymons reconstructed in wiktionary have descendants in Anatolian. As a rule of thumb, we shouldn't extrapolate a secure PNIE reconstruction, even if it has a morphological archaic look. For example, the word "*udōr" ~ "*udnés" is reconstructible for PNIE with a "plural" collective in "-eh₂", if we assumed that because r/n stems are archaic, then we can extend it to PIE we would be mistaking, because Anatolian evidence actually show that "*udōr" was instead the collective of *wódr̥. We should only reconstruct something for PIE when we can do it for PAnatolian and PNIE too.
There shouldn't be any confusion since we are not changing the definition of PIE. It is still the closest common ancestor of all IE languages including Anatolian. We are just adding the extra term PNIE for reconstructions that are not suitable for Anatolian, which happen to be the majority.
I understand skepticism, honestly I thought this would get a lot more opposition. But I believe this will have to be fixed eventually, we cannot just simply ignore the contradictions in our reconstructions --Tom 144 (𒄩𒇻𒅗𒀸) 16:04, 25 February 2018 (UTC)
I'm not skeptical. I'm well aware of all that. The main question is whether we need to rework our basic structure in response rather than noting it in the entries where relevant. The point about PNIE content that doesn't apply to Anatolian is a good one, though. Chuck Entz (talk) 16:52, 25 February 2018 (UTC)
As to whether we're changing the definition of PIE: we're changing the definition of the part of PIE that most people are interested in. I also don't look forward to our having to go through all the etymologies and decide whether a reconstruction is PIE s.l. or NPIE. It seems to me like incorporating the distinction into our structure will force us into making that judgment in order to avoid misrepresentation. We're going to have to decide whether w:Schroedinger's cat is alive or dead, rather than leaving it conveniently undefined. Chuck Entz (talk) 17:10, 25 February 2018 (UTC) Update: The question I asked last night basically eliminates this concern: if everything we have inherently includes NPIE by virtue of requiring presence in some form in NPIE to be IE, there's no need to decide whether it's also PIE in the broader sense. Chuck Entz (talk) 21:30, 25 February 2018 (UTC)
This does seem like the sort of thing that would be easier to handle with labels. @Florian Blaschke may wish to comment; when I googled thhe phrase "Nuclear Indo-European" one of the results was him opining on Wikipedia that "There is no clear evidence for a 'Nuclear Indo-European' excluding Anatolian, either; instead, Anatolian can simply be a peripheral branch that became isolated early on and did not take part in some developments common to all or most other languages, without those forming a monophylum, but instead a dialect continuum or language area like early medieval Common Slavic or Old High German through which innovations could still spread (and instead, Anatolian went through innovations of its own)." - -sche (discuss) 17:41, 25 February 2018 (UTC)
@-sche: Mmm… I don't understand how does this contradict the existence of PNIE. Pooth has some ideas that may be similar, but easier to understand. He argues that during the period when PIE branches where splitting, they still had some common innovations. He calls this Vulgar-PIE, it's not the closest common ancestor of anything, but a "dialectal continuum". Of course that, he makes this assumption to account for the absence on any PIE branch with an ergative alignment, a partitive in "*-ém", an allative in "-m", a absolutive plural in "-e", a locative plural in "-is", an allative plural in "*-ms", an sociative-associative plural in "-eh₁",and many other weird things he reconstructs. In other words, he does not believe this because he has very good arguments that support his view, but because he needs to assume this so his reconstructions don't fall apart. It wouldn't be too crazy to say that he is biased. The dialectal continuum is necessary for PNIE's subgrouping after the split of Tocharian. We know this because common innovations such as satemization and centumization, double dentals, northern replacement of *bh by *m, augment, and others cannot be traced back to a tree model. But unsurprisingly, PAnatolian does not figure in those issues. PAnatolian has the triple reflex, does not have neither of *-bh- nor *-m-, and shows no trace of an augment. There are plenty of PNIE common innovations that have no trace in Anatolian. As I showed in the above citations, This isn't a controversial issue, there is a wide consensus over this topic.--Tom 144 (𒄩𒇻𒅗𒀸) 19:45, 25 February 2018 (UTC)
I don't see what's so hard to understand about that. Anatolian was definitely spoken in Asia Minor throughout the 2nd mill. BC and most probably in the 3rd mill. BC too (at least that's what most people seem to assume nowadays, who might even go back further, into the 4th mill. BC). In the Corded Ware period, Indo-European must basically have been a vast dialect continuum centred on Europe (Tocharian is thought to have been isolated in Asia early too, as early as c. 3000 BC, prior to the expansion of Indo-Iranian out of Eastern Europe). Innovations can spread through such a continuum. Anatolian was geographically isolated, separated from this continuum by the Black Sea, the Sea of Marmara and the Aegean, so you would not expect it to participate in common Nuclear IE innovations. There is no clear evidence for a "Proto-Nuclear-IE", and when Anatolian (and Tocharian) split off, dialectal divisions within the remainder, NIE, can already have existed, such as the isogloss middle endings in *-y (Indo-Iranian, Greek, Germanic) vs. middle endings in *-r (Italic, Celtic, Tocharian, Phrygian), where Anatolian plainly belongs to the second group. In fact, the *-bʰ-/*-m- story is not so clear; there is evidence for both inside Anatolian too, and in fact inside other branches, which contradicts the idea of a simple ancient isogloss separating the "m-branches" Germanic and Balto-Slavic from the rest, and there is an alternative way to account for the alternation (see here, jump to ch. 3 starting on p. 178).
Consider a parallel: Old Icelandic was closest to Old West Norwegian, Old Norse already exhibiting clear dialectal divisions at the period of our earliest literary attestations (12th century), but Icelandic has been largely isolated from the mainland languages since after the high medieval period, and thus did not participate in the changes of the late medieval period that saw the strong influence (mostly but not exclusively lexical) from Middle Low German in Mainland Scandinavian and an erosion of word endings and morphology, as well as common phonological developments. So Mainland Scandinavian now superficially looks like it descends from a "Proto-Mainland-Scandinavian" and Icelandic separated before, but this division is not the historical truth, as we know. --Florian Blaschke (talk) 03:45, 26 February 2018 (UTC)
Also, even if you do decide to believe in Indo-Hittite sensu stricto (in the absence of compelling evidence for it, it's better to default to treating Anatolian like any other primary branch), Anatolian is poorly attested outside of Hittite, and even the Hittite evidence is often sketchy, obfuscated by the script. Various lexemes are suspected to be hidden under Akkadograms and Sumerograms. Many numerals attested everywhere else, even in Tocharian, are missing, but sometimes indirect evidence is found that may however be debated. I don't think you'll find much enthusaism for scrapping the numerals. Relying that strongly on the absence of evidence in Anatolian is a terrible idea. Just because something is missing in Anatolian does not mean that PIE never had it; accidental failure of attestation, or loss, is always possible.
It's a lot like Proto-Germanic. Many lexemes well-attested in West Germanic, and often also in North Germanic, are missing in Gothic (including Crimean Gothic), but there is frequently no other, more compelling or inherent reason not to reconstruct them for Proto-Germanic. Earlier generations of scholars refused to reconstruct any lexeme for Proto-Germanic not attested in Gothic, but Kroonen has no such compunctions, he does reconstruct even many lexemes limited to West Germanic for Proto-Germanic, and so do we. We even reconstruct the instrumental as a noun case for Proto-Germanic, even though Gothic lacks it (Old High German has it), so even grammatical categories are affected! This is a methodical decision, but it makes sense (new evidence keeps cropping up, and not limiting reconstructions unreasonably helps). Although it is generally accepted now that Gothic is the first split from Germanic, its attestation is limited. Gothic may never have had the instrumental as a noun case, or at least some of these lexemes, but we don't care. It could have lost them; it happens. Methodical fundamentalists may object to this practice, but Wiktionary has taken a stand already and sided with the younger scholars who view the issue differently, preferring to err on the side of inclusivity. --Florian Blaschke (talk) 04:16, 26 February 2018 (UTC)
  • I agree with Chuck, which is that you seem to be missing the point, Tom. You continue to muster more evidence, but we're talking about how to build an effective dictionary. For the time being, people will look for this information under PIE, and we have a pretty good infrastructure for keeping it as PIE. That means that the wisest course of action is probably finding some way to mark entries (with a context label, or maybe a custom template?) so that readers know to what level a reconstructed word or morphological feature can actually be reconstructed. —Μετάknowledgediscuss/deeds 20:04, 25 February 2018 (UTC)
    • If we do decide to keep a unified PIE, I think we should definitely include a disclaimer that what we call PIE is really NPIE unless we specifically say otherwise, and that applies especially to inflection sections. This reminds me a lot of Ancient Greek, where the Attic dialect, which had several significant (though far less so than in NPIE) innovations not found in the other dialects, eclipsed all the others during the Hellenistic period. We don't call Koine and Byzantine Greek "Nuclear" or "Attic" by name, even though they clearly are. Perhaps should mark the NPIE-specific morphology the way we do the Attic declensions and other Attic-specific inflectional morphology. Chuck Entz (talk) 21:30, 25 February 2018 (UTC)
      • Well, PIE entries do go listed under one same reconstruction page. We could separate them under different headings, and of course, add a disclaimer or something clarifying the different reconstructed stages. That way we can include both PIE and PNIE without changing much of the current infrastructure. Since all PIE pages would presuppose a PNIE page, then the obvious solution is to keep the PNIE form as a lemma. That way we wouldn't need to redirect anything, and we would present both stages whenever possible in a consistent way. If the heading solution isn't supported, then I would obviously support the greek dialectal analogical solution. --Tom 144 (𒄩𒇻𒅗𒀸) 21:53, 25 February 2018 (UTC)
        • I don't know a whole lot about this topic, but I would oppose introducing either PNIE or PIH as a new proto-language. I think we can say everything we need to say with usage notes and labels while retaining ine-pro as the only IE ancestor language. —Mahāgaja (formerly Angr) · talk 07:26, 26 February 2018 (UTC)
Should I start a vote about the usage of lables then? --Tom 144 (𒄩𒇻𒅗𒀸) 22:50, 2 March 2018 (UTC)

Format of xiehouyu entries[edit]

Currently there are several formats for xiehouyu entries:

Should there be any standardization on how these entry names should be formatted? — justin(r)leung (t...) | c=› } 05:12, 26 February 2018 (UTC)

Standardization seems like a good idea to me. —suzukaze (tc) 21:10, 27 February 2018 (UTC)
I think we should strive to use the dash-separated full names whenever possible──it is pretty much the standard practice in education. Wyang (talk) 23:25, 27 February 2018 (UTC)
I would suggest redirects from whichever format isn't used (generally or for any particular/exceptional entries) to whichever format is used, if both formats are attested. - -sche (discuss) 00:49, 28 February 2018 (UTC)

Preparation of the Wikimedia Conference 2018[edit]

Dear fellow wikimedians,

In April I will go to Berlin to assist to the Wikimedia Conference as Wiktionary "representative". I quote the term as I do understand that not everybody, myself include, like this terminology, as I obviously can't represent our whole diversity. But nonetheless I would like to be indeed as representative of the Wiktionary community as I can, so I come here to ask your feedback. It's important to me to go there being confident that I gathered a decent overview of the Wiktionarian community goals, issues and needs, as well as any point that make consensual agreement as being important.

There is already a set of question to prepare the conference on Meta. What are the messages you want to be see passed there? What information do you want to get? You can reply to the previous set of question on Meta, but you are also welcome to reply freely here, I'll do my best to work with you.

Please spread the word in any Wiktionary where you are active, or even where you can simply translate this message, translated feedback would be also an extremely welcome input.

Be bold with notifications anytime you think there is something I should read regarding this topic, and feel free to ask me anything you want. --Psychoslave (talk) 16:44, 27 February 2018 (UTC)

Username etymologies[edit]

Anybody want to have a little bit of fun on the side? I started a subpage of my user page that anyone can edit. I'm attempting to document Wiktionary username etymologies. User:PseudoSkull/Etymologies of usernames Please edit this page with any further information you know. You can put yourself on the page, too, if I forgot to add you, and do not feel hesitant to edit it. This is actually something I'm always curious about. I'm good at memorizing peoples' usernames, but sometimes I really wonder hard where those usernames came from. PseudoSkull (talk) 20:34, 27 February 2018 (UTC)

Is it all right if I add those? --Per utramque cavernam (talk) 21:08, 27 February 2018 (UTC)
I think that would be called flooding. DonnanZ (talk) 19:07, 28 February 2018 (UTC)

CAT:English prepositions, CAT:English prepositional phrases[edit]

The latter is currently a subcat of the former, but this is a mistake; prepositional phrases ≠ phrasal prepositions. So if nobody objects, I'm going to remove this. --Per utramque cavernam (talk) 11:50, 28 February 2018 (UTC)

@Per utramque cavernam: It might be best to keep it in the category so that it can be easily found, but to make the breadcrumbs show "Phrases" instead. — Eru·tuon 18:51, 28 February 2018 (UTC)

Misspellings: "common" criteria[edit]

I know that Google is not an acceptable reference (as per WT:REF), but could we use number of Google results as a kind of unofficial criterion? For example, "exsercize" (for some reason how I spell exercise) has only 856 results on Google, but "excercise" has 26.8 million, so I think it's obvious that exercise should be included but not "exsercize". If this seems acceptable, we could compare "misspelling" to "accepted spelling" and come up with a ratio maximum (minimum?). For example, zymography:zimography is about 197:1 based of Google searches, so it seems acceptable to include despite zimography only having 1,560 results.

Of course, this is not a perfect method, as a lot of misspelling results are just mentions, and for misspellings that are words in their own right, this method is useless (e.g. collage for college, homophones), and misspellings could be proper spellings of terms not in Wiktionary (e.g. zuchetto; "zuchetto" returns many surnames). But I think it gives good rough estimate on how widely-used misspellings are. – Gormflaith (talk) 15:57, 28 February 2018 (UTC)

No, the "number" of Google results is utter bullshit and shouldn't be used for anything. Google Books ngrams are more useful. DTLHS (talk) 19:02, 28 February 2018 (UTC)
Oh ☹. Sorry. I guess I don't fully understand how Google works. – Gormflaith (talk) 01:14, 1 March 2018 (UTC)
Just try a search that claims that there are, say, a thousand hits on Google Books. Then page through the results as quickly as you can. I just did it for the word figpecker. (Don't ask.) The first page said 4,000 hits, the last page said 430. Sometimes the results are not as dramatic and sometimes much more. DCDuring (talk) 02:37, 1 March 2018 (UTC)
Unfortunately, even apart from the result-counting problems, there are serious numbers of content-free spam pages that copy each other, especially ones that use rare words to try to attract specialised searchers. Equinox 10:35, 2 March 2018 (UTC)

March 2018

March LexiSession: mathematics[edit]

This month, we suggest you to focus somehow on the words to talk about the mathematics. Yes, it's because of Pi Day, the 14th of March. As a starting point, you can have a look at Thesaurus:mathematics an there is still plenty domains to explore and to structure. Let's figure it out.

By the way, for those who do not know LexiSession yet, it is a collaborative transwiktionary experiment. You're invited to participate however you like and to suggest next month's topic. The idea is to look at other community improvements on the selected topic to improve our own pages. It already bring new collaborators to contribute for the first time on a suggested topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every Wiktionary on the same agenda, to give us more insight into the ways our colleagues works in the other projects.

By the way, it is the twentieth edition of LexiSession! Face-smile.svg Noé 10:56, 1 March 2018 (UTC)

I did some Spanish entries and translations for some math terms. --Otra cuenta105 (talk) 11:04, 1 March 2018 (UTC)
Time to crack open that Mongolian stats intro book. Crom daba (talk) 11:17, 1 March 2018 (UTC)
There are LOTS of mathematical terms listed in Requests for definitions in English entries. Maybe this is a good excuse for someone to have a crack at them! Kiwima (talk) 02:34, 8 March 2018 (UTC)

Unifying the display of romanisations in links and headwords: italicise romanisations by default[edit]

This follows from last month's topic Wiktionary:Beer parlour/2018/February#Inconsistent and confusing romanisation formats given by various templates and modules. I would like to propose that we make the display of romanisations in {{l}}, {{m}} and {{head}} consistent by italicising romanisations in an entry by default.

  • Rationale: consistency, clarity, and professionality.
  • Examples: Russian русский (russkij), Hindi युद्ध (yuddh), where italicised and unitalicised romanisations appear alternately in the entries.

Wyang (talk) 12:18, 1 March 2018 (UTC)

Symbol support vote.svg Support --Per utramque cavernam (talk) 12:29, 1 March 2018 (UTC)
Symbol support vote.svg Support. I really have to clean up that Hindi entry. —AryamanA (मुझसे बात करेंयोगदान) 13:42, 1 March 2018 (UTC)
Symbol support vote.svg Support. I've never understood why these display differently. ‑‑ Eiríkr Útlendi │Tala við mig 18:02, 1 March 2018 (UTC)
Symbol abstain vote.svg Abstain – I understand the reasoning behind the current format, but don't mind it being changed. — Eru·tuon 23:05, 1 March 2018 (UTC)
Symbol abstain vote.svg Abstain – Per ErutuonCrom daba (talk) 00:06, 2 March 2018 (UTC)
Symbol support vote.svg Caveated support: Inline with {{lang}} over on Wikipedia. The implementation of a transcription param would be prerequisite, because I wouldn't want to see 𐫁𐫏𐫇𐫡‎ (bywr /bēwar/) --Victar (talk) 03:29, 2 March 2018 (UTC)
Symbol support vote.svg Support, but keep romanizations in inflection tables that are on separate lines (as at e.g. алъдьи (alŭdĭi)) unitalicized. — Vorziblix (talk · contribs) 08:56, 4 March 2018 (UTC)

Speaking of transliterations, has anyone ever entertained the idea of switching them on and off at will (with a button or sumthin')? --Per utramque cavernam (talk) 22:23, 2 March 2018 (UTC)

@Per utramque cavernam:, you can, just use |tr=-. --Victar (talk) 22:26, 2 March 2018 (UTC)
@Victar: I'm thinking of a gadget that would let the user choose which scripts he wants to see transliterated, and which ones he doesn't.
While I personally need translits for Devanagari (everywhere), I don't need them for Greek (anywhere); but others will be in the opposite situation.
Thus, it would be convenient to be able to hide them on the entire website, without actually hardcoding |tr=-. --Per utramque cavernam (talk) 22:44, 2 March 2018 (UTC)
@Per utramque cavernam: That can be easily done with some custom JS in your preferences. Or for a quick fix, just add something like span[lang="el-Latn"] { display: none } to your custom CSS file. --Victar (talk) 23:12, 2 March 2018 (UTC)
@Victar That leaves empty parenthesis and commas everywhere, nice idea though. Crom daba (talk) 23:59, 2 March 2018 (UTC)
@Crom daba, that's why is said it should be done in JS. I thought you knew programming. --Victar (talk) 00:08, 3 March 2018 (UTC)
You can use the below. It's going to remove |pos= too, which could be fixed, that's all from me.
var element = document.querySelectorAll('[lang="el-Latn"]');

[].forEach.call(element, function(element) {
        var parent = element.parentElement.innerHTML.replace(/ *\([^)]*\) */g, "");
        element.parentElement.innerHTML = parent;
--Victar (talk) 02:38, 3 March 2018 (UTC)
That's cool, although maybe we should rework our modules so that css can do it. Crom daba (talk) 15:19, 3 March 2018 (UTC)
@Crom daba, no better time to learn some actual coding yourself. --Victar (talk) 17:56, 3 March 2018 (UTC)
@Victar I have prior Javascript experience (although I was never fluent, and I wouldn't be able to write the above code without some heavy SO consultation), I just figured it would be more elegant to solve it with .css Crom daba (talk) 18:31, 3 March 2018 (UTC)
Symbol oppose vote.svg Oppose Let's unify on non-italics. I don't see why the romanization should be in italics; the romanization is no more a mention than the term romanized. Better go for a simple typography. --Dan Polansky (talk) 21:53, 16 March 2018 (UTC)
@Dan, I'm confused -- what do mentions have to do with it? ‑‑ Eiríkr Útlendi │Tala við mig 23:36, 16 March 2018 (UTC)
In the referenced Beer parlour discussion, someone said "I believe the notion is that for scripts which we don't italicize in mentions (Russian, Greek, etc.), we italicize the romanization to show the distinction between the mention and non-mention formats."
That's the only argument in support of italics that I could find at the time of my post.
Now, below, someone said "Italicising romanisations of non-Latin-script foreign terms is a standard and internationally used practice in reference works and in academia": That is for romanizations in the middle of the sentence, right? Like below, we have an example: "The Arabic tāʾ marbūṭa is rendered a not ah." There, "tāʾ marbūṭa" is a mention. There, you might italicize a Czech term as well, where Czech uses roman letters by default.
In "Hindi युद्ध (yuddh)", I don't see a need to italicize "yuddh". I admit that the Jaschke Dictionary for Tibetan does italicize romanizations, but there, they are followed by English text, whereas in the uses of {{m}} and {{l}}, they are not followed by English text.
I looked in русский for visual inspection. The headword line in русский currently says "ру́сский • (rússkij)" without romanization, and it looks just fine; italics would not help it in any way.
What would make sense to me is italicizing romanizations in {{m}}, but not in {{head}}, {{l}} and {{t}}; this is because {{m}} can be used in the middle of the text and it italicize roman script in general, whereas {{l}} does not italicize roman script in general.
As for the rationale "consistency, clarity, and professionality" and as for legibility, romanizing in {{m}} but not in {{head}}, {{l}} and {{t}} would be consistent with what we do for roman script; as for clarity, I do not see how italics is more clear; as for professionality, I might admit that italics could be more in keeping with what other publications do, but doing something different when well justified is not necessary unprofessional in any bad sense; as for legibility, there is no doubt in my mind that italics is less legible, especially on computer screens. --Dan Polansky (talk) 07:50, 17 March 2018 (UTC)
On a procedural note, I created Wiktionary:Votes/2018-03/Showing romanizations in italics by default to ensure maximum audience. --Dan Polansky (talk) 08:05, 17 March 2018 (UTC)
Symbol oppose vote.svg Oppose: I'm changing my vote to Dan's side. I don't think italics brings anything but less legibility. --Victar (talk) 22:40, 16 March 2018 (UTC)
@Victar: Quite the contrary, actually.
Italicising romanisations of non-Latin-script foreign terms is a standard and internationally used practice in reference works and in academia. Take a look at Wehr and Steingass for Arabic, Oxford Dictionary for Hindi, Steingass Dictionary for Persian, Monier-Williams Sanskrit Dictionary, Jaschke Dictionary for Tibetan, The Chicago Manual of Style, The Oxford Style Manual, ... and even Wikipedia (Iran).
The International Journal of Middle East Studies guidelines become like this without italics:
The Arabic tāʾ marbūṭa is rendered a not ah. In Persian it is ih. In Arabic iḍāfa constructions, it is rendered at: for example, thawrat 14 tammūz. The Persian izafat is rendered -i: for example, vilāyat-i faqīh. []
and this with italics:
The Arabic tāʾ marbūṭa is rendered a not ah. In Persian it is ih. In Arabic iḍāfa constructions, it is rendered at: for example, thawrat 14 tammūz. The Persian izafat is rendered -i: for example, vilāyat-i faqīh. []
The basic meaning of italics is that “this Latin-script word is not English”, and legibility is precisely its advantage. Wyang (talk) 00:17, 17 March 2018 (UTC)
@Wyang:, I don't think that's a comparable usage. In those examples, the foreign terms are only distinguished by being in italic. We however are enclosing them already in parentheses. And yes, I do think italics are less legible than normal text, in that it's harder to read, especially with diacritics and special characters. --Victar (talk) 01:00, 17 March 2018 (UTC)
@Victar: Parentheses, quotation marks or not, the academic practice is that romanisations are italicised by default in running text, whenever they assume an auxiliary function to the script forms. We can find plenty of people writing 여름 yelum, 여름 (yelum), or yelum 여름, but hardly anyone writing 여름 (yelum) in proper works. Also see Citation Guidelines for Chinese-language Materials, § 2.3 In Parentheses. Such practice makes it easier for readers to parse the text and identify romanisations - and simply ignore them if the readers already know the script or language. 서울 (Seoul, Seoul) is much easier to parse than 서울 (Seoul, Seoul). To me, the current links on Reconstruction:Proto-Iranian/θanǰáyati are impossible to read. It's hard to make out which is which, and eyes become strained after reading a few lines, while the the italic version makes the romanisations stand out aesthetically.
There isn't really a reduced legibility of italic Latin text. We routinely italicise natively Latin-script terms in {{m}}, and some Latin-script languages are no less diacritics-heavy. For example, Vietnamese:
ủ trái cây bằng đất đèn cho mau chín. Thường mấy trái cây non, bọn buôn nó hay giú khí đá cho mau chín, nên ăn mấy trái cây ấy có ra gì đâu.
There is not really a legibility difference between the non-italic and italic sentences above. Our readers seem to read these italic letters with diacritics just fine too. Wyang (talk) 02:41, 17 March 2018 (UTC)
@Wyang, all that you just expressed are preferences and opinions. I simply disagree with them. --Victar (talk) 02:45, 17 March 2018 (UTC)
@Victar It's not preferences and opinions. It's what typically happens in academia and lexicography and the rationales behind it. You can surely disagree, but it's just unfortunate that votes can happen with complete disregard for what other way more established dictionaries adopt as standard practice, dismissed as professional preferences. Wyang (talk) 02:55, 17 March 2018 (UTC)
@Wyang, again, that's simply not true; you're citing a standard for foreign terms within running English text. We don't have a need for it as we already use parentheses, so you're argument is of stylistic preference, not out of functionally. --Victar (talk) 03:07, 17 March 2018 (UTC)
@Victar Please have a closer look at the dictionary links I have given above. You are missing the point: this isn't a discussion on whether there is a functional need to italicise the romanisations. The point is: is there a stylistic need to do so? The answer is yes, on the grounds that:
This is the standard practice in academia and lexicography. Note that the “standard practice” is not that dictionaries italicise romanisations following headwords; the practice is:

Parentheses, quotation marks or not, romanisations are italicised by default in text, whenever they assume an auxiliary function to the script forms.

When we flip through Wehr, Steingass, Oxford, Monier-Williams, Jaschke, etc., they are full of examples of romanisations after headwords, in parentheses, in quotation marks, on their own, whatever, but all romanisations are italic, regardless of the environment the romanisations are found in after the headwords. On a quick glance, there were 10 romanisations in parentheses on the Wehr page I gave before alone. This simultaneous consistency of formatting romanisations as italic in reference works is what we have failed to appreciate so far in our infrastructure. And, this standard practice in reference works is supported by good rationales which are very relevant to us too. Wyang (talk) 03:50, 17 March 2018 (UTC)
@Wyang, don't be patronizing. I doesn't help your argument. I saw your links and understand the issue. I disagree and my points are above. I don't want to get into it with you any further. --Victar (talk) 04:10, 17 March 2018 (UTC)
@Victar: Sorry if you found it patronizing. The discussion is directed towards the arguments, and none towards the people. Your points were: (1) that reference works adopt italicity out of functional necessity and that such necessity is nonexistent when romanisations are enclosed in parentheses, which is incorrect as the italicity was a universal stylistic preference in lexicography as shown above, independent of text environment; (2) that italicity brings about a reduced legibility, also unsupported by our routine italicisation of Latin-script terms, many of which are no less diacritics-heavy. This is not a personal preference vote; it is a site-wide style format change which must be carefully deliberated on, and I unfortunately did not find the arguments above sufficiently reasoning-robust to balance the overwhelming evidence of a standard practice of italicisation in reference works. Wyang (talk) 04:56, 17 March 2018 (UTC)
Symbol support vote.svg Support --Anatoli T. (обсудить/вклад) 02:57, 17 March 2018 (UTC)

Passed and discussion closed. This and the February discussion have been around for sufficiently long- and as Korn said in the Japanese discussion below, those with interest have already expressed their opinions. No substantial opposing argumentation was put forth, compared to what we have as the literature and lexicographic evidence for unifying it as italicised. This affects and interests some people much more than others who may barely view and manage non-Latin-script entries. This isn't a popular polling station or a place for musings, but rather a “think tank” where arguments for and against should be proposed and evaluated in the presence of each other. Wyang (talk) 09:08, 17 March 2018 (UTC)

FWIW I support this. I think efforts (by the same one stick in the mud as usual) to insist on further bureaucracy can be dismissed. - -sche (discuss) 15:32, 17 March 2018 (UTC)
This has not been brought to a wide audience, the reasoning presented intially was nearly absent ("consistency, clarity, and professionality") and free of substantiation. This is why Wiktionary:Votes/2018-03/Showing romanizations in italics by default is the proper venue. The only way the vote can fail to pass is if there actually is not a consensus. --Dan Polansky (talk) 15:36, 17 March 2018 (UTC)
And on votes somtimes being evil and such, let the reader read the top of this Beer parlour "discussion". There is no discussion; there are blank and nearly blank votes on the supporting sites. A real discussion started only when I posted my oppose, and Victar changed his vote. --Dan Polansky (talk) 15:41, 17 March 2018 (UTC)
At no surprise, I agree with Dan. I can't speak for each personally, but the support votes look more like "whatever" votes. I was also "whatever", but I thought more into it and ended up disagreeing. I think with a vote, people will put more thought into both sides of the argument, especially since only one side was initially given. Also, @-sche, no need for personal attacks --Victar (talk) 16:03, 17 March 2018 (UTC)
As for: "arguments for and against should be proposed and evaluated in the presence of each other": Yes, let's. If I evaluate the arguments, I find in favor of my arguments. If I did not, I would change my vote above, right? Now what? That does not work. There is no mechanism of evaluation of arguments. The best we have is our venerable votes-cum-discussions. --Dan Polansky (talk) 16:22, 17 March 2018 (UTC)

I think the changes made towards italicized romanizations should be reverted immediately pending the completion of the vote. I find it completely inappropriate that @Wyang moved forward with this change. @Dan Polansky --Victar (talk) 03:59, 18 March 2018 (UTC)

I agree. This is a longstanding format that most of us have become very used to and should not be changed without a vote. --WikiTiki89 15:01, 19 March 2018 (UTC)
Not just gotten used to, but it nullifies important distinctions in Hittite and merges {{l}} and {{m}} for many languages. You were already desynoped for making changes without consensus and wheel waring. Please undo this change immediately before it escalates any further. --Victar (talk) 15:36, 19 March 2018 (UTC)
Now that I've found where the change was made, I've reverted it. --WikiTiki89 15:45, 19 March 2018 (UTC)
Thanks, @Wikitiki89. --Victar (talk) 15:52, 19 March 2018 (UTC)

Where tf were you before, when the discussions were ongoing? Bye-bye. Wyang (talk) 19:15, 19 March 2018 (UTC)

Middle Assamese[edit]

I have added a code inc-mas for Middle Assamese per Talk:ভাল. If anyone has any objections, please put them here. There are lots of cites on Google Books. —AryamanA (मुझसे बात करेंयोगदान) 18:40, 1 March 2018 (UTC)

@AryamanA, I think the usual format for that would be inc-asm (Assamese Middle), cf. frm (French Middle), goh (German Old High). --Victar (talk) 03:21, 2 March 2018 (UTC)
Pinging @-sche. --Victar (talk) 03:31, 2 March 2018 (UTC)
@Victar: But what about inc-ohi (Old Hindi), inc-ogu (Old Gujarati)? —AryamanA (मुझसे बात करेंयोगदान) 04:10, 2 March 2018 (UTC)
roa-opt (Old Portuguese) —AryamanA (मुझसे बात करेंयोगदान) 04:11, 2 March 2018 (UTC)
@AryamanA: Right, all not ISO codes, but yes, it seems wiki sub-codes are reversed, so inc-mas is appropriate. +1 --Victar (talk) 04:26, 2 March 2018 (UTC)
Yes, in my experience when we make our own codes, we tend to have the code approximate the name with the words in the same order (so, "inc-mas"); the ISO's "backwards" order may be a product of internally preferring names like "German, Old High" for sorting reasons and/or preferring codes that sort "nearby" ("fr", "frm"), or just a result of their not always approximating language names as well as they could (they couldn't use "mfr" for "Middle French" because they already use it for "Marrithiyel", despite that word not having an "f" in it). - -sche (discuss) 04:56, 2 March 2018 (UTC)
ISO codes are often quite inconsistent, e.g. owl for Old Welsh but wlm for Welsh, Middle; neither of which uses the native name Cymraeg the way cy for Modern Welsh does. —Mahāgaja (formerly Angr) · talk 15:41, 3 March 2018 (UTC)

Proposed change to Japanese entry format - using kana as the main entry form[edit]

Continuing the discussion from last month at Wiktionary:Beer_parlour/2018/February#Related:_Status_of_hiragana_entries.

In the process of cleaning up after some anon edits, I reworked the hiragana entry at うまい (umai) to show an example of what it might look like if we were to use the kana entries to store the main content, rather than the current practice of using kana entries only as soft redirects to the kanji spellings. The うまい (umai) entry is a bit of a simpler example, as this term only has one etymology. I think it can still help to illustrate how we might lay things out, and how we might show how different kanji spellings are applied to different senses of the same lemma.

For those interested in Japanese entries here, please read the linked thread above, have a look at the うまい (umai) entry, compare to 上手い (umai) and 旨い (umai) as (currently less comprehensive) examples of the conventional kanji-focused format, and discuss here as appropriate.

TIA, ‑‑ Eiríkr Útlendi │Tala við mig 21:21, 1 March 2018 (UTC)

Symbol support vote.svg Support. I think this is a step long overdue, but it also means there will be a lot of work... Suggestions: the kanji forms on def lines need to be made more conspicuous, cf. the 【】 notation in JA dictionaries. Maybe an additional template is indicated. Also I think the kanji forms can take even less information, provided we incorporate content into the kana entries, including the conj table, which can be extended to display multiple kanji forms in one cell. Wyang (talk) 22:01, 1 March 2018 (UTC)
I should have included this earlier -- @Wyang, please have a look at 巧い (umai) as an example of a kanji spelling entry as a soft-redirect to the fuller kana entry.
I agree that some formatting, and probably different (new?) templates, may well be called for. ‑‑ Eiríkr Útlendi │Tala við mig 00:12, 2 March 2018 (UTC)
I'd like to see more complex examples, and what sort of templates could improve them. As Wyang says, this is going to be a very big job — Chinese unification was also a big job, so I know it can be done, but more planning is necessary first. —Μετάknowledgediscuss/deeds 02:59, 2 March 2018 (UTC)

My thoughts (mostly related to practical usability):

  1. ja.wt has definitions on the kanji entry for Sinoxenic words; for example: ja:意味 (imi). Is this something we should consider doing?
    1. Do users want to manually go to 辞典 from じてん?
    2. See also ja:いみ (imi), which has definitions for native words and redirects to Sinoxenic words.
  2. Paper dictionaries use kana as the main entry because they are sorted alphabetically and have multiple words on one page.
    • Other online dictionaries don't have these problems because they are more database-like.
  3. How do we indicate the rarity of a kanji spelling?

Personally I think that the most common spelling should be used, solely because of usability (which admittedly can be unsightly), but kana entries make a lot of sense for native words. —suzukaze (tc) 03:48, 2 March 2018 (UTC)

I thought this proposal only affects native Japanese words, and that Sinoxenic words would be kept at their kanji spellings (?) Wyang (talk) 00:31, 4 March 2018 (UTC)
As initially conceived, I hadn't fully considered Sinoxenic terms. In light of the discussion above, I agree with the coalescing consensus that Japanese Sinoxenic terms (those deriving originally from Chinese borrowings) should use the kanji spellings for the lemmata, with the kana spellings serving as soft redirects -- much as the current status quo. Meanwhile, native Japanese terms and fully-nativized borrowings (such as たばこ (tabako), which is old enough that it has multiple broadly accepted kanji spellings) would have lemmata content moved to the kana spellings, with the kanji spellings serving as soft redirects -- the opposite of the current status quo. ‑‑ Eiríkr Útlendi │Tala við mig 09:08, 4 March 2018 (UTC)
What about words like  () (きょう)故郷 (ふるさと) (kokyō furusato)? —suzukaze (tc) 01:43, 5 March 2018 (UTC)
You bring up a good point, that Japanese is sometimes variable enough in the kanji spellings, but consistent in the kana, that it might make sense to use kana for lemma entries even for Sinoxenic terms.
As a counterargument, Daijirin lists at least 26 different kanji spellings for the kana sequence かんせい (kansei). If we were to use かんせい as the lemma, the entry would be quite horrifically huge. This is not true for every Sinoxenic reading, but it's common enough that we need to consider the ramifications. ‑‑ Eiríkr Útlendi │Tala við mig 20:48, 13 March 2018 (UTC)
Well, the entry looks useful, but the information beneath the definitions needs more collapsing. After all, definitions are the primary function of this site, everything else is secondary data, so ease-of-access to the defs should be our primary concern. Korn [kʰũːɘ̃n] (talk) 09:16, 4 March 2018 (UTC)
@Korn -- do you mean うまい (umai)? There was a thread somewhere about making usexes auto-collapsing, similar to the current behavior of quotes. I think it was Wiktionary:Beer_parlour/2018/March#Hiding_usexes. ‑‑ Eiríkr Útlendi │Tala við mig 18:55, 14 March 2018 (UTC)
Symbol support vote.svg Support for now. I think we'll be able to iron out problems. —suzukaze (tc) 01:43, 5 March 2018 (UTC)
Neither way is perfect, in my opinion.
  1. A kana or mixed spelling may be more common for Sino-Japanese spellings as well, especially for complex or rare characters.
  2. If we choose kana forms for lemmas, then it would make sense to do this for Sino-Japanese terms as well. By Sino-Japanese, I mean all terms using on'yomi readings, not necessarily just terms borrowed from any form of Chinese.
  3. I prefer the status quo but to care about duplication of contents. Perhaps native verbs and adjectives should only have inflections in the kana entries. --Anatoli T. (обсудить/вклад) 02:36, 5 March 2018 (UTC)
As another example, I recently reworked the あばく (abaku) entry. This spelling has three different etymologies by current research, all of which seem to be at least loosely related. The terms have three different kanji spellings, and one etymology for which no kanji spelling is (yet?) attested. ‑‑ Eiríkr Útlendi │Tala við mig 20:48, 13 March 2018 (UTC)
I hope this discussion doesn't die down. I think this layout is logical, and will prove to be much superior in the long run. The kanas were designed specifically for this reason (i.e. to record wago more accurately), and this makes the hiragana forms the most suitable lemma forms, out of all the possible variant forms of a word. Wyang (talk) 10:39, 14 March 2018 (UTC)
If the discussion dies down, it likely means everything was said by everyone who cares. Then it's time to be bold and just do the stuff that was agreed on. Korn [kʰũːɘ̃n] (talk) 10:59, 14 March 2018 (UTC)

Middle Japanese again[edit]

We still have 6 Middle Japanese entries without any code for that language- can we decide what should be done? DTLHS (talk) 02:43, 2 March 2018 (UTC)

Forgive me, I can't remember how to find those. Are they in a category? ‑‑ Eiríkr Útlendi │Tala við mig 01:02, 3 March 2018 (UTC)
Sorry, here they are: かめ, かへる, かへす, かはる, かはす, かふ DTLHS (talk) 01:07, 3 March 2018 (UTC)

Book Pahlavi in Unicode[edit]

Man, am I the only one that's pissed that Book Pahlavi hasn't been added to Unicode yet? Why the heck hasn't this proposal gone forward?! --Victar (talk) 03:15, 2 March 2018 (UTC)

It seems like Unicode blogged about working on it two days ago: [1]suzukaze (tc) 03:17, 2 March 2018 (UTC)
Woot! Thanks for sharing, @Suzukaze-c. I hope the move quickly on it. --Victar (talk) 03:24, 2 March 2018 (UTC)
lol, that's great! I am annoyed at having Latin script Middle Persian Entries too. —AryamanA (मुझसे बात करेंयोगदान) 04:14, 2 March 2018 (UTC)
More frustrating for me is that we already have Manichean Unicode but no good fonts to support it. Crom daba (talk) 10:39, 2 March 2018 (UTC)
I'm still waiting for Tocharian. —Mahāgaja (formerly Angr) · talk 11:24, 2 March 2018 (UTC)
@Crom daba:, I've been using this one, which I ripped from a Unicode proposal PDF. --Victar (talk) 16:08, 2 March 2018 (UTC)
Wow, thanks! I've been periodically checking this page, but they don't list whatever font this is. Crom daba (talk) 16:15, 2 March 2018 (UTC)
@Crom daba: Yeah, it's not publicly available yet. I ripped it from https://unicode.org/charts/PDF/U10AC0.pdf. FYI: @AryamanA, Vahagn Petrosyan --Victar (talk) 16:39, 2 March 2018 (UTC)


Why is this under English lemmas? ---> Tooironic (talk) 15:55, 2 March 2018 (UTC)

Fixed. Equinox 16:00, 2 March 2018 (UTC)

Middle Persian language codes[edit]

Now that we have Unicode Manichaean, and are soon getting Unicode Book Pahlavi, I think it imperative that we rehash the conversation on whether the two should still be split into separate languages codes, one for Pahlavi pal, and the other for Manichean xmn. My arguments for unifying them under one code are as follows:

  1. Book Pahlavi and Manichean are scripts (and a religion), not languages.
  2. The general pronunciation of Pahlavi and Manichaean is mostly identical, far less distinct that Old Avestan and Younger Avestan or Vedic Sanskrit and Classical Sanskrit.
  3. Unnessary category division, i.e. Category:Ancient_Greek_terms_borrowed_from_Manichaean_Middle_Persian.

Pinging @AryamanA, माधवपंडित, Vahagn Petrosyan, -sche, ZxxZxxZ. --Victar (talk) 17:52, 3 March 2018 (UTC)

Symbol support vote.svg Support Crom daba (talk) 18:00, 3 March 2018 (UTC)
Symbol support vote.svg Support, but we should tag the variety inside the page using {{lb}} or something in the headword line. --Vahag (talk) 20:02, 3 March 2018 (UTC)
Agreed. --Victar (talk) 20:11, 3 March 2018 (UTC)
Symbol support vote.svg Support -- माधवपंडित (talk) 03:07, 4 March 2018 (UTC)
Symbol support vote.svg Support, but as Vahag said. —AryamanA (मुझसे बात करेंयोगदान) 04:10, 4 March 2018 (UTC)

@-sche, do you have any thoughts before moving forward on this? --Victar (talk) 14:52, 14 March 2018 (UTC)

Symbol support vote.svg Support; based on previous discussion it does seem like we're dealing with dialects and not separate languages, especially because it sounds like the two varieties have differences within themselves (temporally, regionally or otherwise), not just between each other. ISO/SIL also split some other languages by script, e.g. Luwian. In that case, we just picked one of the codes to use for both script varieties; we could do that here; it would have the advantage that we'd be using a shorter code and a recognized (ISO) one, but the disadvantage that it might be confusing for people to see content from the second lect under the first lect's code. - -sche (discuss) 18:00, 14 March 2018 (UTC)
Symbol support vote.svg Support*i̯óh₁n̥C[5] 19:28, 14 March 2018 (UTC)
@-sche, why not run a bot to replace {{(.+)|xmn|(.*)}} and lang=xmn with pal? --Victar (talk) 21:33, 15 March 2018 (UTC)
That (or, the general concept of replacing the language code "xmn" with "pal", and changing the L2 headers at the same time) would work. In fact, it looks like we're dealing with so few entries that it would be feasible for me to do it with AutoWikiBrowser. Unless anyone has objections, I should have time to do that later. - -sche (discuss) 21:52, 15 March 2018 (UTC)
Does anything more complex need to be done to maintain the functionality of Module:Mani-translit beyond adding it to the data for "pal"? - -sche (discuss) 22:06, 15 March 2018 (UTC)
@-sche, nope, same functionality. If you're going to run some more bot conversions, adding |sc=Mani to those xmn entries would be awesome. --Victar (talk) 01:40, 16 March 2018 (UTC)

A fair number of entries have module errors because they still use xmn. — Eru·tuon 19:23, 17 March 2018 (UTC)

I've fixed all the ones I could, but take a look at Reconstruction:Proto-Indo-European/n̥- and Reconstruction:Proto-Indo-European/speḱ-. - -sche (discuss) 22:58, 17 March 2018 (UTC)

Spelling pronunciation[edit]

I've created a simple etymology template for marking (historical) spelling pronunciations. Is this okay with everyone? Crom daba (talk) 20:09, 3 March 2018 (UTC)

P.S. Once again I can't remember where I'm supposed to put category information other than Module:category tree/poscatboiler/data/terms by etymology to make {{autocat}} work.

Wiktionary:Beer_parlour/2014/January#spelling_pronunciations --Per utramque cavernam (talk) 20:11, 3 March 2018 (UTC)
I suspected there was a hidden Chesterton fence here, but I figured this was the best way to find it.
This is basically my use case: хязгаар#Mongolian, is this valid or not? Crom daba (talk) 20:36, 3 March 2018 (UTC)
Is it not a pronunciation spelling? From what I (think I) understand, the letter г was added to reflect the pronunciation more accurately. A spelling pronunciation is the reverse process: altering the pronunciation and matching it to the spelling (pronouncing salmon /ˈsælmən/, for example). --Per utramque cavernam (talk) 21:30, 3 March 2018 (UTC)
Yes, a spelling pronunciation is a pronunciation that's been altered because of how the word is spelled, like /ˈsælmən/ for salmon and a whole lot other examples. A pronunciation spelling, on the other hand, is a spelling that's been altered because of how the word is pronounced, like Enya for Eithne or (presumably) show for shew. —Mahāgaja (formerly Angr) · talk 21:59, 3 March 2018 (UTC)
I guess I should try to write more clearly. Classical Mongolian (g) stands for two different Proto-Mongolic phonemes (it's generally full of homography), and г (g) was added (to the [Khalkha] pronunciation, which is more faithfully reflected in Cyrillic orthography) as a misreading of (g) as *g instead of *x (actually as a mixture of both). Crom daba (talk) 12:20, 4 March 2018 (UTC)
I'm still not sure what's going on. Is the word pronounced the way it is etymologically expected to be pronounced, but "misspelled" (from an etymological point of view)? Or is it spelled the way it's etymologially expected to be spelled, but "mispronounced" (from an etymological point of view)? Or both, or neither? —Mahāgaja (formerly Angr) · talk 15:24, 4 March 2018 (UTC)
It is mispronounced. ᠬᠢᠵᠠᠭᠠᠷ (qiǰaɣar) renders Proto-Mongolic *kïjaxar, which regularly goes to *kïjaar -> *xïjaar -> xyajaar -> hyadzaar and then (or in some intermediate steps) g was inserted because the spelling is ambiguous between *kïjaxar and **kïjagar.
When I asked whether this use was valid, I meant whether everyone is fine with there being a template that would do the thing I did here (link to appendix + categorize), not whether this is an instance of a spelling pronunciation (I already know it is). Crom daba (talk) 00:00, 5 March 2018 (UTC)



Hi! From my own counting, it seems that unquestioningness‎ is the 5,500,000th pages to be created here. Congratulations Pamputt (talk) 09:26, 4 March 2018 (UTC)

For what it's worth as confirmation, I arrive at the same conclusion. (I counted back through the recent changes list of new entries as of when there were 5,500,059 entries per Special:Statistics, and again when there were 5,500,062 (double checking), to find the 5,500,000th new entry.) Congratulations to Equinox! - -sche (discuss) 10:00, 4 March 2018 (UTC)
Wiktionary:Milestones has been updated accordingly. SemperBlotto (talk) 10:09, 5 March 2018 (UTC)

Use of † in taxonomic entries[edit]

In biology, is sometimes placed before a taxonomic name to indicate that it is extinct. Some of our Translingual taxonomic entries, although only a small portion, use this notation, e.g. at Smilodon in the headword line as well as elsewhere in the entry. Whether or not a taxon is extinct is not lexical information; indeed, any species could go extinct without its definition, etymology, gender, hyponyms, or other lexical metadata changing. Extinction status is purely encyclopaedic, and belongs at Wikipedia. It is also unexplained in entries, which may confuse readers. As a result, it would probably be best to remove it from our entries. —Μετάknowledgediscuss/deeds 20:33, 4 March 2018 (UTC)

I'm not opposed to noting that a taxon is extinct, maybe in the definition ("an extinct species of..."), or it arguably is the sort of semi-lexical information we record in other cases by using {{lb|en|historical}}. (It does seem like a lot of effort to maintain.) But given that the cross/dagger also sometimes means the word or sense is obsolete — I think Chinese entries use it this way — and given that we have the space to spell things out, it's probably best to avoid using just the symbol. - -sche (discuss) 20:54, 4 March 2018 (UTC)
I don't work with these entries, but I agree with Metaknowledge. --Per utramque cavernam (talk) 14:03, 6 March 2018 (UTC)
I don't understand how the word encyclopedic applies to the dagger. Even less can I understand how the fact of something being extinct or extant or endangered isn't information of considerable interest to someone looking at a taxonomic entry, as much as whether something is "endangered" or "red" or "large", or a bird. Have we really lost touch with normal dictionary users to that extent? As to the changeability of the status of something sub specie aeternitatis, the same applies to the very words we use to gloss other words.
As to it not being explained, the same applies to m, f, n, and plenty of terms which use in labels, category names, etc, and even in definiens. We could link to [[]] or to WT:GLOSS, approaches we have taken with a few of these others. DCDuring (talk) 15:25, 6 March 2018 (UTC)
@DCDuring: It looks like you've "lost touch with normal dictionary users" if you think that they will know what the daggers mean and not be confused by their only occasional usage here. Gender is explained, if you merely hover your mouse over the letter. I agree with -sche that it is often appropriate to use "extinct" in the definition line, although not always (it probably isn't particularly useful for dinosaurs, for example). We can make those kinds of decisions without reliance on daggers. —Μετάknowledgediscuss/deeds 17:56, 6 March 2018 (UTC)
I am glad you support the use of hover notices. That might be a desirable alternative to a link, though a link to a sense id gloss can include more information, eg context and is more accessible for us technically challenged contributors.
I was actually surprised that you seemed to object to semantic content in your original complaint.
The dagger is just the kind of orthographic element that we have lavishly honored in showing ligatures and obsolete English characters in entries, especially in alternative forms and in citations. DCDuring (talk) 20:05, 6 March 2018 (UTC)
@DCDuring: Parts of your response are unintelligible to me, particularly the final sentence. I see no relationship between documenting all words that meet our criteria regardless of what characters they use and what we choose to put in our entries. Can you make a clear statement about what you want to do with this issue? To be abundantly clear, I want to remove all daggers and ensure that the word "extinct" is on definition lines where it is deemed useful. —Μετάknowledgediscuss/deeds 20:27, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose DCDuring (talk) 19:44, 7 March 2018 (UTC)
  • I hereby propose that someone, not me, create a means of linking a to the entry appropriate English definition at [[]] and that the daggers be permitted on the inflection line of taxonomic names and wherever else the taxonomic name of an extinct species may appear. DCDuring (talk) 19:44, 7 March 2018 (UTC)
The dagger is the ordinary means of indicating in text that a species is extinct (as an aside, Wikipedia makes heavy use of it as well), and I see no reason to prohibit it. However, I'm also not opposed to replacing it with the word "extinct." Andrew Sheedy (talk) 11:18, 12 March 2018 (UTC)
I intend to use the dagger as a means of locating entries that would benefit from the addition of {{R:Fossilworks}}. DCDuring (talk) 11:53, 12 March 2018 (UTC)

Hiding usexes[edit]

I assume it is OK to hide longer usexes in the same manner as quotations. The only thing that worries me is the click-on "quotations" heading is slightly misleading when it's a usex. I did an example at rekke. DonnanZ (talk) 11:42, 5 March 2018 (UTC)

I think we should change it to examples ▼, and hide all 'usage examples' and 'quotations' by default. Wyang (talk) 11:44, 5 March 2018 (UTC)
That would be better, or perhaps "usage examples and quotations" (perhaps a little long). As long as {{ux}} / {{usex}} or {{quote}} are used where appropriate. DonnanZ (talk) 13:50, 5 March 2018 (UTC)
Well, they're not, so this discussion seems a little pointless. DTLHS (talk) 15:53, 5 March 2018 (UTC)
It's not necessarily pointless if users are made aware that entries can be updated. DonnanZ (talk) 16:23, 5 March 2018 (UTC)
  • I don’t like the idea very much (after all, if a usex is so long that it’s a good idea to hide it, it’s almost always a better idea to use a shorter one instead). But if you do, please keep using {{ux}} so that parsers can have a chance at knowing it is not a quotation. — Ungoliant (falai) 16:33, 5 March 2018 (UTC)
In the example I gave above it was a sentence from Wikipedia which is I believe not allowable as a quote, so I treated it as a usex. DonnanZ (talk) 17:20, 5 March 2018 (UTC)
I think sentences from Wikipedia should be treated as quotes. They don't count toward attestation for CFI purposes because they're not durably archived, but they're still quotes and ought to be properly attributed. —Mahāgaja (formerly Angr) · talk 20:16, 5 March 2018 (UTC)
Oh right. So if a word is attestable for CFI, e.g. with dictionary references included, there's no problem with quotes from Wiktionary? DonnanZ (talk) 21:05, 5 March 2018 (UTC)
God no. Please don't start adding quotes from ourselves to pages- what an awful idea. DTLHS (talk) 21:18, 5 March 2018 (UTC)
We invent usexes so we can just shorten them as needed. Equinox 19:38, 5 March 2018 (UTC)
Y-yes, in this case it made sense to include the whole sentence. DonnanZ (talk) 20:18, 5 March 2018 (UTC)
Personally I like having more space between definitions- it makes the page easier to read for me. DTLHS (talk) 20:25, 5 March 2018 (UTC)

Category:Quotation templates to be cleaned[edit]

There's over 8,000 of them, but what exactly is needed: {{quote}} templates, or is it something else that needs attention? DonnanZ (talk) 13:42, 5 March 2018 (UTC)

They might not need to be cleaned at all, the only problem with them is that they are using a generic {{quote-text}} instead of a specific variant ({{quote-book}} usually). If we are OK with some quotes being generic, you only need to clean the category out of the template. - TheDaveRoss 20:07, 5 March 2018 (UTC)
@TheDaveRoss: Thanks. That was indeed the case in the entry I was looking at. That's one off the list, only 8,621 to go. DonnanZ (talk) 20:44, 5 March 2018 (UTC)
@TheDaveRoss There would appear to be a problem with these, getting them to register in the category for quotes. I tried adding "en" to the quote at glowing, but that isn't the solution, there must be something else that should be done, a rewrite? DonnanZ (talk) 14:25, 9 March 2018 (UTC)
@Donnanz What category for quotes? - TheDaveRoss 15:22, 9 March 2018 (UTC)
@TheDaveRoss: Category:English terms with quotations, where all of these should go. DonnanZ (talk) 15:31, 9 March 2018 (UTC)
@Donnanz: That is from {{quote}}, not the {{quote-book}} family. If you want all of those quotes to go into that category you will have to add the category to {{quote-meta}} or each of the family of templates. - TheDaveRoss 15:46, 9 March 2018 (UTC)
@TheDaveRoss: I'm not allowed to edit that template, even if I could make head or tail of it. DonnanZ (talk) 16:02, 9 March 2018 (UTC)

Moving snowclones back to the mainspace[edit]

I think that snowclones (i.e., the ones listed in Appendix:English snowclones) should be included as dictionary entries in the main namespace, as long as they follow CFI's rules on attestation and idiomaticity. The rationale is that it is far less convenient for a dictionary reader to have to look in the appendix for this than it is to look in the mainspace. They are just like any other idioms and are just as lexical; it's just harder to fit general semantic variables like someone or something into them. I think things like X is the new Y or to X or not to X are perfectly decent entry-material. So, let's make it easier for the readers and move these exceptional idioms to the mainspace. (P.S.: I remember now that a while back I created ride the ... train. This was before I knew about the appendix page or about what a "snowclone" is. I just modeled the "..." after the already existing phrasebook entry I am ... year(s) old, which may need to be changed a bit too. That name looks a little funky.) PseudoSkull (talk) 04:28, 6 March 2018 (UTC)

Symbol support vote.svg Support. --Daniel Carrero (talk) 08:40, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose having them in the main space. We can create redirects to an appendix. --Per utramque cavernam (talk) 12:05, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose per Per utramque cavernam. DCDuring (talk) 12:56, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose Equinox 12:57, 6 March 2018 (UTC)
Symbol support vote.svg Support Crom daba (talk) 13:52, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose --WikiTiki89 16:18, 6 March 2018 (UTC)
Comment: @Per utramque cavernam, DCDuring, Equinox, Wikitiki89 If I may kindly ask, can any of you do me the favor to explain (further) why you oppose this idea? PseudoSkull (talk) 16:41, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose --Victar (talk) 23:03, 6 March 2018 (UTC)
Symbol oppose vote.svg Oppose. Some, but by no means all, are already in mainspace. These constructions are not necessarily phrases suitable for inclusion in the body of a dictionary. bd2412 T 03:00, 7 March 2018 (UTC)
Important note: I think I forgot to mention that I only propose to include idiomatic snowclones. Things like I know X better than you'll ever know X, which can be deduced from its parts, should not be included (if it's even attested in the first place). ride the X train, which cannot be deduced from its parts, should. PseudoSkull (talk) 04:49, 7 March 2018 (UTC)
See also Category:English snowclones. That category is important for our discussion.
I was thinking the same thing when I voted "support", though I failed to say it. Yes, only idiomatic ones. In my opinion, phrases like Appendix:Snowclones/X with a capital Y should be included in the mainspace, because you can't deduce the meaning from the sum of its parts.
The entry awesome with a capital A was deleted in 2011 (RFD: Talk:awesome with a capital A) because it's a snowclone.
But technically, the CFI doesn't currently offer a snowclone caveat, so it seems that in theory all attestable variations of "X with a capital Y" could be created as entries (jerk with a capital J, snowclone with a capital S, dictionary with a capital D...), they can't be deduced from their parts.
In my opinion, having a snowclone entry (X with a capital Y) as opposed to entries for all variations of a snowclone is better, because it covers all possibilities.
The entry name could be X with a capital Y or ... with a capital ..., but it's tempting to create a title like just with a capital for snowclones that have no variables in the middle, just at the extremities. Or to be even more minimalistic, just add a new sense at capital to explain the "X with a capital Y".
I gave a few ideas to be discussed, but my preference is for having a mainspace snowclone title like this: X with a capital Y.
Yes, some other snowclones are just common SOP phrases, like Appendix:Snowclones/X and Y and Z, oh my!. They don't merit mainspace entries whatsovever. --Daniel Carrero (talk) 08:50, 7 March 2018 (UTC)


I am trying to a get a vote going to amend CFI to expressly allow retronyms. I think retronyms are interesting to people who study language. Quite often they will have a current meaning which is transparently equal to 'sum of parts' and that is why I feel they are deserving of special protection. As an example, analogue clock merits inclusion in my view even though it is really just analogue + clock. See also: Category:English retronyms. John Cross (talk) 06:33, 6 March 2018 (UTC)

How interesting are retronyms to normal people who use dictionaries? —This unsigned comment was added by DCDuring (talkcontribs).
A large number of words we have will be uninteresting to most normal people. I am interested in creating entries that a small proportion of our users find interesting. John Cross (talk) 21:09, 6 March 2018 (UTC)
I feel as though this should at least be qualified, i.e. allow retronyms unless... What are the worst types of SoP etc. that this rule would permit? Equinox 13:00, 6 March 2018 (UTC)
This would permit the likes of paper book and mechanical mouse. I don't think these are useful or belong in a dictionary. —Μετάknowledgediscuss/deeds 21:28, 6 March 2018 (UTC)
And even biological mouse, perhaps even mammalian mouse. ←₰-→ Lingo Bingo Dingo (talk) 13:05, 16 March 2018 (UTC)
I also don't think this is a useful concept for us. Retronyms are just one example of general disambiguating techniques. Just like in England, what Americans call football is called American football, and who in England is called "the Queen", in America is called "the Queen of England". When these terms are idiomatic, we include them. When they are SOP, we don't. Retronyms are no different. --WikiTiki89 22:37, 6 March 2018 (UTC)
The only thing that gives me pause is that something that wasn't SoP originally might become SoP over time. I wish I could think of an example; I'm sure this came up on here before. But suppose that a new phrase Adj+N is coined, and gets in all the dictionaries, and then Adj comes to be used more generally, with other nouns, significantly later: it seems wrong to delete the original Adj+N when it was the predecessor. Equinox 22:41, 6 March 2018 (UTC)
We do have a rule that if a word was once idiomatic but is now SOP, it is to be included. --WikiTiki89 22:46, 6 March 2018 (UTC)
If as an example, it can be shown that hammerhead shark was used first and hammerhead came later as shortened form then our policies appear to allow both to be included even if hammerhead sharkhammerhead + shark. I want the same in reverse - sort of - if it can be shown that compass is the original term and magnetic compass comes second to distinguish from steady-state compasses then ... I would still want to be able to include magnetic compass even if magnetic compassmagnetic + compass. -- John Cross (talk) 06:18, 7 March 2018 (UTC) (edited John Cross (talk) 06:29, 7 March 2018 (UTC))
I disagree. All that would need to be done is to have enough definitions at compass that cover all historical usage. There's no need to have magnetic compass, unless it is shown to have its own specialized meaning. --WikiTiki89 15:31, 7 March 2018 (UTC)
I agree with Wikitiki. (And the test referred to above is WT:JIFFY, for anyone who didn't already know.) - -sche (discuss) 16:07, 7 March 2018 (UTC)

News from French Wiktionary[edit]

Logo Wiktionnaire-Actualités.svg


February issue of Wiktionary Actualités just came out in English!

A snowy issue of Actualités just fall on Wiktionary with not-so-chilled news and stats, surrounded by three articles: Wiktionarians allies, a dictionary that went through some trouble way know as well and a speaking orca! As usual, some changes in Wiktionary projects and advices of videos about languages and linguistics (including some in English!).

This issue was written by seven people and was translated for you by Pamputt. This translation may be improved by readers (wiki-spirit) like it was last month by Xbony2 (thanks a lot!). We still receive zero money for this publication and your comments are welcome. You can also receive a notice on your talk page if you want Face-smile.svg Noé 13:57, 6 March 2018 (UTC)

I like that Bahubali got a mention, haha. —AryamanA (मुझसे बात करेंयोगदान) 15:33, 6 March 2018 (UTC)

Romance words and Medieval Latin[edit]

I just thought about something. Can we truly say that a Romance word can be inherited from Medieval Latin (or Ecclesiastical)? So far I've been doing that at times. Like if a certain word is found mostly in Medieval or Ecclesiastical Latin, but underwent all the normal changes into the descendant language, and it is a common, popular word. Or if the meaning matches the Medieval Latin sense of a word more than the Classical (like coxa ("thigh" in Medieval and Romance langauges but "hip" in Classical). By all indicators these should be inherited terms.

But the problem is, the way I see Medieval Latin defined on Wikipedia for example, was as this kind of artificially preserved language that was no longer popularly spoken, but used in things like administration, writing, church, etc. By the time the Middle Ages came along, the Romance languages/vernaculars had already begun diverging from the spoken Vulgar Latin. So does it really make sense to say they're inherited from this register or form of Latin (the way the Wiktionary templates work now allows inherited to be used on Romance terms with any form of Latin, including New Latin even!)? How are we going to define "Medieval Latin" for Wiktionary's purposes? Is it possible that what we're really looking at for those terms is rather (inherited) descent from a parallel Late or Vulgar Latin term that was more or less the same in form (or at least meaning) as the attested Medieval Latin one? There are certainly many cases of obvious borrowings from Medieval Latin (and some Medieval Latin words even crafted or coined based on existing Romance, like Old French, words), but I'm talking about apparent inherited ones. Like how do we handle coxa for example? Maybe it's best to put that sense as Late Latin instead? Word dewd544 (talk) 16:00, 7 March 2018 (UTC)

If the meaning "thigh" is what the Romance languages have, it's probably best to call that sense Vulgar Latin. Same with focus (fire) rather than "fireplace". —Mahāgaja (formerly Angr) · talk 16:48, 7 March 2018 (UTC)
Indeed, there are many cases where a sense is shared between Vulgar Latin and Mediaeval Latin because the latter has borrowed it from a Romance language that inherited it from the former. —Μετάknowledgediscuss/deeds 18:25, 7 March 2018 (UTC)
Ok, that works for me. But it will admittedly require a bit of backtracking and redoing of some etymologies in which I've put "inherited" from Medieval Latin (because at the time I didn't know how else to handle it; I used to incorrectly treat Medieval Latin in these contexts as essentially being Vulgar Latin in the Early Middle Ages, which is more accurately what we'd be looking at for inherited terms). I assume the same goes for Ecclesiastical Latin? Like all the religious related words like presbyter, episcopus, pascha, abbas, monachus, basilica, baptizo, blasphemo, etc.? And here's another issue: say if we have a Latin word that is listed in its entry as Medieval Latin, but in the Romance descendant's etymology we use Vulgar Latin instead. I imagine it would still be linked to the main entry that is described as "Medieval Latin" (without an asterisk), since making a separate VL. reconstruction page for each instance would be ridiculous. That's just reserved for terms that were unattested in any written form of Latin. Word dewd544 (talk) 17:16, 8 March 2018 (UTC)

sports ticker and score card abbreviations[edit]

Over in [Requests for verification:English] someone has tagged a number of sports ticker abbreviations, such as UTA, LAL, WIM, etc. They are clearly a thing, but are they a thing we want in Wiktionary? The RFV process doesn't work well for them, because sports tickers are not durably archived, but they seem pretty standard in their own subculture. Given the number of these, it seems like something we could use a policy decision on. Kiwima (talk) 01:50, 8 March 2018 (UTC)

First version of Lexicographical Data will be released in April[edit]

I come bearing a message from WikiData.

After several years discussing about it, and one year of development and discussion with the communities, the development team will deploy the first version of lexicographical data on Wikidata in April 2018.

A new namespace and several new datatypes will be created in order to model words and phrases in many languages. Editors will be able to describe words in Wikidata, and in the future, to query this information, and to reuse it inside and outside the Wikimedia movement.

If you’re curious to discover how this new data structures will look like, you can have a look at the data model. It is suggesting a technical structure, but the editors will remain free to model and organize data as they prefer, with the usual open discussions and community processes that we apply on Wikidata. The documentation will be improved step by step, with the different releases and help of the community.

Please note that the version that will be deployed in April is a first version, that will be improved in the future, thanks to your tests, comments and suggestions. Some features may be missing, some bugs may occur. We can already tell you that the following features will be included in the first version:

  • Add, edit and delete Lexemes, Forms, statements, qualifiers, references
  • Link from an Item or a Lexeme to an Item or a Lexeme
  • Basic search feature

And the following features will not be included in the first version, but are planned for the future:

  • RDF support (which means: the ability to query it with query.wikidata.org)
  • Senses will not be included in the first version, to give you all some time to get properties, processes, etc in place for Lexemes and Forms
  • Entity suggestion and better search features
  • Merge Lexemes

You can have a look at a more detailed features list. After the first deployment, we will start a discussion with all of you about what are the most important features for you, so we know which ones you would like us to work on next.

Thanks to the people who already showed support and curiosity about lexicographical data on Wikidata. We hope that when it will be deployed, you will test it, experiment with the languages you know, and give us some feedback to improve the tools in the future.

While waiting for the release, here’s what you can do:

  • Improve the list of tools with ideas of tools that could be built on the top of lexicographical data
  • Add your ideas of cool queries you’d like to do with words and phrases in the future
  • Have a look at the project page and especially the talk page, where people are already asking questions, and discussing about how to model data and other topics
  • If you’re involved in a Wiktionary community, discuss with them and answer any questions they might have about Wikidata. You can also register as ambassador for your community.

Last but not least, we are kindly asking you to not plan any mass import from any source for the moment. There are several reasons behind that: first of all, like mentioned above, the release will be a first version and we need to observe how our system reacts to the manual edits before starting considering automatic ones. The system may not be ready for big massive imports at the beginning. Second reason is legal. Lexicographical data in Wikidata will be released under CC0, and the responsibility of each editor is to make sure that the data they will add is compatible with CC0. For more information, you can have a look at the advice of WMF Legal team. Finally, we strongly encourage you to discuss with the communities before considering any import from the Wiktionaries. Wiktionary editors have been putting a lot of efforts during years to build definitions, and we should be respectful of this work, and discuss with them to find common solutions to work on lexicographical data and enjoy the use of it together.

If you have any question or idea, feel free to write on Wikidata:Wikidata talk:Lexicographical data. Further discussion is also ongoing at Wikidata:Wikidata:Project chat#First version of Lexicographical Data will be released in April. Cheers! bd2412 T 02:59, 8 March 2018 (UTC)

As an offshoot, but really unrelated to the Wikidata effort, I wonder how much content on en.wiktionary would become CC0 based on a small number of contributors ex post facto releasing their work in such a manner. There are many entries which have only been touched by a single primary author and then a number of bots for formatting, I don't know whether the bots even count as authors. If someone has the full edit history downloaded I imagine it would be possible to do some modeling and determine how many entries here would be CC0 if the top 10, 20, 50, etc. editors were willing to transfer the license. If we wanted to get fancy and remove from the edit history all reversions (that is any intermediate edits between two equivalent versions of the same page), or perhaps consider section by section. While I am not a fan of the process that Wikidata seems to favor when interacting with other projects, I would love to be able to back our project with a more structured data. I think this would open myriad doors for improvements in presentation and usefulness. - TheDaveRoss 13:02, 8 March 2018 (UTC)
I would actually be hard-pressed to imagine a Wiktionary editor asserting any kind of copyright in their contributions. bd2412 T 21:22, 8 March 2018 (UTC)
It seems like they're treating us in the same way that we treat other dictionaries- as a source of information that can't be copied directly but can be paraphrased and used as a source. DTLHS (talk) 21:27, 8 March 2018 (UTC)
Except that unlike a print dictionary, we're an active community that they could collaborate with if they chose to do so. —Μετάknowledgediscuss/deeds 21:30, 8 March 2018 (UTC)
Lots of frustrated voices over at the project chat discussion (now several pages long). Anyway, I encourage you to participate in the technical discussion happening right now, the project enters a crucial early phase were important decisions are made. I'm curious when and why the project "rebranding" of "Wikidata: Structured data for Wiktionary" to "Wikidata: Structured Lexicographical Data" happened, it was mentioned a few times in the discussion (and, as pointed out, changed the tone of the collaboration). – Jberkel 12:13, 12 March 2018 (UTC)

Category for Insurance terminology[edit]

Is there a particular process for agreeing and implementing a new category for English terms? I'd like to create and populate Category:en:Insurance for terminology used within the insurance industry (a subcategory of Category:en:Finance seems most appropriate), and add automated categorisation via labels using Module:labels/data/topical. I'm holding back from "being bold" to make sure I don't step on any toes. -Stelio (talk) 10:37, 8 March 2018 (UTC)

Be bold. - TheDaveRoss 12:56, 8 March 2018 (UTC)
Yes check.svg Done -Stelio (talk) 14:58, 8 March 2018 (UTC)
Some of the items in the category (eg, economic) seem not to have any distinct insurance sense. Unless they do, the category seems misleading. This is not unique to this topical category, but it might be well to address the problem now before populating the category recklessly. I think the problem is associated with the presence of hard categorization and the absence of {{label|insurance}} categorization. An "incategory" and "insource" Cirrus search should quickly identify the possible problems. DCDuring (talk) 16:09, 8 March 2018 (UTC)
The category may be too new for the Cirrus search term "incategory:en:Insurance" to work. The "insource" term won't be allowed to run unless it is restricted to run over a readily identified, "not-too-big" subset of Wiktionary entries. DCDuring (talk) 16:18, 8 March 2018 (UTC)
Thanks you very much for the review, @DCDuring. Yes, I had two difficulties here:
  1. Words with definitions that are more generic that a specific insurance sense. For example term, definition 8, is "Duration of a set length...". That's the insurance definition: the term of an insurance policy is the amount of time from its inception to latest expected termination. But I shouldn't label that definition with "insurance" because it applies in wider circumstances too. Would an indented insurance definition would be appropriate (8.1)?
  2. Avoiding SOP terms. For example "economic assumption" is a modelling assumption (assumption is on my list to update with that sense) that relates to economic factors. That feels SOP to me, but the economic/demographic split is sufficiently important in the insurance world to merit categorising those terms. Perhaps then an additional definition of economic with a sense that is labelled as "insurance" and "of an assumption", then?
I'm definitely keen to get this right and conform to established site norms, so I value this feedback. -Stelio (talk) 16:41, 8 March 2018 (UTC)
If the insurance sense of "term" is in fact covered by an existing, broader sense of "term", then I wouldn't add a subsense. (To give an extreme example: insurance documents also use "the", but it's the same "the" as everyone else also uses, so there's no need for an insurance-specific sense.) If the insurance sense is significantly different, then a subsense is merited. Whether or not terms that seem important to insurance but aren't specific/limited to it (like "economic" and maybe "term") should be categorized is less clear. As DCDuring suggests, it's an unclearness that plagues our category structure in general, and despite giving it thought, I don't know what to advise you. Many other categories do include terms that seem related/important without being limited to the category's named context. - -sche (discuss) 18:06, 8 March 2018 (UTC)
The problem is that a topic-specific glossary is useful because it contains ONLY the relevant sense and usage of polysemic words like term, policy, and economic. As a comprehensive (and historical dictionary) we, by definition, try to include all definitions. Someone only interested in the insurance use of a term can get lost in our entries for such terms. We could have appendices (eg, Appendix:Glossary of terms used in insurance [term again!!!]) that contained links to the specific definitions using {{senseid}}. This would serve as a specialized portal for passive users as well as contributors who had specialized topical interests. DCDuring (talk) 21:30, 8 March 2018 (UTC)

Automatic transliteration of Biblical Hebrew[edit]

Would it be possible and desirable to implement automatic transliteration of the etymology-only language Biblical Hebrew (hbo) when it's fully pointed? That way, {{der|en|hbo|אָמֵן}} would automatically provide the transliteration ʾāmēn, but {{der|en|he|אָמֵן}} (and of course both {{der|en|he|אמן}} and {{der|en|hbo|אמן}}) would still require manual transliteration. Would that be technically possible, and if so, would other people find it a good idea? —Mahāgaja (formerly Angr) · talk 13:15, 8 March 2018 (UTC)

There are still a lot of complications that need to be solved. We already have an experimental module Module:he-translit, which is able to transliterate about 90% of words (which is not at all good enough for automatic transliterations), but without stress marks. The next step would be to implement support for stress marking, but this would also require adding stress marks or cantillation marks to Hebrew text that needs to be transliterated. We decided a while ago not to allow stress marks or cantillation marks in Hebrew text due to poor font support, which has improved a little but not enough over the years. Additionally, we would need to start strictly using the Unicode HEBREW POINT QAMATS QATAN (U+05C7) instead of the regular qamats mark whenever it represents a short-o, and we would need a way to mark the distinction between sheva na and sheva nach, which currently do not have separate Unicode codepoints. Once all that is done, however, it should work equally well for Biblical Hebrew and Modern Hebrew. And then there is also a minor issue that it would be impossible to distinguish between abbreviations (which should be transliterated letter-for-letter) and Hebrew numerals (which should be transliterated as "Arabic" numerals). So in short, it's not possible yet until we can solve some of those problems. --WikiTiki89 17:19, 8 March 2018 (UTC)
Regarding the shevas, it was pointed out that Michael Everson is a wiki user, w:User talk:Evertype; if we can point (no pun intended) to texts where the shva na and shva nach are used contrastively, we could ask him about proposing a new Unicode codepoint. - -sche (discuss) 18:22, 8 March 2018 (UTC)
@-sche: Here are couple examples:
Both of these examples also differentiate qamats qatan from qamats gadol. --WikiTiki89 19:39, 8 March 2018 (UTC)
@Wikitiki89 Thanks. I'm writing to him now. I see that some references say the shvas are no longer normally pronounced differently in modern Hebrew; if so, which one are they pronounced as? If Unicode, instead of adding new codepoints for both, were to desire to assume that the existing shva codepoint could be taken to be one of them (with only one new codepoint added, for the other one, for those texts which distinguish it), which shva should be the "default" shva and which one should get a new codepoint? (I will see if Michael thinks it would be better to propose two new codepoints or just one.) - -sche (discuss) 20:18, 8 March 2018 (UTC)
@-sche: Regarding your question about Modern Hebrew pronunciation: Generally, the old distinction of the shvas has disappeared, but there is a new distinction between null and /e/, depending on the phonological environment or morphology, with some environments having free variation between them. Regarding your question about codepoints, I definitely don't think we need two new codepoints and I would say that simply for graphical reasons the shva nach should share the current codepoint, and the shva na should get the new codepoint, because generally when they are distinguished, the shva nach has a normal or maybe slightly reduced size, while the shva na is clearly enlarged and/or bolded. --WikiTiki89 20:40, 8 March 2018 (UTC)
OK, I left a message, with your informative explanation of how differently they are displayed. Hopefully he can either make the proposal or advise us on making it. - -sche (discuss) 21:28, 8 March 2018 (UTC)
If the two are sometimes distinguished in writing, then by all means they should have separate code points. But isn't the distinction always clear from environment anyway? Are there any words where you can't tell whether a schwa is na or nach just from its environment? —Mahāgaja (formerly Angr) · talk 11:53, 9 March 2018 (UTC)
No, the distinction is not always clear from the environment, otherwise we wouldn't have this problem. --WikiTiki89 20:11, 9 March 2018 (UTC)
Alright, Michael Everson sent me an e-mail explaining that the next step is that we need a point person who is willing to use their real name, and it would be helpful but not obligatory (because you can always just check back in here if questions come up) if it was someone with some knowledge of these characters and/or of Hebrew script generally, to e-mail him and another gentleman. If someone here is willing to be that person, I will send you the contact information. - -sche (discuss) 02:29, 11 March 2018 (UTC)
I'm willing to send the message and use my real name. I do not however have knowledge of Hebrew script (beyond the parallels it shares with Arabic script). So feel free to use me if a more suitable candidate does not arise. -Stelio (talk) 11:44, 12 March 2018 (UTC) I should probably ping you too, @-sche, in this response. -Stelio (talk) 12:16, 12 March 2018 (UTC)
Since I know Michael Everson IRL I'm willing to use my real name too. I do have a fair knowledge of the Hebrew script, although until this thread I never knew that the two schwas were sometimes distinguished in writing, so maybe my knowledge of the Hebrew script is insufficient. —Mahāgaja (formerly Angr) · talk 13:10, 12 March 2018 (UTC)
@Mahagaja: Don't beat yourself up over that. It's a recent phenomenon that is still limited to very specific religious publications, the same ones that also make a distinction between the two qamatses. If you look carefully at the second example I linked to above, you'll notice they even distinguish between the two types of dageshes. --WikiTiki89 15:01, 12 March 2018 (UTC)
@Wikitiki89: Yes, I see that now; it also distinguishes the two types of qamats. Do we want to request a new code for dagesh forte while we're at it? —Mahāgaja (formerly Angr) · talk 15:10, 12 March 2018 (UTC)
@Mahagaja: Maybe. It's less common, because it's much more straightforward to distinguish them (in 99.9% of cases). But I guess since it does exist, it might deserve a code. --WikiTiki89 15:16, 12 March 2018 (UTC)
OK, I've passed the contact info on to Mahagaja (by e-mail). Hopefully we get us some shiny new codepoints! - -sche (discuss) 00:47, 13 March 2018 (UTC)

"Proverb" PoS isn't a PoS[edit]

PseudoSkull pointed out that "proverb" is not in fact a part of speech. Please see "Proverb"_POS_at_Wiktionary. We should presumably convert these PoS into "phrase", and possibly tag proverb as a gloss. Thoughts? Equinox 09:32, 10 March 2018 (UTC)

It isn't, but phrase isn't either. Both is accepted per WT:EL#Part of speech. - 21:24, 12 March 2018 (UTC)


I'm getting "Lua error in Module:headword/templates at line 103: attempt to index field 'falt' (a nil value)". What's this? ---> Tooironic (talk) 02:32, 12 March 2018 (UTC)

Wiki Indaba 2018[edit]

Presentation for Wiki Indaba 2018.

Hi, Benoît Prieur and I will be at Tunis from 16 to 18 March to attend at WikiIndaba conference 2018. We will go there to present the Wiktionary (especially the French version) and hope to incite some people (everybody?) to contribute at the Wiktionary in languages from Africa that are currently underrepresented on Wiktionaries (Arabic, Berber, Fula, ...). A conference and a workshop have been accepted. The goal of the presentation is to try to show the interest of the Wiktionary (mainly French one) for the development and the visilibility of languages spoken in Africa.

I have posted a first version of the presentation. I would really appreciate if one of you could correct the English on the slides before Thursday so that I can take it into account. I can provide the odp file if needed. I would be also happy if you have comment about what we wrote. Thanks in advance. Pamputt (talk) 06:51, 12 March 2018 (UTC)

I really appreciate that you're doing this! Obviously, Francophones will have you as a point of contact, but for people who are more comfortable with English, I would be happy to mentor people who want to contribute in African languages to en.wiktionary. —Μετάknowledgediscuss/deeds 07:17, 12 March 2018 (UTC)
@Pamputt: In general, the slides are well written. :) On page 7, perhaps "allow understanding by all audiences" would be better than "allow understanding for all audiences". On page 8, it's not clear what "words with a confidential use" would be, or how Wiktionary would know a secret use; perhaps you mean "restricted (to certain contexts/jergons)" or "literary"...? On page 9, instead of "built languages", it's more natural to say "constructed languages" or "artificial languages". On page 11, the language is fine, it's just hard to read the yellow font "Austronesian: Malagasy" is in. On page 12, "Adding a new entry... (compare to..." sounds more natural than "Add a new entry... (to compare to..." IMO. And "limited knowledge" wouldn't normally take an article (a) AFAIK. In "Contributing help to learn its own language or rediscover it", it's unclear what "its" and "it" refer to... maybe "Language communities contributing helps them maintain and deepen knowledge of their own languages"? And one would normally speak of interest from linguists or of something being of interest to linguists. On page 15, "native speaker" is more natural and simpler (for any non-native speakers to understand) than "locutor". But again, on the whole, well-written; it sounds like an interesting and informative presentation! :) - -sche (discuss) 07:41, 12 March 2018 (UTC)
@Metaknowledge thanks for your messge. Sure I will give your name if some English speaker needs help to contribute here. Pamputt (talk) 22:36, 12 March 2018 (UTC)
@-sche thank you very much for your corrections. I took them into account within the new version of the presentation. If you have more comments, do not hesitate to write them. :D Pamputt (talk) 22:36, 12 March 2018 (UTC)


I'm wondering what the best way to add Pazend, the Middle Persian variant of Avestan, which contains an extra character (𐬮) and several unique ligature. Would this be inappropriate?

m["pal-Avst"] = {
        canonicalName = "Pazend",
        characters = m["Avst"].characters,
        direction = "rtl",
        parent = "Avestan",

--Victar (talk) 19:25, 12 March 2018 (UTC)

What's the reason it needs to be a separate script? --WikiTiki89 21:02, 12 March 2018 (UTC)
It wouldn't be inconsistent with there being so many language-specific versions of Arabic script. But if what's desired is different fonts for Pazend as opposed to Avestan proper, that can be achieved without a separate script code, using the CSS selector .Avst:lang(pal) (or additional selectors if other languages were also written in Pazend). — Eru·tuon 21:16, 12 March 2018 (UTC)
It's not a difference of font, but rather the addition of a character and various ligatures, so the unicode characters employed for spelling a word would be different. Incendeltly, I'll need to create a modified typing-aids module dataset. It would also be nice to call up the name in templates, ex. {{desc|pal|𐬯𐬞𐬁𐬵|sc=pal-Avst|sclb=1|tr=spāh}} and populate categories specially for with pal-Avst script requests. --Victar (talk) 21:30, 12 March 2018 (UTC)
If it doesn't need a different font, that doesn't sound like the kind of thing that needs a new script code; I mean, different languages use different subsets of Latin letters while still using just either "Latn" or "Latinx". Just make sure "Avst" covers all the characters that are used. (I wonder if we need as many Arabic script codes as we have, to Erutuon's point...) - -sche (discuss) 22:02, 12 March 2018 (UTC)
@Victar: Huh, how are the ligatures rendered if not by a separate font? — Eru·tuon 22:11, 12 March 2018 (UTC)
@Erutuon: I'm not sure how it exactly works, but e.g. Noto Sans Devanagari has different styles for Hindi and Marathi. It's one font. —AryamanA (मुझसे बात करेंयोगदान) 22:23, 12 March 2018 (UTC)
@-sche: There are stylistic differences that could be better represented in a different font, but I'm not aware of any font though specifically made for Pazend. I think the best comparison is Latn to Latf. Note that Fraktur uses the same range as Latin. @Erutuon: What's recommended is actually a silly system of using U+200C between letters to prevent ligatures, but your idea of a separate font would be much better. I wonder how the Unicode Avestan font compares to the Google Noto font. I'll have to check that out. --Victar (talk) 22:31, 12 March 2018 (UTC)
I looked into it and there are actually only two free unicode Avestan scripts, one, Noto Sans Avestan, used ligatures, and the other, Ahuramzda, does not.
Ligature Ahuramzda Noto Sans Avestan
š + a 𐬱 + 𐬀 𐬱𐬀 𐬱 + 𐬀 𐬱𐬀
š + c 𐬱 + 𐬗 𐬱𐬗 𐬱 + 𐬗 𐬱𐬗
š + t 𐬱 + 𐬙 𐬱𐬙 𐬱 + 𐬙 𐬱𐬙
a + h 𐬀 + 𐬵 𐬀𐬵 𐬀 + 𐬵 𐬀𐬵
So, we could use Ahuramzda for Avestan (Old and Younger) and Noto Sans Avestan for Pazend. The distatanges of this is that their stylistic differences aren't along historical lines, and the Avestan language does employ some ligatures which aren't in Ahuramzda. I could make a variant of Noto Sans Avestan with the Pazend ligatures removed, but I don't know the legality of that, nor do I know if we could host that variant as a webfont. --Victar (talk) 04:40, 13 March 2018 (UTC)
Could one of these CSS properties disable Noto ligatures? —suzukaze (tc) 04:45, 13 March 2018 (UTC)
@Suzukaze-c: HAH! I didn't even know that existed! It does indeed work: 𐬀𐬵. No IE support, but no surprise there. I'll have to look inside Noto Sans to see if they're using distinct suffixes for those Pazend ligatures. --Victar (talk) 05:02, 13 March 2018 (UTC)
 :D —suzukaze (tc) 05:20, 13 March 2018 (UTC)
It looks like those are the only ligatures supported by Avestan unicode anyway. So, perhaps as a first step, I recommendation that .Avst be set to font-family:"Noto Sans Avestan", "Ahuramzda";font-variant-ligatures: none and .Avst:lang(pal) set to font-variant-ligatures: normal. @Erutuon, -sche, would that work for you? --Victar (talk) 05:32, 13 March 2018 (UTC)
I'm not against adding a separate script code for it if it'll have a benefit. For most scripts, the benefit is in the form of a subscript needing a different font to display correctly (e.g. display the special non-Latn letters of Latinx at all) or accurately (e.g. when rendering Latf differently from Latn). For this, if it's not the font that's different so much as the presence or absence of ligatures, I don't know what's better. In the table above, all the words display in the same font and are identically ligatured, probably because I don't have both fonts installed yet. Does anyone know if a separate script code would result in correct display for a greater number of users than the lang(pal) approach? - -sche (discuss) 15:59, 13 March 2018 (UTC)
@Erutuon, you seem most familiar with this. Are there any advantages/disadvantages to either approach? --Victar (talk) 14:55, 14 March 2018 (UTC)
@Victar: Which approaches do you mean? Using a dedicated script code versus a combination of a script code and language code? — Eru·tuon 20:03, 14 March 2018 (UTC)
@Erutuon, yes, I believe that's what @-sche is asking, .pal-Avst over .Avst:lang(pal), estentally. --Victar (talk) 20:11, 14 March 2018 (UTC)
I believe having sub-scripts is a holdover from before CSS supported language tags. As long as we're confident that enough of a proportion of users are using browsers that support language tags, then we can start switching away from sub-scripts. --WikiTiki89 20:15, 14 March 2018 (UTC)
The only benefit I can see for a language-script script code is if multiple languages will be using the same combination of CSS properties and you want to shorten the list of CSS selectors. Otherwise, both do the same thing equally well. — Eru·tuon 20:18, 14 March 2018 (UTC)

To approach this from a different angle, is having a different font and native name enough to warrant a subscript? If not, what makes Latf and the various Arab variants different? Why does it even matter? --Victar (talk) 06:31, 13 March 2018 (UTC)

Latf is displayed in very different fonts from Latn. Some of the Arab subscripts may need different fonts to display closer to how they're written as far as letter-shapes (letters having dots, or entirely different shapes from in standard Arabic) or line-slanting, but I do wonder if we need so many. - -sche (discuss) 16:01, 13 March 2018 (UTC)

Generalizing of Japanese infrastructure[edit]

Currently we have templates like {{ja-def}}, {{ja-pos}}, and {{ja-r}}, but nothing for other Japonic languages (except for forks of {{ja-def}}). We should consider making these templates usable for other Japonic languages as well.

See also Module_talk:ja-kanji-readings#Okinawan, Template_talk:ja-readings#Separate_template_for_Okinawan, and diff.

(Notifying Eirikr, Wyang, TAKASUGI Shinji, Nibiko, Atitarev, Dine2016, Poketalker, Cnilep, Britannic124, Fumiko Take, Dine2016): and @Erutuon. —suzukaze (tc) 20:12, 13 March 2018 (UTC)

I'd definitely be supportive of that.
There are some Japonics with weird phonetics and weird spellings, like the Miyako dialect of Ryukyuan. In addition, there's at least one non-Japonic that uses katakana (Ainu) which could also benefit from this effort. ‑‑ Eiríkr Útlendi │Tala við mig 20:41, 13 March 2018 (UTC)

I support the move in principle, but probably don't have much to contribute technically. I was going to mention Ainu, but Erikr has that well in hand. Cnilep (talk) 00:41, 15 March 2018 (UTC)

Old West and Old East Norse[edit]

I would like to add a language code for Old West and Old East Norse respectively. Just like with New and Medieval Latin. I can’t seem to find the right module though. Can someone add the two? I suggest non-oen (Old East Norse) and non-own (Old West Norse). I will use this in the etymology templates.Jonteemil (talk) 16:28, 15 March 2018 (UTC)

Sounds like what you want is etymology-only language codes. You can add those at Module:etymology languages/data. --WikiTiki89 16:38, 15 March 2018 (UTC)
For Old East Norse we do already have Old Danish and Old Swedish. —Mahāgaja (formerly Angr) · talk 20:56, 15 March 2018 (UTC)

Is the war against the unified Serbo-Croatian raging on?[edit]

Is the war against the unified Serbo-Croatian raging on? Template talk:User hr. --Anatoli T. (обсудить/вклад) 01:11, 16 March 2018 (UTC)

I will repeat what I said on the talk page: there's no reason to try to make our user language templates correspond exactly to what's in Module:languages. I find it baffling that you have an objection if someone wants to say they speak Croatian and not Serbo-Croatian. DTLHS (talk) 01:13, 16 March 2018 (UTC)
The question is: if user declares with the template "this user pig meat", than he/she likes pig meat, they did not say "this user likes lamb meat". Redirecting the template to some other form is not a fair and not correct. Just for communication (talk) 01:28, 16 March 2018 (UTC)
Fixing political issues is NOT our business. Use whatever ISO says because we have nothing better. Equinox 02:45, 16 March 2018 (UTC)
I'm in favor of letting users specify whatever language variety they like in their userboxes. If Croation is going to be banned in userboxes, American and British English should be as well: they don't have full language codes or get headers in entries. — Eru·tuon 03:21, 16 March 2018 (UTC)
  • I agree. Just because we treat Serbo-Croatian as a single language for lexicographical purposes doesn't mean we can't allow userboxes to make finer distinctions. —Mahāgaja (formerly Angr) · talk 10:38, 16 March 2018 (UTC)
  • I also agree that people should be able to say whatever they want on their userpages. --WikiTiki89 12:18, 16 March 2018 (UTC)
  • This is an interesting statement because I see it and think "yes, anyone should be able to say anything on their page" (unless it's totally egregious, like posting our home addresses, or spam/propaganda without any contribs), and then I wonder why I oppose userboxes. Probably because they have a "viral" quality and people tend to copy them without thinking, and then we end up with a big infrastructure of needless rubbish. I suppose in theory I don't oppose an individual userbox. Huh! Oh well just ranting. Equinox 15:59, 16 March 2018 (UTC)
IMO, as long as "sr", "hr" etc still ultimately categorize into the "User sh" categories so the users can be found when their expertise is needed, it's fine to have Serbian-specific and Croatian-specific boxes. I think the other Babel system (that pulls from a central, off-Wiktionary repository of codes) allows them regardless. - -sche (discuss) 16:52, 16 March 2018 (UTC)
Good point. I can use {{#babel:hr}} overriding the local template or I can use a global user page on Meta including whatever Babel extention permits. In fact, there is no link to update on pages currently on Category:User hr. --Vriullop (talk) 21:05, 16 March 2018 (UTC)

This is what I dreaded when Just for communication (a.k.a. Kubura) first contacted me – I share Atitarev's fear that this might be the first inkling of yet another war against unified Serbo-Croatian. Heaven knows I've reverted my fair share of anons who have tried to change Serbo-Croatian lemmas in one direction or another, blatantly disregarding our policy. With that said, I can also agree that letting users specify which language variety they speak in their userboxes might not be such a big deal after all. As long as we can agree – unanimously – that it's something we're willing to stomach. --Robbie SWE (talk) 19:25, 16 March 2018 (UTC)

The template has been changed back to point to Croatian by User:DTLHS in diff with a summary "it is agreed". In my opinion too early in the discussion. The original poster was even offended by the term "Serbo-Croatian", calling it the "so-called", and it has been our long-fought policy for years! If we say we are apolitical, then using Serbo-Croatian/Croato-Serbian is not a political statement but a linguistic common sense. Do we stand for anything? Why have language policies, language treatment documents and modules, mergers, splits and votes? --Anatoli T. (обсудить/вклад) 01:42, 17 March 2018 (UTC)
What do "language policies" have to do with what people put on their user pages? We unify Serbo-Croatian for lexicographic convenience and nothing more. And no, something isn't apolitical just because you happen to agree with it. DTLHS (talk) 01:47, 17 March 2018 (UTC)
I tend to agree with DTLHS. To echo Mahagaja: "Just because we treat Serbo-Croatian as a single language for lexicographical purposes doesn't mean we can't allow userboxes to make finer distinctions." Any edits switching the mainspace should be dealt with separately, if and when they arise. --Dan Polansky (talk) 13:00, 18 March 2018 (UTC)
As Erutuon said, there is no reason to allow British and American varieties of English and disallow varieties in other languages. If the user feels more comfortable with Croatian or Serbian, let them feel comfortable. It does not influence the main space in any means. --Jan Kameníček (talk) 13:14, 18 March 2018 (UTC)

Vote: PseudoSkull for admin[edit]

Hi! I'm a newbie who has rarely done any work on this project, and (seriously) I can hardly tell how to create a vote and I am not sure if I've done it right. It's very hard. (Usability question?) Anyway, I think it's about time that PseudoSkull becomes an admin and here is a vote about it. Please visit Wiktionary:Votes/sy-2018-03/User:PseudoSkull_for_admin Equinox 04:53, 16 March 2018 (UTC)

If you're a newbie I must be a new-newbie. I know what you meant though. DonnanZ (talk) 22:02, 16 March 2018 (UTC)

Vote: Including translation hubs[edit]

FYI, I created Wiktionary:Votes/pl-2018-03/Including translation hubs.

Let us postpone the vote as much as discussion requires, if at all. --Dan Polansky (talk) 08:54, 17 March 2018 (UTC)

Czech noun phrases[edit]

lošák zprohýbaný should be marked as "Noun", just like černá díra and black hole should be marked as nouns. This is consistent with WT:EL, which forbids the part of speech "Noun phrase". Jan.Kamenicek disagrees. A discussion is at User_talk:Jan.Kamenicek#Noun phrases, but I was not convincing enough. I do not want to engage in a revert war. --Dan Polansky (talk) 12:06, 18 March 2018 (UTC)

  • Terms that consist of a noun and an adjective (in either order) are phrases according to our definition of phrase but we always treat then as nouns here. Please change it to a noun (I don't know if it has a plural). If the other person continues to change it to a phrase, I'll give him a short block. SemperBlotto (talk) 12:14, 18 March 2018 (UTC)
  • I do oppose such solution for Czech entries. I understand that English nouns may include noun phrases as well, but Czech nouns do not. General understanding is that only single-word expressions can be nouns in Czech language, which I explained in detail on my talk page. --Jan Kameníček (talk) 12:24, 18 March 2018 (UTC) E. g. houpací křeslo (rocking chair) is always analyzed either as adj + noun, or as a noun phrase, but never as a single noun. --Jan Kameníček (talk) 13:21, 18 March 2018 (UTC)
    • Czech and English are exactly the same as for noun vs. noun phrase, as I pointed out. No reason to treat Czech different from English or German. A dictionary treatment of a part of speech is not necessarily the same as a general linguistic treatment. The English linguistics does distinguish NP from N, no question about it. --Dan Polansky (talk) 12:27, 18 March 2018 (UTC)
      • Yes there is a huge reason to treat it differently, and that is that linguistic sources on Czech nouns use it differently and so do all people no matter whether they are linguists or laypeople. It is very confusing for readers if they meet here an attitude that is so different from what they are used to in real Czech language usage as well as Czech language textbooks. --Jan Kameníček (talk) 12:33, 18 March 2018 (UTC)
        • The first-time users of Merriam-Webster may experience the same confusion. But the confusion quickly withers away; they get used to it, some of them realizing that it is a part-of-speech classification and that it makes sense from a dictionary point of view. Czech = English as for linguistic sources distinguishing NP and N; no difference here. --Dan Polansky (talk) 12:47, 18 March 2018 (UTC)
          • To argue about classification of Czech words we should seek something that deals with classification of Czech words, which Merriam-Webster is not.
          • Here is also a link to an English book on Czech language that also differentiates between nouns and noun phrases. These two have always been understood as different things when analyzing Czech language and so it should be mirrored in Wiktionary as well.
          • It is not true that it has to be a part of speech classification, as various different headings are allowed, such as Phrase (which is the one I used), Prepositional phrase, Proverb, Suffix and many more...
          • The confusion does not wither away with non-regular users. --Jan Kameníček (talk) 13:00, 18 March 2018 (UTC)
            • I will break it down.
            • 1) General English linguistic sources about English distinguish NP from N.
            • 2) English dictionaries do not distinguish NP from N.
            • 3) The English Wiktionary has decided to abolish the distinction between NP and N, for all languages. It did so in keeping with 2).
            • 4) We do not have any example of a Czech dictionary that has černá díra, and ranks it either as NP or P.
            • 5) General Czech linguistic sources about Czech distinguish NP from N, similar to 1). No surprise here.
            • 6) There is no grammatical difference between černá díra, black hole and schwarzes Loch.
            • 7) ----- Therefore -----
            • 8) Let us enter Czech in a way consistent with 3). Let us do so until the decision made in 3) is reverted via general consensus of the English Wiktionary.
            • 9) Consistent with 8), please change lošák zprohýbaný to Noun, and leave it like that until you convince other people to change 3).
            • --Dan Polansky (talk) 13:34, 18 March 2018 (UTC)
              • Ad 1) and 2) Not applicable for Czech expressions.
              • Ad 3) I avoided using the heading "Noun phrase" which was rejected and used just the allowed "Phrase", although it is probably meant for different cases. I believe it is a good compromise until it is agreed whether "noun phrases" are allowed to have their own heading at least in Czech entries. If not, I would be happy just with "Phrase", too: it is not ideal, but at least it is not wrong and confusing.
              • Ad 4) Easy to explain, most dictionaries of Czech expressions do not have various phrases as separate entries, but among phrases and collocations connected with individual single words. Despite this there is evidence how the dictionary authors understand Czech nouns:
                • c) I have never seen a dictionary of Czech expressions marking noun phrases as nouns.
                • b) My Czech-English dictionary (by Josef Fronek, 2000) has loads of examples that suggest the authors do not consider noun phrases to be nouns. Every entry there has marked its POS. If they want to change it to a different POS within the same entry, they always mark it again. However, when I look up elektrický (electric) marked as adj., they have also got there elektrické křeslo (electric chair) within the same entry without marking any change of POS to a "noun". Instead they have got it among other phrases and collocations of the adjective electric with other words. If they considered it a noun, they would mark it so.
                • c) My dictionary of Czech phraseology, part on non-verbal expressions, contains entries many of which are noun phrases. Although they never mark any POS (which can also be understood as evidence that they do not consider them to be PsOS but phrases), in various comments in the preface and other chapters they directly call them noun phrases and never nouns.
              • Ad 5) And so do English sources about Czech.
              • To sum up my arguments again: various sources on both Czech lexicography and Czech language generally do not consider Czech noun phrases to be nouns (but phrases consisting of words of various PsOS) and so do also laypeople. Because the heading "Phrase" as such is not disallowed in Wiktionary, I hope it can stay (although allowing "Noun phrase" would be even better). --Jan Kameníček (talk) 17:30, 18 March 2018 (UTC)
Perhaps an argument can be made for using "Noun phrase" instead of "Noun", but this would have to apply to all of Wiktionary, Czech is not exceptional in this matter. Crom daba (talk) 17:56, 18 March 2018 (UTC)
Thank you. One difference I can see is that while Czech noun phrases are not understood as nouns, English ones sometimes are. General solution of allowing "Noun phrase" headings would be great (and I believe it will happen one day), but if such consensus does not occur, tollerating just the heading "Phrase" to Czech entries without specifying what kind of phrase it is would suffice. --Jan Kameníček (talk) 19:06, 18 March 2018 (UTC)
I agree with Semper, we treat these as nouns because they function as nouns, can be replaced with single-word nouns without changing the grammar of the sentence, etc. There does not seem to be anything different about these terms in Czech versus Polish, French, German, English, etc that would justify treating them differently; to Jan's point about "confusing" Czech-speakers I would counter that it is likely confusing for non-Czech people (who seem more likely than Czechs to be looking up English definitions of Czech words, i.e. using en.Wikt instead of cs.Wikt) who are looking up words to see things that are clearly nouns labelled as "phrases"; certainly, it seems wrong to me, since "phrases" normally refers here to things like "a little bird told me" (and if I had encountered it without realizing there were an ongoing discussion like this, I would have simply considered it an obvious mistake and misunderstanding of en.Wikt conventions and changed it to "Noun"). - -sche (discuss) 20:01, 18 March 2018 (UTC)
I second everything -sche said. —Μετάknowledgediscuss/deeds 20:05, 18 March 2018 (UTC)
@-sche, Metaknowledge: Non-Czech speakers trying to learn Czech are not likely to use Wiktionary as the only source for learning. They are likely to use other sources and Wiktionary only as a secondary source. So Wiktionary should be in accordance with the others, not the only one which is different (and in the context of linguistics dealing with Czech language also wrong). --Jan Kameníček (talk) 20:29, 18 March 2018 (UTC)
Displaying noun phrases as nouns makes sense specifically in English because English noun phrases and equivalent compound nouns are often found in free variation. Consider synonymous forms like healthcare vs. health care, distinguished only by a bit of whitespace; in a dictionary, it makes sense to format such pairs of entries in the same way. And perhaps the same consideration applies to other analytic languages such as Chinese.
By contrast, in strongly inflected languages, for instance in any Slavic language, noun phrases and compound nouns behave extremely differently. First of all, if you remove space between Slavic words, you don't end up with a compound noun; you end up with garbage that violates the language's morphological rules. (You cannot meld охрана здоровья (oxrana zdorovʹja) into *охраназдоровья; the correct compound would be здравоохранение (zdravooxranenije).) Second, you have a lot of freedom to reorder the words in a Slavic noun phrase or even interleave it with other words and phrases in the middle.
The English wiktionary's insistence on classifying Russian noun phrases as nouns has always grated on me; it feels and looks utterly wrong for my language, and I am sure that speakers of Czech like Jan.Kamenicek are of the same opinion. If someone starts a vote to abolish the noun/NP merger that makes no sense for non-English entries, I would be 100% in favor. Tetromino (talk) 16:04, 21 March 2018 (UTC)

Headings linked?[edit]

Would you consider linking headings, or some headings? e.g. Participle. Often edited as Adjective or Verb. But is both, and in most pages there is no clarification. sarri.greek (talk) 13:21, 18 March 2018 (UTC)

Some languages do have "participle" as a part of speech and POS header. But (standard) POS headers should never have links in them (I seem to recall that some "Abbreviation" or "Initialism" headers may have contained links at some point, but that is deprecated). - -sche (discuss) 20:07, 18 March 2018 (UTC)
I see, thank you @-sche:. A pity: the PoS in many pages remains unexplained (αγιοποιημένη). I was comparing to @el.wiktionary with linked Pos. I presume, that at some stage in the future, all words in wiktionary will be clickable/linked. sarri.greek (talk) 21:26, 18 March 2018 (UTC)

German case ordering[edit]

Newer grammars tend to use the order "nominative-accusative-dative-genitive", which has the advantage that the often identical nominative/accusative forms are grouped together (easier to see patterns for learners). It also reflects usage: genitive is rare and listed last. I'd like to change our templates accordingly, any objections? – Jberkel 08:59, 19 March 2018 (UTC)

A recent discussion about this. --Per utramque cavernam (talk) 09:13, 19 March 2018 (UTC)
Ah, thanks. In general editors seem to be in favor of the change (this wasn't a general discussion though). As a compromise, I could change it to the proposed order with the option of reordering it back to the "traditional" layout via the script mentioned. – Jberkel 09:34, 19 March 2018 (UTC)
Symbol oppose vote.svg Oppose, this is confusing for those of us who grew up with the classical ordering, which I suspect includes most Germans. Crom daba (talk) 13:17, 19 March 2018 (UTC)
I also learned the traditional ordering at school (ages ago), but we should think about non-native readers who want to use Wiktionary as a resource. Imagine how confused they are with our current presentation. The proposed order is already used in DaF (German ESL) and is becoming a standard in modern grammars. What is currently taught in German schools, to native speakers, I do not know (any teachers reading?) – Jberkel 14:05, 19 March 2018 (UTC)
Additionally, it doesn't fit with European case names like German dritter Fall (third case = dative), vierter Fall (fourth case = accusative), Dutch vierde naamval (fourth case = accusative), Czech čtvrtý pád (fourth case = accusative). Also the German ordering fits with the ordering of Greek, Latin, Czech and other languages. - 16:27, 19 March 2018 (UTC)
That's irrelevant; we don't have to make our system match these case names. What we do have to do is add some explanations in these entries. --Per utramque cavernam (talk) 16:37, 19 March 2018 (UTC)
  • Symbol support vote.svg Support, as it is easier, quicker to see and is more appropriate for the German case system. --Mahmudmasri (talk) 13:56, 19 March 2018 (UTC)
  • Symbol support vote.svg Support; that's the ordering we used when I was learning German. --Per utramque cavernam (talk) 15:53, 19 March 2018 (UTC)
  • Symbol support vote.svg Support; it's a more logical order, IMO, grouping similar forms. - -sche (discuss) 16:06, 19 March 2018 (UTC)
  • Symbol oppose vote.svg Oppose. We should follow the lexicographical conventions of the language. --WikiTiki89 16:43, 19 March 2018 (UTC)
  • Symbol oppose vote.svg Oppose: per Wikitiki89. This mostly seems moot given the ability to reorder with JS, but as long as de.Wikt uses the traditional ordering, I don't think we should break with them. —*i̯óh₁n̥C[5] 21:57, 19 March 2018 (UTC)
  • Symbol question vote.svg Confused: I'm quite surprised to learn that the Nominative-Accusative-Dative-Genitive ordering is "new". I learned German first from the First-year German textbook by Jedan, Helbling, Gewehr, and von Schmidt, first published in 1975 and republished in 1979 (Amazon link), and they used that ordering. Is 43 years old still "new"? ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 19 March 2018 (UTC)
    It's new in comparison to two thousand years of nom-gen-dat-acc. Crom daba (talk) 22:37, 19 March 2018 (UTC)
Forgive me for not believing that German grammars have been around for 2000 years. If you are referring to Latin, I fail to see how that has any direct bearing. ‑‑ Eiríkr Útlendi │Tala við mig 22:40, 19 March 2018 (UTC)
FWIW, re-reading my post above, it comes across much snottier than I intended. Apologies, Crom daba. ‑‑ Eiríkr Útlendi │Tala við mig 05:15, 21 March 2018 (UTC)
I'm not a big fan of monkey patching MediaWiki with user scripts to fix usability problems (how does this help anon/casual users?), but as there's no consensus for making global changes I adapted the noun declination tables to work with the changeCaseOrder.js script. @Erutuon could you change the order to NADG? There is also a table which lists ablative and vocative forms {{de-decl-noun-table-sg-av}}, they should be last (currently listed first with the script). – Jberkel 10:03, 21 March 2018 (UTC)
@Jberkel: Done. — Eru·tuon 21:10, 21 March 2018 (UTC)
  • Symbol oppose vote.svg Oppose: The current order (nominative-genitive-dative-accusative) is just as easy, just as quick to see, just as appropriate, and just as logical. This order is found in many books, is the order many schools use in teaching, and is the order shared by many Indo-European languages on this Wiktionary, including Russian, Czech, Polish, German, Latin, Lithuanian, Belarusian, Ancient Greek, Serbo-Croatian, Upper Sorbian, Lower Sorbian, and Ukrainian. (including the German declension tables on de.wiktionary). When a student who has learnt declensions using a certain order is later asked to adjust to a different order, he thinks of little things such as two forms being adjacent, alphabetical order, chronological order (the order in which the forms were presented in class), or one of any number of other superficial features that he can think of to "prove" his argument that his way is best. It is simple rationalization (i.e., a psychological defense mechanism in which perceived controversial behaviors are logically justified, also known as "making excuses"). There is one good argument, and that is the order in which the student learned them. For anyone who learned declensions in the current order, the current order is best (easiest, quickest to see, most appropriate, most logical). For anyone who learned declensions in a different order, the different order is the easiest, etc. —Stephen (Talk) 12:46, 21 March 2018 (UTC)

Category:Numeral symbols by language[edit]

What is the purpose of Category:Numeral symbols by language exactly? Probably Category:Gujarati numeral symbols is what it is intended for, but Category:English numeral symbols and Category:Korean numeral symbols look very different. — TAKASUGI Shinji (talk) 05:53, 21 March 2018 (UTC)

I don't understand your question. Why would it include Gujarati numerals, but exclude English and Korean numerals? And what does "look very different" have to do with it? There are many sets of numerals, and the numerals in one set usually look different from the numerals in any other set. If every language used one and the same set of numerals, there would be no need for this. It's needed because the numeral sets often vary according to language. —Stephen (Talk) 13:03, 21 March 2018 (UTC)