Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:Beer Parlour)
Jump to: navigation, search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome, all, to the Beer Parlour! This is the place where many a historic decision has been made and where important discussions are being held daily. If you have a question about fundamental Wiktionary aspects—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don't make personal attacks, don't change other people's posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page. There are various other discussion rooms which may serve the idea behind your questions better. Please take a look to see which is most appropriate.

Sometimes discussion identifies an issue as an idea for policy development or rewriting. Such discussions may be taken out of the Beer parlour to a relevant page, or a brand new page may be created. Usually, the active policy pages will be listed in one of the sections below. See also the policy development page and the votes page.

Questions and answers will not remain on this page indefinitely, as it would very soon become too long to be editable. After a period of time with no further activity (usually a couple of weeks), information will be moved to the archives. We make a point to preserve all discussions that were started here in the archives. However, talk that is clearly not intended for this page may be moved and will not end up in the archives. Enjoy the Beer parlour!

Beer parlour archives edit
2002
December
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018


Contents

January 2018

Cleaning up after long-term abuse by Zeshan Mahmood[edit]

The user account Zeshan Mahmood is globally locked [1] because of cross-wiki abuse. In a recent sockpuppet investigations case on en.wiki, it turned out that they have extensively edit from IPs. IPs that geolocate to the same area and that follow the same editing patter, have been active here as well, the following IPs match the description: [2] [3] [4] [5]. It's likely there have been many more. This user's edits were occasionally helpful, buy they have often added spurious content and created hoax articles (there's one obvious example). I'm leaving it to the community to decide what is the best way of dealing with their legacy.

Though it is not highly desirable content at this stage in Wiktionary's development, I don't see any 'obvious' problem with example you highlight, the entry for Karachi-Bela Division. Could you explain how it is a hoax? DCDuring (talk) 17:51, 1 January 2018 (UTC)
It would seem there is no "Karachi-Bela Division" of Pakistan. Bela, Pakistan and Karachi both exist, but they're in separate provinces. Divisions of Pakistan does suggest there used to be a Karachi-Bela Division, but that info could have been added by this blocked user. —Mahāgaja (formerly Angr) · talk 18:10, 1 January 2018 (UTC)
The Wikipedia article doesn't exist, so I removed the reference. Probably a candidate for RFD. DonnanZ (talk) 19:05, 1 January 2018 (UTC)
I straight up deleted it. No such thing. It was also rv'd on Wikipedia: [6]. All the Google hits seem to be WP mirrors that haven't been updated. —AryamanA (मुझसे बात करेंयोगदान) 19:28, 1 January 2018 (UTC)
If you look at the block log for some of these accounts, you'll notice we dealt with this person back in 2013 & 2014- it was obvious at the time what they were up to. My impression at the time was that this was a typical expat wannabe hardliner trying to rewrite geographical terminology to fit their Pakistani/Islamist worldview. I think you'll find that the bogus geographical entities are what would exist if certain things like the Indian occupation of territories claimed by Pakistan hadn't happened.
I agree, though that we never properly cleaned up their edits- most of the problems were in content that's rather marginal for Wiktionary, so our review is rather hit-and-miss. Chuck Entz (talk) 22:04, 1 January 2018 (UTC)

January LexiSession: Happy New Year![edit]

LexiSession is back! Ok, I didn't have time to write you a notice last month, sorry. We looked at tea.

This month, we are gonna be improving the pages describing words related to New Year celebrations, all around the globe. It could be interesting.

Well, for those who do not known LexiSession yet, it is a collaborative transwiktionary experiment. You're invited to participate however you like and to suggest next month's topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every Wiktionary on the same agenda, to give us more insight into the ways our colleagues works in the other projects.

I hope that 2018 will be a year that LexiSession increases in participants and page-creations! Face-smile.svg Noé 20:04, 1 January 2018 (UTC)

I created uvas de la suerte. --Gente como tú (talk) 14:59, 2 January 2018 (UTC)

Wiktionary:Votes/sy-2018-01/User:Nloveladyallen for admin[edit]

There's an adminship vote going on. --Per utramque cavernam (talk) 14:42, 2 January 2018 (UTC)

Wiktionary:Votes/2017-12/User:BukhariSaeed for rollbacker[edit]

I don't know what to make of this. --Per utramque cavernam (talk) 15:10, 2 January 2018 (UTC)

I deleted it; I think Aryaman is handling this user. —Μετάknowledgediscuss/deeds 23:57, 7 January 2018 (UTC)

News from French Wiktionary[edit]

Logo Wiktionnaire-Actualités.svg

Hello!

December issue of Wiktionary Actualités just came out in English!

Actualités this month include an article about Trump censuring words, a presentation of a book, an investigation about the definition of peace, some words about the Tech survey, links to cool stuff, statistics, shorts news and nice pictures!

This issue of our regular journal was written by nine people and was translated for you by Pamputt and I. This translation could be improved by readers (wiki-spirit). We still receive zero money for this publication and we are not supported by any user group or chapter, it's just a way for us to show how cool our project and community are. Feel free to send us comments or to start your own journal (we're eager to read it and we can help you to start it!) Face-smile.svg Noé 16:59, 2 January 2018 (UTC)

Very nice! —Stephen (Talk) 19:06, 2 January 2018 (UTC)

RFD of Reconstruction pages[edit]

These are currently put in Wiktionary:Requests for deletion/Others, among rfd of templates, categories and the like, but I think they belong rather in Wiktionary:Requests for deletion/Non-English. Yes, reconstruction pages aren't in the mainspace, but they're still entries, which serve to present lexical items. --Per utramque cavernam (talk) 21:49, 2 January 2018 (UTC)

@Metaknowledge --Per utramque cavernam (talk) 18:24, 7 February 2018 (UTC)
I abstain; although your suggestion is logical, the present situation has its advantages. Like other pages deleted at RFDO, there are more special cases in which they may be deleted (say, if the page lacks descendants). RFD discussions are more grounded in the CFI. —Μετάknowledgediscuss/deeds 18:41, 7 February 2018 (UTC)

Listing of compounds under Derived terms or Related terms ?[edit]

Hello!

A while ago, I looked up the definition of Derived terms in the section Derived terms at Wiktionary:Entry layout. There, I was told that Derived terms list terms that are morphological derivatives. But what exactly are morphological derivatives? I looked it up at Wikipedia (Morphological derivation).

Under the section Derivation and other types of word formation the article clearly states that from a linguistic point of view compounds are not considered to be derivations:

Derivation can be contrasted with other types of word formation such as compounding. For full details see Word formation.
Note that derivational affixes are bound morphemes – they are meaningful units, but can only normally occur when attached to another word. 
In that respect, derivation differs from compounding by which free morphemes are combined (lawsuit, Latin professor). 
It also differs from inflection in that inflection does not create new lexemes but new word forms (table → tables; open → opened).

Since my editing is mostly confined to German language entries, I subsequently figured out that this also applies to German language compounds: Derivation_(Linguistik)

Die Derivation unterscheidet sich von der Zusammensetzung (Komposition) dadurch, dass bei letzterer mindestens zwei Wörter (Grundmorpheme) eine eigenständige lexikalische Bedeutung besitzen, während bei der Derivation nur ein Wort existiert, dessen Anhängsel (Affixe) keine konkrete (jedoch eine abstrakte) lexikalische Bedeutung haben.
Beispiel eines Derivats: Frei-heit → frei ist Lexem (Adjektiv), heit besitzt abstrakte lexikalische Bedeutung, nämlich einen Seins-Zustand. Gesamtwort: Substantiv
Beispiel eines Kompositums: Haus-wand → Haus ist Lexem (Substantiv), Wand ist Lexem (Substantiv). Gesamtwort: Substantiv

The established practice at Wiktionary, however, is to include compounds under Derived terms, so this seems to me somehow contradictory. Again, W:EL clearly states that morphological derivatives should be listed under Derived terms, so there can be no doubt.

Those words that have strong etymological connections (like compounds) but aren’t derived terms should be listed under Related terms (-> Related terms).

For this reason, I changed my way of editing, starting to list compounds under Related terms, but my edits were reverted twice so far. To resolve this confusing situation, I need some kind of clarification regarding this issue. Thanks.--91.61.113.176 00:34, 3 January 2018 (UTC)

While we are at it, we could also decide whether terms that are historically (diachronically) derived from terms in other languages, but can be constructed equivalently (synchronically) from native morphemes, should be shown as Derived or Related or both. DCDuring (talk) 01:16, 3 January 2018 (UTC)
I'm in favor of considering compounding to be a form of derivation for Wiktionariographical purposes, even if it isn't as far as theoretical morphologists are concerned. For example, we consider German verbs with separable prefixes (e.g. ˈüberˌsetzen (to pass over)) to be compounds but verbs with inseparable prefixes (e.g. überˈsetzen (to translate)) to be affixed forms. It seems silly to me to consider the latter but not the former to be a derived term of setzen.
I'm also in favor of considering transparent root+affix units synchronic derived forms even when the affixation originally happened in another language: while heavily goes back to Old English, it can be (and is) coined afresh by any English-speaking child who has learned to affix -ly to adjectives to form adverbs, even if s/he has never actually heard the word heavily before. It is thus simultaneously an inheritance from Old English and a new formation in Modern English. —Mahāgaja (formerly Angr) · talk 09:13, 3 January 2018 (UTC)
I agree, and I'd even go a bit further. I suspect that words using very common affixes should, more often than not, really be seen as new coinages only: the force of analogy is so strong that all the sound changes they would normally undergo are warded off. --Per utramque cavernam (talk) 14:14, 3 January 2018 (UTC)

We are using “derived” in a more vulgar way. The section lists morphological derivations, but not only these, but as you see also compounds, and also we could list those Chinese formation mentioned in Wiktionary:Beer parlour/2017/November § Add pronunciation of chinese words in the table titled "Dialectal synonyms of", under the "Synonyms" header. which currently use a non-standard header. For this übersetzen example it could be advisable to separate those two kinds of derivations under two headers, or maybe even three to make the distinction to other kinds of compounds that do not look like containing a prefix: with prefix, with adverb, with other parts of speech. If the community had known all those problems before there would not have been a successful vote … but still you must see that Related terms is too loose a relation for compounds you add, but if you don’t take WT:EL by the words it all looks good, because no reader can complain about seeing compounds under Derived terms. Palaestrator verborum sis loquier 🗣 11:01, 3 January 2018 (UTC)

My issue is that the common practice uses Related terms to mean words that share an etymon but are not derived or directly related. Compounds do not fall into this category as they are created directly from the 2 or more parent members. Related terms should always be used to represent a more distant genetic relation. —*i̯óh₁nC[5] 11:39, 3 January 2018 (UTC)
I agree. --Per utramque cavernam (talk) 14:14, 3 January 2018 (UTC)
Agreed. Far too many editors are including derived terms (consisting of two words) as related terms, which I think is wrong. I'm not sure what the logic is. And then there's hyponyms, yet another complication and open to misinterpretation. DonnanZ (talk) 17:02, 4 January 2018 (UTC)

AWB[edit]

I am a regular user of Wiktionary and I already have AWB access on English Wikipedia and Simple English Wiktionary. I would like to help with cleaning up some of the definitions on Wiktionary and I would like to help out with correcting typos and formatting. I have already done some work on cleaning up some pages on the Check Wiktionary page [7]. Can I please be added to the AWB checkpage. Pkbwcgs (talk) 10:08, 3 January 2018 (UTC)

  • Seems reasonable request. Added. SemperBlotto (talk) 21:44, 4 January 2018 (UTC)

Disallow Template:l in glosses and definitions[edit]

Can we make a rule to disallow {{l}} in edits like diff? There's absolutely no need for it. —Rua (mew) 13:04, 6 January 2018 (UTC)

That there is ”no need” means it’s supererogation, not that it is bad. But it seems to me that the editors are generally most joyed with the anarchy. Sometimes I write square brackets, sometimes curly brackets. Both has its advantages. The syntax highlighting though should display better, for it seems to me that your dislike for the template in glosses arises mostly from it. And, I don’t say you are overly reactionary, newer people have learnt to like it too (like me, it’s easy for me after I became used somehow, though I see that for others it is easier to write four square brackets, which I sometimes do too).
Wasn’t there are vote where it should be made required to use {{l}}? It’s fail had as result: Do you what you want. Sometimes normalization is exuberant. Palaestrator verborum sis loquier 🗣 14:05, 6 January 2018 (UTC) This is the vote: Wiktionary:Votes/2016-07/Using template l to link to English entries. Palaestrator verborum sis loquier 🗣 14:28, 6 January 2018 (UTC)
Yes please. It makes it harder for editors and gives no benefit. Equinox 14:09, 6 January 2018 (UTC)
@Rua: Isn't it needed to slow down the pages? --Rerum scriptor (talk) 14:21, 6 January 2018 (UTC)
@Rerum scriptor: I don’t know what exactly you are asking, but {{l}} instead of square brackets slows down, so the square brackets are needed.
But there is a reason against the notion that the template makes the wikitext harder to read, the reason that the template adds some structure: Links to words are done by templates, other links, out of the mainspace for example, get square brackets. But I take all easy. Palaestrator verborum sis loquier 🗣 14:37, 6 January 2018 (UTC)
@Palaestrator verborum: I think he's making a joke; {{l}} can use a lot of memory if it's invoked on a page too many times. —AryamanA (मुझसे बात करेंयोगदान) 15:00, 6 January 2018 (UTC)
@AryamanA (or someone else): Have you got some numbers in the head about it? I will support the square brackets if there is a significance. Palaestrator verborum sis loquier 🗣 15:07, 6 January 2018 (UTC)
@Palaestrator verborum: I just tried it out. {{l|en|word}} uses 1.52 MB of memory, and each successive use of {{l}} uses ~0.11 MB. So it's not horrible, but it's still unnecessary memory usage. —AryamanA (मुझसे बात करेंयोगदान) 15:11, 6 January 2018 (UTC)
Ok, some middling support from me for the square brackets. I ping @Profes.I. lest he find out too late: It looks like there forms are rule that you shall not use {{l}} anymore for English glosses; but you can say what you think about it. Palaestrator verborum sis loquier 🗣 15:35, 6 January 2018 (UTC)
I don't mind as long as we make an exception in cases where the gloss is spelled the same as the word it's glossing, for example accident#French needs to be glossed with {{l|en|accident}} so there's actually a link; using double square brackets would result in a linkless, bold-face gloss. —Mahāgaja (formerly Angr) · talk 16:35, 6 January 2018 (UTC)
I write [[#English|accident]] in such cases. —Rua (mew) 16:42, 6 January 2018 (UTC)
I don't like that solution any better than {{l}}, so I would like to continue having the choice to use either. I'm fine with it being banned in English definitions and cases where plain square brackets work just fine, but I oppose an absolute ban. Andrew Sheedy (talk) 17:03, 6 January 2018 (UTC)
Why is it "harder for editors"? New editors? I find [[#English|accident]] more difficult to parse. The fact that HTML fragments are used for language links is an implementation detail (and conveniently abstracted in templates) – Jberkel 17:15, 6 January 2018 (UTC)
Wasn't it supposed to be essential for the proper working of "Tabbed Languages". I need it to work properly for references to Translingual terms in definitions to avoid "orange" links. Also I note that it seems to force a type size in a way that plain wikitext and plain links does not [but in Firefox, not Chrome]. (See Cryptomonada#Hyponyms for example. If you don't see a difference try changing your default text size on your OS or browser settings.) See Template talk:l for any responses to my complaint (made today). DCDuring (talk) 17:37, 6 January 2018 (UTC)
Also, if it is bad to use it for links to English words in definitions, why is it good in lists of English words in English L2 sections? DCDuring (talk) 17:47, 6 January 2018 (UTC)
We've been trying in the last few years to wrap plain links in {{l}} so that they work properly with TabbedLanguages. TL was modified recently so that it defaults to English. So that consideration no longer applies. However, for the sake of consistency {{l}} has been used in the same places that it would be used for other languages, and I would like to keep it this way. There is a conceptual difference between {{l|...|[[x]] [[y]]}} and {{l|...|x}} {{l|...|y}}. The former is a single term in which two individual words are linked. The latter is two separate terms, each of which is linked. —Rua (mew) 18:40, 6 January 2018 (UTC)
Another kind of consistency would result from eliminating all uses of {{l|en}} in all English L2 sections. DCDuring (talk) 18:45, 6 January 2018 (UTC)
We can't elimiate {{l|en}}. Have you considered that the template has other parameters? —Rua (mew) 18:48, 6 January 2018 (UTC)
There would be a problem with ===Alternative forms=== for a start. DonnanZ (talk) 18:51, 6 January 2018 (UTC)
I didn't mean to suggest that it should be forbidden in English, only that it be discouraged in English L2 sections where other parameters are not actually needed. Most legitimate uses of alternate display function can be readily accomplished with plain links with pipes. DCDuring (talk) 19:01, 6 January 2018 (UTC)
I strongly suspect that a very small proportion of uses of {{l|en}} in English sections use other parameters, numbered or named. DCDuring (talk) 19:08, 6 January 2018 (UTC)
Well regardless, this discussion is only about getting rid of uses of {{l|en}} in places where non-English does not appear. —Rua (mew) 19:11, 6 January 2018 (UTC)
That would include uses under Alternative forms, Related terms, Derived terms, Synonyms and other semantic relations. In Etymology and Usage notes there is not much point in having {{l}}, almost all instances being better handled with {{m}}. I doubt that See also is much different from Related terms etc in that regard, non-English terms not really being appropriate there as a general rule. DCDuring (talk) 20:23, 6 January 2018 (UTC)
{{m}} gives italics, {{l}} doesn't. DonnanZ (talk) 20:54, 6 January 2018 (UTC)
That's always (almost always?) what we want in Etymologies and Usage notes. DCDuring (talk) 21:30, 6 January 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Exactly. I also use {{m}} for species. e.g. Punica granatum. Some editors are using {{l}} with {{der3}} etc. when it is not necessary if the language, e.g. lang=en, is specified. Links from the {{der3}} will work just as well without {{l}} except when there's two links on one line, or where a note is added. Then you have to use [[]] or {{l}}. DonnanZ (talk) 22:02, 6 January 2018 (UTC)

Actually, that may be due to laziness when converting from the old template to the new one. DonnanZ (talk) 22:30, 6 January 2018 (UTC)

The taxonomic authorities apparently want folks to use a type style for taxonomic names that contrasts with italics whenever the taxonomic name appears in italicized text. I couldn't figure out any good way to implement that here. (How would that work with {{sense}} or {{a}} when the taxonomic name was the only items enclosed. What about a mention of a taxonomic name? Should it contrast with the surrounding text or with the way a normal word would appear?) Our existing practice seems good enough. DCDuring (talk) 23:10, 6 January 2018 (UTC)
Regarding "X does italics and Y doesn't": let's learn from CSS (cascading stylesheets in Web design), where the aim is to separate how it looks from what it means. If we can't do something because it would have the wrong visual style, that suggests we might need a new style/markup based on the semantics. (Frankly I still miss the old days of "{{cooking}} A [[pot]] used to [[cook]] food.", but while we rely on hacky markup I can see why we need it. And I do like to be able to edit markup manually.) Equinox 00:52, 7 January 2018 (UTC)
@Equinox: The usual typographic custom is to use roman whenever italicized text calls for something to be italicized. For example, I would be really scared if I saw a Tyrannosaurus rex outside my window right now. —Mahāgaja (formerly Angr) · talk 15:58, 7 January 2018 (UTC)
Oppose, at least until we can agree upon a simpler way to make links consistently work correctly. — Ungoliant (falai) 16:47, 7 January 2018 (UTC)

Documentation template for modules[edit]

Hey there, I'm an admin from Turkish Wiktionary and have been meaning to get documentation pages of modules right. As you can see on this page, there aren't any edit or see links. It makes us difficult to work with modules. But couldn't figure our where to add a decent template for this. Anyone can help me? HastaLaVi2 (talk) 00:46, 7 January 2018 (UTC)

You need to create MediaWiki:Scribunto-doc-page-show and MediaWiki:Scribunto-doc-page-does-not-exist. --Vriullop (talk) 10:15, 7 January 2018 (UTC)
Thanks a lot! HastaLaVi2 (talk) 19:21, 7 January 2018 (UTC)

How much is the sc= parameter still needed?[edit]

Lots of our templates have a sc= parameter, but because we have script detection, I'm not sure we really need it. Are there any cases in which it's still used? Perhaps we can look at solving those cases. —Rua (mew) 21:14, 7 January 2018 (UTC)

@Rua: For what it's worth, I just surveyed 400 random English lemmas and found about 25 instances. I removed two and didn't see any difference in how the pages rendered. —Justin (koavf)TCM 22:43, 7 January 2018 (UTC)
Not even in the HTML? That's the part that matters. —Rua (mew) 22:47, 7 January 2018 (UTC)
@Rua: This edit "changed" line 365 from:
<li>Greek: <span class="Grek" lang="el"><a href="/wiki/%CF%97#Greek" title="ϗ">ϗ</a></span> <span class="mention-gloss-paren annotation-paren">(</span><span lang="el-Latn" class="tr Latn">ϗ</span><span class="mention-gloss-paren annotation-paren">)</span></li>
to
<li>Greek: <span class="Grek" lang="el"><a href="/wiki/%CF%97#Greek" title="ϗ">ϗ</a></span> <span class="mention-gloss-paren annotation-paren">(</span><span lang="el-Latn" class="tr Latn">ϗ</span><span class="mention-gloss-paren annotation-paren">)</span></li>
i.e. they are identical. —Justin (koavf)TCM 23:41, 7 January 2018 (UTC)
Templates that use an "sc" parameter and number of occurrences: User:DTLHS/sc, translation templates only: User:DTLHS/cleanup/translation sc. DTLHS (talk) 00:06, 8 January 2018 (UTC)
The |sc= parameter isn't needed if findBestScript from Module:scripts would give the same result. That is, if the template has text to work with, and a language code whose associated data file contains the script that the text is actually written in. I suspect that in the vast majority of cases the parameter is not needed. Rarely, it's actually doing damage, when Ancient Greek text is labeled as Grek (monotonic Greek) when it should be polytonic (polytonic Greek).
To actually determine if the parameter isn't needed, we need data on how often each script is actually used by each language on Wiktionary. If any script that is used is not in the language's data table, the |sc= parameter is needed, or the script needs to be added to the language's data table so that findBestScript will be able to automatically determine it. (This data would also be useful for determining which script should be first in the list for those languages that use multiple scripts.)
I suppose this could be done by bot, but it might be complicated. There are some pretty efficient Lua functions that could be translated into Python to do the actual script detection, though. — Eru·tuon 00:59, 8 January 2018 (UTC)
It's about time you got a bot account isn't it? DTLHS (talk) 01:19, 8 January 2018 (UTC)
@DTLHS: I like the idea, but I've found it difficult to come up with a way to start using the Python interface. — Eru·tuon 03:47, 8 January 2018 (UTC)
I'd rather do it with tracking templates. Module:links, Module:headword and others can be modified so that if sc is provided, check if it's identical to what you get from findBestScript. —Rua (mew) 11:12, 8 January 2018 (UTC)
I've done it now for Module:links. The following tracking templates are used:
Rua (mew) 11:23, 8 January 2018 (UTC)
I did the same for Module:headword. The tracking templates are the same, just with "headword" instead of "links". —Rua (mew) 13:02, 8 January 2018 (UTC)
Sometimes, mixed Japanese-English text needs to use |sc=Jpan. (could ja be made to never use Latn or something?) —suzukaze (tc) 03:49, 8 January 2018 (UTC)
In that case, I'd suggest wrapping the English and Japanese parts each in their own template. That way the language tags will be correct too. —Rua (mew) 11:13, 8 January 2018 (UTC)
I think that using the same font for Jpan and Latn text is nicer most of the time. —suzukaze (tc) 18:28, 8 January 2018 (UTC)
@Suzukaze-c: Latn is intended for roumaji. If this is what you mean, I agree that if text contains some Latin mixed in with kanji or kana, it should be tagged as Jpan; it would look weird if sequences of Latin characters in Japanese text were script-tagged as Latn. Maybe the Lua logic should be to assign Jpan if there are any Hani, Hira, or Kana characters at all in Japanese text, and if not to decide between Latn and Brai by counting characters. findBestScript in Module:scripts isn't quite that sophisticated, though. — Eru·tuon 21:44, 8 January 2018 (UTC)
Yes, what I meant is that if text contains any Jpan character, it should be marked as Jpan. (I forgot about romaji.) —suzukaze (tc) 00:12, 9 January 2018 (UTC)
@Erutuon But it's quite feasible to turn the scripts into an ordered priority list of some sort. Given that our script tagging is generally intended to make text more legible, it makes sense that Latn should only be used if none of the fancier scripts are found, in any language. —Rua (mew) 00:25, 9 January 2018 (UTC)
@Rua: I don't know how simple it will be to formulate a rule. Even in Japanese it may be more complex: probably Latin-script terms that are not roumaji transliteration (for instance, AT) should be tagged as Jpan. It's probably best to start small. — Eru·tuon 22:02, 12 January 2018 (UTC)
@suzukaze: What about sentences like ジス (this)?
If we code so that any sentence containing any JA text is marked in its entirety as JA, we might not get what we want.  :)
Also, it's worth noting that Japanese authors are occasionally prone to including English strings right in the middle of Japanese sentences. It's hard to search for, but this shows some examples where English appears in otherwise Japanese texts, and where the English is clearly English and not Japanese spelled in the Latin alphabet:
‑‑ Eiríkr Útlendi │Tala við mig 22:44, 12 January 2018 (UTC)
@Eirikr: Any ja sentence, not just any-language sentence :) "English strings in Japanese sentences" is exactly why I said what I said. Japanese fonts are often designed with consideration for English, but the reverse is alsmost certainly not true. —suzukaze (tc) 20:47, 13 January 2018 (UTC)

Western Yugur orthography standardization[edit]

(Pinging @Anylai as the only other consistent Turkic editor, but I'd like wider input too)

Western Yugur is a Turkic language spoken in China. It has no writing traditions (as far as I know) and due to the small number of its speaker community it is unlikely to get an officially recognized orthography. It is only attested in (pseudo-)phonetic transcription which differs from author to author.

In order to unify these sources and express them in a form appropriate for Wiktionary, I'm proposing a transcription system. The table compares symbols used in sources dealing with Western Yugur, Proto-Turkic (as used here, adapted from Starling) and Eastern Yugur (as used here, adapted from Nugteren in Mongolic Languages 2003) with my proposition. There are comments in wikicode (they should be notes but I forgot how to format those). A few example words to compare given orthographies and a sample text written in this orthography.

There are some inconsistencies here that I couldn't straighten out:

  1. T/D difference sometimes implies pre- and sometimes post-aspiration, and sometimes h is used instead.
    1. This is because post-aspiration is very common at the onset of the word and very rare in medial position, while pre-aspiration is quite common medially.
    2. Also there is no intuitive way to represent s with preaspiration.
    3. Pre-aspiration may be found before a -RT- cluster, but it is always a function of the occlusive which can then be used to signify it.
  2. It uses both a digraph and diacritics.
    1. I could perhaps use ġ for unaspirated uvular plosive, but gh feels more intuitive and in synch with Eastern Yugur.
  3. Slavic and Turkish symbols might clash too much.
    1. I needed to use Turkish symbols to represents Turkish sounds, and I needed kreska and haček to represent two series of sibilants, (Pinyin was out of question).

I'd love to hear what everyone thinks of this? Is creating new orthographies beyond everyone's comfort zone? Do you hate how it looks? Would you prefer more consistency? Any suggestions (even cosmetic)? Crom daba (talk) 18:20, 8 January 2018 (UTC)

Thank you for the research into the various orthographies. Amongst the sources, Lei Xuanchun's dictionary is part of the dictionary series for ethnic minorities in China produced by the Chinese Academy of Social Sciences, which can pretty much be regarded as the standard (unless there is evidence to the contrary). In general, I think we should limit creating orthographies to cases where absolutely no attempt at writing the language exists. Wyang (talk) 13:07, 9 January 2018 (UTC)
Lei's transcription scheme is pretty good, but it's basically IPA, and I think it would be better to have something more abstract and intuitively clear to Turkologists. I don't know if Lei's orthography is used outside of his dictionary or if it's purely ad hoc, I haven't come across it in western literature, are there any other Chinese sources using it? Crom daba (talk) 15:22, 9 January 2018 (UTC)
@Crom daba Sorry for the delay in reply. There are some papers citing the book and using his orthography:
  • 莊子儀(2011),回鶻文《金光明經》所反映的音韻現象,國立臺灣師範大學。
  • Yong-Sŏng Li (2014), “Some Star Names in Modern Turkic Languages-I” (and -II) (Çağdaş Türk Dillerinde Bazı Yıldız Adları-I, -II), Türk Dili Araştırmaları Yıllığı - Belleten, 62 (1): 121–156.
  • 徐丹(2015),从借词看西北地区的语⾔接触,《民族语文》,第2期。
  • 赤坂恒明(2016),<翻訳> 馬鈴?「哈薩克入甘続記」第一章第一・二節,埼玉学園大学紀要. 人間学部篇。
  • Li Yong-Sŏng (2016), “Finger Names in Modern Turkic Languages”, Central Asiatic Journal, 59 (1–2): 1–42.
Wyang (talk) 15:50, 12 January 2018 (UTC)
The digraph gh is a bit of an irregularity, when compared to its voiceless counterpart q. (Ts and dz are somewhat different, as they indicate affricates.) Does the sequence g + h also occur? If so, some method to distinguish the two would be needed. But consonant clusters don't look very common in the example given on the page. — Eru·tuon 21:47, 9 January 2018 (UTC)
The way I imagined it, h would only be used initially (where it is a phoneme), before sibilants to indicate pre-aspiration, and after medial stops to signify post-aspiration with the stop written as fortis. This makes it impossible to express the distinction between pre-aspirated and non-preaspirated, but I doubt that this difference is phonemic. In Lei I have found following cases of preaspiration:
  1. [pəhltər], [buhrqan], [ɢahsqa] - but he also has [pəhldər], [buhrɢan], [ɢahsɢa] showing free variation.
  2. Words ending in -hT, this is simply because Lei treats every final stop as aspirated, but (post-)aspiration isn't distinctive here, and I couldn't find any word written with -(h)D.
  3. Words with -hTD- or -hTT-, here he uses a fortis stop because all stop clusters are treated as if containing an intervening aspiration, I couldn't find any words with -(h)DD- clusters.
  4. Words with -hT- clusters that are actually compounds of words ending with -hT, aka second case.
  5. Remaining cases are written -hD-, leading me to believe that pre-aspiration is not contrastive before post-aspirated stops.
So basically, there shouldn't be any cases where a gh might be used for anything other than the uvular. Crom daba (talk) 01:51, 10 January 2018 (UTC)
Thank you for your effort. I wish I had a general idea about phonology and Western Yugur itself so that I could comment. I liked your orthography, it is about deciding on which letter to use. But how will we know which words exist in this language if not noted in literature? We will need similar works or dictionaries, one of which maybe in future could use totally different methodology? I think complex stuff (if theres any) should be simplified further to make use of future works (not to be left in doubt and having to create new writing rules). Very good job Crom daba. --Anylai (talk) 18:06, 25 January 2018 (UTC)
Thanks for input and kind words @Wyang, Anylai, Erutuon, I will go ahead and implement Li's orthography (only with digraphs instead of ligatures for affricates, and added /ts/ and /dz/) and add the correspondance tables to the About:Western Yugur page. Crom daba (talk) 20:24, 8 February 2018 (UTC)

Removing Scots from Wiktionary:Criteria for inclusion/Well documented languages[edit]

I've been playing with Scots a little, but things are very hard to cite. If someone RFVd jeelie bean, I couldn't back it up. Which is not say there's any other word in Scots for "jelly bean"; it's that Susan Rennie is way more dominant as an author/translator of children's books than anyone could be in most well documented languages.

I know we don't quote from Wikipedia, but I think that's a decent source of hard numbers on how well documented a language is. https://stats.wikimedia.org/EN/Sitemap.htm shows that there's fewer active editors than many other well-documented languages, and while the number of articles put it above Icelandic, slightly, a quick comparison of the two Wikipedias shows that Scots is full of stubs and Icelandic has long articles; I found various examples, but the current random article was Watter cycle versus Hringrás vatns.

Maybe I'm biasing it by comparing it to western languages. There's two Punjabi Wikipedias, and Western Punjabi is in about the same shape as Scots, where as (Eastern) Punjabi has fewer articles and more active editors. The Xhosa and Zulu Wikipedias are no where in the shape of the Scots Wikipedia; they're not my field, but I don't see why they're considered well documented languages.

I'm sure Wikipedia stats are going to annoy some people; I didn't come to this conclusion based on those numbers. I'm interested in Scots and Estonian, and as an American, books in Scots should be easier for me to access than Estonian books; I can order direct from Amazon.co.uk, if nothing else. I've found Ben-Ben-A-Go, Sweetieraptors: A Book O Scots Dinosaurs and Everson's various translations of Alice in Wonderland for modern Scots, but I can find a huge selection of modern Estonian works, so much so that I see no point in trying to enumerate them. w:List_of_newspapers_in_Estonia is an amazing list of regularly published works in Estonian; Scots Leid Associe says "The Associe furthsets the bi-annual journal Lallans, a 124-page magazine o the best nui screivin in Scots, thare is nae ither journal 100% in Scots". (That is, their biannual journal is the only periodical 100% in Scots.)--Prosfilaes (talk) 06:10, 10 January 2018 (UTC)

I think many of your arguments are not the most relevant, but I see your overall point and admit that you may very well be right that we should remove it. I suspect the original intent was to avoid sneaking in extremely rare English dialect words used in Scotland as Scots, considering how the two languages are so undistinct that at RFV, we often struggle to determine which language a text is written in. —Μετάknowledgediscuss/deeds 06:26, 10 January 2018 (UTC)
It's hard to compare against a lot of languages without some hard numbers. I didn't want to just refer to Estonian versus Scots, since I have no reason to think that Estonian should be the least of the WDLs, or that other people think so.--Prosfilaes (talk) 12:47, 10 January 2018 (UTC)
While the issue Metaknowledge highlights is a serious and recurring one, in some ways it's orthogonal, in that we're going to continue having to figure out whether things are Scots or Scottish English either way. It does seem like Scots is not that much better attested than Irish too (which was also removed a while ago). The tendency to view Scots as a form of English (which the OED still does?) may also be influencing those who want it to be subjected to the same standards; OTOH, it does seem like Scots authors are liable to unilaterally create neologisms by just Scots-ifying English words; but on the third hand, meh. I don't object to removing it. - -sche (discuss) 16:48, 13 January 2018 (UTC)
It could be argued that "Scotsified" words are merely phonetic spellings in the Scots dialect, which normally aren't too difficult to separate from true Scots words. DonnanZ (talk) 12:48, 14 January 2018 (UTC)

Reciprocal label[edit]

Why does {{lb|en|transitive}} add the word in Category:English transitive verbs when {{lb|en|reciprocal}} doesn't add the word in Category:English reciprocal verbs?Jonteemil (talk) 16:46, 11 January 2018 (UTC)

Because Module:labels/data has pos_categories = { "transitive verbs" }, under labels["transitive"] = {, but doesn't have pos_categories = { "reciprocal verbs" }, under labels["reciprocal"] = {. We could change that, though, if it seems like a good idea. —Mahāgaja (formerly Angr) · talk 17:10, 11 January 2018 (UTC)
There are possibly so few uses of such a label because one tends to split relevant words into multiple senses, compare fuck for an example where there is no label to put, apart from the unknownness of the term and the unlikeliness of the phenomenon in some languages, and maybe because it is at times hard to hard to decide if a verb is reciprocal or just ambitransitive. However if one does use such a label I see no reason why it should not categorize. Palaestrator verborum sis loquier 🗣 17:36, 11 January 2018 (UTC)
I suggest a change since all verbs are transitive, intransitive or reciprocal (I think).Jonteemil (talk) 18:41, 11 January 2018 (UTC)
Are other parts of speech ever tagged with {{lb|foo|reciprocal}}? If so, we would have to weigh whether it is better to use a new label "reciprocal verb" to categorize such verbs, and risk that some verbs will not get categorized because people don't know better and just use "reciprocal", or else force other parts of speech to use other labels and risk that some will be miscategorized as verbs if people use bare "reciprocal" on them. - -sche (discuss) 19:53, 11 January 2018 (UTC)
@-sche: A search for insource:/lb\|[^\}]+\|reciprocal[}|]/ in mainspace yields 21 results, some of which are in Pronoun sections: се, си, միմյանց, фкя-фкянь. — Eru·tuon 20:31, 11 January 2018 (UTC)
If it's the verb itself rather than the sense that's reciprocal, then there shouldn't be a reciprocal label there. Sense labels are for sense-specific things. —Rua (mew) 20:40, 11 January 2018 (UTC)
Thanks for doing the search. It is as I suspected (used of more than one POS). Since the label is so rare, my preference would be to introduce a new label "reciprocal verb" for verbs that need it, but bear in mind Rua's point. - -sche (discuss) 20:42, 11 January 2018 (UTC)
Yep, that stuff at the non-glosses of pronouns is misuse, those labels as in си should be just removed, they just double what should be in the non-glosses (as non-glosses can contain grammatical information, these examples are exactly what they are for). The “clitic” word should be moved into the description, it seems to me. Palaestrator verborum sis loquier 🗣 21:15, 11 January 2018 (UTC)
@rua: Well, if what I think is correct there are reciprocal verbs and reciprocal pronouns. A reciprocal verb can express a reciprocal tense without the use of a reciprocal pronoun. So there are reciprocal verbs, tenses and pronouns. English has two reciprocal pronouns who happen to be synonymous - each other and one another. My mother tongue Swedish has two - varandra (each/one another) and sinsemellan (with each/one another). All reciprocal tenses can be expressed with ”I /verb/ (with) you and you /verb/ (with) me/. For example: We met each other=I met you and you met me. Here ”met” isn’t used reciprocally since the reciprocality is expressed with the pronoun ”each other”. In Swedish this is: ”Vi träffades här”. Hear ”träffades” is used reciprocally since there is no reciprocal pronoun.Jonteemil (talk) 17:20, 12 January 2018 (UTC)

Proposal: Remove pre-1919 Chinese from well documented languages[edit]

User:Dokurrat created Template:zh-historical-ghost, indicating senses that only found in one or more historical dictionaries. However, per Wiktionary:Criteria for inclusion, these mention-only terms is not consider attested.

Classical Chinese is essentially dead language. Although there're plenty of texts in Classical Chinese (just like Latin), many texts in antiquity are irreversibly lost and many terms (including characters) can only be found in dictionaries. So I propose to exclude Chinese from well documented languages until 1919, when Classical Chinese is no longer widely in use.--Zcreator (talk) 13:19, 13 January 2018 (UTC)

I always opined that if a quote is old enough then that single quote is enough, by analogy, as English and German etc. are also separated into three stages of which only the latest are considered well-attested, so for example an Arabic quote from the eleventh century is always enough.
For Chinese one has special arguments again as the Chinese have a history of burning their own literature. The question is what you promise yourself from including characters that are only found in dictionaries. For English we have Appendix:English dictionary-only terms and the template {{no entry}} used. But the words you are concerned about are maybe, and likely, not ghost words but believed to have been used, just that the only thing left from the usage is a dictionary entry – a situation that the modern English language and the modern German language do not have but their old predecessors do: Many Old High German words are only attested in no better source than one or two word-lists; still mainspace entries for such words are accepted, it seems to me.
So I’d say if the dictionary is old enough, it seems a good solution for me to include the word in the mainspace and use {{zh-historical-ghost}}. The old dictionaries haven’t habitually invented characters, have they? Palaestrator verborum sis loquier 🗣 15:04, 13 January 2018 (UTC)

I think {{zh-historical-ghost}} should be turned into a language-agnostic template ({{historical-ghost}}), and use a language parameter. --Per utramque cavernam (talk) 15:10, 13 January 2018 (UTC)

Also a good point. Some people might ask the edgy question from which point in time such usage be appropriate, but answering that question would be comparing apples and oranges, for it depends on how history has unfolded itself for each language. Rigor that is appropriate with English attestations can well be brutish with another language that is superficially prominent, and the votes about such criteria were of course biased by the privileged position of English and loosed from the reality of other language. Palaestrator verborum sis loquier 🗣 15:22, 13 January 2018 (UTC)
Chinese is split into stages too: Category:Old Chinese language (och, en:w:Old Chinese including en:w:Classical Chinese) and Category:Middle Chinese language (ltc, en:w:Middle Chinese). Is it requested to split something like (New) Chinese in some way? If that's intended, how about splitting (New) English into Early New English (e.g. Shakespeare, KJB) and younger New English (e.g. Harry Potter), and (New High) German into Early New High German (until 1650, e.g. Luther) and younger New High German (after 1650, e.g. philosophy (Kant, Nietsche)) too? -84.161.22.125 15:41, 13 January 2018 (UTC)
No, it’s not intended. Languages are split if differences in grammar and core vocabulary create a barrier. And it seems like English and German have two such splits while Spanish has only one and Arabic and Chinese have none since their early days. If you look through the “Old Chinese lemmas” you see that they are entries under the header “Chinese” with Old Chinese pronunciations in the pronunciation section.
The question is where to put such terms that are presumably left only in dictionaries but nonetheless believed to have existed. Palaestrator verborum sis loquier 🗣 16:01, 13 January 2018 (UTC)
In my opinion terms in Appendix:English dictionary-only terms (and other dictionary-only terms), except coined protologisms and ghost words like esquivalience and zzxjoanw, should be moved to main namespace, with a notice template indicating that this is only a dictionary-only term.--Zcreator (talk) 17:07, 13 January 2018 (UTC)
@Zcreator We have {{no entry}} (ablocate for example). DTLHS (talk) 17:37, 13 January 2018 (UTC)
This is my proposed layout.--Zcreator (talk) 17:44, 13 January 2018 (UTC)
@DTLHS Yep, it looks like he knew this – I have said this supra, and what he wants is some in-between where the definitions are still in the mainspace but with proper warning around. Like: “the meanings given for this term are …” Palaestrator verborum sis loquier 🗣 17:46, 13 January 2018 (UTC)
I'm sceptical. I can think of a number of times Chinese terms have been RFVed and an editor has cited nothing but a dictionary or two (mentions) — sometimes the senses RFVed are quite elaborate or hard-to-parse, too, like Talk:坉 — and the terms have had to be failed for lack of evidence of actual use (such as would, among other things, clear up the meaning). There is more argument to be made, IMO, for allowing Middle-Chinese-and-older terms (analogous to allowing Middle English, etc), but for terms mentioned only in a dictionary from e.g. 1914, I see no compelling reason not to use the same approach as in other languages, with appendices for dictionary-only terms. - -sche (discuss) 17:08, 13 January 2018 (UTC)
IMO dictionary-only terms may have its entry, but with a template indicating such.--Zcreator (talk) 17:30, 13 January 2018 (UTC)
I see, Chinese at wiktionary isn't really split into stages. Are there at least labels like {{lb|zh|Old Chinese}} similar to {{lb|la|Medieval Latin}} (at wiktionary Medieval Latin is part of Latin)? -84.161.22.125 18:22, 13 January 2018 (UTC)
Oh well, 1919 is too late, that is visible; 1914 words aren’t that interesting either, but having entries for older badly attested terms and having stated the uncertainty (or non-existence) is wherefore people visit Wiktionary and appreciate it. I don’t know what a good date for Chinese is, but arguably it is one that is determined by the intrusion of Westerners and their economic possibilities for publishing texts – the same with Arabic. For senses, one can use {{uncertain}} – people have to see this if with the available material the semantics cannot be reduced to a denominator, which can happen as well with many cites. Consider plant names where many descriptions are needed for knowledge of the meaning; and also consider units of measures where in fact a mention can be more valuable than a use; and of course there are always problems with ideological and religious concepts – it is still unknown what فرقان means that appears seven times in the Qurʾān, and such terms continue to be created by obscurantists. We just need to evaluate if the term has existed widely, considering if people still search it, balancing the scientific and the market-oriented approaches. Palaestrator verborum sis loquier 🗣 17:33, 13 January 2018 (UTC)

Another proposal: Accept web.archive.org and WebCite etc. as a source[edit]

Previous discussion at Wiktionary:Votes/pl-2012-08/Citations from WebCite.

Currently only accept Usenet but not web.archive.org and WebCite have some problems:

  1. Not all languages are well presented at Usenet and Usenet is somewhat English-biased. This will cause a natural English-bias in Wiktionary.
  2. Use of Usenet is declining. It may be more and more difficult to find attestions of neologisms from Usenet.
  3. The decentralization of Usenet is limited. They may be accessed through Google Groups, but if you thinks web.archive.org and WebCite will close one day, it's not impossible that Google will also (Google was founded after Internet Archive and WebCite); It's also not impossible that Google may take down some content because of Digital Millennium Copyright Act. If you think WebCite had major outages, Google also had ([8]).

So, it may be a good idea to accept web.archive.org and WebCite etc. as a source, at least for webpages that there's evidence that it is an original work. For safety's sake, it may be required that a webpage should be archived at at least two different archive websites. However, quality control for cites is a problem; we should discuss it in detail.--Zcreator (talk) 17:29, 13 January 2018 (UTC)

For a beginning, the quotation templates should support additional links of archived versions. Else when I use |archiveurl= in {{quote-book}} or whatever it says “archived from” though the original URL is still accessible. Though I just habitually ensure archive.org and archive.is archive versions and want to link three versions for attestation. For many words – say gamer words, Russian words used in Germany only, dialectalisms in Arabic … – to cite some forum posts plus archived versions is the best thing one can do. @Sgconlaw
Yes, archive.is can be used too though it shows ads – I believe in capitalism. Palaestrator verborum sis loquier 🗣 17:58, 13 January 2018 (UTC)
To clearify: My proposal is to accept all perennial web archiving services, but web.archive.org and WebCite are preferred as they are long-established.--Zcreator (talk) 18:08, 13 January 2018 (UTC)

Does WT:Translation requests need more rules?[edit]

First of all, I see that a lot of writers are keeping the [brackets] in when they submit their requests. I don’t think that that’s a major issue, but in any case it seems to be incorrect and either needs to be removed entirely or given a clarification.

Now more importantly, for months we’ve been receiving a lot of garbage requests, lines that when translated turn out to be bizarre nonsense, like ‘colourless green ideas sleep furiously’. I myself have made jocular or vanity requests on occasion, but these particular ones, aside from being excessive, seem completely pointless to make. They’re useless for communication, and I suspect that the lines were never written by sentiment beings. Messages from amateur speakers would be one thing, but I think that these are nonsensical on purpose. As such, I propose that editors be allowed to erase them.

Nonetheless, I could see arguments against this, namely: ‘nonsense’ might be too subjective and up to interpretation, and nonsensical requests still aren’t exactly ‘harmful’, I guess. Keeping the mindless requests would annoy me, but I could deal with it in the long run. — (((Romanophile))) (contributions) 22:43, 13 January 2018 (UTC)

I’m for closing it. By its very nature it can only contain nonsense because nobody would post something personally valuable on such a high-visibility site for others to find that he has begged from others to translate it. There are also other communities more suitable for such, subreddits, Telegram groups, Discord groups, Tumblr, what not.
People might interject that sometimes it is amusing to translate, I have done it once – and only once – for this reason too, but there is no hardship with finding comparable delectations. Palaestrator verborum sis loquier 🗣 22:56, 13 January 2018 (UTC)
I see nonsense requests as abuse of a free resource. I suspect that the person(s) posting them are fully aware of the nature of their requests. I have removed them on sight.
I think that closing TRREQ is too drastic. —suzukaze (tc) 23:02, 13 January 2018 (UTC)
It can make sense to translate 'colourless green ideas sleep furiously'. For example, when translating the English wikipedia article into German, it could begin with "colourless green ideas sleep furiously (englisch für farblose grüne Ideen schlafen wütend) ist ein [englischer] Satz [...]". In case of other random words, it could be that the requester wants to have several words translated independently from each other. In case of strange English source sentences it's possible that it was translated from the user's native language to English, though not perfectly. Though of course it would have been better if both sources were provided, the non-English and the English translation. -84.161.10.167 06:43, 15 January 2018 (UTC)

Hittite lemmas[edit]

Related previous discussion: Beer parlour / 2016 / March § Hittite lemmas.

Hello,

Currently there are 115 Hittite entries in wiktionary. Most of them are written in cuneiform except for the few ones I've created. I think that expanding the Hittite dictionary would be way easier if we wrote the lemmas in some romanization. There is absolutely no reason to keep the lemmas in cuneiform, it only makes them harder to find. All books and dictionaries transliterate or transcribe words. No reader is going to look up a word in cuneiform, they're most probably going to type the broad transcription. And if they want to see the word written in cuneiform, there's no problem, since it's shown in the declension tables (see attaš). Say if a student that knows no Hittite want's to find a word, he can either do two things, look up a cognate and hope that the word he's looking for is linked there, or go checking the entries one by one on the categories. We don't write Egyptian lemmas in hieroglyphs, then why should we write in Hittite in cuneiform. Plus, the characters aren't visible in chrome, or at least not to me, so even if the reader knew Hittite, he might not even see the signs.

Hittite has two romanization systems. The first is called the one to one transliteration (e.g. at-ta-aš < 𒀜𒋫𒀸), here each sign is written with its corresponding transliteration. Whenever a dictionary gives an inflection, it often gives it in this method of transcription, specially if the word is irregular. The second one is called the broad transcription, and because it is the most legible it's the one I propose to use as lemmas. Dictionaries list words according to this one. They often list them under stems, so if you anted to find at-ta-aš you would need to look for atta-. Generally to transcribe words, the hyphens are removed and adjacent repetitions of identical vowels are simplified (e.g., a-ša-an-zi > ašanzi, na-at > nat, but ši-uš > šiuš). Adjacent identical consonants are not simplified but remain geminate (ap-pa-an-zi > appanzi). Redundant vowels are expressed with a macron (e.g. e-eš-ḫar > ēšḫar), and silent vowels are written between brackets (e.g. at-ta-az, at-ta-za > attaz(a)). Using the broad transcription would be way more practical, for both the readers an the editors. --Tom 144 (talk) 00:41, 14 January 2018 (UTC)

It is not true that else the entries cannot be found. One writes the transcription and insource:/==Hittite==/ into the search field.
There is no harm in creating soft redirects like for Japanese and Gothic, but do you really want to duplicate content? It can easily become out of sync, having invited incompetent people to create Hittite entries in romanization in masses without the cuneiform being found or to expand Hittite entries without expanding the cuneiform entries. I warn you that it is really annoying when people edit Serbo-Croatian entries in Latin spelling only and do not touch the corresponding Cyrillic entries. Palaestrator verborum sis loquier 🗣 10:20, 14 January 2018 (UTC)
Obviously, content shouldn't be duplicated; either the romanizations or the cuneiform should soft- (or hard-?) redirect to the other.
The problem of lemmatizing (and romanizing) Hittite has been discussed before, and is a bit tricky, I'll ping users who participated in that discussion: @ObsequiousNewt, JohnC5, Rua, DerekWinters. - -sche (discuss) 15:08, 14 January 2018 (UTC)
Thank you, @-sche:. After reading that discussion I would support listing words under stems, as Kloekhorst, the CHD, and Hoffner & Melchert do. I would oppose to standardizing cuneiform, since then we'd be making a false claim. Concerning attestations, unattested words should be marked with an asterisk as reconstructions generally are (e.g. the ablative in 𒉺𒀪𒄯, which is partially attested). There are two issues of this method, ambiguous characters, this are divided in to two types: ambiguous voicing, and ambiguous vowels. Ambiguous voicing is easy to solve, we can simply use the voiceless sign, just like Kloekhorst. Hittite used voiced and voiceless signs interchangeably and showed no voice assimilation, so it's unlikely voice was a distinctive feature (as Kloekhorst argues). Hoffner & Melchert say the following about the issue:
"Some cuneiform signs have more than one phonetic value, that is, they are polyphonous. Some CV type signs whose initial consonant is a stop can have either a voiced or voiceless interpretation: BU can be bu or pu. Signs of the types VC and CVC do not indicate whether the final stop is voiced or voiceless (b or p, d or t, g or k). For example, the sign AB can be read ab or ap, ID as id or it, UG as ug or uk. Moreover, when writing Hittite, the scribes do not even use contrastively those CV signs with initial stop that distinguish voicing in the Akkadian syllabary: a-ta-an-zi and a-da-an-zi ‘they eat’, ta-ga-a-an and da-ga-a-an ‘on the ground’, ad-da-as and at-ta-aš ‘father’ (§§1.84–1.86, pp. 35–36). Nevertheless, when transcribing syllabically-written Hittite words, Hittitologists normally transliterate the obstruent according to the value of the cuneiform sign most favored by the tradition of Hittitologists. Usually the favored trans- literation is that which uses the number one value (pa, not bá; du, not tù; ga, not kà). Exceptions to this pattern are the preferred transliterations utilizing the voiceless stops such as pí or pé (instead of bi), tén (instead of din or den), pár (instead of bar), pád/t or píd/t (instead of be), tág/k (instead of dag/k). CV signs possessing a number-one value of both voiced and voiceless nature, e.g., BU = bu or pu, are normally rendered with the voiceless stop."
Concerning the ambiguous vowels we have the sign 𒀪 that in bot Akkadian and Hittite accounts for aḫ, eḫ, iḫ and uḫ. There seems to be preference for aḫ. There are also various characters that cannot distinguish the i from the e, here the preference is i. In those cases, I would simply follow what the source has to say, and if authors happened to contradict each other, just list the alternative form in the page. After all, they will have already transcribed the word for us.
The second problem has to do with logograms (e.g. DUMU.MUNUS, "girl"). I'd say that whenever we can reconstruct the stem, we should do it (as in 𒆜𒀸) and use the one-to-one transliteration if not. --Tom 144 (𒄩𒇻𒅗𒀸) 16:24, 14 January 2018 (UTC)
I would not be opposed to having entries for both at-ta-aš and attaš whose only content is "Romanization of 𒀜𒋫𒀸" and for KASKAL-aš whose only content is "Romanization of 𒆜𒀸" (no Etymology section, no Pronunciation section, no Inflection section, etc.). But the main entries should remain at the cuneiform spellings. —Mahāgaja (formerly Angr) · talk 16:40, 14 January 2018 (UTC)
The cuneiform script can only be added if the authors cited show the transliteration of the word. Hoffner & Melchert have a vocabulary list in their book, but they only show the broad transcription, unless they are written with sumerograms. If we used the stems as lemmas as I proposed, we could create entries based on their list, which happens to be one of the most reliable sources today. And if we happen to find the transliteration, then we can add it along with the original script. Each script is optional on the declension tables for this very reason. But if we decide to use cuneiform as a lemma, then we would be restraining ourselves from expanding the already small set of Hittite words on wiktionary. --Tom 144 (𒄩𒇻𒅗𒀸) 18:07, 14 January 2018 (UTC)
I also want to add that even though logograms are common, we also happen to know the consonantal stem of most of them. --Tom 144 (𒄩𒇻𒅗𒀸) 18:12, 14 January 2018 (UTC)
I think the end goal should be to have all lemmas in cuneiform. But in the meantime, I agree with you: it'd be good to allow users to add full-blown entries in broad transcription (still bearing in mind that they will eventually be converted to simple romanisation entries, once all their info has been moved to the cuneiform lemma.)
Would that be messy, though? For an indeterminate amount of time, we would have some lemmas in end state (full-blown entries in cuneiform), and some in middle state (full-blow entries in broad transcription). I don't know if there's any precedent to that. We do have CAT:Gothic romanizations without a main entry, but these are (already) simple romanisation entries only, and all the info still has to be encoded at the main entry. --Per utramque cavernam (talk) 18:30, 14 January 2018 (UTC)
I think I would support broad transcriptions that are soft redirects. I think the extra information should be kept to a minimum. In reference to a question I asked in the previous conversations, determinatives should not be included. —*i̯óh₁nC[5] 21:47, 14 January 2018 (UTC)
Since it's almost consensual, I guess we'll just keep the lema forms in cuneiform and create soft redirects for the romanizations, I'm still opposed to this solution though. I agree with the fact that the broad transcription shouldn't have logograms of any kind. Concerning the terms Hoffner & Melchert's vocabulary lists, I guess the best thing to do would be to add the lists to some appendix or request list, and add create them only once we have the cuneiform script for them. Unattested lemmas should be dealt in the same way we do with (vulgar) latin. And btw, could anybody instruct me on how to use the Module:typing-aids for Hittite? --Tom 144 (𒄩𒇻𒅗𒀸) 05:07, 15 January 2018 (UTC)
@Tom 144: {{subst:chars|hit|a-ku}} produces 𒀀𒆪. That is, you type {{subst:chars|hit|[NAME OF CHARACTERS]}} to output the actual cuneiform. At the moment, there is a module for Hittite not for Sumerian for some reason, so a Sumerian term like "𒂼𒄄" (ama-gi) does not work with this template. —Justin (koavf)TCM 05:28, 15 January 2018 (UTC)
@Koavf: Thank you! --Tom 144 (𒄩𒇻𒅗𒀸) 05:37, 15 January 2018 (UTC)
@Tom 144: No problem. I'm assuming that you hvae at least a passing familiarity with Sumerian, so could you please take a look at my two most recent creations? —Justin (koavf)TCM 05:45, 15 January 2018 (UTC)
@Koavf:, I'm sorry, but I don't know anything about it. But I would certainly be interested to study the oldest written language it if I got some reliable text book. --Tom 144 (𒄩𒇻𒅗𒀸) 05:59, 15 January 2018 (UTC)
@Koavf: If Sumerian is not handled, it's probably because nobody has expressed a need for it yet. I suggest you post on the module talk page. --Per utramque cavernam (talk) 14:24, 15 January 2018 (UTC)
It would be useful for Hittite too, sumerograms are common. Btw, would infringe copyrights to add Hoffner & Melchert's vocabulary list into Wiktionary:Requested entries (Hittite)? I guess that if we just leave the stems but erase the definitions it would be fine. --Tom 144 (𒄩𒇻𒅗𒀸) 15:57, 15 January 2018 (UTC)
Also, how would we lemmatize morphemes such as -ant-, -iya-, -ili-, -ima-, -ir-, -talla-, -ul-, -att-, -ašti-, -ašha-, -ašša-? We could just use cuneiform too, it would look ugly though. --Tom 144 (𒄩𒇻𒅗𒀸) 16:31, 15 January 2018 (UTC)

Allowing IAST Romanisation entries for Sanskrit[edit]

I propose that the result of the "Wiktionary:Votes/pl-2014-06/Romanization of Sanskrit" vote be revisited, and that IAST romanisations be allowed as alternative-form entries of the Devanagari-script lemma entries, in a manner similar to how Gothic is handled.

My main incentive is that the issue brought up by Ivan Stambuk in the talk page of that vote, as well as in "Wiktionary:Grease_pit/2014/July#Sanskrit_transliteration", has, AFAICT, never been properly addressed: namely, that "Vedic Sanskrit uses special accent marks which we don't use in Devanagari, but which are indicated in IAST transcriptions."

This means that relying entirely on the automatic transliteration from Devanagari (by way of Module:sa-translit) actually leads to a loss of information.

One could argue at this point that I should get my facts right, and that it has never been suggested to rely entirely on the transliteration module; that manual transliterations are 1) entered whenever necessary, and 2) never removed when they're present. But is this the case? I genuinely don't know, but if yes, this seems like a huge overhead (unless the automatic transliteration is, for all intents and purposes, sufficient in 95% (arbitrary number) of cases?).

In any case, I think having dedicated Romanisation entries would allow us to relax and not worry about not having complete transliterations everywhere: we would know that they can be found somewhere, and where exactly that somewhere would be.

But one might say that we could provide the manual transliteration directly in the Devanagari-script entry. Yes we could, I guess?

(it has also been suggested that we could insert invisible stress marks in the Devanagari-script, so as to make the transliteration module attain the desired result; but I agree with Ivan Stambuk that "Devising an obscure secondary system with invisible stress marks and whatever in Devanagri is absurd", not to mention impractical)

I'm totally unqualified to contribute further in any meaningful way, and probably shouldn't get involved in the first place. Still, I thought it would be good to have a new discussion about this, now that we have many users knowledgeable in Sanskrit: @AryamanA, माधवपंडित, Kutchkutch, DerekWinters, JohnC5, Victar, Mahagaja. --Per utramque cavernam (talk) 01:56, 14 January 2018 (UTC)

Symbol oppose vote.svg Oppose: Unnecessary. --Victar (talk) 02:02, 14 January 2018 (UTC)
@Victar, for users without expertise in Devanagari input, do we (EN WT as a whole) have a means for users entering IAST to find the Devanagari entries? An analogy could be made to the use of romaji for Japanese, as a set of soft redirects to get users to the main entries in kana or kanji scripts. ‑‑ Eiríkr Útlendi │Tala við mig 02:09, 14 January 2018 (UTC)
Although this idea does sound fascinating, I agree with Victar that this is unnecessary. The Devanagari transcriptions of the Vedas do indicate the high and low pitch, by means of a horizontal line above and below the character respectively. We can have those symbols. In any case, googling the IAST trabscription along with the pitch accent should give the wiktionary entry, if it exists, as one of the first results. Lastly, IAST has the same symbol for two very distinct phonemes: (ḷa) which is the retroflex /l/ and (), which is the syllabic liquid /l/. Although both sounds are very rare in Sanskrit, an IAST transcription kḷp can be ambiguous between कॢप् (kḷp) and क्ळ्प् (kḷp). The current active Sanskrit editors are seeing to it that information with regards to accentuation is not lost and now with JohnC5's new declension module, even the declension tables record the accent. I personally don't see having to manually enter the accents as a hassle and enjoy working a bit more to make Wiktionary's information more accurate. -- माधवपंडित (talk) 02:53, 14 January 2018 (UTC)
The automatic Sanskrit transliteration is pretty reliable and can continue to be used. Sanskrit Devanagari is very phonetic. What is missing, from the point of view of some users, is the stress marks and some hyphens. I personally oppose the stress marks in the transliteration, since there's nothing in the native script to show the stress. The stress marks could be used in the pronunciation sections, if it's known. Hyphens are used to show the borders between compound words. I also think this is the job of the etymology sections. There won't be any loss of information if Sanskrit entries are maintained properly. I have the same opinion about Hebrew transliterations - if semi-automatic transliteration can be produced for about 70-80% of fully vocalised terms, we should use it and leave the stress marks for the entries with pronunciation sections. Alternatively, invisible symbols could be employed to mark stresses for both Sanskrit and Hebrew, which would only affect the translit, not the words in the native scripts. As it is, the automatic Sanskrit transliteration doesn't override the manual, so, if someone is not happy with the automatic one, can override it with the manual ("tr=") one but I maintain what should belong to entries, should be used there, not in every place Sanskrit terms are used. And I oppose IAST entries. --Anatoli T. (обсудить/вклад) 02:10, 14 January 2018 (UTC)
@Atitarev: There is a way to show accent in Devanagari: (), क॒ (ka), क॑ (). How else could we know where the pitch accent was if Sanskrit compilers of the Rigvedic-era texts didn't use such symbols? I think keep these in headwords would be a good idea. —AryamanA (मुझसे बात करेंयोगदान) 04:45, 14 January 2018 (UTC)
@AryamanA:: Thanks, I am not familiar with this convention but I don't see why not, as long as everyone is happy with this particular method and there are no more common ones. It can also also be made invisible in Devanagari, if purists objected. --Anatoli T. (обсудить/

вклад) 04:56, 14 January 2018 (UTC)

@Atitarev: I think purists would be fine with it. There are some variants that are used only in certain texts (the Unicode block "Vedic Extensions" has them), but the ones I showed are the most common. —AryamanA (मुझसे बात करेंयोगदान) 16:37, 14 January 2018 (UTC)
Symbol oppose vote.svg Strong oppose As Madhavpandit has said, accent was in fact marked in Vedic Sanskrit, and it would make sense for use to have it as |head= parameter on the headword-line templates. But, not all Sanskrit words have a known pitch accent, and a lot of words that were borrowed later or first used in Classical Sanskrit just didn't have pitch accent (Classical Sanskrit had syllable weight-based stress). Automatic translit doesn't get rid of anything that is very necessary; pitch accent is really only useful to linguists who reconstruct PIE and priests who do Vedic chanting. As for Ivan Štambuk's comments, I don't have reason to believe he was much more knowledgeable in Sanskrit than, say, JohnC5 or Madhavpandit. (He also copied every entry he made for Sanskrit from Monier-Williams, so it's difficult to assess how much he knew about the language) Anyways, all the active Sanskrit editors do add the accent when making entries from my experience. I also add it in etymology sections for Hindi etc. now. —AryamanA (मुझसे बात करेंयोगदान) 04:45, 14 January 2018 (UTC)
@AryamanA: It's unrelated, but I must say I find his almost religious deference to Monier-Williams rather odd. This exchange and this message especially were pretty disconcerting. Saying that Monier-Williams is an exemplary piece of scholarship and saying that it's absolutely unimprovable on any account at all are two quite different things (I wouldn't see much point anyway in copying it verbatim; it's already online after all). But I still think he raised some important points. --Per utramque cavernam (talk) 16:22, 14 January 2018 (UTC)
@Per utramque cavernam: I am particularly surprised by "In other words, there are no problems with Sanskrit entries." I (and others) still are cleaning up the huge messes made by copying from Monier. Monier is also pretty old, and Sanskrit scholarship has advanced leaps and bounds in the past century. As for Sanskrit being a dead language, we still don't know the exact meanings of every Sanskrit word, and Monier didn't either; there's a lot of debate on what certain words in even the Rig Veda mean.
He also claims in the vote that IAST is a neutral way of transliterating Sanskrit and that Devanagari has a "pro-Hindu POV", which IMO is a pretty clueless thing to say. —AryamanA (मुझसे बात करेंयोगदान) 16:35, 14 January 2018 (UTC)
For all of his immense contributions to Wiktionary, Ivan Štambuk always has had problems with a battleground mentality. I think some of the more extreme things he said came from his perception that his judgment was being questioned, and the instinct to fight that off by any means available. Chuck Entz (talk) 01:42, 15 January 2018 (UTC)
Symbol oppose vote.svg Oppose Certainly to have accents on the transliteration. One major problem is that the CDSD version of MW doesn't distinguish between udatta and svarita, so a lot of people don't know about independent svaritas. The notion of correcting incorrectly accented forms isn't great. Also, a lot of academic literature will add accents to example forms of verbs that are not actually attested with accent marking (mostly because the finite forms appear in main clauses). So it's hard to know which accentuated forms are "real" without looking in Grassmann, and even with Grassmann and Whitney, you need to know to interpret things like “kanýā, kaníā” as kanyā̀. Overall, Rigvedic is obscure, difficult to get correct and very spottily attested, so I am opposed to using it in transcriptions. We could represent it in Devanagari, but several opposing and contradictory notational systems exist, so that isn't a good idea either. Though the current situation is annoying, all of the other options are way more prone to error. —*i̯óh₁nC[5] 05:17, 14 January 2018 (UTC)
Perhaps it's not worth to mark accents if they are not confirmed by multiple sources and leave altogether if there is any doubt. We don't normally mark accents for word stresses in Old-Church Slavonic or Old East Slavic, even if accents can be guessed in a large number of cases and confirmed with sources in a smaller number of cases. --Anatoli T. (обсудить/вклад) 05:24, 14 January 2018 (UTC)
Symbol oppose vote.svg Oppose One learns the script first before dealing with the language, it should not be that hard. I can’t see much value in people wanting to find Sanskrit entries without caring about the script. Also what the others said: Too many variant transcriptions, too inexact transcriptions, too bad sources, too high probability of errors. Palaestrator verborum sis loquier 🗣 10:33, 14 January 2018 (UTC)
I disagree with your argument that "One learns the script first before dealing with the language, it should not be that hard. I can’t see much value in people wanting to find Sanskrit entries without caring about the script.".
There are many possible reasons someone might want to look up entries in any non-Latin script, without having any intention of becoming a student of that language or of learning the script (such as when researching the etymologies of derived terms in other languages). And even if the user can read the script, that's not the same thing as being able to input that script easily.
This is separate from the issue of whether to include IAST entries. I simply wish to point out the potential for serious usability issues inherent in your assumptions. I am totally happy not having IAST entries, so long as users still have some means of getting to the Devanagari-spelled entries without having to search for the Devanagari strings. ‑‑ Eiríkr Útlendi │Tala við mig 11:26, 14 January 2018 (UTC)
And the “researching the etymologies of derived terms in other languages” is the only thing I could think about, I don’t see the “many possible reasons”. And those should be able to use the search, and maybe they should learn the language a bit because it is prone to errors if one adduces formations from a language without knowing anything about its morphological shapes and their frequencies.
Note that one does not “pick up some Sanskrit” to go to India, so the argument that one can make for Japanese that people might be interested in the oral language only is detached from reality.
Whatever cases you contrive, the issue here is that they need to constitute sufficient reason for the additional maintenance burden of romanization entries to be acceptable. Palaestrator verborum sis loquier 🗣 12:43, 14 January 2018 (UTC)
Yes, Devanagari is pretty much the standard script for Sanskrit now. Mediawiki has built in Devanagari input tools, hit "ctrl-m" in any text field and select Sanskrit. The popular INSCRIPT keyboard is available and so is a simple transliteration keyboard based on IAST. I use these all the time. —AryamanA (मुझसे बात करेंयोगदान) 14:41, 14 January 2018 (UTC)
@Palaestrator verborum, AryamanA, please note, I am not arguing that we need IAST entries. I am only arguing that we need to ensure that, whatever we choose to implement, we are not introducing barriers to usability.
For instance, Ctrl-M doesn't work for me at all (Chrome on Win 10), and I have no Devanagari input installed on my machine. When editing an entry, I could at least use Edittools to get Devanagari input that way. However, Edittools is not available for the search bar. Moreover, Devanagari input requires that the user know the script, which is a barrier to entry. Granted, anyone interested in Sanskrit over the long term will want to learn the script. However, everyone must start somewhere, and especially for casual users and beginning learners, we need to make sure that users can still find the Devanagari-script entries, even if they only search on Latin-script spellings. So long as that search feature works, I have no qualms. ‑‑ Eiríkr Útlendi │Tala við mig 20:30, 14 January 2018 (UTC)
@Eirikr: What you described is true for any language. We don't do this for Arabic, Persian, Hindi, Russian, etc. etc. ad nauseum even though there are plenty of learners who don't learn the Arabic script or the Cyrillic script at first. Frankly, Mediawiki's search function is good enough to locate the entries by searching for the transliteration.
I'm using Chrome on Mac (macOS Sierra) and Mediawiki's input tools work so well (and are fast enough) that I never bother using the built in input method. I don't know why they're not working for you, that's definitely a problem. —AryamanA (मुझसे बात करेंयोगदान) 20:43, 14 January 2018 (UTC)
@AryamanA: I assume by "this" in we don't do this, you mean creating romanized entries? Indeed. Searching for a term by language + romanized string does seem to work to some extent, and this thread is prompting me to re-evaluate the usefulness of romanized entries for Japanese. However, there are some hiccups: searching for "sanskrit karpasa" gives me lots of other Indian-language entries, but not the Sanskrit one at कर्पास (karpāsa). This is not the expected result. If I search just for "karpasa", the Sanskrit entry is the third one down for me. For other Latin-script strings with more overlap with other languages (say, "gola"), it's even harder to find the Sanskrit entries. Is there any way of improving the search functionality? ‑‑ Eiríkr Útlendi │Tala við mig 21:14, 14 January 2018 (UTC)
@Eirikr: Yes, I mean romanized entries, sorry if I was unclear. Adding incategory:"Sanskrit lemmas" to the search narrows down to searching only Sanskrit terms, but that isn't immediately obvious to a casual Wiktionary user. I think Japanese is a different case, because from what I know Romaji is used a lot in learner's material, whereas the books I've used to learn Sanskrit always have a unit on the Devanagari script. (I also think we should keep Pinyin redirects for Chinese, I use them a lot for learning Mandarin). —AryamanA (मुझसे बात करेंयोगदान) 21:23, 14 January 2018 (UTC)
@AryamanA, Eirikr:: When I joined Wiktionary, romaji and pinyin entries had full-blown entries, as if they were the proper native Japanese and Chinese scripts. Their status has been reduced to soft-redirects and Japanese kana entries work well for disambiguations. They still enjoy higher status than any other romanisation but it's not fair to other languages. If the search functionality is improved, we don't need romanised entries. --Anatoli T. (обсудить/вклад) 22:20, 14 January 2018 (UTC)
  • I would have no objection to including an entry, for example, for vṛka, that contains no information but "Romanization of वृक (vṛka)", much as we already have for Gothic. Accent marks (both Latin and Devanagari) could be included in headword lines and stripped from links, just as macrons already are for Latin and Ancient Greek. Incidentally, the ambiguity of "ḷ" is actually easy to resolve: ळ must (I'm pretty sure) always be adjacent to a vowel, while ऌ may never be. And even if both कॢ (kḷ) and क्ळ् (kḷ) really do exist, there's nothing stopping us from having an entry for kḷ that says "1. Romanization of कॢ (kḷ) <br/> 2. Romanization of क्ळ् (kḷ)". —Mahāgaja (formerly Angr) · talk 16:53, 14 January 2018 (UTC)
    @Mahagaja: मीळ्ह (mīḷha) exists at least. I don't think we really need romanizations though, because if you search for "vrka", वृक (vṛka) is in the results anyways. —AryamanA (मुझसे बात करेंयोगदान) 20:43, 14 January 2018 (UTC)
    And in मीळ्ह (mīḷha), ळ is adjacent to a vowel, so it's not a counterexample to my statement. (I'm not sure whether you intended it to be one, though.) When I search for "vrka", वृक (vṛka) is the sixth result listed, which isn't very good. And what if I'm looking for (ka)? If I search for "ka", (ka) doesn't appear until the fifth page of results. Not very useful at all. —Mahāgaja (formerly Angr) · talk 23:02, 14 January 2018 (UTC)
Support. The current method of using the search function is insufficient for finding entries reliably. I've had plenty of difficulty finding Russian entries, it needs to be easier. —Rua (mew) 20:45, 14 January 2018 (UTC)
Symbol support vote.svg Support. Redirecting people to the Devanagari entries wouldn't do any harm. To me the fastest way to find a Sanskrit entry is looking up a cognate an hope the term I'm looking for is listed there. This would facilitate things. --Tom 144 (𒄩𒇻𒅗𒀸) 21:11, 14 January 2018 (UTC)
Another option is to browse CAT:Sanskrit lemmas, but that only works for people with a good reading knowledge of Devanagari. —Mahāgaja (formerly Angr) · talk 23:02, 14 January 2018 (UTC)
Symbol support vote.svg Support, without accent marks of course. I feel that @AryamanA and others are getting far too wrapped up in that instead of acknowledging that accentless IAST soft redirects could serve our users. —Μετάknowledgediscuss/deeds 23:41, 14 January 2018 (UTC)
It's my fault though, I shouldn't have presented this stuff about accents as the main reason for the proposal; in the end, it's probably the weakest of all. --Per utramque cavernam (talk) 23:46, 14 January 2018 (UTC)
@Metaknowledge: Do our users really not know about tools like this? —AryamanA (मुझसे बात करेंयोगदान) 23:49, 14 January 2018 (UTC)
FWIW, I didn't, and I think it's a fair bet that casual users of Sanskrit won't necessarily know about it either. ‑‑ Eiríkr Útlendi │Tala við mig 00:33, 15 January 2018 (UTC)
A thoroughly plausible scenario is someone seeing a romanized Sanskrit term in a dictionary's etymology or a linguistics article and wanting to find out more. Such people aren't going to know much about what tools are available, nor are they likely to bother with them if they're pointed to them.
I have no problem with romanization entries that are soft redirects, as in Gothic- as long as all the content is in the Devanagari entry. There are so many potential ways to represent Sanskrit that we need to have one designated standard to keep content from getting unmanageably scattered all over the place. Chuck Entz (talk) 01:42, 15 January 2018 (UTC)
@AryamanA: I didn't know about it either. I think you may be in too deep to realise what those of us who have never studied an Indian language are like when it comes to using a dictionary. —Μετάknowledgediscuss/deeds 03:05, 15 January 2018 (UTC)
@Metaknowledge: It would have helped me a lot to know the Persian script for Hindi etymologies, so I learned it. Before that, I used far more comprehensive dictionaries than Wiktionary to find Persian stuff.
Anyways, I would support this if it wasn't Sanskrit specific. There are many other languages (that aren't dead!) that learners could benefit from having transliteration redirects for. —AryamanA (मुझसे बात करेंयोगदान) 13:43, 15 January 2018 (UTC)
Yes, admittedly learners may have issues with foreign scripts but there are so many, much more complex scripts than Devanagari but we don't create soft-redirect entries for them. Why Sanskrit should be another privileged exception? --Anatoli T. (обсудить/вклад) 05:50, 15 January 2018 (UTC)
Symbol oppose vote.svg Oppose Sanskrit may not have had an official script initially, but the modern convention is to use Devanagari. Sanskrit is adequately represented with Devanagari, and as an abugida the individual units of the Devanagari script in most cases have a direct relationship with their transliterations and transcriptions.
Even if Devanagari is given primacy, Anatoli: "[Romanized soft redirects] will mislead users that it's OK to write Sanskrit in Roman" at all times and that Romanized forms are as equally legitimate as the Devanagari forms. The romanized alternate forms could be confused with the lemmas themselves. It would probably be better as Anatoli suggested to "help users use Devanagari and other complicated scripts and help them find what they're looking for" such as Wyang's idea to "develop reverse transliteration modules". Kutchkutch (talk) 07:07, 15 January 2018 (UTC)
@Kutchkutch -- by way of examples of soft redirect entries, please view hōhō#Japanese, kawara#Japanese, and sukī#Japanese. You'll note that all of them have zero content -- just a note that this is a romanized spelling of a term, and a link to the non-romanized entry. There isn't really any reasonable way for users to confuse these with the full lemma entries. (Note: I'm not arguing for IAST entries, I'm just offering examples of what that might look like to address specific concerns.) ‑‑ Eiríkr Útlendi │Tala við mig 09:56, 15 January 2018 (UTC)
That's what I was going to say; I'm not suggesting that we should have anything more than this. The IAST entries would simply be soft redirects, really. --Per utramque cavernam (talk) 10:12, 15 January 2018 (UTC)
@Kutchkutch, Atitarev: There are grammar books and readers of Sanskrit written entirely in romanization, e.g. Wackernagel's grammar and Liebich's reader. Granted, it tends to be 19th- and early 20th-century scholars from Germany who use the Latin alphabet exclusively, but such works do exist. I really fail to see the harm in providing soft redirects from the romanized forms to the Devanagari forms. —Mahāgaja (formerly Angr) · talk 11:07, 15 January 2018 (UTC)

😁 How is it even easy for the users to write the signs needed in the IAST romanization? The Anglo-Saxons who are not tech-savvy even fail to write ñ or – and have to learn how to write characters outside ASCII. Though the software redirects, it is doubtful that people even think so far that transliterations could be entries and then use them for getting to Devanāgarī entries, because they would think that they cannot access the entries anyway because of not being able to write IAST. I have the impression that for Anglo-Saxons on the internet it is even easier to write Indian scripts than to use correct quotation marks … Palaestrator verborum sis loquier 🗣 10:08, 15 January 2018 (UTC)

Symbol oppose vote.svg Oppose as well, mostly because it is unnecessary. Writing in Devanagari online is quite easy for even those who barely try. Sorry I'm late, just got back from abroad. DerekWinters (talk) 02:40, 19 January 2018 (UTC)

‎'Palaestrator verborum'[edit]

'Causing our editors distress by directly insulting them or by being continually impolite towards them.' [9], [10] Yes check.svg Done Kaixinguo~enwiktionary (talk) 12:47, 15 January 2018 (UTC)

I think this warrants a block already; this kind of behaviour of insulting an entire community, ethnic group, etc. should not be tolerated here. Wyang (talk) 13:35, 15 January 2018 (UTC)
Unless there are more statements which are harsher than the ones linked, I do not think this warrants a blocking. To my reading, the second statement is akin to "curse the Irish for inventing Guinness." It is worthwhile to let PV know that their comments were not well taken and that they should use more discretion in the future. If the behavior persists or worsens then, or if there are other comments which I have not seen, perhaps a block may be in order. - TheDaveRoss 13:51, 15 January 2018 (UTC)
This should not be acceptable around here... —AryamanA (मुझसे बात करेंयोगदान) 15:42, 15 January 2018 (UTC)
@TheDaveRoss Seriously? It might be worthwhile to read the entire sections that came after the linked edits. When I challenged him on his statement wishing death to all Christians, at which point you might have expected him to apologise or clarify that he was joking, he demonstrated that he was, in fact, entirely serious ("Why should I like Christians? Complimenting Christianity is tantamount to outspokenly support criminality." [11]).
I decided to ignore this deliberate and frankly childlike provocation in order to work towards establishing some of the meanings of the entry at hand, and trying to assist by providing information from Persian-only sources (Dehkhoda). I was only forced to communicate with him against my better judgment due to the fact that he had deliberately used an archaic word on that page ('wherewith') even though other editors have already had cause to warn him against using archaic or poorly-worded English in entries. It can only be assumed that this was deliberate, as he himself describes his English as being at 'near-native' level on his own user page. In the first example linked, he also mis-characterises my effort to help establish the correct translation of this word as being 'entitled that the whole world rotates around them; everybody knowing what they use in their ritual acts' due to my Christianity (real or otherwise). He has unleashed a tirade of bigoted abuse directed at a whole group of people and also at myself as an individual. Kaixinguo~enwiktionary (talk) 16:11, 15 January 2018 (UTC)
In any case, any block is irrelevant and purely symbolic, as he would certainly be back to edit afterwards. I think I just wanted to draw attention to how he has behaved. Kaixinguo~enwiktionary (talk) 16:14, 15 January 2018 (UTC)
You have decided to be offended. As I said, I did not know about “your Christianity”, and apparently I could not either.
The wording “bigoted” is very striking, for this is taken from religion, and therewith it is claimed that I have to adhere to Christianity. Here I could say that this warrants a block for Kaixinguo~enwiktionary because he tries to propagate his religion by removing those who are positioned against it.
What Wyang says “insulting an entire community, ethnic group” is beside the point. People hardly choose to belong to ethnic groups, but people choose to exercise Christianity, and Kaixinguo~enwiktionary chose to throw upon me expectations of being entangled in Christianity, so that I should know what happens inside of churches. Also there isn’t such a thing as “insulting a community”. The punishable offences of “insult” always protect the honour of individuals, and communities are not individuals and attacking the phenomenon of behaving in conformance with Christianity does not reach out to the honour. And the concept is contourless:
What if I spoke out against analphabetism, drug abuse, or gluttony? It is generally agreed that these are vices and it would not be edgy to take position against them, so why should Christianity get a special treatment? Or is being a druggie acceptable if one is a druggie in a community? People choose to memorize the deeds of Jesus of Nazareth and to sit down regularly at church pews as others engage in bulge-drinking or lechery, which looks equally freely decided, so why has Christianity to be regarded more favourable? Is it because there are so many Christians around? Nobody would look up if I cursed some died-out cult from Antiquity, and yet still what I am not interested in is the lives of the Christians – I would be content if the Christians all ceased to be Christians, and if I wish death to them nobody knows it and it does not matter because it does not matter what I wish. I can wish what I want and I can wish death to whom I want as long as I do not express incitement for forceful realization of it. (Though still it is a debatable question if it is allowed to invite someone to kill himself, because he has the right to do it, but perhaps not here.) And this digresses, I have not enticed nor have I even expressed a wish of death but I reported a wish that Christianity ceases to be; if this were an offence it would mean that it is an offence to tell the truth. People here fail to distinguish between assertive illocutionary acts and directive and expressive illocutionary acts.
It would for example be not improper to tell him that I wish Christians to be dead if he asked me what I want about Christianity, because then we are talking only about the true states of things. Directives on the same are generally harmful, whereas about expressions it must be weighed, for emotions may be desired as well as undesired; but I have always recommended not to have any emotions.
Not sure about wherewith. therewith is quite common and thus the intelligibility of wherewith is not lessened even by its falling out of use; but for me it has been just a translation of womit, and a German hardly notices anything when he reads that word, and I might use the whole collection of such words by influence from legalese. Palaestrator verborum sis loquier 🗣 17:48, 15 January 2018 (UTC)
I changed my mind, let's block him for being obnoxious. - TheDaveRoss 19:59, 15 January 2018 (UTC)
This kind of language is unacceptable. —Justin (koavf)TCM 20:23, 15 January 2018 (UTC)
This geezer usually has too much to say for himself. I will go along with a block if it's considered necessary. Has he been booted off somewhere else? DonnanZ (talk) 20:39, 15 January 2018 (UTC)
I agree with Koavf. I think a one week block would let Palaestrator verborum cool down. —AryamanA (मुझसे बात करेंयोगदान) 22:05, 15 January 2018 (UTC)
Don't do it on my account, and also don't expect him to change- he isn't going to. Kaixinguo~enwiktionary (talk) 22:10, 15 January 2018 (UTC)

This looks like a witch hunt by the PC police. Palaestrator is entitled to expressing strong opinions on Wiktionary, as long as that is not the only thing he does around here. Also, I like his archaic language. He sounds like Bogorm on steroids. --Vahag (talk) 21:44, 15 January 2018 (UTC)

"analphabetism" was as far as I got... —AryamanA (मुझसे बात करेंयोगदान) 21:58, 15 January 2018 (UTC)
"This looks like a witch hunt by the PC police. Palaestrator is entitled to expressing strong opinions on Wiktionary, as long as that is not the only thing he does around here." This is where you're mistaken: this is a dictionary. His "strong opinions" about religion or ethnic groups or coffee are irrelevant. So he's free to express them as long as he bears in mind that off-topic ranting that others find obnoxious and distracting from the project of making a dictionary is absolutely a good cause for blocking him. Why is it you think that the Beer Parlour is a free hosting service for flagrantly stupid bigotry? —Justin (koavf)TCM 22:30, 15 January 2018 (UTC)
Nor is the Beer Parlour a place for piling on a user and virtue signalling. --Vahag (talk) 22:55, 15 January 2018 (UTC)
Just ignore Vahag, he has a history of having "strong opinions". I think he's joking, but I'm never sure. —AryamanA (मुझसे बात करेंयोगदान) 23:32, 15 January 2018 (UTC)
I am not joking this time. I too have been on the receiving end of such an unfair witch hunt. It starts with a hysterical and insecure user taking offence from some harmless joke or rant and looking for protection in the mob. Then the mob takes turns in haranguing the accused, taking pleasure in “protecting” some minority group from this evil person. Usually they do not belong to the “wronged” group and have no idea if they are insulted (like the Christians would need any of your protection). They are simply virtue signaling.
Wiktionary editors are not your employees. They are not robots. They are supposed to have rants and express unusual opinions from time to time, even offensive ones. If you don’t like that, don't interact with the user.
@Palaestrator verborum, please don’t be discouraged from editing. Your high-quality contributions are very welcome. --Vahag (talk) 13:10, 16 January 2018 (UTC)
@Vahagn Petrosyan: I have no interest in "protecting" Christians, I just think you're forgetting this is a dictionary. Like, what possible reason is there to say that kind of stuff on a dictionary website? There's nothing so stressful about editing a dictionary that would lead to ranting (at least in my view). There's no doubt Palaestrator has great contributions, and I've gotten tremendous help from him when I've asked, but this kind of stuff is just not acceptable. Besides, it's just a week-long block, if he really does care so much about the dictionary (and I'm sure he does), he will come back. —AryamanA (मुझसे बात करेंयोगदान) 17:06, 16 January 2018 (UTC)
This isn't some harmless joke or rant. This is explicit religious profiling: death wishing and revilement in face of one who is clearly traumatised. There is no attempt of making the “joke” light, and User:PV only upped his tirade of abuse after seeing the other party has taken offence. This isn't being “odd” like he claims himself to be; this is being obnoxiously self-obsessed. Clearly he doesn't think any of what he has written was inappropriate ― the next target will just be a matter of time. Wyang (talk) 13:47, 16 January 2018 (UTC)

UK traffic sign 601.1.svg Let's draw a line under this and end this discussion here. I've never seen the like of it in more than ten years (on and off) here, not even when Crazy Yalda Guy threatened me with a dictionary. I'm taking a break, which I had decided before this morning and there should be no block of PV as it won't serve any purpose. That will be an effective end to the matter, as it's clear that the root cause is that he and I are two totally and utterly incompatible people. It happens. Kaixinguo~enwiktionary (talk) 23:20, 15 January 2018 (UTC)

@Kaixinguo~enwiktionary I wish you great relish! 💛 Palaestrator verborum sis loquier 🗣 00:32, 16 January 2018 (UTC)
I’ve blocked him for 1 week now, per the suggestions by other editors above.
@Palaestrator verborum What you have said on this page and other related pages is deeply insulting to User:Kaixinguo and many other editors in the Wiktionary community. You are entitled to your opinions, but using insults and profiling as such is immature and unacceptable. Please cool down during this period and realise that those comments are not welcome here. I suggest we hide the relevant revisions. Wyang (talk) 04:40, 16 January 2018 (UTC)

I suggested earlier today on his talk page unblocking him. It's best to just move on from this IMO. Kaixinguo~enwiktionary (talk) 21:12, 16 January 2018 (UTC)

His comments were inappropriate, regardless of who was and wasn't offended. I don't think his block should have been shortened. --Victar (talk) 09:39, 17 January 2018 (UTC)

@Victar: I only did it because of what Kaixinguo said. Honestly, I don't think he's going to change no matter how long the block is. —AryamanA (मुझसे बात करेंयोगदान) 15:03, 17 January 2018 (UTC)
@AryamanA: This block was beyond simply the matter with Kaixinguo, and he was the only person I saw wishing to remove the block. Shortening the block was premature, and though it may be symbolic, I think we should be clear that this sort of dialog is unwelcome to the project. --Victar (talk) 15:30, 17 January 2018 (UTC)
@Victar: That is a very good point. I've un-shortened the block. —AryamanA (मुझसे बात करेंयोगदान) 16:47, 17 January 2018 (UTC)
The block length now seems to be taken from when it was last changed instead of from the original start date. Kaixinguo~enwiktionary (talk) 17:19, 17 January 2018 (UTC)
As I have made clear, I think the block should be lifted now. The point has been made and I have offered to take a break and we had come to an agreement. Honestly, it's not like he's going to going on a crazy spree like some people who have been blocked have done in the past, and he hasn't had another go at me (that I can see), which is probably what I would have done if it were me in his position. 'It takes two to tango' and I didn't have to react to what was written, either. I could have closed the page and done nothing but I have a fiery temper and decided to respond. So I'd really appreciate it if he can be un-blocked. From a selfish POV, I feel compelled to keep on checking back to see what has happened and I just want to leave. Kaixinguo~enwiktionary (talk) 17:33, 17 January 2018 (UTC)
The block was not because you wanted him blocked, it was because Wyang looked at what had transpired and determined that a block was in order. I think you were right to raise your concerns, and I think Wyang made a reasonable determination. It is not your "fault" that the block occurred, you can feel free to move along from the issue. - TheDaveRoss 19:41, 17 January 2018 (UTC)
Oops, fixed. Anyways, TheDaveRoss is right, there were other reasons for such a block to have happened, and it wasn't your fault Palaestrator chose to say what he did. We can't let this kind of dialogue be acceptable here. —AryamanA (मुझसे बात करेंयोगदान) 19:50, 17 January 2018 (UTC)
This sort of behavior is totally unacceptable because saying such strong words is not only irrelevant to the dictionary, but can also very easily scare editors away from the project. We definitely don't want that! PseudoSkull (talk) 01:25, 19 January 2018 (UTC)

Proposal: adding elasticity/flexibility in Chinese entries[edit]

I'll be concise for those knowledgeable, and refer to brief and basic bibliography for those who are not.

The Chinese elasticity/flexibility is a lexical property of chinese terms, two sides of the same coin, which must be reflected in the very same entry for a certain lemma.

Therefore, for example the fifth version of the prestigious XDHYCD (Xiandai Hanyu Cidian) applies mutual annotations in the respective entries, so that the entry for 煤 mei ‘coal’ reads "noun, … also called 煤炭 mei-tan ‘coal-charcoal’", and the entry for 煤炭 meitan ‘coal-charcoal’ is annotated as "noun, 煤 mei ‘coal’".

Unfortunately, currently in wiktionary this is wrongly reflected in the broadly termed 'compounds' section, as a synonym or after 'see also', and only for the monosyllabic version.

Please, before commenting read the following brief article (and if necessary further references within it); if you still have any questions, I'll be glad to try and answer them.

http://www-personal.umich.edu/~duanmu/2014Elastic.pdf

Finally, elasticity from Xiandai Hanyu Cidian 2005 has been tabulated in the following open access thesis

deepblue.lib.umich.edu/bitstream/2027.42/116629/1/yandong_1.pdf

I hope an enriching discussion ensues for this critical lexicograhical issue --Backinstadiums (talk) 15:33, 15 January 2018 (UTC)

The shadow of the Wikimedia Foundation[edit]

Hi all,

Just to let you know an admin in French Wiktionary went global ban by the Wikimedia Foundation. No contact before the sudden change on his personal page, no explanation on the reasons behind, no possibilities of appeal, no discussion about the procedure. Classiccardinal was never contacted by the people who decided this and our community members neither. We suppose this ban could be based on some insult he wrote in French Wikipedia two years ago and a stupid joke he made in Commons, but maybe it based on something completely different. He was banned in those two projects but was a great contributor in Wiktionary (10k+ edits), nice with newcomers and very helpful to answer politely to questions. Sure, he used a gross language time to time but only with colleagues and he never went offensive, it was just his manner in communication and we were adapted to it.

I diffuse this information here after I read two conversations with people causing problem. They may be judge by others if people here do not decide of appropriate ways to deal with them, and it can be very painful for everyone. Take care of each others, and I wish you to never know such unfair procedure in your community. If you need assistance on difficult situation, you can talk to stewards or discuss for a global ban, but not let some bureaucrats decide for you if there is no strong threat/harassment. We are still looking for options on how to modify this procedure, but it appears we are not welcome to be part of this aspect of the governance. So, you may heard again about this case in the future, but I don't call you to do anything, as we are not suppose to. -- Noé 11:52, 16 January 2018 (UTC)

We've already experienced this phenomenon before at en.wikt, although the most prominent case (Liliana-60) was one where there was arguably due cause. I don't like it, and most of all I don't like that it is impossible to get them to discuss it after the fact. It bears remembering that, for better or for worse, democratic principles are not among the central ideas that inspire how the WMF works. —Μετάknowledgediscuss/deeds 15:56, 16 January 2018 (UTC)
Yes. "Shadow" is a good word for it! There is the WP:OFFICE problem where they sometimes hush things up due to legal arse-covering. ("One of the terms of the settlement was that we would not disclose any of the terms of the settlement"... where'd I see that?) Equinox 22:51, 18 January 2018 (UTC)

Nym-type in bold[edit]

I think having the nym-type in bold looks overbearing, often larger than the definition itself.

  1. mad
    Synonyms: angry

I would rather the nym-type be made normal and the whole thing be in italic.

  1. mad
    Synonyms: angry

@Rua, Erutuon --Victar (talk) 16:54, 16 January 2018 (UTC)

I didn't make it like that originally, so that reflects my preference. I don't see a reason to make it italic. —Rua (mew) 16:59, 16 January 2018 (UTC)
I agree that bold overemphasizes "Synonyms". But it's in the spirit of overemphasizing headings relative to content. DCDuring (talk) 17:44, 16 January 2018 (UTC)
True! But still, I would drop the bolding. - -sche (discuss) 18:47, 16 January 2018 (UTC)
@Rua: I'm not married to the italic suggestion. --Victar (talk) 18:45, 16 January 2018 (UTC)

To broach a larger question, why are we placing {{syn}} under the definition instead of under its own header, like we do Related terms? --Victar (talk) 18:55, 16 January 2018 (UTC)

Because synonyms are sense-specific, related terms aren't. —Rua (mew) 19:07, 16 January 2018 (UTC)
The header format is still allowed, though. I still use it sometimes, when it works for many senses. --Per utramque cavernam (talk) 19:11, 16 January 2018 (UTC)
@Rua: So are translations, but again, their own section. --Victar (talk) 19:12, 16 January 2018 (UTC)
Who says the current placement of translations is a good thing? DTLHS (talk) 02:17, 17 January 2018 (UTC)
I don't particularly like it when "Alternative forms" are regularly placed above "Etymology" by a certain bot. DonnanZ (talk) 10:24, 17 January 2018 (UTC)
@DTLHS, Donnanz There was a vote specifically allowing alternative forms to be placed below the definitions, if the bot is changing that it's in error and should be fixed. —Rua (mew) 20:15, 17 January 2018 (UTC)
@Rua: I can't remember the vote, can you pinpoint it? DonnanZ (talk) 20:25, 17 January 2018 (UTC)
Wiktionary:Votes/pl-2016-09/Placement of "Alternative forms" 2 (weaker proposal). —Rua (mew) 20:43, 17 January 2018 (UTC)
Yeah, I abstained, but that would be preferable to what's happening at the moment. DonnanZ (talk) 20:56, 17 January 2018 (UTC)
(chiming in...)
I agree with DonnanZ.
I missed both votes. For Japanese, neither of the suggested placements (above syns as a POS subsection, or at the top above everything) are appropriate. Alternative forms in Japanese are determined by etymology and pronunciation, not by POS. This is why I (and I believe other JA editors as well) have placed alt forms after the etym and pronunciation, and before POS sections. A single JA spelling might have multiple separate etyms and pronunciations -- see for one such example, showing how alt forms are tied to the etym + pr combination. Native monolingual dictionaries are structured in a similar fashion; I would be happy to supply screenshots. For consistency across JA entries, it makes the most sense to place alt forms in the same location even for JA spellings that only have one etym and pronunciation.
Mandating a single structure for all languages, without properly considering the impacts on all languages, doesn't strike me as the best way forward. ‑‑ Eiríkr Útlendi │Tala við mig 23:00, 17 January 2018 (UTC)
As an aside to that, in "Templates and Headers" we have ===Alternative forms===, not ====Alternative forms====. DonnanZ (talk) 12:48, 19 January 2018 (UTC)

How about:

  1. mad
    Synonyms: angry

Or is that too small? I also think that no matter what format we choose, the nyms should be made collapsible by default (using User:Ungoliant MMDCCLXIV/synshide.js). —AryamanA (मुझसे बात करेंयोगदान) 02:13, 17 January 2018 (UTC)

Looks good to me. DCDuring (talk) 03:41, 17 January 2018 (UTC)
I think it's too small, but definitely agree that it should be collapsed by default, similar to how quotations currently are. --Victar (talk) 06:49, 17 January 2018 (UTC)
I support dropping the bolding and wikification (at least for the well-known names: synonyms and antonyms) from the nym type.
A smaller font doesn’t seem necessary if they are collapsed by default, but it does look nice. — Ungoliant (falai) 21:06, 17 January 2018 (UTC)
You should all look at {{zh-syn}} as well, it looks pretty nice. —AryamanA (मुझसे बात करेंयोगदान) 21:58, 17 January 2018 (UTC)
You and I... have very different aesthetic tastes. --Victar (talk) 15:32, 28 January 2018 (UTC)
@Ungoliant MMDCCLXIV I'd support using your User:Ungoliant MMDCCLXIV/synshide.js script. --Victar (talk) 15:32, 28 January 2018 (UTC)
Thanks for reminding me about that. I still need to fix some things. — Ungoliant (falai) 16:16, 28 January 2018 (UTC)
@Ungoliant MMDCCLXIV: Godspeed. --Victar (talk) 16:19, 28 January 2018 (UTC)

Kazakh romanization[edit]

https://www.nytimes.com/2018/01/15/world/asia/kazakhstan-alphabet-nursultan-nazarbayev.htmlJustin (koavf)TCM 17:53, 16 January 2018 (UTC)

@Koavf: We already had this conversation, at Wiktionary:Beer parlour/2017/October#Kazakh orthography, where we essentially concluded that we will wait for attestation. What do you have to add by posting this? —Μετάknowledgediscuss/deeds 17:58, 16 January 2018 (UTC)
Just, "this is neat, I think you might be interested". —Justin (koavf)TCM 18:23, 16 January 2018 (UTC)

Desysopping for inactivity[edit]

Per Wiktionary:Votes/pl-2017-03/Desysopping for inactivity, we can (should?) desysop the following users:

Umm, I'm still here, just below the radar! I'm normally on the site at least once a week, and if I'm needed urgently, my email is monitored at least daily. Generally, I if I'm looking for a definition and it's incorrect or missing, I amend it or add it, as I'm about to this evening. I'm amazed I haven't edited anything for 10 months -- a combination of being very busy starting a new business, and the quality/completeness of en.wikt being higher than it used to be, so I haven't felt the need to alter anything. The other thing I tend to do is patrol Recent changes, and occasionally adopt Unwatched pages. But apart from those, as noted, I have not used any restricted tools for many years. Hopefully, one day, once my family is fully grown, I will have time to be more use to you...but I don't expect that this year or next.
I agree with the reasoning behind the vote, and with most of what is said below, so if you wish to remove my admin privileges until I am back more regularly, but leave me approved for rvv rollback and the other patrolling enhancements, I would not be offended, nor much inconvenienced. --Enginear 18:24, 22 January 2018 (UTC)
Having said which, while patrolling Recent Changes tonight, I have given short blocks to one anon who used a page to abuse three different ?friends over a few minutes, and another who wrote a (fairly minor) racist rant, and the delay of reporting those for another admin to deal with would have made the sanctions pointless, so I suppose there is some advantage in me keeping the privilege. --Enginear 05:23, 17 February 2018 (UTC)
Don't de-op Enginear. This is a security measure for people who are never around. Equinox 12:42, 17 February 2018 (UTC)

I suppose we should warn the admins who have been recently active that their status is liable to be removed right now? --Per utramque cavernam (talk) 20:30, 16 January 2018 (UTC)

Why not just get to work on the ones inactive since 2015 or earlier?
Have we had an actual problem with any admin account being hijacked? Have we had any signs of such trouble? Have any wikis, especially Wiktionaries, with our level of activity had such trouble? DCDuring (talk) 20:58, 16 January 2018 (UTC)
I think the results of the vote are pretty clear. Yes, we should de-sysop users who have not used their tools in the past five years. As to whether or not there have been issues, I don't think that matters. - TheDaveRoss 21:05, 16 January 2018 (UTC)
@TheDaveRoss The vote doesn't command us to desysop; it allows us to. I am asking whether there is any compelling reason to do so, especially in the case of those who are recently less active. DCDuring (talk) 02:23, 18 January 2018 (UTC)
@DCDuring I agree, it isn't written as a mandate. As it stands all it does is allow 'crats to change the user rights if they feel like it. I assumed that we would actual make that a practice as well, which I don't think was an unreasonable line of thought. With regards to compelling reason to do so, I think there are lots of good ones, none of them particularly urgent.
It is best if the administrator lists reflect the active administrators on the project. This helps people looking for help to more easily find it. If you leave a message on, say, Conrad.Irwin's talk page he is unlikely to respond quickly to assist you. An active list also helps us keep track of how many people are doing admin work, so that if the number dips particularly low we know to seek out more. There is also the small chance that an account gets compromised. This is unlikely and would not cause lasting harm, but more administrators means more surface area for attack. I don't give too much weight to that argument, but it has been made.
We shouldn't give too little weight to it either. Human factors can defeat most security. Someone did attempt to change my password once, which failed because of the dual-mode security -- the email came to me, and I disowned it. But if I was no longer active on any WMF site, and someone came up with a plausible reason for disappearing and losing access, they might manage to persuade a sysop to bend rules and let "me" back in.
The likelihood may be small, but we have been attacked by a rogue admin before, 11 yrs ago...and that was someone we knew but misjudged. He quickly blocked all the other en.wikt admins, causing a bureaucratic delay in restraining him. He was mischievous rather than malicious -- our misjudgement wasn't that bad. But a malicious person with admin access could do the project significant harm. --Enginear 18:24, 22 January 2018 (UTC)
Finally, there is the question of what user rights represent. I consider user rights to be an expression of trust on behalf of the community to the particular user. After several years of inactivity there is a new community, with new people and practices. This is also an argument in favor of discreet terms in roles, which I would probably support if it didn't mean so much extra overhead in the form of voting and role changing and keeping track of duration. I find the automatic removal after a long period of inactivity to be a low-maintenance method of imposing this sort of term limit. It is not hard to become an administrator, so if a trusted user returns they would almost certainly have no difficulty regaining their rights. - TheDaveRoss 13:03, 18 January 2018 (UTC)
I think this user is a little overzealous. If a user was active last month they are not inactive, whether they use certain tools or not. DonnanZ (talk) 21:16, 16 January 2018 (UTC)
The vote was very specific, the measuring stick is use of tools. Also, if you have had admin rights for five years and have not used them, why do you need them? If someone has not used them but would like to keep them they can use them, there are consistently dozens of pages to be deleted, and there is a person to block every hour or two. - TheDaveRoss 21:47, 16 January 2018 (UTC)
You did yourself vote in favour of that rule, so I'm not sure I follow. --Per utramque cavernam (talk) 21:52, 16 January 2018 (UTC)
I voted in favour of desysopping for five years inactivity, but I glossed over the small print. DonnanZ (talk) 22:00, 16 January 2018 (UTC)
Actually, it's a shame Dvortygirl is no longer doing audio, she has a great voice. DonnanZ (talk) 15:49, 17 January 2018 (UTC)
I think they should be desysoped. Even if there are no immediate concerns about their accounts being compromised, we should practise the principle of least privilege. —Internoob 05:21, 20 January 2018 (UTC)
I agree. --Enginear 18:24, 22 January 2018 (UTC)
In general for people who aren't around I agree. They don't need the tools and if they do come back they can reacquire them easily. It's a "surface area" issue. Equinox 12:44, 17 February 2018 (UTC)
Let's make full use of Wiktionary:Votes/pl-2017-03/Desysopping for inactivity, that is, desysop all who the vote allows to be desysopped. The policy is rather lenient in that it allows full five years with no use of admin tools. If, contrary to the policy, editors want to change the criterion from no use of admin tools to no activity, including editing, let's change the policy. --Dan Polansky (talk) 13:19, 17 February 2018 (UTC)
@Dan Polansky: I would suggest a new vote to enforce the policy: "making the automatic desysopping agreed upon in the March 2017 vote compulsory". The decision is currently left to the bureaucrats, which somewhat defeats the goal of that vote. --Per utramque cavernam (talk) 10:09, 23 February 2018 (UTC)
The situation is a little silly at the moment with individual votes, witness the current vote on User:Dvortygirl. I think that after five years of total inactivity a user with admin tools should lose them automatically without the need for a vote. DonnanZ (talk) 13:36, 17 February 2018 (UTC)
After five years of total inactivity it can be assumed the user has either (1) died (the worst scenario), (2) found another consuming interest, or (3) just can't be bothered any more. DonnanZ (talk) 14:31, 17 February 2018 (UTC)
@Chuck Entz, SemperBlotto: As active bureaucrats, would you be willing to desysop the above list of admins except for Enginear who now has last admin action from 17 February 2018, making use of Wiktionary:Votes/pl-2017-03/Desysopping for inactivity? Or do you see any objections? From my standpoint, this very discussion is a "further ado" while the vote mandated "without further ado". --Dan Polansky (talk) 16:57, 23 February 2018 (UTC)
Done. Could someone modify Wiktionary:Administrators and move them all to the inactive list please. If they become active again, I think they can be reinstated without a vote. SemperBlotto (talk) 19:29, 23 February 2018 (UTC)
I moved them to a "former administrator" section. - TheDaveRoss 19:54, 23 February 2018 (UTC)
@Chuck Entz, SemperBlotto: Could also the following accounts be desysopped with the use of Wiktionary:Votes/pl-2017-03/Desysopping for inactivity? Or do you object do that? Some of them were active in 2017 but none of them has used the tools for over 5 years.
--Dan Polansky (talk) 20:05, 23 February 2018 (UTC)

I'm surprised it took this long to de-sysop me. I haven't been active and I don't foresee ever being active again. I've got other things in my life, such as a spouse and family, that I didn't have when I spent so much time here. I more than likely have done some minor edits since 2015, (I know I have over on the 'pedia.) but I don't bother logging in as it's an extra step not needed for what I intend to do, and even if I did start editing more, I doubt I'd need the admin tools for what I would be wanting to do. So no hard feelings, folks. — Carolina wren discussió 14:36, 24 February 2018 (UTC)

Wyang playing a Lenin on the whole community[edit]

I like to keep bitching at around 1‰ (we don’t need more of that here whenever avertible), but in recent days Mr. Wyang has forbidden me to arrange my talkpage in archival fashion 1, ignored related questions in his 2 3 and deleted messages from mine 4 (he claimed my arrangement was "vandalism" but he obliterates content therefrom and that's ok with him).

I respect his obsession with me (I know commoners marvel at extraordinaires), but it oughtn’t worsen Wiktionary. He has many times proved capable of more than pettiness, so if he could stop engaging in quarrels which, according to him 5, are a loss of time we will all win in the process. He was restored admin rights because, in his own words “It's incredibly frustrating to not be able to delete new user vandalism or delete the original as I move entries with wrong titles”, and less than 6 months after he has just today banned one of the most knowledgeable users we have here, thus exceeding the scope of his initial admin request.

Thanks in advance for taking the time to read my message!

—This unsigned comment was added by Gfarnabo (talkcontribs).

rofl --Per utramque cavernam (talk) 21:15, 16 January 2018 (UTC)
Hah, funny. Blocked. —AryamanA (मुझसे बात करेंयोगदान) 22:25, 16 January 2018 (UTC)

Quickly, Aryaman, ban this user too to attain Nirvana, this is your opportunity!!! talk

Wyang's edits are completely merited and the blocked user's (Gfarnab) complaints are not. He continues to use sockpuppets to avoid the block. --Anatoli T. (обсудить/вклад) 23:21, 16 January 2018 (UTC)
A perfect example of why we could use local CheckUsers. —Justin (koavf)TCM 23:24, 16 January 2018 (UTC)
But we already have local CU... Or do you mean more local CU? --Per utramque cavernam (talk) 23:26, 16 January 2018 (UTC)
For what it is worth, I went to look at this and Chuck had already done so. - TheDaveRoss 23:29, 16 January 2018 (UTC)
Sorry if I was unclear here: this is in reference to our recent votes on CheckUsers. Some of the editors here felt the user rights were superfluous. —Justin (koavf)TCM 01:34, 17 January 2018 (UTC)
Wrong religion buddy :) —AryamanA (मुझसे बात करेंयोगदान) 23:34, 16 January 2018 (UTC)
You can spend time (in vain) trying to ban me or answer my grievances with more than adjectives.
"Grievances"? Oh, you mean you adding incorrect information in languages you don't know? —AryamanA (मुझसे बात करेंयोगदान) 23:48, 16 January 2018 (UTC)

I wish you a pleasant night either way! —This unsigned comment was added by 99.194.139.191 (talk).

I saw the deleted revision, and I'll respond. First of all, the actual verse (see Wikisource):
अमुं च रोपितव्रणमिगुदीतैलादिभिरामिषेण शाकेनात्मनिर्विशेषं पुपोष ।
amuṃ ca ropitavraṇamigudītailādibhirāmiṣeṇa śākenātmanirviśeṣaṃ pupoṣa .
Second, I have never made any false claims to how much Sanskrit I know. I don't know enough Sanskrit to translate it. I just see a meaningless translation that probably needs way more context and finesse with Sanskrit than you or I have. So it's better to not have it at all and wait for someone more knowledgeable to deal with it, rather than have low-quality content. —AryamanA (मुझसे बात करेंयोगदान) 21:21, 18 January 2018 (UTC)

"From" in etymologies[edit]

I've been meaning to bring this up for a while now, but haven't had much time.

Wouldn't it be great if we didn't have to write "from"? Oh, wait, just don't write "from". An etymology by definition tells you where something comes from. - Equinox

I've always written "From" (until only recently) in etymologies because I've seen it done on so many other entries, and I just wanted to copy what was said. Equinox claims this is redundant. I have vague memories of him complaining about this before, but I can't remember exactly what happened.

I'm starting this topic because I can understand where he's coming from. I more specifically remember him also saying something like "The pronunciation doesn't say 'sounds like' before it, so why should the etymology say 'from' before it?"

So, for those reasons, I'm looking for an explanation of why we do this "from" thing here. Could it be because maybe other dictionaries do it, perhaps? That would be the only reason I can think of. I'd also like to propose to disallow etymologies to be worded this way since it is redundant, unless someone comes up with a good explanation of why to say "from".

And naturally, I should ping you, @Equinox. PseudoSkull (talk) 03:00, 19 January 2018 (UTC)

Etymologies are sometimes English sentences (seize) and sometimes formulas (de- + frog). I don't see why making all etymologies into formulas would be a good thing. DTLHS (talk) 03:07, 19 January 2018 (UTC)
The benefit of etys being templates is that we are then only storing the abstract details (X derived from Y) and we can render those details with or without a "from", depending on this week's whim, or a user's choice. The downside (as DTLHS says) is that you can't be discursive or mention anything quirky. Yeah, I hate the "from". (P.S. I want "I have vague memories of Equinox complaining" as my epitaph.) Equinox 03:20, 19 January 2018 (UTC)
Here is the etymology for English man:
From Middle English man, from Old English mann (“human being, person, man”), from Proto-Germanic *mann- (“human being, man”), probably from Proto-Indo-European *mon- (“man”) (compare also *men- (“mind”)).
Now here it is without "from"
Middle English man, Old English mann (“human being, person, man”), Proto-Germanic *mann- (“human being, man”), probably Proto-Indo-European *mon- (“man”) (compare also *men- (“mind”)).
See why we need "from" ? Leasnam (talk) 03:23, 19 January 2018 (UTC)
At one point in time we were using <'s in place of "from", but then it began to feel impersonal and cold, so we reverted to using "from". I think "from" is easier to make sense of, especially in lengthy etymologies. If it's just: be- + glimmer, then it can do without the "from", but using the "from" in such circumstances increases consistency across all etymology formats. Leasnam (talk) 03:26, 19 January 2018 (UTC)
I don't see why we need the first one. As I've also said before, etys are a sort of "family tree" and we don't typically include the entire thing in every entry (e.g. we wouldn't/shouldn't include the entire history of "fragment" at "defrag"). I suppose we await better visualisation technologies where you can scan and zoom through a sea of floating words linked by lines or something. (I am serious.) Type "etymology of car" into Google to see their primitive (but quite nice) attempt, which does not use the word "from" at all. Equinox 03:27, 19 January 2018 (UTC)
I remember those ">" etymologies. Yuck. --Victar (talk) 04:03, 19 January 2018 (UTC)
  • @DTLHS: Just so you know that I amended your comment and added a link to make it clear that there is a real-life example that you are citing rather than a hypothetical. —Justin (koavf)TCM 03:29, 19 January 2018 (UTC)
[edit conflict x3...] Strong oppose. Cutting out technically unnecessary words usually results in something taking more brainpower to read, not less. It would also create new problems. For instance, if I understand the suggestion correctly, this...
Borrowed from French rendez-vous, from rendez, second person plural, imperative, of to go (to) + you.
...would become this...
Borrowed from French rendez-vous, rendez, second person plural, imperative, of to go (to) + you.
...implying that "rendez-vous" and "rendez" are simply forms of the same word somehow, the way various forms of the Middle English ancestor are listed at seize (in the case of rendezvous, it's easy enough to figure out, but there are plenty of cases where it would be more confusing). If this is only about removing the initial "from," I think that's not much of an issue, but I don't think there's any point to banning it. Don't fix it if it ain't broke, as they say.... Andrew Sheedy (talk) 03:30, 19 January 2018 (UTC)
Is it something that can be agreed upon that "From de- + frog." etymologies are not necessary, and should just be said as "de- + frog"? PseudoSkull (talk) 03:35, 19 January 2018 (UTC)
I agree that we should leave it as is. Currently, it is optional to leave off the "From" when it's clearly inferred, but it in no way is illegal to add it, because it really does belong there Leasnam (talk) 03:38, 19 January 2018 (UTC)
I don't think we should actually ban it. But also it shouldn't be compulsory to stick "from" on the front of every simple templated ety. Someone used to do that; haven't seen it recently and can't remember who. Equinox 03:49, 19 January 2018 (UTC)
Maybe me? I do this sometimes. I think they should be proper sentences, personally, though I wouldn't revert if someone removed a "From". Ƿidsiþ 14:01, 1 February 2018 (UTC)
Personally, I think "from" or "of" at the start of an etymology should be compulsory. I find it make the most grammatical sense, and I think the etymologies should be proper sentences, not just mechanical hierarchies. --Victar (talk) 04:00, 19 January 2018 (UTC)
But you must admit that if something is compulsory then it might as well be automated (why should users type the same thing every time? we don't have to type the page's HTTP headers). I am sure we can make templates like "compound" and "prefix" say "FROM" at the start of a line if we really want it. How many times do you type "From" in a year? I'm gonna get RSI six months early from typing "===Etymology===" half my life. Equinox 04:05, 19 January 2018 (UTC)
Even having "from" at the beginning doesn't make it a complete sentence. If simple etymologies being complete sentences becomes a thing, check this out: "The word defrog was formed by taking the noun frog and appending the prefix de- to the beginning." That sounds like way too much to write for just an etymology. That looks kind of like how the French Wiktionary does it, btw. PseudoSkull (talk) 04:09, 19 January 2018 (UTC)
If you want fully automated etymologies you should probably write a template that can accommodate all the steps in one go, plus add some JS hooks so we can convert between Leasnam style and Equinox style at a whim. DTLHS (talk) 04:13, 19 January 2018 (UTC)
I'm sure that would be lovely but that would solve the problem "Leasnam and Equinox want to see slightly different etymologyies". It wouldn't solve any actual problem that affects most users. I'm also sceptical of "make it a user setting" in general because it tends to indicate some inherent flaw in the design. I could write about five paras on this but it's not necessary yet. Equinox 21:28, 19 January 2018 (UTC)
Actually, replying to Equinox, ===Etymology=== etc. can be added by accessing "Templates and Headers" when editing/creating an entry. Maybe "From" can be added the same way, by adding it to the available templates. DonnanZ (talk) 12:41, 19 January 2018 (UTC)
No, please let's not do that again. We finally got rid of that pesky automated text in front of {{bor}}, so let's not re-add the same kind of crap somewhere else. --Per utramque cavernam (talk) 12:59, 19 January 2018 (UTC)
I dislike {{bor}}, so I don't care what happens to it. DonnanZ (talk) 13:19, 19 January 2018 (UTC)
I would lose my mind if I had to use |nofrm=1 --Victar (talk) 14:01, 19 January 2018 (UTC)
I guess that means you dislike the use of "From", but that shouldn't stop other editors using it. It needn't be made compulsory. DonnanZ (talk) 14:18, 19 January 2018 (UTC)
Nope, the opposite. --Victar (talk) 17:35, 19 January 2018 (UTC)

Wayback Machine[edit]

I found a discussion from 2012 on the Wayback Machine saying it wasn't durably archived, and I find the reasoning for this flimsy. "The Web Archive is an Internet company that can disappear at any time" - OK, but do you know how many books have been lost to the ages? A library can burn down any time taking the one copy of an obscure book along with it, or it could be stolen, etc.. It's entirely possible Usenet could be lost to history. These are all big ifs. How likely is it that the Wayback Machine is going to disappear? It has lasted much longer than GeoCities and GeoCities was dying for much of its official lifespan already before they decided to put the final nail in the coffin, so it is not a fair comparison. GeoCities was considered to be a big deal back when the Internet was much smaller than it is now, and when the Internet was not nearly as old as it is now, the short time GeoCities was popular seemed longer than it was. Finsternish (talk) 23:09, 19 January 2018 (UTC)

WMF is now actively working with the Internet Archive. See Inviting IABot for a related BP discussion. Jberkel 15:27, 20 January 2018 (UTC)
IIRC sites archived at archive.org could be removed by adding or editing a robots.txt: Disallowing archive.org or bots would result in a removal of the archived site if crawled again. This would also mean that a new site owner could remove another site. heise.org (25.04.2017) mentioned this too:
"Internet Archive ignoriert künftig robots.txt [...] Immer öfter komme es auch vor, dass vormals archivierte Domains den Besitzer wechseln und in einer neuen robots.txt die Archivierung untersagt werden. Das heißt also, die archivierten Versionen einer Seite gehen offline, wenn die Seite vom Netz genommen wird. [...] Auf per Mail geäußerte Bitten, einzelne Inhalte aus dem Archiv zu entfernen werde aber reagiert".
Furthermore heise.org states that archive.org was going to ignore robots.txt, although it will still be possible to remove sites. -84.161.43.152 12:44, 31 January 2018 (UTC)

Hittite pronunciation[edit]

Should we abstain from giving Hittite pronunciations? In that case we should probably delete this category. --Tom 144 (𒄩𒇻𒅗𒀸) 00:56, 20 January 2018 (UTC)

I see no reason to abstain from giving them. Simply use {{a|reconstructed}}, and reference pronunciations where appropriate. —Μετάknowledgediscuss/deeds 19:12, 22 January 2018 (UTC)

IPA letter-spacing[edit]

My screen is not ideal, but, would you consider a bit of letter-spacing for IPA? Especially l and i come too close to other symbols. Thanks. sarri.greek (talk) 18:03, 24 January 2018 (UTC)

I feel like that's an issue between you and your browser and maybe your CSS style sheet (User:Sarri.greek/vector.css) rather than something that should be changed Wiktionary-side. —Mahāgaja (formerly Angr) · talk 19:02, 24 January 2018 (UTC)
.IPA { letter-spacing: 1px; } (adjust "1px" to other units as you wish). Also, I think that putting it at User:Sarri.greek/common.css might be better, because it's not tied to a certain skin. —suzukaze (tc) 19:11, 24 January 2018 (UTC)
Thank you @Mahagaja: @Suzukaze-c: I shall try. My hint though was about visitors & the default design. sarri.greek (talk) 00:01, 26 January 2018 (UTC)

Proto-Prakrit[edit]

Some of our Proto-Indo-Aryan (inc-pro) reconstructions (*ćʰoṭṭas, *grillas, *kuttas) actually represent Middle-Indo-Aryan rather than the stage of Old-Indo-Aryan preceding Sanskrit. It's not possible to project these reconstructions to actual Old-Indo-Aryan because Middle-Indo-Aryan has simplified consonant clusters and even dropped intervocalic consonants entirely in many cases. So, I (and some others; see User talk:AryamanA) think we should have a code for Proto-Prakrit (the name used in scholarly research) for these kinds of reconstructions.

I eagerly made the code pra-pro and moved two entries to CAT:Proto-Prakrit lemmas, but @Victar thought this kind of change should be discussed, and I guess he's right. Also @JohnC5, माधवपंडित, DerekWinters, Kutchkutch, CueIn, Sagir Ahmed Msa. —AryamanA (मुझसे बात करेंयोगदान) 16:47, 29 January 2018 (UTC)

Very technical and long discussion
I support this, it helps us to represent only Middle and New IA words better without requiring an OIA etymon. The concept of Proto-Prakrit has been used so it's got some ground. -- माधवपंडित (talk) 16:52, 29 January 2018 (UTC)
I think it's a no-brainer that we need a code for this, but probably the only question is if it should be Proto-Prakrit pra-pro or Proto-Middle-Indo-Aryan inc-mia-pro. I don't know enough to say which is most appropriate. --Victar (talk) 17:00, 29 January 2018 (UTC)
Prakrit isn't actually a language, but a bunch of languages. Their common ancestor is already Proto-Indo-Aryan. I'm not convinced that Proto-Prakrit even exists. —Rua (mew) 17:37, 29 January 2018 (UTC)
@Rua: Prakrit isn't a language, I know, I've made a good amount of Prakrit entries. The common ancestor of the Prakrits is Vedic Sanskrit, not Proto-Indo-Aryan. But a word like {{m|pra-pro|*ćʰoṭṭas}} is just not possible at the Vedic Sanskrit level, nor is it attested in any of the Prakrits. Nevertheless, there are reflexes in every family of New Indo-Aryan. So it stands to reason that the stage it can be reconstructed at is Middle-Indo-Aryan, or "Proto-Prakrit". —AryamanA (मुझसे बात करेंयोगदान) 19:48, 29 January 2018 (UTC)
If the issue is that we know that a form existed, but we don't know what form it is, then I don't think making an entirely separate language is the solution.
And yes, Proto-Indo-Aryan is the common ancestor of all Prakrits, because it's the common ancestor of all Indo-Aryan languages by definition. Are you saying there are Indo-Aryan languages not descended from Vedic? —Rua (mew) 19:54, 29 January 2018 (UTC)
@Rua: Dardic languages are thought to be a separate branch of PIA.[12] --Victar (talk) 20:04, 29 January 2018 (UTC)
@AryamanA Perhaps not a "Proto-Prakrit", which implies that this is before the languages were written. Perhaps something like "Common Prakrit" or just reconstruct unattested Sauraseni, Magadhi, and Maharashtri terms themselves, like Sauraseni Prakrit *𑀕𑁆𑀭𑀺𑀮𑁆𑀮 (*grilla), which I think is the most appropriate. DerekWinters (talk) 21:10, 29 January 2018 (UTC)
Or even perhaps a Pre-Prakrit, however, that is giving us the initial problem, where we cannot reconstruct beyond the MIA period due to the massive sound simplifications that had occurred. A Proto-Prakrit may in fact be meaningless, because it would be aiming to again go before the MIA period with a meaningful distinction, but a meaningful distinction would be OIA. I understand that it would facilitate grouping, and I'm not sure how we would go about doing that. DerekWinters (talk) 21:22, 29 January 2018 (UTC)
Vedic was part of a dialect continuum with the ancestors of the MI languages. We know this because the Rig Veda contains forms that appear to have gone through Middle Indic phonology before coming into the Rig Veda. Middle Indic also has reflexes of the thorn cluster that Vedic lacks, implying that the specific dialect of the Vedas cannot be the unitary ancestor of the Prakrits. The question is whether we set up Middle Indic as a dialectal sister of Vedic or just assert that sa contains Vedic Sanskrit, Classical Sanskrit, and the Middle Indic dialects which were also around at the time of the composition of the Rig Veda but which are mostly unattested. I could be convinced either way. —*i̯óh₁nC[5] 21:19, 29 January 2018 (UTC)
Also, does this new protolanguage, whatever its name be, contain Ashokan Prakrit, Gandhari, Pali, and (H)elu (see here)? —*i̯óh₁nC[5] 21:27, 29 January 2018 (UTC)
There still seems to be a hole in the logic here. If Vedic Sanskrit is the common ancestor of the Prakrit languages, then it is itself Proto-Prakrit. Any later language can't be Proto-Prakrit, because there will be dialectal variations. The reflexes of syllabic r come to mind. As far as I know, these reflexes already differed by the time Sanskrit proper was written down, which necessarily makes Proto-Prakrit older than Sanskrit proper. To ask the point in another way, what was the Proto-Prakrit term for Prakrit? —Rua (mew) 21:37, 29 January 2018 (UTC)
The reasoning here is that the different Prakrits underwent the same changes that were, although descended from OIA, not present in it. So a need for an intermediate proto-language becomes necessary. If all the descendants show a particular effect but their ancestor does not have it, do we not assume an intermediate ancestor which underwent this change and from which the new descendants come? Anyway, if "Proto-Prakrit" is still problematic, we can have "Proto-Middle-Indo-Aryan" or something like that. -- माधवपंडित (talk) 01:09, 30 January 2018 (UTC)
Ah, let me clarify my point. If we propose this language, it should not be a descendant of Vedic, because the attested Vedic dialect cannot be its ancestor; it would have to be a sibling of Vedic. For better or worse, we include Vedic in sa which muddies this issue considerably, since sa now contains the entire dialectal continuum. The question is what can be gained by adding this intermediate step between the supercode sa and the Prakrits. This feels a lot like Vulgar Latin, namely the unattested dialect of a super language which spawns all of the languages descendants. —*i̯óh₁nC[5] 21:50, 29 January 2018 (UTC)
@JohnC5 In that case, Common Prakrit, or even just "Reconstruction:Prakrit/blah" would be better. A word like *ćʰoṭṭas is just not possible at the Vedic Sanskrit level, it's clearly at the Middle Indo-Aryan stage.
Pali, Ashokan Prakrit, and Elu should definitely be included, but I'm not sure about Gandhari, since it's supposed to have Dardic affinity.
@Rua The syllabic "r" always became a short vowel (a, i, u) in all of the Prakrits. It's not meaningful dialect differentiator. As JohnC5 mentioned, the outcome of PIE thorn clusters is the only real difference, and that too was standardized to ch in Maharastri and kh in the other Prakrits. —AryamanA (मुझसे बात करेंयोगदान) 21:55, 29 January 2018 (UTC)
But different dialects have different vowels, right? Therefore, a single common vowel can't be reconstructed for this Proto-Prakrit. And if you instead posit a syllabic r for Proto-Prakrit, then you're already back at Vedic. According to w:Grammar of the Vedic language#Phonology, the development of the epenthetic vowel occurred between Vedic and proper Sanskrit. —Rua (mew) 21:58, 29 January 2018 (UTC)
The attested Prakrits do have their individual differences so this appears to be the reasoning behind keeping them as separate languages rather than a single Prakrit language even if they are sometimes identical. Since Old Indo-Aryan is really old, there must be an unattested stage before all those individual differences developed and after the antiquity of Old Indo-Aryan. How concurrent this unattested stage was with Vedic Sanskrit is not clear, but there was probably at least one unattested Prakrit in the Vedic era parallel to or derived from Vedic Sanskrit. Regardless of how it is implemented, there should be a way to have entries for this intermediate unattested stage regardless of whether it is "Reconstruction:Prakrit/blah" or Proto-Prakrit *blah, but it is definitely not Vedic Sanskrit. Reconstructing unattested Sauraseni, Magadhi, and Maharashtri terms themselves, like Sauraseni Prakrit *𑀕𑁆𑀭𑀺𑀮𑁆𑀮 (*grilla) for Hindi गीला (gīlā), Punjabi ਗਿੱਲਾ (gillā), Nepali गिलो (gilo) is inefficient since there is probably a Maharastri Prakrit 𑀕𑀺𑀮𑁆𑀮 (gilla) that led to Marathi गील (gīl), Marathi गिलगिलीत (gilgilīt), and a Magadhi Prakrit 𑀕𑀺𑀮𑁆𑀮 (gilla) that led to Assamese গেলা (gela) so why should there not be a {{cog|pra-pro|*𑀕𑁆𑀭𑀺𑀮𑁆𑀮|tr=*grilla}} as the ultimate ancestor. Kutchkutch (talk) 05:21, 30 January 2018 (UTC)
As I said, Vedic is already the real Proto-Prakrit. What you reconstructed is just an anachronism, we know it never existed. We should not be creating languages that we know didn't exist, and that consideration should certainly not be subserviant to the convenience of reconstruction. —Rua (mew) 11:34, 30 January 2018 (UTC)00

──────────────────────────────────────────────────────────────────────────────────────────────────── @Rua, Kutchkutch: (deindenting) The name "Proto-Prakrit" is a total misnomer, my bad (and you are right, it would really just be Vedic Sanskrit). A better name would just be "Middle Indo-Aryan". That way, we could have a page like Reconstruction:Middle Indo-Aryan/gillas, which is an accurate transcription of the ancestor of those New Indo-Aryan words at that stage. The code inc-mia is a good idea as @Victar suggested. —AryamanA (मुझसे बात करेंयोगदान) 22:21, 30 January 2018 (UTC)

But that runs into the same problem. It's a fictitious language, as the variety of Prakrits attests to. I'm not ok with creating a language that never existed. —Rua (mew) 22:23, 30 January 2018 (UTC)
@Rua: So what should be done with an entry like {{m|pra-pro|*ćʰoṭṭas}}? To be calling it PIA would be committing a graver error than having a Proto-Middle Indic code. Turner's dictionary also occasionally reconstructs MIA proto-forms from attested OIA forms, btw. Although reconstructing a Sauraseni term would be an answer, what about it when all New Indo-Aryan languages have a particular word, not just the descendants of Sauraseni? Are we going to go and reconstruct the same term for Sauraseni, Maharastri and other Prakrits and not acknowledge that they have a common source? -- माधवपंडित (talk) 02:18, 31 January 2018 (UTC)
@AryamanA, माधवपंडित Are there any attestations of what we might identify as proto-Prakrit or proto-MIA? --Victar (talk) 07:16, 31 January 2018 (UTC)
@Victar: There don't seem to be any. The confusion has been created because you have attestations of Sauraseni, Maharashtri etc and at the very next level, it's Sanskrit. So it all comes down to whether we believe that Sauraseni, Maharashtri etc independently developed from OIA. -- माधवपंडित (talk) 08:40, 31 January 2018 (UTC)

I would also like to add that there is an unused and redundant code for "Prakrit" proper. All the entries are either Sauraseni, Maharastri, Ardhamagadhi or Magadhi, but the "Prakrit" code continues to stay. What we can do is move that to a reconstructed namespace and use them only for terms which cannot be traced back to Old Indo-Aryan. In other words, we are not going to reconstruct the descendants of attested Old Indo-Aryan words or earlier Proto-forms (PII, PIE) which are proven to exist using other cognates. It going to be only for entries like *ćʰoṭṭas and reconstructions which can be found at CAT:Hindi terms inherited from Proto-Indo-Aryan. -- माधवपंडित (talk) 02:28, 31 January 2018 (UTC)


@AryamanA, माधवपंडित Is the following accurate? Proposed Moves:

Reconstruction:Proto-Indo-Aryan/bāppasReconstruction:Middle Indo-Aryan/bāppas
Reconstruction:Proto-Prakrit/kuttasReconstruction:Middle Indo-Aryan/kuttas
Reconstruction:Proto-Prakrit/ćʰoṭṭasReconstruction:Middle Indo-Aryan/ćʰoṭṭas
Reconstruction:Sanskrit/ठग्ग्Reconstruction:Middle Indo-Aryan/ṭhagg
Reconstruction:Proto-Indo-Aryan/appasReconstruction:Middle Indo-Aryan/appas
Reconstruction:Proto-Indo-Aryan/grillas remains for Dardic and create Reconstruction:Middle Indo-Aryan/gillas

Kutchkutch (talk) 07:39, 31 January 2018 (UTC)

@Kutchkutch: Absolutely. -- माधवपंडित (talk) 08:40, 31 January 2018 (UTC)
@Kutchkutch: Please do not move anything yet. This is still an ongoing discussion. I also find your lengthy reply rather disruptive. --Victar (talk) 09:06, 31 January 2018 (UTC)
@Victar: The lengthy reply was necessary to be unambiguous. Nothing will be moved yet. The following is from Linguistic Archaeology of South Asia by Franklin Southworth with MIA * being Proto-Middle-Indo-Aryan:
"Tedesco . . . prefers to call this dialect parallel to Rigvedic ‘archaic Middle Indic’; probably ‘Proto-Middle-Indo-Aryan"
Marathi बहीण (bahīṇ) and Hindi बहन (bahan) ‘sister’ appear to be derived from an unattested OIA or MIA *baghini, with an irregular metathesis of /h/ in comparison with the attested OIA *bhagini.
OIA intervocalic [non-retroflex] stops are generally lost, and OIA final vowels are lost except to the extent that they contract with preceding vowels *garbhiṇikā → MIA *gabbhiṇi(y)ā- → Gujarati ગાભણી (gābhṇī)
“Genitive” postpositions derive from verbal participles; for example, the Punjabi ਰਾਮ ਦਾ (rām dā) ‘Ram’s’ ← MIA *rāmā diya ← OIA *rāmayā ditaḥ ‘given to Ram’, on the model of constructions like Tamil ரமன்-உதைய (ramaṉ-utaiya) ‘Raman’s’ (literally “Raman-owned”).
Marathi वाट (vāṭ)’s meanings of ‘enclosure’, ‘garden’, ‘compound’ etc. derive from MIA *var-tra
Origin and development of language in South Asia by Michael Witzel:
Sanskrit अस्ति (asti)MIA * asati ‘he is’ → Hindi है (hai)
Indo-Iranica, Volume 10, Issue 1
Old Indo-Aryan *jyotsnā → proto-Middle Indo-Aryan *jyosṇāPali dosinā, Prakrit 𑀚𑁄𑀡𑁆𑀳𑀸 (joṇhā)
Old Indo-Aryan *mr̥tsná- → proto-Middle Indo-Aryan *mr̥sna- → Middle Indo-Aryan 𑀫𑀲𑀺𑀡 (masiṇa) (Sanskritised into 𑀫𑀲𑀾𑀡 (masṛṇa))
Indo-Aryan 'six' by Alexander Lubotsky
'6' Sanskrit षट् (ṣáṭ) Proto-Middle-Indic *ṣvaṭ
'6th' Sanskrit षष्ठ- (ṣaṣthá-) Proto-Middle-Indic *ṣvaṣtha-
'16' Sanskrit षोडश (ṣoḍaśa) Proto-Middle-Indic *ṣoḍaśa
'60' Sanskrit षष्टि (ṣaṣṭí) Proto-Middle-Indic *ṣaṣti Kutchkutch (talk) 11:19, 31 January 2018 (UTC)
@Rua: Those are the middle Middle Indo-Aryan languages that have dialectical differences. The early middle Indo-Aryan languages (Pali, Ardhamagadhi, and Ashokan Prakrit) are way closer to each other. It's not crazy to say that they had a common ancestor post-Sanskrit. And why are you saying "fictitious language"? This isn't a new idea in literature at all.
I also think you are misled about the variety of the Dramatic Prakrits (Sauaraseni, Maharastri, Magadhi). They're all mutually intelligible (we know this because they were used within the same play by different characters depending on their social class). —AryamanA (मुझसे बात करेंयोगदान) 12:34, 31 January 2018 (UTC)
No, it was unnecessary, both in content and formatting, so thank you for removing some of it. --Victar (talk) 15:38, 31 January 2018 (UTC)

@Kutchkutch: Honestly, that amount of evidence should convince anyone. Can we settle on "Proto-Middle-Indo-Aryan" now? It's a bit long, but at least it's accurate. @Rua, Victar, माधवपंडितAryamanA (मुझसे बात करेंयोगदान) 00:06, 1 February 2018 (UTC)

I'm still not convinced that this newly proposed language actually ever existed, so I oppose until it's shown that it is. Either Vedic is the common ancestor of these languages, or it isn't. You can't have it both ways. It has already been established that Vedic is the common ancestor, implying that this other language never existed. —Rua (mew) 00:14, 1 February 2018 (UTC)
@Rua: Okay, so at what stage does *ćʰoṭṭas belong? Hint, it's not Vedic Sanskrit.
We need a code for common Middle Indo-Aryan, and you're really misunderstand what we mean. This language would not be the common ancestor of the NIA languages, rather it would be a common transcription of the word form in the various MIA dialects. In this case, *ćʰoṭṭas is the same in all the dialects, so yeah, it is "Proto-Middle-Indo-Aryan". —AryamanA (मुझसे बात करेंयोगदान) 00:27, 1 February 2018 (UTC)
It would belong to the various individual Middle Indo-Aryan languages, not to one single language. There was no single language spoken at the time, and to invent one is linguistically unsound. Yes, it may be convenient to have common ancestral forms, but that is not nearly as far as creating an entire new language out of thin air. I'm not disputing that something like *ćʰoṭṭas existed as a word spoken in the area somewhere, what I dispute is that there was a single language. It's like inventing something like Proto-Scandinavian *dag as a common form of Danish dag, Swedish dag and Norwegian dag. It's basically a conlang. So I remain opposed. —Rua (mew) 13:56, 1 February 2018 (UTC)
@AryamanA: I support this. They are descended from Vedic but they have a more recent common ancestor. -- माधवपंडित (talk) 01:11, 1 February 2018 (UTC)
@AryamanA, माधवपंडित, Rua, Kutchkutch We already agreed that we would take Sanskrit as meaning the collection of Old Indo Aryan dialects, not just "Classical Sanskrit". We also have noted that Ashokan Prakrit seems to be a common language across India, after Sanskrit stopped being spoken natively. The Dramatic Prakrits (and all their dialectal forms, attested or not) would then understandably derive from Ashokan. Since we are trying to reconstruct past the Dramatic Prakrits, would be fair to perhaps give these are reconstructions of Ashokan Prakrit? This would require a fairly expansive view of Ashokan Prakrit, but considering that our source material is only several pillars, I think it is a fair assumption. Especially because Ashokan Prakrit seems to be on the cusp of much of this simplification, note Ashokan Prakrit 𑀥𑀁𑀫 (dhaṃma) and 𑀥𑁆𑀭𑀫 (dhrama). I think this would be a fair and partially attested language to reconstruct up to. DerekWinters (talk) 01:18, 1 February 2018 (UTC)
@DerekWinters: I'm not sure if that's a good idea. Ashokan Prakrit was already breaking up, and there are already dialectical variations showing (e.g. 𑀥𑁆𑀭𑀫 (dhrama) is only attested on the Shahbazgarhi and Mansehra edicts that are in Northwest India; they have a Dardic affinity). I think the current grouping of Ashokan Prakrit as a single language is merely out of convenience since the attested corpus is so small. I totally agree though that any hypothetic Proto-Middle-Indo-Aryan would not be very far from Ashokan Prakrit, since that is the earliest attested MIA language (only some early Ardhamagadhi texts are even close to that level). —AryamanA (मुझसे बात करेंयोगदान) 01:23, 1 February 2018 (UTC)
@AryamanA I think especially because of the stronger dialectal variation it might fit exactly what we're looking for. And this way, we have a more seamless reconstruction path. Otherwise, would Proto Middle Indo Aryan derive from Ashokan Prakrit? Would it be parallel? If so, wouldn't that then not catch all Prakrits as a common ancestor (excluding the Gandhari and Elu)? I think it would be safer logistically, and also more sound in general, to use Ashokan Prakrit as our basis. Otherwise I feel there might just be a bit too much clutter. I wouldn't be opposed to something like *𑀕𑁆𑀭𑀺𑀮𑁆𑀮 (*grilla), and I think it's easy to figure in to our scheme. DerekWinters (talk) 01:29, 1 February 2018 (UTC)
Having checked Masica, 1991:
Aśokan Prākrits: various regional dialects of the third century BC (eastern, east-central, southwestern, northwestern), with the notable exception of the midland, recorded in the inscriptions of the Emperor Asoka on rocks and pillars in various parts of the subcontinent.
So I guess treating it as a language with many dialects is okay. And *𑀕𑁆𑀭𑀺𑀮𑁆𑀮 (*grilla) would actually make sense with Ashokan Prakrit's phonology, so that's definitely a plus. I think that's not a bad idea. —AryamanA (मुझसे बात करेंयोगदान) 01:34, 1 February 2018 (UTC)

|}

@माधवपंडित, DerekWinters, Kutchkutch, Victar, Rua Making Ashokan Prakrit (inc-ash) an ancestor of Sauraseni, Maharashtri, Ardhmagadhi, Magadhi, and Elu Prakrits and moving the later-stage PIA lemmas to "Reconstruction:Ashokan Prakrit/foo" is the best solution in my opinion. Ashokan constitutes an earlier stage of Middle Indo-Aryan (see e.g. Masica 1991) and also has well-documented dialectical variation that matches with the later Prakrits (e.g. Eastern dialects using "l" for "r", just like Magadhi). I think this discussion has dragged on too long, this is hopefully acceptable to everyone. —AryamanA (मुझसे बात करेंयोगदान) 02:49, 5 February 2018 (UTC)

Symbol support vote.svg Support -- माधवपंडित (talk) 02:56, 5 February 2018 (UTC)
Symbol support vote.svg Support for all but Elu. Is there evidence of it? (I'm willing to be convinced). Also, would a Proto-Dardic derive from here? If not, then something like *𑀕𑁆𑀭𑀺𑀮𑁆𑀮 (*grilla) would no longer be valid, as we would only be able to reconstruct gilla. DerekWinters (talk) 03:27, 5 February 2018 (UTC)
@DerekWinters: Actually, we can remove Elu for now; I assumed it is descended from Maharashtri but I'm not sure. We should add Gandhari (pgd) though because it corresponds perfectly with the Mansehra and Shahbazgarhi incscriptions (comprising the Northwest dialect of Ashokan Prakrit). —AryamanA (मुझसे बात करेंयोगदान) 16:41, 5 February 2018 (UTC)
@AryamanA: There is definitely a need for reconstructed middle Indo-Aryan lemmas. So if this is a solution that everyone can agree on, it is certainly better than no solution at all. There are a few more things to consider that have not been mentioned yet. Has anyone ever made a case or provided evidence for Ashokan Prakrit being the ancestor of other Prakrits? Reconstructed lemmas even if they are not really Ashokan Prakrit would be need to be sufficiently distinguished from attested Ashokan Prakrit lemmas. Perhaps Reconstruction:Ashokan_Prakrit/blah is sufficient for now.
Southworth: “It is reasonable to assume that along with the attested literary Prakrits there were also ‘colloquial Prakrits’ which never appeared in writing”
The reconstructed lemmas would be an attempt to represent these unattested Prakrits. These reconstructions would be merged with the language attested as Ashokan Prakrit. In this example: proto-Middle Indo-Aryan *jyosṇā → Pali dosinā reconstructed Ashokan Prakrit would be the ancestor of Pali, which is not currently proposed. I also always assumed Elu Prakrit descended from Maharashtri Prakrit as the ancestor of Insular Indo-Aryan languages such as Sinhala and Maldivian but there is still uncertainty.
Chandralal: “Even after extensive research, scholars have acknowledged that there remains much uncertainty as to the exact location of the geographical places relating to the first Aryan settlements in Sri Lanka…there is a the possibility of a of two streams of immigration, one from Gujarat and the other from Bengal…the early settlers got their first batch of wives from South India, which led the way in bringing the first Dravidian Influence…the introduction of Buddhism to Sri Lanka gave a strong Aryan character to the language and motivated some scholars assume that the Prakritic dialect originally brought to Sri Lanka was Magadhi…Up to the end of the Eighth Century the Sinhalese had free communication with the North Indians…thereafter such communication began to decline gradually”
Kutchkutch (talk) 02:23, 6 February 2018 (UTC)
@Kutchkutch: Ashokan Prakrit was indeed a (somewhat) colloquial Prakrit. The edicts were meant to be read and understood by people, unlike the dramatic Prakrits which were purely stylistic. I forgot Pali, but yes, it should be a descendant of Ashokan.
So if everyone agrees, we can start moving stuff. —AryamanA (मुझसे बात करेंयोगदान) 02:34, 6 February 2018 (UTC)
I defer to everyone else's expertise. --Victar (talk) 03:20, 6 February 2018 (UTC)

It is done. —AryamanA (मुझसे बात करेंयोगदान) 01:11, 7 February 2018 (UTC)

Translations for a template at Latin Wikisource[edit]

I was recently doing some technical "fixes" to

https://la.wikisource.org/wiki/Formula:Navigatio

I'd appreciate being able to add some suitable wording to account for unsupplied information, and suitable categories to track that data..

Anyone here able to assist in providing suitable translations for these english phrases

  • " Navigato template used with no author supplied."
  • " Navigato template used with no chapter or section specified."
  • " Navigato template used with no work or parent work specified."
  • "Pages using this (Navigato) template"

I'm not fluent in latin at all and so would appreciate the community being able to assist in getting good translations rather than the mediocre ones from Google.

Longer term I's also like to have some typography templates with suitable name equivalances for :

  • Italic block
  • (Small, smaller, tiny, really tiny etc) block
  • (large, larger, huge, really huge etc.) block

So that certain obselete HTML tags can be replaced appropriately... ShakespeareFan00 (talk) 14:33, 30 January 2018 (UTC)

Have you asked at Latin Wikipedia? They probably have more people used to Wikimedia-related Latin prose composition than we do. —Mahāgaja (formerly Angr) · talk 15:02, 30 January 2018 (UTC)

Categorizing place names into CAT:Geography[edit]

The description for CAT:Geography is quite ambiguous: "X terms related to geography". Should/can place names be placed into the category? I find it quite unnecessary since we have other categories for place names. I think we should be a little more specific in terms of what this category should include. — justin(r)leung (t...) | c=› } 02:10, 31 January 2018 (UTC)

A "See also" link to the corresponding parent placename category would do a lot. DCDuring (talk) 02:33, 31 January 2018 (UTC)
I added such a link, like you could've just done yourself. Anyway, our longstanding practice is not to put places in this category, but the editor you've been having issues with does not seem disposed to respecting our practices. —Μετάknowledgediscuss/deeds 03:07, 31 January 2018 (UTC)
Others might disagree. In fact, if it were so obvious, then it would be part of the category-creation modules/templates/protocol to insert such links in all topical categories. DCDuring (talk) 12:00, 31 January 2018 (UTC)

Wiktionary Discord server[edit]

For those of you who would use Discord, I have made a Discord server for Wiktionarians to chat and whatnot. Discord is a rather new software primarily meant for gaming communities, but is very often also used for other sorts of communities, such as even other wikis. Here is the invitation to the server. The link is permanent, so don't worry about it expiring. I'm also more than willing to make Wiktionary administrators who join the server into administrators on the server. I hope to see some Wiktionarians there. That'd be pretty cool. Ciao! PseudoSkull (talk) 04:41, 31 January 2018 (UTC)

I've never used Discord--what is better or different about it versus IRC? —Justin (koavf)TCM 09:11, 31 January 2018 (UTC)

Western Armenian[edit]

(moved from Grease pit, where I started the discussion by mistake)

SIL has published ([13]) a new set of changes to ISO 639 codes. Some have been deprecated, some have been added. Among the new codes added is hyw for Western Armenian. Shall we follow suit and treat Western Armenian as a separate language from Armenian? If so, we already have a CAT:Western Armenian ready to go, and probably some of the other subcategories of CAT:Regional Armenian belong there too. But of course we're not required to slavishly follow SIL in all things. I know nothing about Armenian and have no opinion on the matter myself; I just wanted to bring this to the attention of all editors. —Mahāgaja (formerly Angr) · talk 09:09, 31 January 2018 (UTC)

This [i.e. Grease pit, since moved to Beer parlour] isn't the best place to discuss it, but now that you've brought it here, so be it. AFAICT, we have never felt a lexicographical need to separate the Armenian lects. @Vahagn Petrosyan can speak more at length about that. —Μετάknowledgediscuss/deeds 09:13, 31 January 2018 (UTC)
We have evidence now that languages and their variety thrive under one L2 header. Of course, they could be split but Western Armenian is handled under "Armenian" just fine, like Chinese, Albanian or Serbo-Croatian. We can't say the same about Norwegian or Arabic varieties, even if the situations are different with availability of resources and editors and language complexities, of course. Urdu, partially borrowing the logic of Hindi templates and transliterations got a serious boost just thanks to Hindi-savvy editors. Look at Chinese lects. From nothing or a couple of hundred of entries to many thousands. Repeat ping for @Vahagn Petrosyan, since the topic has moved. --Anatoli T. (обсудить/вклад) 09:59, 31 January 2018 (UTC)

The ISO 639-1 code hy and the ISO 639-3 code hye cover all varieties of modern Armenian, including the standard Eastern and Western literary languages and the many dialects. The vocabulary of both literary languages is based on Old Armenian and is largely the same. They have converged even more after independence. The main differences are in grammar and pronunciation. Lexicographically, all varieties are easily handled under the same ==Armenian== header. The differences are taken care of by Module:hy:Dialects, context labels, accent qualifiers and different inflection templates. As far as I know the push for the new code came from Wikipedia editors who wanted a separate Western Armenian version. In Wiktionary, we do not share their concerns. --Vahag (talk) 13:11, 31 January 2018 (UTC)

merlin[edit]

In a complicated entry like this, where should those alternative forms go? Usually we put them at the top of entries, no? ---> Tooironic (talk) 10:59, 31 January 2018 (UTC)

When there are two etymologies, and the alternative forms apply to only one, it's OK to put them inside the relevant etymology section. You can list them horizontally rather than vertically by using {{alter}}: {{alter|en|foo|bar|what|ever}}. —Mahāgaja (formerly Angr) · talk 12:26, 31 January 2018 (UTC)

Entries for Japanese verb and adjective forms[edit]

As discussed before in 2014 and 2017:

If you don't mind, please figure out the best way to create entries for Japanese verb and adjective forms and then create those entries.

Sometimes I've been trying to read Japanese text. With my limited knowledge of this language, I have to seek the lemma of conjugated words somehow and then navigate conjugation tables to find the correct form.

If I didn't know the meaning of a conjugated English word like ate, Wiktionary would do that work and present the information in a separate entry. Please do the same for Japanese. Thanks in advance. --Daniel Carrero (talk) 12:01, 31 January 2018 (UTC)

@Eirikr, suzukaze-c, TAKASUGI Shinji, Wyang. --Daniel Carrero (talk) 12:09, 31 January 2018 (UTC)
I think the best approach, as mentioned in the July BP thread, is to start by ensuring that the tables include the forms needed by learners. We would then use the table forms as the basis for botting verb-form entry creation as appropriate.
@Daniel, for what you specifically are looking for, do the current verb conjugation / adjective inflection tables include the forms you need? If not, what about suzukaze's mock-ups? ‑‑ Eiríkr Útlendi │Tala við mig 17:01, 31 January 2018 (UTC)
Thanks for the questions. My opinion is this: Overall, I prefer the table design that is already in entries (e.g. 書く). The mock-up #1 seems to be just a stub. I don't like the mock-up #2 design with a mixture of labels and actual conjugations everywhere; I find it distracting. I believe I found one big problem with the mock-up #2: sometimes a reader could be confused as to whether a label applies to what is above or below it. For example, in reality "formal" applies only to "書かきます (kakimasu)" below it, but the table design makes it seem that "formal" maybe could apply to "書かき (kaki)" above.
Aside from that, I would suggest editing the table currently used in entries by changing "Volitional" to "Volitional (let's...)" and adding other English notes like this, based on the mock-up #2.
I like that the mock-up #2 has some conjugations not currently found in the entry, like 書いている. Overall, I suggest expanding the table currently found in entries to cover these additional conjugations, but I don't have the expertise to know if some of these should or shouldn't be there for some reason. I would even add the dictionary form (書く) itself to the table because it's a valid verb "form". Conjugation tables in most or many languages already include the dictionary form, like in the multiple languages of amar (apparently Ido is an exception for some reason).
I don't recall any specific conjugation that I didn't find in the tables. I'll let you know if there's any in the future. --Daniel Carrero (talk) 18:09, 31 January 2018 (UTC)
Thank you Daniel, that's good input. As time allows, I will try my hand at another mockup incorporating your comments -- though as busy as I've been lately IRL, suzukaze may well beat me to it.  :) ‑‑ Eiríkr Útlendi │Tala við mig 01:55, 1 February 2018 (UTC)
1 is stub-like but it's the same information that is in the first half of our existing conjugation tables. (An idea I haven't implemented yet is that we include forms that incorporate these central six forms, making it less stubbish, like our current conjugation tables but sorted.)
Confusion over the label in layout 2 is definitely possible. I'll think about what I can do (if we even want to use that layout).
@Eirikr: Maybe, maybe not :p
suzukaze (tc) 17:59, 1 February 2018 (UTC)

The conjugation tables seem to be missing the "I want..." form. For example: 触る -> 触りたい. --Daniel Carrero (talk) 21:10, 13 February 2018 (UTC)

February 2018

Incorrect by extensions at hack[edit]

"Hack" in the computer sense initially meant a creative solution. It was only later that it came to mean compromising someone's Hotmail account or stealing files from three-letter agencies. Since we have it backwards, I'm soliciting feedback for how to fix it. Thoughts? —Justin (koavf)TCM 02:51, 1 February 2018 (UTC)

I fear this is a part of a larger project. The relationships among the various senses and etymologies of hack and hacker are not at all settled. See, for example “hack” in Douglas Harper, Online Etymology Dictionary, 2001–2018.. Also, is hack ("cough") onomatopoetic? DCDuring (talk) 05:58, 6 February 2018 (UTC)
To me, the verb hack in this sense is odd. I only recall hearing it used as a noun: 10 hacks to improve something or other (where hack means "clever idea"). If someone wants to hack their love life, I would probably understand it to mean they want to stop it. —Stephen (Talk) 10:34, 6 February 2018 (UTC)
See this "hack your love life" Google Search. DCDuring (talk) 15:48, 6 February 2018 (UTC)
@Koavf, one solution would be to put the computing definitions (and their extensions) in chronological order of their development, grouping related terms together. So first "hacking" for expert coding (and thence optimising daily processes), and then "hacking" for breaching security.
Current order of verb:
  1. (transitive, slang, computing) To hack into; to gain unauthorized access to (a computer system, e.g., a website, or network) by manipulating code; to crack.
  2. (transitive, slang, computing) By extension, to gain unauthorised access to a computer or online account belonging to (a person or organisation).
  3. (computing) To accomplish a difficult programming task.
  4. (computing) To make a quick code change to patch a computer program, often one that, while being effective, is inelegant or makes the program harder to maintain.
  5. (transitive, colloquial, by extension) To apply a trick, shortcut, skill, or novelty method to something to increase productivity, efficiency or ease.
  6. (computing, slang, transitive) To work with something on an intimately technical level.
Improved order (in my opinion):
  1. (computing) To make a quick code change to patch a computer program, often one that, while being effective, is inelegant or makes the program harder to maintain.
  2. (computing) To accomplish a difficult programming task.
  3. (computing, slang, transitive) To work with something on an intimately technical level.
  4. (transitive, colloquial, by extension) To apply a trick, shortcut, skill, or novelty method to something to increase productivity, efficiency or ease.
  5. (transitive, slang, computing) To hack into; to gain unauthorized access to (a computer system, e.g., a website, or network) by manipulating code; to crack.
  6. (transitive, slang, computing) By extension, to gain unauthorised access to a computer or online account belonging to (a person or organisation).
Current order of noun:
  1. (computing, slang) An illegal attempt to gain access to a computer network.
  2. (computing, slang) A video game or any computer software that has been altered from its original state.
  3. (computing) An interesting technical achievement, particularly in computer programming.
  4. (computing) An expedient, temporary solution, such as a small patch or change to code, meant to be replaced with a more elegant solution at a later date.
  5. (colloquial) A trick, shortcut, skill, or novelty method to increase productivity, efficiency or ease.
Improved order (in my opinion):
  1. (computing) An expedient, temporary solution, such as a small patch or change to code, meant to be replaced with a more elegant solution at a later date.
  2. (computing) An interesting technical achievement, particularly in computer programming.
  3. (colloquial) A trick, shortcut, skill, or novelty method to increase productivity, efficiency or ease.
  4. (computing, slang) An illegal attempt to gain access to a computer network.
  5. (computing, slang) A video game or any computer software that has been altered from its original state.
-Stelio (talk) 10:31, 21 February 2018 (UTC)
@Stelio: This is beautiful and probably better than anything I could have made. May I suggest that in the future, you don't use green/red in case any of the readers out there are color blind? —Justin (koavf)TCM 10:38, 21 February 2018 (UTC)
Indeed yes, I'm aware of colour blindness as a barrier for visual comparison; I take pains to distinguish colours on graphs I put in professional presentations and fully support the Web Accessibility initiative. The headers I used are meant as the main differentiators; the colouring was just for some quick visual impact. Red-blue is a safer combination, and one I usually use; mea culpa for publishing before thinking deeper. Fixed! -Stelio (talk) 10:48, 21 February 2018 (UTC)

February LexiSession: radio[edit]

This month, we suggest you to focus somehow on the words to talk about the radio.

Well, for those who do not know LexiSession yet, it is a collaborative transwiktionary experiment. You're invited to participate however you like and to suggest next month's topic. The idea is to look at other community improvements on the selected topic to improve our own pages. It already bring new collaborators to contribute for the first time on a suggested topic. If you participate, please let us know here or on Meta, to keep track on the evolution of LexiSession. I hope there will be some people interested this month, and if you can spread it to another Wiktionary, you are welcome to do so. Ideally, LexiSession should be a booster for every Wiktionary on the same agenda, to give us more insight into the ways our colleagues works in the other projects. Face-smile.svg Noé 09:45, 1 February 2018 (UTC)

We started with the creation of a thesaurus of radio in French and a thesaurus of waves in French. I'm eager to compare them with local Thesaurus:radio and Thesaurus:waves! For the last one, I am pretty sure we will not collect the same kind of vocabulary and that can be very interesting Face-smile.svg Noé 21:00, 5 February 2018 (UTC)

Middle Japanese[edit]

A new user, Kakiyomi2 (talkcontribs), has been creating entries for Middle Japanese (or Classical Japanese as the user calls it), but we don't have a code for the language. --Lo Ximiendo (talk) 16:37, 2 February 2018 (UTC)

That's me. I've added entries for かはす, かはる, かへる, かへす, かふ, and かめ, partly just to get interest going in this. I intend now to leave it some months to see what feedback it generates before starting to add more from my extensive materials.--Kakiyomi2 (talk) 17:18, 2 February 2018 (UTC)

@Eirikr, suzukaze-c, TAKASUGI Shinji, Wyang, Nibiko IMO they're beautiful entries. I would prefer the name Classical Japanese, as that's what much of the literature seems to call it, perhaps a code jap-cls? DerekWinters (talk) 18:14, 2 February 2018 (UTC)
For historical and political reasons, jap is generally eschewed in favor of ja or jp where a two-letter code might suffice, and jpn, where a three-letter code is needed.
In monolingual Japanese sources, the stage of the written language from roughly the Heian period (800s) through to the Meiji period (late 1800s) is broadly described as 文語 (bungo, literally literary language), in contrast to 口語 (kōgo, spoken language, vernacular, literally mouth language). There is precedent for using some variation of the term literary, as in the three-letter code ltc for Middle Chinese / Classical Chinese (presumably derived from literary Chinese). By extension, I'd prefer ltj if we can use just a three-letter code. If we need a 3-3 code, I'd suggest jpn-ltj. ‑‑ Eiríkr Útlendi │Tala við mig 19:24, 2 February 2018 (UTC)
PS: To my knowledge, the ISO only has codes for Old Japanese (ojp) and Japanese (ja). I'm not aware of any extant standardized codes for anything in between circa 800 and the modern era. ‑‑ Eiríkr Útlendi │Tala við mig 19:30, 2 February 2018 (UTC)
We do not get to make up two or three letter codes. Japanese is ja as a two-letter, ISO 639-1 code, and jpn as a three-letter, ISO 639-2/3 code. "jap" is an obsolete code for Madi, now merged into Yamamadi (jaa).
If we need a three letter code, qaa–qtz are reserved for local use. "ojp covers "7th-10th centuries AD", according to the Linguist List, which basically controls the extinct section of ISO 639-3. Japanese is listed as the child of Old Japanese, so presumably it covers everything from then until now.--Prosfilaes (talk) 21:50, 2 February 2018 (UTC)
Right, we can't make up a two- or three-letter code (because the ISO might later assign it, and besides it'd be confusing). If a code is needed, the customary naming scheme, described in Wiktionary:Languages, is to use the nearest ISO family code and then three letters that approximate the language named, so the code should be "jpx-ltj" if we call it "Literary Japanese", or "jpx-mja" if we call it "Middle Japanese", or something else starting with "jpx-". - -sche (discuss) 23:04, 2 February 2018 (UTC)
Re: codes, thank you both for the pointers. I dimly remembered that there was a mechanism for creating our own codes (the prefix of three letters from qaa through qtz), but I encounter such issues so rarely that I couldn't recall any useful details. I'm happier with the jpx- prefix, as that's a lot easier to remember than anything beginning with q.
Re: dating, there's some terminology confusion. OJP is variously described in English as including everything textual prior to the Heian period (i.e. 794 and before), or up through the end of the Heian period (1185), or until some relatively arbitrary point in the middle of the Heian period (probably where Linguist List gets its dating). For EN WT purposes, so far as I've understood it, we're using the earlier dating, in alignment with Japanese sources. The main inflection point in the development of the language is the loss of certain vowel distinctions recorded using w:Jōdai Tokushu Kanazukai, which shift was apparently complete by the start of the Heian period. The EN WP article on w:Early Middle Japanese describes some of this in more detail. (NB: Anything pre-historic, i.e. before the first texts, is usually described as Ancient Japanese, Proto-Japanese, or Proto-Japonic.) ‑‑ Eiríkr Útlendi │Tala við mig 00:32, 3 February 2018 (UTC)
It is not bad at all to use ojp if there is no other approprite code. In Middle Ages spoken Japanese changed but written Japanese stayed similar. — TAKASUGI Shinji (talk) 01:12, 3 February 2018 (UTC)
@Shinji: If ojp implies Jōdai Tokushu Kanazukai and the underlying vowel distinctions, then using ojp for later developments of Japanese could be unnecessarily confusing, no? ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 5 February 2018 (UTC)
@everyone who finds this thread relevant: I did some reformatting of the provisional entry at かへる, to: 1) bring the formatting more in line with WT entries in general and other JA entries more specifically; 2) add in kanji usage information in a way that mirrors JA WT and other monolingual dictionaries. I'm less certain about #2, since what I added isn't deeply researched (mostly I wanted to provide a quick-and-dirty visual example), and it's based on modern sources and historical kanji usage can be very divergent. Thoughts? ‑‑ Eiríkr Útlendi │Tala við mig 22:24, 5 February 2018 (UTC)
Also, I was just looking at Okinawan today, and the literature suggests an Old (first documented until early 1600s), Middle (1600s to 1800s), and Modern stratification. If you all think it's appropriate, we could add those languages as well. DerekWinters (talk) 18:16, 2 February 2018 (UTC)

Related: Status of hiragana entries[edit]

For modern Japanese, hiragana entries are only ever soft redirects to the kanji spellings (except for those words that have no associated kanji).

The かへる entry in its current state is laid out as the lemma for the classical form of modern 帰る (kaeru, to return, to go back to one's starting point, to go home, intransitive).

I see a few problems with this.

(For our readers unfamiliar with the Japanese language, I hope the above makes clear some of the lexicographical challenges inherent in the language and its writing system.)

I feel rather strongly that we should have the same policy for both modern Japanese and older stages of the language with regard to choosing lemma spellings.

I'd suggest one of two approaches:

  • Align Middle / Classical Japanese practice with modern Japanese, and use the kanji spellings for the lemma, with hiragana entries as soft redirects.
Pros:
  • We already have this practice, editors are used to it, and we can likely repurpose a good bit of the supporting infrastructure (templates, modules, etc.).
Cons:
  • As illustrated above, using kanji for the lemmata obscures the fact that, in many cases, we have one word spelled in multiple ways, with each spelling imparting a shade of meaning, but not fundamentally altering the basic theme.
  • We must also either duplicate a lot of data, or arbitrarily choose one kanji as the "main" and create the others as soft-redirect "alternative form" entries. This can also obscure relationships between senses and spellings.
  • When one kanji spelling has multiple readings, and more than one reading belongs to the same category, only the last reading on the page actually gets added to the category. This appears to be a fundamental flaw in the underlying MediaWiki database software. See 避く (saku, yoku, to dodge, to avoid) as one such example -- although both readings are marked for inclusion in [[Category:Japanese_shimo_nidan_verbs]], only the yoku reading actually appears on that page.
  • Drastically rework our approach to Japanese to use hiragana spellings as the lemma, breaking each derivation out under its own ===Etymology=== section, indicating on each sense line which kanji spelling is most commonly used. Kanji-spelling entries would instead be stubs redirecting to the hiragana entries.
Pros:
  • This aligns with the common practice of monolingual Japanese dictionaries, including JA WT, and is also closer to how many bilingual dictionaries function.
  • This is easier for learners, who may know how a word sounds (and can thus work out the hiragana), but might not know how to spell it in kanji.
  • This is easier for learners when looking for the various meanings that might apply to a particular verbal or otherwise-non-kanji context. In our current setup, unless the hiragana entries include glosses for all the kanji spellings, users have to click through each separate spelling to try to find the appropriate meaning. Maintaining glosses in multiple places can be difficult.
  • Categories will index more appropriately. While a single kanji spelling might have multiple readings that must all be indexed within the same category (but cannot be due to the software), a single hiragana spelling is already the reading, and will thus only need to be indexed once within the same category.
Cons:
  • We'd need to rework all of our existing entries and infrastructure.

In terms of simple numbers of pros versus cons, it seems clear that hiragana spellings would be the better choice. However, that one con is a huge one. If we were starting from scratch, I'd definitely argue wholeheartedly that we go that route. Given the current state, I still argue in favor of hiragana spellings, for both modern Japanese and older forms, albeit with an awareness of the enormousness of the work required to convert our existing entry base.

Thoughts? ‑‑ Eiríkr Útlendi │Tala við mig 20:41, 2 February 2018 (UTC)

If hiragana spellings are the lemmas in monolingual Japanese and many bilingual dictionaries, then I think that already speaks in favour of that approach. —Rua (mew) 21:04, 2 February 2018 (UTC)
Some past suggestions:
 :) Wyang (talk) 07:20, 3 February 2018 (UTC)
I also like hiragana entries as far as Classical Japanese is concerned. Modern Japanese officially uses mixed spellings and we can very easily check real usage. — TAKASUGI Shinji (talk) 07:41, 3 February 2018 (UTC)

POS headers[edit]

In Wiktionary:Entry_layout#Part_of_speech it says "Some POS headers are explicitly disallowed:" which includes "Abbreviation" and "Initialism" but it doesn't suggest what should be used instead in those cases. And of course they end up in a cleanup category. DonnanZ (talk) 14:04, 3 February 2018 (UTC)

Please use the actual part of speech as if it were a normal word or phrase used in the same context, like PC (personal computer) is "Noun". SNAFU has both "Phrase" and "Noun". --Daniel Carrero (talk) 14:12, 3 February 2018 (UTC)
OK, I guess that can be done, but a guidance note there wouldn't go amiss. Another odd one is "Symbol" which isn't even mentioned, but is used at Translingual CH. DonnanZ (talk) 14:20, 3 February 2018 (UTC)
Absolutely, I support having some guidance note. But actually "Symbol" is mentioned, in the part that says "Symbols and characters: Diacritical mark, Letter, Ligature, Number, Punctuation mark, Syllable, Symbol". --Daniel Carrero (talk) 14:25, 3 February 2018 (UTC)
Heh, so it is, even though CH may be an abbreviation.... DonnanZ (talk) 14:37, 3 February 2018 (UTC)
I think it makes sense to say that "CH" is a "Symbol", because chemical elements and formulae are not phrases and so don't use nouns; they use letters as symbols, which may be in diagrams as opposed to text. That's just my personal interpretation. Feel free to disagree if you want. Aside from that, it seems normal in the English Wiktionary to use "Symbol" for chemical elements and formulae, so at least it's consistent if nothing else. --Daniel Carrero (talk) 16:10, 3 February 2018 (UTC)
Another one: Even though "Idiom" is specifically disallowed, it can still be selected if you use NEC (new entry creator). I didn't check if there are other disallowed ones there, I think there are. DonnanZ (talk) 14:42, 4 February 2018 (UTC)
What should be used instead of Idiom? There are many such lexical items in Japanese that don't fit into POS categories (they aren't nouns, verbs, adjectives, etc, but rather four-character set phrases in some cases, or whole sentences in others). ‑‑ Eiríkr Útlendi │Tala við mig 18:20, 7 February 2018 (UTC)
Many entries use "Definitions", which doesn't have a voted consensus. What should be done with those? —Rua (mew) 18:25, 7 February 2018 (UTC)
So far I've only encountered a Definitions header in Chinese entries, for which the Chinese editor community has made a strong argument (as I've understood it, largely based on Chinese terms not fitting nicely into POS categories). Is this header in use in the entries of any other languages? ‑‑ Eiríkr Útlendi │Tala við mig 18:29, 7 February 2018 (UTC)

Cuneiform and Unicode[edit]

I tried looking up a sentence in Hittite, but I've had trouble finding it. I've began to suspect that the signs don't match. Look at the thirty third line in KBo 6.3 i.

The text is transliterated as:

ku-iš-ma-kán ke-e-el tup-pí-aš 1-an-na me-mi-an wa-aḫ-nu-zi na-an-kán ku-u-uš li-in-ki-ya-aš DINGIR.MEŠ-eš ar-ḫa ḫar-ni-in-kán-zi.

If we use the characters provided by Unicode we get:

𒆪𒅖𒈠𒃷 𒆠𒂊𒂖 𒁾𒁉𒀸 𒁹𒀭𒈾 𒈨𒈪𒀭 𒉿𒄴𒉡𒍣 𒈾𒀭𒃷 𒆪𒌋𒍑 𒇷𒅔𒆠𒅀𒀸 𒀭𒈨𒌍𒌍 𒅈𒄩 𒄯𒉌𒅔𒃷𒍣.

Apparently the cuneiform characters given by Unicode are wrong. I've recognized some of the erroneous signs, including "e", "ku", "an", "kan". I've also noticed that the Neo-assyrian sign for "an" mentioned here matches perfectly the Hittite "an" sign, while here in Wiktionary we write "an" as "𒀭". It looks as if Hittite was written in Neo-assyrian, but instead we are writing it in some earlier stage. As far as I've read it seems like Unicode doesn't have signs for the Neo-assyrian cuneiform.

The site assyrianlanguages.org also offers Hittite texts with their corresponding transliteration. I don't know what we're supposed to do in this kind of situation. --Tom 144 (𒄩𒇻𒅗𒀸) 15:05, 4 February 2018 (UTC)

Btw, wikipedia's article w:Hittite cuneiform is also based in Unicode. --Tom 144 (𒄩𒇻𒅗𒀸) 15:24, 4 February 2018 (UTC)

I am afraid this is a problem with all paleoscripts. There is not a Hittite Unicode but a cuneiform script not specific to any alphabet. Unicode block for cuneiform script does not cover all variants of different alphabets nor its allophones, only a standard representation of each glyph based on the most common language. Generally it is hard to reproduce an original inscription in any paleoscript with Unicode. --Vriullop (talk) 12:27, 5 February 2018 (UTC)
@Vriullop: In that case, should we use pictures as in Egyptian?--Tom 144 (𒄩𒇻𒅗𒀸) 01:42, 6 February 2018 (UTC)
Probably. Doing so should be easier for Hittite than for Egyptian, if the Hittite signs are not stacked in various ways like hieroglyphs are: we could probably just get images of every variant sign and make a template which would convert input text like "fu bar2" into the sign for "fu" and the second of two variant signs for "bar". Although he's busy (aren't we all?), I think @JohnC5 had interest in doing something similar for Italic languages and might be interested in this idea. Wikimedia Commons hopefully already posseses all the needed images. - -sche (discuss) 01:51, 6 February 2018 (UTC)
FWIW, I am just doing it for Iberian script: ca:Viccionari:Escriptura ibèrica. --Vriullop (talk) 09:43, 6 February 2018 (UTC)
I'm afraid we don't have them all. I've looked up some online sources but they contradict each other. I'm thinking in trying to contact some known author, or would this be too much?--Tom 144 (𒄩𒇻𒅗𒀸) 23:20, 6 February 2018 (UTC)
You could try to make them yourself if you have the necessary expertise. DTLHS (talk) 23:25, 6 February 2018 (UTC)
@DTLHS: You mean as drawing and uploading them? --Tom 144 (𒄩𒇻𒅗𒀸) 15:17, 7 February 2018 (UTC)

Wikimania 2018 call for submissions now open[edit]

On behalf of the program commmittee of Wikimania 2018 - Cape Town, we are pleased to announce that we are now accepting proposals for workshops, discussions, presentations, or research posters to give during the conference. To read the full instructions visit the event wiki and click on the link provided there to make your proposal:

https://wikimania2018.wikimedia.org/wiki/Submissions

The deadline is 18 March. This is approximately 6 weeks away.
This year, the conference will have an explicit theme based in African philosophy:

Bridging knowledge gaps, the ubuntu way forward.

Read more about this theme, why it was chosen, and what it means for determining the conference program at the Wikimedia blog. Sincerely, Wittylama 08:22, 5 February 2018 (UTC)

News from French Wiktionary[edit]

Logo Wiktionnaire-Actualités.svg

Hello!

January issue of Wiktionary Actualités just came out in English!

Actualités reach the stars! Despite a missing admin, this edition offer you four articles: about thesauri in French and English wiktionaries; new words coined by French government; a funny dictionary and the suffix -gate. Surrounded by shiny pictures are the shorts, galactic stats, nice videos and a note about the last LexiSession. Big news: we include the stats for the quantity of pictures included in French Wiktionary and we plan to reach 100.000 this year!

This issue was written by eight people and was translated for you by Pamputt and I. This translation may be improved by readers (wiki-spirit) like it was last month by Stephen G. Brown and Xbony2 (thanks mate!). We still receive zero money for this publication and your comments are welcome. To celebrate this new year, I worked on a description of our workflow to explain how we do our journal, and I translated it in English for you! I'll be happy to help if you want to start your own journal here in the future Face-smile.svg Noé 20:53, 5 February 2018 (UTC)

Wiktionary:Criteria_for_inclusion#Formatting[edit]

Currectly Wiktionary:Criteria_for_inclusion have a "Formatting" section, but its content is unrelated to whether a term should be included. I propose to remove this section.--Zcreator (talk) 22:31, 5 February 2018 (UTC)

I agree- does this need a vote? DTLHS (talk) 23:08, 5 February 2018 (UTC)
do we want to move it to anywhere, like WT:ELE? - -sche (discuss) 01:38, 6 February 2018 (UTC)

User:Rua removing information from Portuguese entries[edit]

About a week ago, Rua tried to remove the distinction between “epicene” nouns and nouns and those with sociolinguistic variation in gender usage from the Portuguese noun headline module. This caused {{pt-noun}} to display incorrect information and filled Category:Portuguese nouns with varying gender with thousands of entries that were never intended to be there. She had already tried to do this many times, and as before I had to stop what I was doing to write a hasty fix.

More recently, she speedied {{pt-noun-form}}. The reason given was “Deleted per RFD, RFDO”. Where is the RFD? I recall that some HWL templates that were redundant to {{head}} were RFDed, but pt-noun-form was not completely redundant: it had a paramater that made it display information about metaphonic plurals and add the entry to the appropriate category. Rua had her bot convert the template, in some cases manually removing said parameter. As a result, Category:Portuguese metaphonic plurals is now empty and the information about metaphonic plurals is gone.

As many here may remember, Rua (then CodeCat) was pulling the same crap on our Thai content a while back, trying to meddle with languages that she doesn’t contribute to nor understands. I hope I can avoid a drama as big as that one, but I confronted her about the removal of information and she didn’t respond, so I have to post here. The bot guidelines say that an operator must undo damage caused by their bot, and that’s what she should do. — Ungoliant (falai) 13:34, 6 February 2018 (UTC)

Rua has committed three fouls, by my count. She has removed valuable information from entries, which is easily solved by doing a bot run to undo what she did. Secondly, she deleted a template that does not seem to have been RFDed (or at least it was not linked to in any RFDO discussions) as having failed RFD, which is a misrepresentation. Thirdly, she did not respond to Ungoliant's question, which is irresponsible both as an admin and as a conscientious fellow editor. These are all related to problems in the past, which she swore would be nullified by her seeking consensus. @Rua, Chuck EntzΜετάknowledgediscuss/deeds 17:19, 7 February 2018 (UTC)
The edit summaries on past revisions of MOD:pt-noun are definitely troubling. —AryamanA (मुझसे बात करेंयोगदान) 18:03, 7 February 2018 (UTC)
Spite fights like with "Redoing the work that was too hard for poor Ungoliant" are not what edit summaries are for, that's my two cents. mellohi! (僕の乖離) 20:57, 7 February 2018 (UTC)
I would like to update that she has edited this discussion page after this was posted, and therefore is actively ignoring this thread. —Μετάknowledgediscuss/deeds 21:50, 7 February 2018 (UTC)
I have no interest in participating in a show trial. I can only make it worse, so I'm staying quiet and awaiting the inevitable storm. —Rua (mew) 23:05, 7 February 2018 (UTC)
@Rua: No, you can make it better. I listed the solutions in my earlier comment. To make it abundantly clear: if you restore the template and do a bot run to replace it where you removed it (or choose another solution for denoting metaphonicity in those entries, it need not be that particular template), and acknowledge that you made a change without consensus and will avoid that in the future, I (and I expect everyone else, including Ungoliant) will be satisfied. The storm is not inevitable unless you choose it to be. There would never have been a BP thread if you had responded to the message on your talk page, and the thread need not continue if you choose to fix the problem. —Μετάknowledgediscuss/deeds 00:44, 8 February 2018 (UTC)
This is what I mean. All eyes are on me, I'm the only one who does everything wrong. —Rua (mew) 13:23, 8 February 2018 (UTC)
That's a straw man. You removed lexicographical information from dictionary entries, so this is a clear-cut issue. You can choose to fix it, or indulge in an ill-advised persecution complex. —Μετάknowledgediscuss/deeds 15:29, 8 February 2018 (UTC)
Category:Portuguese nouns with metaphonic plurals. Happy now? If you want me to cooperate in the future, I suggest cooperating with me, too, rather than working against me all the time. Good will is earned. —Rua (mew) 15:37, 8 February 2018 (UTC)
No, I'm not happy. There are entries where you removed the information and have not replaced it. It used to display (show the reader the information) on the plural form, which is the actual one affected by the process; now all we have is categorisation of the lemma (with no display there either). Indeed, it is true that you have to earn good will. —Μετάknowledgediscuss/deeds 15:55, 8 February 2018 (UTC)
@Rua, you are ignoring me. You have not fixed the problem, but instead did a little bit of work toward fixing it. That is not acceptable. —Μετάknowledgediscuss/deeds 01:55, 10 February 2018 (UTC)
@Rua, this won't just go away. I am concerned by your behaviour here. —Μετάknowledgediscuss/deeds 17:03, 13 February 2018 (UTC)
I'd like to mention that both Rua and Ungoliant engaged in wheel-reversion far more times before seeking community help than good practice would permit. Please try to be more aware about becoming trapped in unproductive behaviour. Korn [kʰũːɘ̃n] (talk) 10:31, 8 February 2018 (UTC)
What reason would CodeCat/Rua have to make a corrective action? Similar fait accompli have worked for them in the past. After Wiktionary:Votes/sy-2017-11/Desysopping CodeCat aka Rua failed spectacularly, why should CodeCat/Rua bother? Their best strategy for them is to do nothing, and continue in the same vein as they did in the last multiple years. I do not remember CodeCat ever fixing after themselves anything for which it turned out there was no consensus, but my memory may fail me. I mean, if you misbehave for multiple years, and after that, the community gives you a resounding approval, why would you, at that point, introduce a behavior change? --Dan Polansky (talk) 10:32, 18 February 2018 (UTC)

Talahedeshki and Tudeshki[edit]

Does anyone know if the Iranian dialects Talaxedeškí[1]/Talakhedeshk[2]/Talahedeshk[3] and Tūdeški[4] are one in the same? --Victar (talk) 10:52, 7 February 2018 (UTC)

OK, I figured out Talahedeshki is an old dialect of Gurani, and not the same as Tudeshki, a Kermanic dialect. Mystery solved. --Victar (talk) 17:03, 7 February 2018 (UTC)

Proto-Munda[edit]

It would be a great if we had a code for Proto-Munda, the common ancestor of the Munda languages (mun). There are a lot of reconstructions here. @-sche. —AryamanA (मुझसे बात करेंयोगदान) 16:16, 8 February 2018 (UTC)

Yes; Yes check.svg added. - -sche (discuss) 16:25, 8 February 2018 (UTC)

Learned borrowing category and template[edit]

How long has this been around? I just noticed it on an Armenian entry only now. Someone should have told the word dewd about this. This can actually be a useful distinction in the case of some languages, like Romance ones, and I've been particularly interested in using something like this for Albanian to distinguish between terms it borrowed/took from Vulgar Latin in ancient times as a natural process of prolonged interaction versus the much later learned borrowings from Classical Latin in the last couple of centuries. One reason why I've been using the der template for ancient Latin loans as opposed to calling them explicitly borrowings, since the process was much different. But now I don't have to do that. Word dewd544 (talk) 22:22, 10 February 2018 (UTC)

@Word dewd544: Which entry? —Justin (koavf)TCM 08:43, 11 February 2018 (UTC)
Like կիթառ (kitʿaṙ), for example. Word dewd544 (talk) 16:27, 11 February 2018 (UTC)
@Word dewd544: We'd have thousands of entries to fix, that's why I've never bothered with it... But yes, it could be an interesting distinction. I wish it were developed a bit more in the documentation page though. --Per utramque cavernam (talk) 11:57, 14 February 2018 (UTC)
Yeah, I know. I don't have the stamina to start using it for many languages. There's too much to do. For most other languages it's understood that borrowings from Latin were learned. Albanian was just a unique case since there were at least two distinct "layers" or periods of borrowing/incorporation, the first of which happened organically in the distant past, sometimes from vulgate terms that weren't even fully attested. And Armenian can be an applicable language too, I guess, although just using the regular 'borrowed' from Old Armenian wouldn't really be that different. Same can go for modern Greek words borrowed from its Ancient counterpart. I also agree that it should've been described in more detail in the doc page; that could have been useful. I guess it never really took off. Word dewd544 (talk) 22:02, 14 February 2018 (UTC)

Allowing automatic transcription of Khmer terms[edit]

Can an administrator or template editor please add ["km"] = "Module:km" to the list phonetic_extraction on Module:links (with a comma after the Thai line)? The relevant discussion can be found at Wiktionary talk:Khmer romanization. Thanks! Wyang (talk) 13:04, 13 February 2018 (UTC)

You know you're an administrator right? DTLHS (talk) 16:56, 13 February 2018 (UTC)
@DTLHS: I believe that Wyang is hoping to avoid reigniting conflict with Rua. —Μετάknowledgediscuss/deeds 17:00, 13 February 2018 (UTC)
Perhaps one way to do that: @Rua, any problems with this change? - TheDaveRoss 19:03, 13 February 2018 (UTC)
@TheDaveRoss: Rua has not been active for a couple days, their last edit was on this page in the section "User:Rua removing information from Portuguese entries".
I think the edit request just preserves the status quo, so it should be okay. —AryamanA (मुझसे बात करेंयोगदान) 23:25, 13 February 2018 (UTC)
I agreed to not editing relevant modules, so I was hoping another admin or template editor could make the change. Hopefully it doesn't take too long. Wyang (talk) 21:51, 13 February 2018 (UTC)
Can this please be added? Wyang (talk) 07:15, 17 February 2018 (UTC)
Resolved. Wyang (talk) 06:26, 18 February 2018 (UTC)

Western Canadian Inuktitut (ikt)[edit]

Can we rename this to Inuvialuktun? This is the name that I believe is more commonly used throughout the literature (especially modern literature) and is simpler than Western Canadian Inuktitut. DerekWinters (talk) 08:32, 15 February 2018 (UTC)

Well-spotted. I agree it should be renamed. It looks like about 70 entries (translations tables, modules, etc) will be affected. I can rename it in a day or so, if no-one wants to beat me to it (feel free to!), or raise objections. - -sche (discuss) 09:51, 15 February 2018 (UTC)
@-sche: Thank you! DerekWinters (talk) 10:11, 15 February 2018 (UTC)
@-sche Just pinging as a reminder, for once you're done with all the Polynesian mess that's going on. DerekWinters (talk) 22:29, 15 February 2018 (UTC)
Yes check.svg Done. - -sche (discuss) 13:46, 16 February 2018 (UTC)

Proto-Central Malayo-Polynesian[edit]

In accordance with this discussion from three years ago, I've changed all languages formerly called Central Malayo-Polynesian languages to being Central-Eastern Malayo-Polynesian languages. But we do still have CAT:Proto-Central Malayo-Polynesian language with several lemmas and many words in various languages said to be derived from those lemmas. So my questions:

  1. Do we want to eliminate Proto-CMP (plf-pro) and replace it with Proto-CEMP (poz-cet-pro)? Some of the Proto-CMP entries already have identically spelled Proto-CEMP correspondents; but the others would have to be moved.
  2. If so, is someone with a bot willing to change all instances of plf-pro to poz-cet-pro in mainspace?

Pinging @Amir Hamzah 2008, Chuck Entz, Metaknowledge, -sche, Tropylium. —Mahāgaja (formerly Angr) · talk 14:46, 15 February 2018 (UTC)

Merging the ones that are spelled identically (and not just changing the code in mainspace entries but deleting then-redundant links like so) is a no brainer; I'll take a go at mainspace entries where CMP can be merged into CEMP that way with AWB. The other (Proto-CMP) entries should, I think, be moved, per the linked-to discussion. - -sche (discuss) 21:32, 15 February 2018 (UTC)
OK, since -sche removed the links, I've deleted all the categories and removed plf-pro from Module:languages/datax. —Mahāgaja (formerly Angr) · talk 09:34, 16 February 2018 (UTC)
As an aside, our reconstruction pages in this area have a lot of overlap, e.g. a lot of descendants are listed manually on both Reconstruction:Proto-Malayo-Polynesian/əpat and Reconstruction:Proto-Austronesian/Səpat (which were among the last remaining instances of plf, which I just removed since they were now categorizing into CAT:E). - -sche (discuss) 13:43, 16 February 2018 (UTC)

Subgroupings of Polynesian[edit]

@Amir Hamzah 2008, Chuck Entz, Metaknowledge, -sche, Tropylium again: the same discussion I linked to above also points out that we have CAT:Proto-Nuclear Polynesian language and CAT:Proto-Eastern Polynesian language but no corresponding CAT:Nuclear Polynesian languages or CAT:Eastern Polynesian langauges, thus we have two proto-languages without any descendants. How do we want to resolve this? I see two options: (1) We make Proto-NP and Proto-EP into etymology-only synonyms of Proto-Polynesian (which entails moving the existing PNP and PEP lemmas to PP), or (2) We recognize NP (poz-pnp) and EP (poz-pep) as families and start sorting languages into them in accordance with w:Polynesian languages#Languages. What do you think? —Mahāgaja (formerly Angr) · talk 20:03, 15 February 2018 (UTC)

It's never been clear to me what our end goal is in terms of grouping languages. If we want to eventually provide a code for every well demonstrated monophyletic grouping of languages, then #2 is the way to go. —Μετάknowledgediscuss/deeds 23:59, 15 February 2018 (UTC)
Providing a code for every well demonstrated and widely accepted monophyletic grouping seems to definitely be our goal for the Indo-European languages, so why not for other families? I guess what my question amounts to is this: are EP and NP well demonstrated and widely accepted as being both monophyletic and clearly distinct from general Polynesian, with the members listed on Wikipedia? —Mahāgaja (formerly Angr) · talk 09:54, 16 February 2018 (UTC)
Having codes for groupings is fine, but my opinion on the proto-languages is the same as three years ago: we do not need proto-languages with only miniscule differences from their parent as separate languages, and they are probably best treated as simply dialect labels of their parent. --Tropylium (talk) 00:32, 18 February 2018 (UTC)
I agree, but that's beside the point at this stage. We do currently have the proto-languages but not the groupings; I'm looking for agreement to add the groupings. Whether we want to remove the proto-languages is a different issue, and one I don't know enough about Polynesian linguistics to weigh in on. —Mahāgaja (formerly Angr) · talk 15:11, 18 February 2018 (UTC)
OK, I've added the language families poz-pnp (Nuclear Polynesian) and poz-pep (Eastern Polynesian). —Mahāgaja (formerly Angr) · talk 21:54, 18 February 2018 (UTC)

Need help with using javascript[edit]

Hello. I am an admin from Turkish Wiktionary. I need some help with storing javascript arrays into data files, just like we do with Lua modules. So, we have this (tr:MediaWiki:YeniMadde.js) js file which helps users who doesn't know how to create a new entry, but in it, there are some arrays used. I have also created this page: tr:MediaWiki:YeniMadde.js/Menüler.js to store all arrays. But I couldn't manage to access them from the main js file. I have read mw:Manual:Interface/JavaScript page, these are useful information, but still do not understand how can I access an array from an external js file. If anyone could help me, I would appreciate it. Thanks! ~ Z (m) 10:23, 16 February 2018 (UTC)

@HastaLaVi2: The way in which I transfer items between scripts is, in script 1, placing the items in the window object, then in script 2 loading script 1 with jQuery.getScript and using the items in a callback: jQuery.getScript(/* script URL */, function () { /* code that uses the items in this script */ }). You can see an example of this technique in MediaWiki:Gadget-AcceleratedFormCreation.js, where User:Conrad.Irwin/creationrules.js is loaded and its function window.generate_entry is used. (That's where I got the technique originally.) Maybe there is a more elegant way to do this, I don't know. I like the Lua way, in which modules don't write to the global object. 19:39, 16 February 2018 (UTC)
Thanks a lot for your response! Now I see it, actually using window object is the good way of doing this. I agree with you on that now. I am really new at this coding extensions to the wiki, but I hope to be getting better in time. So thanks again for your help! :) ~ Z (m) 10:30, 17 February 2018 (UTC)

Requesting rollback[edit]

Hi. I am already a rollbacker on Simple English Wiktionary. I am also autopatrolled here. I am trusted here and I regularly look into recent changes and revert vandalism. Therefore, I would like to request for the rollback right. Pkbwcgs (talk) 15:11, 16 February 2018 (UTC)

@Pkbwcgs: Looking at your edits, I don't actually see many which are undoing edits other than your own; most of your work seems to be fixing systematic formatting problems, which is still very helpful, thank you! Still, you've been around here and around Simple for a year and you are a rollbacker there, and I see no reason to deny this request (as another admin pointed out once, people can just undo edits or write js to acquire the same one-click functionality as the rollback feature; it's not a restricted ability the way being able to delete things or block people is), so I have granted it. - -sche (discuss) 15:49, 18 February 2018 (UTC)
Thanks. Pkbwcgs (talk) 16:01, 18 February 2018 (UTC)

Ancestor of Middle Indo-Aryan[edit]

I'd like to start a discussion about the ancestor of the various Middle Indo-Aryan lects. As stated by {{R:inc:Kobayashi:2004}} "Vedic was probably a specific dialect of Old Indo-Aryan; it was quite close to, but not identical with the language from which Middle Indo-Aryan developed." This is clearly illustrated by various archaisms found in MIA, such as no *gẓʰ-*kṣ merger in Gandhari and Pali, so to say they are descended from Vedic is demonstrably inaccurate:

What are people's thoughts on this? @AryamanA, माधवपंडित, JohnC5, Rua, -sche --Victar (talk) 03:49, 17 February 2018 (UTC)

@Victar: This issue has been raised before. Like last time, I think we should treat Vedic as representative of all Old Indo-Aryan dialects (which is the status quo now); it's just a technicality, and in 99% of cases the Sanskrit and MIA forms match perfectly. And if we take Vedic as representing all OIA dialects, it's not "demonstrably false" at all. Furthermore, MIA languages underwent later standardization where the thorn cluster Sanskrit क्ष् (kṣ) was standardized to kh (ch in Maharashtri Prakrit). For example we have Sindhi [script needed] (khã̄iṇu) and Kashmiri [script needed] (chawun) for the word you give as an example. (and oh look the Dardic matches the Sanskrit, how interesting)
Also, the layout you gave does not reconcile the Sanskrit dialects. How can *झापयति (jhāpayati) lead to क्षापयति (kṣāpayati)? The example also completely ignores the Prakrits, which are IMO equally if not more important than the languages here. It is also generally accepted that Sauraseni Prakrit is a direct descendant of Rigvedic Sanskrit. Ashokan Prakrit is missing too, which is of much greater antiquity than either Gandhari or Pali, comprising the "Early Middle-Indo-Aryan" stage. They're important if we intend to discuss the ancestor of all MIA languages.
I totally refuse to format etymologies in this manner. I make a *lot* of Hindi entries, and I am not changing anything to reconstructed Sanskrit unless it is necessary (like at Hindi झरना (jharnā), where Proto-Indo-Aryan is more than enough).
Anyway, I'll find a link to the old discussion ASAP. —AryamanA (मुझसे बात करेंयोगदान) 05:54, 17 February 2018 (UTC)
Here it is: Wiktionary:Beer_parlour/2017/August#Sanskrit_vs._Old_Indo-Aryan and Category talk:Hindi Tadbhava. I don't know why such few people responded? Anyways, given the history or our discussion of this topic, this discussion will drag on forever. Honestly, I think the status quo is good, so I'm going to be less willing to change stuff at this point. Also @DerekWinters, Kutchkutch. —AryamanA (मुझसे बात करेंयोगदान) 06:04, 17 February 2018 (UTC)
Woah, @AryamanA, slow your roll. No one is telling anyone to do anything -- I was just opening it up to dialog. I'm totally fine having "Sanskrit" represent all dialects of OIA; it's only when we start calling it "Vedic" that I find we run into a problem, which current literature would agree with. And I'm not "ignoring" Prakrits in my example. My intention wasn't to detail the whole of the IA tree; I was simply illustrating the *gẓʰ-*kṣ merger discrepancy I mentioned above it. If we ever added reconstructions for these unattested Sanskrit forms, I haven't even put thought into the transcription of it. I'm also well aware of the Sanskritization process.
I certainly would be opposed to creating a bunch reconstructed Sanskrit entries that are identical to Vedic Sanskrit, but I don't see a problem with creating Sanskrit reconstruction entries for differing ancestral dialectal forms. I also don't see a problem reflecting this dialectal form in descendent trees, if not as a separate level, perhaps on the same line, ex. Sanskrit: kṣā­pa­ya­ti, *jhāpa­ya­ti. All in all, it's not very different from what we already do for Latin. What are your thoughts on that? --Victar (talk) 07:04, 17 February 2018 (UTC)
@Victar: Sorry if my response was too aggressive, I'm just putting all my cards on the table so this discussion doesn't drag on like our previous discussion on this topic. I think Sanskrit *झापयति (jhāpayati) is unnecessary if we already have Proto-Indo-Aryan *gẓʰāpa­ya­ti. Maybe we could keep it unlinked or something, but I feel that having a full-blown entry for Sanskrit *झापयति (jhāpayati) is redundant.
That's what I would propose. —AryamanA (मुझसे बात करेंयोगदान) 14:56, 17 February 2018 (UTC)
@AryamanA: No worries. If I was to sum up your previous discussion on this topic, it was that we're treating Sanskrit as Latin, placing all forms in a developmental and dialectal continuum. I'm on board with that, but than, like with Latin, we need to address even the unattested forms. Compare *accatto to *झापयति (jhāpayati). I still take issue with calling *झापयति "Vedic" because it nullifies that whole advantage of the temporal and dialectical vauguity of a unified Sanskrit. Why not just K.I.S.S., as we do for Old French and Anglo-Norman French, and simply keep them all on the same line, as so? --Victar (talk) 18:57, 17 February 2018 (UTC)
--Victar (talk) 18:57, 17 February 2018 (UTC)
@Victar: That works perfectly! I can get on board with that. —AryamanA (मुझसे बात करेंयोगदान) 21:16, 17 February 2018 (UTC)
@AryamanA: Happy to be on the same page. =) --Victar (talk) 02:34, 18 February 2018 (UTC)
@Victar: yeah I too feel the actual ancestors of IA languages were so close to Sanskrit that distinguishing between them is often pointless. I don't oppose reconstructing Sanskrit terms if someone can. A slight problem may be posed if there's an IIR/IE etymon and we use {{inh|sa}} or {{der|sa}} in the reconstructions as it's going to cause CAT:Sanskrit terms derived from Proto-Indo-European to display unattested words. It can be resolved by entering "see kṣā­pa­ya­ti" in the etymology. -- माधवपंडित (talk) 09:49, 17 February 2018 (UTC)
As to the matter of chronology to keep in mind, Middle Indic dialects existed at the same time as Vedic Sanskrit. Even the Rig Veda has many words that clearly come from synchronic basilects spoken daily (as opposed to the conservative, ceremonial acrolect used in the Rig Veda). These dialects gave vocabulary, phonology, and morphology which appear all over the Rig Veda. It's a very frustrating issue, since Vedic Sanskrit cannot be their ancestors but existed within a dialectal continuum with them at the time of the composition of the hymns. Our lexicographical issue stems from the fact that only one dialect is recorded from this period. I'm not proposing a solution to this issue, but merely ensuring that when we talk about MI potentially “coming from Vedic,” we realize that this is deceptive because MI already existed by then. —*i̯óh₁nC[5] 11:10, 17 February 2018 (UTC)
I thought that Sanskrit is only an excellent proxy for the ancestor of Indo-Aryan languages and not the thing itself, so that we use it as such for convenience. Making reconstructed Sanskrit entries seems to me both inconvenient and technically incorrect. I feel the same way about reconstructed Ashokan Prakrit, but ultimately I believe that decisions like these should belong to those who do the work. Crom daba (talk) 13:54, 17 February 2018 (UTC)
@Crom daba: Who's to say the term Sanskrit can't refer to the collection of OIA lects, of which only one was standardized and made the prestige dialect. If we look at it that way, the Sanskrit reconstructions of other dialectical forms are perfectly correct. DerekWinters (talk) 15:39, 17 February 2018 (UTC)
We could say that if it pleases us. But there does seem to be an understanding philologically that when we speak of Sanskrit we mean a specific corpus of texts (especially when we talk of Vedic Sanskrit and so) and a certain usage of the language (as a language of Religion and higher learning), if I'm not mistaken its very name refers to this.
We could also reconstruct Old Church Slavonic or 18th-century Slaveno-Serbian or Classical Mongolian or Old Turkic, but it seems inconvenient and not necessarily correct. Crom daba (talk) 16:38, 17 February 2018 (UTC)
@माधवपंडित: If we're calling Sanskrit a OIA continuum, a Pali etymology with from {{inh|pi|sa|*झापयति|tr=jhāpayati}}, {{m|sa|क्षापयति|tr=kṣāpayati}} would be just fine. --Victar (talk) 19:41, 17 February 2018 (UTC)
One could also do something like from dialectal {{inh|pi|sa|*झापयति|tr=jhāpayati}} (compare {{m|sa|क्षापयति|tr=kṣāpayati}}), from... --Victar (talk) 22:49, 17 February 2018 (UTC)
@Victar: That's fine. However in the reconstructed Sanskrit entries, the user may be directed to the attested variation for further etymology. -- माधवपंडित (talk) 02:31, 18 February 2018 (UTC)
@माधवपंडित: Yep, see the example entry I created earlier, *झापयति (jhāpayati). --Victar (talk) 02:34, 18 February 2018 (UTC)
Shouldn't "Proto-Indo-Aryan" already cover this distinction? Or is there a language that descends from PIA, but not (pre-Vedic) Sanskrit? Crom daba (talk) 09:54, 17 February 2018 (UTC)
Also, just had a thought this morning. Ashok became Buddhist. Does that mean Pali predated Ashokan Prakrit o_O? DerekWinters (talk) 12:32, 17 February 2018 (UTC)
@DerekWinters: I think the Buddhist Canon was transcribed during or after the time of Ashoka, so it didn't really "predate" it, but was probably only a little later. An interesting thing to note is that the Girnar dialect of Ashokan Prakrit and Pali share a lot of features. —AryamanA (मुझसे बात करेंयोगदान) 14:56, 17 February 2018 (UTC)
@AryamanA: But didn't Buddha speak Pali natively (or a very similar version)? Also yeah, the Gujjars came in the from the northwest so I wonder how much of the Girnar dialect they absorbed. DerekWinters (talk) 15:29, 17 February 2018 (UTC)
@DerekWinters: Hmm, according to Masica both Pali and Ashokan Prakrit are of the same stage, the "Early Middle Indo-Aryan", along with Old Ardhamagadhi. I guess they were both spoken at the same time. And anyways, Ashokan Prakrit is not really one language, more of a pan-India group of early Prakrits that were mutually intelligible. —AryamanA (मुझसे बात करेंयोगदान) 16:07, 17 February 2018 (UTC)
@DerekWinters: We have Mitanni-Aryan. --Victar (talk) 18:57, 17 February 2018 (UTC)
If we want to have various PIA dialects as "Sanskrit", I agree with Victar that the sub-label "Vedic Sanskrit" needs to be limited to the actual attested Vedic, and not for other early forms from the same period. Merging things as Proto-Indo-Aryan instead of Sanskrit would probably work. I do not think Mitanni Aryan is a major issue here, since last I checked, the evidence to consider it a part of Indo-Aryan specifically at all, instead of simply early Indo-Iranian NOS, is pretty weak. It's clearly neither Nuristani nor Iranian, but that doesn't mean it has to be IA.
Also, minor historical phonology note: Sanskrit kṣ versus MIA jh implies PII *ĵž, not *gž (which I believe should result in kṣ ~ gh, not that I know of any examples OTTOMH). --Tropylium (talk) 00:27, 18 February 2018 (UTC)
@Tropylium: It seemed to be so but Sanskrit kṣárati (flows) is comparable to Pali gharati but c.f. Prakrit jhara-i. -- माधवपंडित (talk) 02:31, 18 February 2018 (UTC)
Huh, I guess that requires an OIA dialect that merged the "thorn" clusters by POA, but not by voice (*kš, *ĉš > *ch, but *gž, *ĵž > *jh)? Of course ch as the usual correspondence/reflex of kṣ in parts of MIA already suggests something of the sort. --Tropylium (talk) 10:17, 18 February 2018 (UTC)

Transliteration modules and 몽골어 물리[edit]

As of late, 몽골어 물리 (talkcontribs) has been editing the following obscure transliteration modules and policies:

Can anyone confirm any of this business? Also, could I get you to look at this user, @Chuck Entz? —*i̯óh₁nC[5] 21:22, 17 February 2018 (UTC)

This is a long-term problematic editor; see Special:Contributions/키르기즈스탄_공화국 for a previous incarnation. I don't know enough about these Turkic languages, although we should probably revert them. Pinging Turkic editors: @Curious, Anylai, Borovi4ok, AtitarevΜετάknowledgediscuss/deeds 21:46, 17 February 2018 (UTC)
I'm sorry, Metaknowledge, I don't know much about these languages, so I can't check their edits. -- Curious (talk) 11:32, 18 February 2018 (UTC)
They seem somewhat negligent, but put in work. Kumyk and Karakalpak changes are correct as far as I can tell (although the gsub function doesn't work that way so they aren't working as intended currently), Tofalar module change (I didn't think we had a automatic Tofalar transliteration, weird) was consistent with WT:Tofa language (which was apparently made by an earlier incarnation of the user still), but seems to have deleted h character probably by accident.
Proto-Turkic entries they made seem basically correct but also riddled with mistakes. This user requires cleaning after, but I'm hoping they can evolve into an asset. Crom daba (talk) 22:56, 17 February 2018 (UTC)
The problem is that we don't know what sources they're using, and they won't communicate with us. They're still making lots of mistakes after (at least) two years, and they've been quite disruptive at times, even apparently creating a throw-away new account to continue an edit-war (see the revision history at Module:ba-translit). They won't improve if they refuse to listen to us. Chuck Entz (talk) 02:34, 18 February 2018 (UTC)
I can only look at edits in the last 90 days, so I can't say anything about 키르기즈스탄_공화국 (talkcontribsglobal account infodeleted contribsnukeedit filter logpage movesblockblock logactive blocks), but 몽골어 물리 (talkcontribsglobal account infodeleted contribsnukeedit filter logpage movesblockblock logactive blocks) and Örümçek (talkcontribsglobal account infodeleted contribsnukeedit filter logpage movesblockblock logactive blocks) are technically indistinguishable, and all the edits in the IP ranges 218.158.179.0-218.158.179.255 and 175.115.110.0-175.115.110.255 for the past 90 days are technically indistinguishable from one or the other of the devices used by both of the logged-in accounts. I'll let everyone else decide what to do about it. Chuck Entz (talk) 02:34, 18 February 2018 (UTC)
He is somewhat trying to be useful but leaving lots of mess around to deal with. I believe there are a couple of accounts linked to him along with various IP addresses that do the same. Recently I created (plant) and mentioned its possible relation to *ïgač (tree). He immediately created two entries for one root. One as "ïgač" (what i mentioned) and the other as "ɨ(ń)gač" (From Starling), probably not realizing they are the same. One of the mess he leaves behind is related to orthography, looking at the entry "ɨ(ń)gač", he transliterated Old Uyghur word as "îġać" which is amusing along with other orthographies. It is rather interesting to see such dedication to add stuff so wrong. I witnessed him trying to add transliterations for Old Uyghur just by looking at some words i listed on PT pages, it seems that he has no idea and he is trying to come up with what might resemble the transliteration. He created *jāg (fat) and immediately decided that *jagɨ (enemy) should have a long /a/ as well and created that entry. He put "Starostin, Sergei; Dybo, Anna; Mudrak, Oleg (2003), “*jāgɨ”" in his reference not even bothering to pay attention the source has the /a/ short.
A lot of the time he is just inventing stuff and being annoying to deal with as he seems to be running alternative accounts. --Anylai (talk) 09:04, 18 February 2018 (UTC)
Maybe we need a Korean regular to communicate some rules to them. If they keep making more mistakes than we have the manpower to handle, blocking them might be a better option. Crom daba (talk) 11:51, 18 February 2018 (UTC)
I really hope this doesn't reflect their actual views. —suzukaze (tc) 19:28, 18 February 2018 (UTC)
As long as they preface it with "According to the controversial Anglo-Uralic theory" I have no problem with this. Crom daba (talk) 20:20, 18 February 2018 (UTC)
Reminds me of this "journal" article. I love the disclaimer: "Individual authors are responsible for facts included and views expressed in their articles". So much for peer review... Chuck Entz (talk) 07:18, 19 February 2018 (UTC)
Let me voice my opinion.
This user leaves behind mess that needs cleaning. For some time now, each of my sessions has begun with looking at my Watchlist and cleaning up what this user has done recently.
This user does not not appear to consult dictionaries, invents stuff e.g. based on cognates.
This user misses some of the fundamentals of Turkology - this is unfortunate, as s/he often edits the Etymology section.
This user won't communicate, although I have proposed him/her to register so we could communicate.
All of this is a pity. It would be nice to have a communication with this user. Would be ideal to see this user grow into a reliable and responsible editor — every contributor can potentially make a difference. Borovi4ok (talk) 08:41, 19 February 2018 (UTC)
Anglo-Uralic theory is a thing??? But why??? —AryamanA (मुझसे बात करेंयोगदान) 00:07, 20 February 2018 (UTC)
Other theories can't explain how Old English had front rounded vowels, also how Russian English use Cyrillic the same as Komy-Permyak. Crom daba (talk) 11:32, 20 February 2018 (UTC)

Appendix for English anagrams?[edit]

Although some users don't care for the anagram sections in entries, I've gotten so that I rather enjoy them (just cracked a smile, for example, when I saw that gone to the dogs and get the goods on are a pair). This is one of those quirky extras that contributes to Wiktionary's thoroughness and uniqueness.

Word lovers might also appreciate a system-maintained list of all the anagrams in English Wiktionary, perhaps in the form of an alphabetical appendix containing 2 entries for each anagram (1 for each member of the pair). I'm not a programmer, but expect that bots could probably build and maintain it. Does anyone else like this proposal? -- · (talk) 00:31, 18 February 2018 (UTC)

The gone to the dogs and get the goods on coupling is cool, admittedly. It would be equally cool to see other long anagrams. --Otra cuenta105 (talk) 21:14, 18 February 2018 (UTC)
The list is too big for one page. How should it be organized? DTLHS (talk) 22:22, 18 February 2018 (UTC)
Puzzlers' books typically order them by alphagram. Equinox 22:35, 18 February 2018 (UTC)
Perhaps we could get a new boring namespace: Anagrams:hist would include the alphagram for hits, shit, this. --Otra cuenta105 (talk) 13:29, 19 February 2018 (UTC)--Otra cuenta105 (talk) 13:29, 19 February 2018 (UTC)--Otra cuenta105 (talk) 13:29, 19 February 2018 (UTC)

Ligurian orthography[edit]

I've been adding a number of Ligurian entries as of late, and it occurred to me that there should probably be some kind of consensus on which orthography to use for the entries. The "official orthography" I use does not have an actual "official" status (it is promoted by the Académia Ligùstica do Brénno, which deals with Genoese, which - AFAIK - is the variant Ligurian is based on), and uses different levels of "accuracy".

  • Using only a few "mandatory" accents, in a colloquial context
    • on the final vowel of oxytone words of 2+ syllables
    • on the final vowel of monosyllabic words ending in two vowels, where the second one is stressed (e.g. nuâ)
    • on the verbal inflections ê ([you] are, singular), é ([he/she/it] is), ò ([I] have), à ([he/she/it] has)
  • Using only certain accents, in a formal context:
    • using all the "mandatory" accents
    • marking on all monosyllabic words ending in a long vowel (e.g. )
    • marking every ö and ò
    • marking every stressed òu diphthong
    • marking every long vowel (except for the stressed ones, unless they fall into one of the above cases)
  • Using every accent, in a didactic context (which is what I do, so that a word's phonemic realization is clear)

Since no official orthography exists, I wanted to see if anyone has any thought on this. Also, summoning @Lo Ximiendo, as a seemingly active user regarding the Ligurian language -- GianWiki (talk) 14:20, 18 February 2018 (UTC)

@GianWiki Could we make a survey for speakers of Ligurian in order to see what they think of this? --Lo Ximiendo (talk) 18:07, 18 February 2018 (UTC)
A survey is not reasonable. The best course of action is to take the standard you've selected and document it at Wiktionary:About Ligurian so other people can find it. @Ungoliant MMDCCLXIV may also find this of interest. —Μετάknowledgediscuss/deeds 18:28, 18 February 2018 (UTC)
@Metaknowledge So stick with the "didactic" level of accents, then? — GianWiki (talk) 18:46, 18 February 2018 (UTC)
@GianWiki, I'd choose the one that most closely approximates how people actually write and add more diacritics in the headword line, but I don't know anything about Ligurian. —Μετάknowledgediscuss/deeds 19:14, 18 February 2018 (UTC)
@GianWiki, in any case, we'll let you decide what to do. --Lo Ximiendo (talk) 20:51, 18 February 2018 (UTC)
@GianWiki, for etheric guidance, pray to either the Christian God, or even to Indo-European gods and goddesses such as Wotan or Freyja or whatever is appropriate. --Lo Ximiendo (talk) 20:53, 18 February 2018 (UTC)
Most Ligurian entries I added came from A Compagna, the magazine published by Académia Ligùstica do Brénno. I recall that the majority of articles were written without the extra accents but, unlike in Italian, it was not rare to find running text with didactic accents.
I have no preference. Whatever GianWiki supports I will support. — Ungoliant (falai) 11:18, 19 February 2018 (UTC)

Standardizing declension tables[edit]

Hey all, I wanted to start a discussion about the formatting of declension tables across PIE descendents. Right now, they're very unstandardized, as demonstrated here. I'm wondering if it wouldn't be a good idea to write a unified module that all theses languages can piggyback on instead. What are peoples thoughts on that? @Erutuon, JohnC5, Rua, Metaknowledge, Mahagaja, Per utramque cavernam --Victar (talk) 18:14, 18 February 2018 (UTC)

I don't see any need to standardise them, unless we intend to standardise all inflection templates across all languages (which I think would meet with a great deal of resistance). They have different aesthetics, many of which created in concert with other templates for those languages, and that's just fine. —Μετάknowledgediscuss/deeds 18:26, 18 February 2018 (UTC)
I think if you go beyond PIE descendents, it becomes more difficult to standardize. Also, I'm just talking about declension tables, not verbal inflection tables, etc. Looking past just formatting, many languages lack declension tables, and creating a sort of plug-and-play module would help remedy that. --Victar (talk) 18:52, 18 February 2018 (UTC)
This seems like a conspiracy to introduce the barbaric practice of placing the accusative before the genitive to more languages, I oppose it totally. Crom daba (talk) 19:04, 18 February 2018 (UTC)
NAGD is very reasonable for West Germanic, I'll have you know. Glory to its creator, boo Germany for not picking up on it. Jokes done, I don't see a need for unification either. The current practice allows for tailored tables and shows no major detriments. Writing some IE-module can be done without changing the current tables and the covens of the individual languages then can decide whether to migrate. Korn [kʰũːɘ̃n] (talk) 19:59, 18 February 2018 (UTC)
I think looking gross *is* a detriment. Drop shadows were cool in the 90s. —AryamanA (मुझसे बात करेंयोगदान) 22:36, 18 February 2018 (UTC)
NAGD isn't very reasonable for West Germanic. NADG would be reasonable. Accusative and dative (sometimes merged into a single accu-dative case) are more similar than genitive and accusative or genitive and dative. Instrumental and vocative however are another thing.
For Latin something like NV[Acc]-G(L)[Abl]D and at the same time NV[Acc]-GD[Abl](L) might be more reasonable: The optional locative is somewhere between genitive and ablative, and dative is between genitive (1st and 5th declension sg.) and ablative (1st and 3rd til 5th declension pl., 2nd declension). Considering Vulgar Latin and Romance languages, ablative should be near accusative like NV[Acc][Abl]D(L)G. This would even fit with the basic West Germanic NADG. For all of a PIE however, a sorting based on tradition as NGD[Acc]V[Abl]LI might be less controversal and might make more sense. -80.133.110.226 20:46, 18 February 2018 (UTC)
LOL, I laugh because I think you're joking, but I honestly don't know. You could always make order an option of the module. --Victar (talk) 22:04, 18 February 2018 (UTC)
I am joking (mostly, NAGD does irk me), but it was meant to point out that standardization may be incompatible with the respective grammar traditions of languages (such as ordering).
As far as diversity of table styles is concerned, I like it, but maybe it could be seen as unprofessional, no strong feelings either way. Crom daba (talk) 23:22, 18 February 2018 (UTC)
I would support any kind of push towards standardization. —AryamanA (मुझसे बात करेंयोगदान) 22:00, 18 February 2018 (UTC)
Agreed. Why not do the same for conjugation, pronoun tables etc? The latter are nicely done in Wiktionnaire, for example wikt:fr:Modèle:pronoms_personnels/fr (/es, /pt etc). – Jberkel 23:02, 18 February 2018 (UTC)
I agree too. About the ordering of cases, this shouldn't be a problem. Erutuon has already written a script for rearranging them in one's preferred order. --Per utramque cavernam (talk) 23:13, 18 February 2018 (UTC)
It should be mentioned as context to this debate is this discussion. Part of the issue is whether to display transliterations in Sanskrit on a separate line as we do in Russian, Arabic, and Ancient Greek, which I feel is clean, clear, and allows you to read the table either in the native alphabet or in transliteration easily. Victar feels that we should have each transliteration follow every term. That should certainly be part of this discussion, though there does not seem to be much impetus to standardize them at this point, it seems. —*i̯óh₁nC[5] 05:22, 19 February 2018 (UTC)
That discussion did rejog my interest in this, but how to display non-Latin text next to transliterations is a conversation to be had down the road. The more important discussion at hand in the technical feasibility and community support for the idea, and I rather not muddy things by interjecting my personal formatting opinions, but people are welcome to chime in at the other discussion. --Victar (talk) 05:37, 19 February 2018 (UTC)
There was a giant programming project undertaken by someone to make a general inflection table interface module (maybe for Uzbek?) that could be used for all languages. @Erutuon, do you remember what I'm talking about? —*i̯óh₁nC[5] 05:53, 19 February 2018 (UTC)
Module:inflection Chuck Entz (talk)
I don't care to bikeshed it, especially not the order of the cases, which can remain unstandardized for all I care, but something a little less random in the colors and typesizes would be nice.--Prosfilaes (talk) 23:04, 19 February 2018 (UTC)
A simpler first step towards standardisation would be to use one CSS style for all of these templates (so that the look at least is consistent). Any future changes can then be made in one place instead of having to maintain them all separately. This has the additional advantage in that users could then choose to override the style formatting in their personal CSS file, and it would cascade through to all tables in all languages. The wikitable class is one example standard. -Stelio (talk) 16:06, 22 February 2018 (UTC)

500,000 English lemmas[edit]

The 500,000th English lemma is motlopi. Congratulations and here's to the next 500,000. DTLHS (talk) 21:03, 19 February 2018 (UTC)

Exciting! And (on the subject of milestones) before the month is out, we should make it to 5.5 million entries. We also have entries from about half of the world's languages at this point. - -sche (discuss) 21:29, 19 February 2018 (UTC)
I just dun a graph.
Line graph plotting number of en.wikt entries against time.
Equinox 23:52, 19 February 2018 (UTC)
I'm surprised it's linear. Are we never going to run out of words? (Also Hindi just hit 10,000, but that's hardly anything) —AryamanA (मुझसे बात करेंयोगदान) 00:05, 20 February 2018 (UTC)
Obviously, the pool of English lemmas in existence has an input rate lower than our rate of adding English lemmas to Wiktionary, but not all such lemmas are equally easy to add, so we're not going to run out so much as come up against words that are increasingly difficult to define and cite (and you could say that we're already seeing the first signs of that). We're currently mainly limited by effort, so the number of lemmas in existence is irrelevant; while growth is linear, we can't "feel" the ceiling. —Μετάknowledgediscuss/deeds 00:12, 20 February 2018 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Still a lot to do before reaching the ceiling for foreign languages...

(taken from Wiktionary talk:Statistics#Lemmas pie chart)

(all the foreign-language ones still have much room for improvement)

Plot of #gloss definitions over the last few years (using sequential data from Wiktionary:Statistics):

English
Chinese
Mandarin
Cantonese
German
Spanish
Italian
Hindi
Thai

Wyang (talk) 01:48, 20 February 2018 (UTC)

lol, I can see my editing hiatus in the Hindi graph.

It would be nice to have lemma figures to go with the pie chart. DonnanZ (talk) 13:30, 20 February 2018 (UTC)

Congrats! That's great! It's a beautiful milestone! To compare, French Wiktionary only have 360,000 lemmas for French (but about a third of people contributing and the project is one year and half younger). @Wyang I am very interested if you can make a chart for French here, to discuss it with my colleagues Face-smile.svg Noé 12:38, 21 February 2018 (UTC)
@Noé Sure :), here you go:
French
Wyang (talk) 12:57, 21 February 2018 (UTC)
Applau.gif Face-smile.svg Noé 13:01, 21 February 2018 (UTC)
  • I had suggested some time ago that we have something like a "language of the month" where we pick a language that could use a lot of expansion and focus on that for a month. I still think it would be an interesting thing to try. I am curious as to whether we have exhausted the supply of translation dictionaries in the public domain. bd2412 T 14:27, 20 February 2018 (UTC)
    • @BD2412: No way, there is so much literature we haven't even touched, coming from the context of Indian languages. But I totally agree that "language of the month" would be a good idea. —AryamanA (मुझसे बात करेंयोगदान) 21:00, 20 February 2018 (UTC)
      • I am rather worried about what might result if people all started adding entries to a language with few native speaker editors, leaving them swamped with work just to catch our inevitable errors. —Μετάknowledgediscuss/deeds 21:02, 20 February 2018 (UTC)
Well Wyang you put my crappy graph to shame :D Thanks for your nice chart. Let us fight over the sweet sugar of the 5,500,000 milestone, I WANT IT. Equinox 15:58, 21 February 2018 (UTC)
lol, I didn't realize we were so close to 5.5 mil. —AryamanA (मुझसे बात करेंयोगदान) 22:15, 21 February 2018 (UTC)

Renaming to {{PIE root}} to {{root}}[edit]

What do people think about renaming {{PIE root}} to {{root}} and changing the format to {{root|ine-pro|iir-pro|*h₃er-}}? That seems more inline with the other etymology templates, and we can potentially use it for other languages, like Sanskrit, i.e. {{root|sa|hi|घट्}}. @Rua, Erutuon, AryamanA, JohnC5 --Victar (talk) 02:02, 20 February 2018 (UTC)

@Victar: Sounds great to me. Also something like {{root|sa|sa|धे}} (or {{root|sa|धे}} ideally) to categorize within a language. —AryamanA (मुझसे बात करेंयोगदान) 02:06, 20 February 2018 (UTC)
@AryamanA, actually, I think the best method, assuming it's possible, would be through {{head}}, i.e. {{sa-noun|root=घट्}}. --Victar (talk) 02:27, 20 February 2018 (UTC)
Hmm, maybe. Then {{root}} would not be needed I guess. —AryamanA (मुझसे बात करेंयोगदान) 02:33, 20 February 2018 (UTC)
@AryamanA: Yeah, {{root}} would only be used on child language entries, like Hindi entries with {{root|sa|hi|घट्}}. --Victar (talk) 02:39, 20 February 2018 (UTC)
@Victar: I'd be interested in this proposal, but @Rua is the one to convince. —*i̯óh₁n̥C[5] 03:13, 20 February 2018 (UTC)
I might suggest that the first and second params be switched, to better match the current behavior of {{bor}}, {{inh}}, etc. ‑‑ Eiríkr Útlendi │Tala við mig 02:07, 20 February 2018 (UTC)
@Eirikr, if we could do the same thing in PIE using {{ine-noun|root=*h₃er-}}, then it wouldn't be a problem switching the |lang= order, but otherwise it might be confusing, having to also use {{root|ine-pro|*h₃er-}}. Maybe not. --Victar (talk) 02:27, 20 February 2018 (UTC)

I created {{root}} to get the ball rolling for Sanskrit and PII roots. @Erutuon could I ask you a favor to make Module:category tree/PIE root cat work for both {{PIE root}} and {{root}}, since you created and are most familiar it? I took a whack at it and it did not go well. --Victar (talk) 01:33, 23 February 2018 (UTC)

Wiktionary:Votes/2018-02/Moving Lojban entries to the Appendix[edit]

I have created this vote concerning Lojban entries. I would appreciate it if other editors could look it over and offer their thoughts. —Μετάknowledgediscuss/deeds 05:07, 20 February 2018 (UTC)

Categories for intentionally nonstandard terms[edit]

I’m thinking we could have a separate category for deliberate/satiric mispellings and typos (Amerikkka, Grauniad, Micro$oft, etc.) and one for deliberately ungrammatic terms (who'd have thunk it, you pays your money and you takes your chances, no can do, etc.). These two along with Category:Pronunciation spellings by language and Category:Eye dialect by language can be added to a supercategory encompassing all intentionally nonstandard terms. — Ungoliant (falai) 12:49, 20 February 2018 (UTC)

I agree. I would split the two classes you mentioned: Category:Deliberate misspellings by language and Category:Deliberate grammatical errors by language. (The latter one probably needs a better name.) —Μετάknowledgediscuss/deeds 20:53, 20 February 2018 (UTC)
I broadly agree that deliberately nonstandard spellings are worth categorizing, but I'm not sure it's right to call "Amerikkka" (for example) a "misspelling", since it is deliberately/intentionally used to invoke "KKK"; it's not an error made out of ignorance. Deliberately nonstandard grammar seems harder to separate from dialectally typical grammar (of e.g. an immigrant community) from which I would expect most examples to derive; it also seems similar to uses of fossilized / no longer standard grammar ("if need be", etc), so the criteria for inclusion in such a category seem fuzzier, although they may not be a barrier to having such a category. - -sche (discuss) 21:08, 20 February 2018 (UTC)
Wikipedia calls Amerikkka a satiric misspelling, I don’t know what else to call it.
Category:Leet by language could probably be added to this category too. — Ungoliant (falai) 14:42, 21 February 2018 (UTC)

Transcription parameter again[edit]

I've just noticed that @Victar has been using the |tr= parameter to enter both transliteration and transcription of a term in Proto-Iranic entries (for example Sogdian [script needed] (ʾʾsʾwk’ /āsūk/, gazelle)).

I like how this looks and it satisfies the need I've been talking about in previous discussions on transliteration and transcription. I would suggest that an additional parameter is added, for example |tsc=, that will produce the same formatting while allowing transliteration to be automatically generated.

This could be used for: languages written in cuneiform, sparsely attested languages written in abjads (Middle Iranian and Middle Turkic languages, Arabic Middle Mongol), Old Turkic, Khitan and Jurchen (once they're properly encoded) even Kalmyk (phonemic schwas are unwritten).

Crom daba (talk) 15:09, 20 February 2018 (UTC)

A recent discussion about this: Talk:-ւ. --Per utramque cavernam (talk) 15:27, 20 February 2018 (UTC)
The takeaway is that Isomorphyc has been working on it (at User:Isomorphyc/Sandbox8), and will work on it again at some point, I think. --Per utramque cavernam (talk) 20:58, 20 February 2018 (UTC)
Symbol support vote.svg Support, At times I've had to do the same thing for Hittite. --Tom 144 (𒄩𒇻𒅗𒀸) 16:00, 20 February 2018 (UTC)
Symbol support vote.svg Support. --Vahag (talk) 19:03, 20 February 2018 (UTC)
Symbol support vote.svg Support. It's remarkable that we've had so many discussions and conflicts regarding this, but it still has not come to pass. —Μετάknowledgediscuss/deeds 20:56, 20 February 2018 (UTC)
Symbol support vote.svg Support. I've been doing the same thing as Victar for Middle Persian and Old Persian. —AryamanA (मुझसे बात करेंयोगदान) 21:02, 20 February 2018 (UTC)
Symbol support vote.svg Support: My only stipulation is that it be made clear that it shouldn't be used for IPA pronunciations. --Victar (talk) 00:32, 21 February 2018 (UTC)

Speak-only languages[edit]

How does this wiki have policy about speak-only (non-written) languages to be collected? Must their entries be made as Latin script or IPA or not at all? --Octahedron80 (talk) 20:42, 20 February 2018 (UTC)

You should follow the conventions of any published materials that exist, such as scholarly papers or dictionaries. DTLHS (talk) 20:55, 20 February 2018 (UTC)
(e/c) We have to select an orthography for unwritten languages, and document it on the relevant About page. If only linguists have documented it, and those linguists use IPA, then it may be most appropriate to follow their lead and add entries in IPA. If they have a working orthography (maybe Latin script, to make documentation work easier), or if any orthography is being taught to the native speakers (maybe using a modification of Thai script if this is a regional language in Thailand), that should be selected instead. —Μετάknowledgediscuss/deeds 20:56, 20 February 2018 (UTC)
You may be interested in Wiktionary:About_Western_Yugur. Crom daba (talk) 01:21, 21 February 2018 (UTC)

Discord[edit]

The Discord server that was discussed last month is up an running thanks to @PseudoSkull, click here to join. Hopefully this can replace our inactive #wiktionary IRC channel. —AryamanA (मुझसे बात करेंयोगदान) 20:58, 20 February 2018 (UTC)

I dropped in to look around. Glad it's got a way to join without "signing up" or installing anything, but I do feel that a free open project like Wiktionary should use free open tech like IRC, not a closed-source thing that may require a specific client, or be restricted by the makers in future (this does happen, e.g. LogMeIn). Equinox 21:42, 20 February 2018 (UTC)
Discord is nearly malware in my view, it is almost impossible to remove once you have installed it on PC. IRC, with all of its problems, is my preference. - TheDaveRoss 21:49, 20 February 2018 (UTC)
Yeah, before someone accuses me of a being a Luddite who hates graphics, sounds, etc.: I'd be fine with those things built optionally on top of free open tech. (Imagine the uproar if Wikimedia replaced e-mail as a contact medium with Facebook. Heh.) I'm gonna go on IRC RIGHT NOW and raise a ruckus <3 Equinox 21:57, 20 February 2018 (UTC)
Dave, if you're afraid of installing it, you can use Discord in your browser. Eq, I can't remember whether you hate xkcd or not, so have this. —Μετάknowledgediscuss/deeds 00:06, 21 February 2018 (UTC)
That comic accuses an open-source fan of being smug and autistic for wanting free open tech. I am the exact opposite of that, which is why I dislike xkcd, which is reliably smug and autistic. The comic doesn't even have a point. Huh! (P.S. The Wiktionary IRC is still pretty good, though I go there less than once a month and see the same four or five faces. Smart faces. Heheh.) Equinox 00:42, 21 February 2018 (UTC)
I would suggest that a character in the comic makes those accusations and that character is the butt of the joke, but that isn't really important. Also, it would be great if more people used the IRC if only on a semi-regular basis like Equinox. I think the Wiktionary community was closest back when a good chunk of regular contributors (20%?) were frequently able to engage in casual conversation. - TheDaveRoss 13:05, 22 February 2018 (UTC)
There's also an "official" English Wikipedia Discord FWIW. —AryamanA (मुझसे बात करेंयोगदान) 22:06, 20 February 2018 (UTC)
"Esperanza was a Wikipedia project founded on 12 August 2005..." Equinox 00:41, 21 February 2018 (UTC)
That's a really weird story, but I suppose it works as a cautionary tale too. —AryamanA (मुझसे बात करेंयोगदान) 22:18, 21 February 2018 (UTC)
There are rather a few self-hosting alternatives, w:Mumble (software) and w:TeamSpeak (proprietary iirc, but freeware) being two off the top of my head. (Also, Nextcloud's Talk, FLOSS implementation of Spreed, allows video conferencing - but my server is bandwidth limited to 9Mbs) The problem with Discord is the time-worn observation that if anything on the internet is free, you are the product being sold. Quite a few of us maintain servers of varying abilities on the internet, which we prefer to "free" services. - Amgine/ t·e 04:26, 21 February 2018 (UTC)

Category:Thesaurus[edit]

Is there a reason why we don't separate English thesaurus entries from other languages? E.g. thesaurus:die and thesaurus:死亡 are put in the same category. Seems strange to me. ---> Tooironic (talk) 04:38, 21 February 2018 (UTC)

This was discussed several times (for example here, see Dan Polansky's vote and the talk page; or here), but no solution has been reached yet. --Per utramque cavernam (talk) 15:22, 21 February 2018 (UTC)

Books of the Apocrypha or Catholic deuterocanon[edit]

The stance on their inclusion in Biblical canons varies across definitions. Baruch and 1 and 2 Maccabees only mention being apocryphal, Sirach and Wisdom don't mention it at all, Tobit only mentions being in the Catholic canon, not Eastern Orthodox ones, and only Judith actually seems unbiased. I propose using the definition given for Tobit for all seven, but with the additional mention of the Eastern Orthodox. (Sirach is especially interesting, because the synonym Ecclesiasticus mentions some groups not considering it canonical)

As a tangential note, I also updated Appendix:Books of the Bible to include them in the Catholic canon listed. That one was also odd because even though our listed source, Catholic Online, includes them now, I checked the Wayback Machine, and it didn't when the list was previously said to have been retrieved.

--RoseOfVarda (talk) 16:13, 22 February 2018 (UTC)

Be bold and have at it! :) And I suggest you link to this discussion in your edit summaries so that anyone who wants to propose some other wording can do so in this central discussion. - -sche (discuss) 19:06, 23 February 2018 (UTC)

Wikidata and CC0 licence for lexicographical data[edit]

Hello,

You may have heard already, Wikidata people are very interested by Wiktionaries data. They are now at the step of creation of a dedicate Lexeme namespace in Wikidata. Lydia, in charge of this project, call for a vote for the licencing of this new namespace. I think we wiktionarian are concerned by this vote, because it may change the kind of connections we may do between Wiktionaries and Wikidata. Lydio only offered argument pro CC0, but there is a lot of con either. I summed some there, but I call for your expertise and capacity of judgment on this matter. I think it is not some much on the legal part but on the psychological and ethical aspects we can give a different perspective, as we are and we know people that have lexicographical data to share and people that reuse Wiktionaries data.

I think we need to imagine some prospective, because they may have built some but they didn't share the potential consequences for each possibility, and I am quite worry with their agenda. In this perspective, the Wikidata team asked for a Wikilegal note about lexicographical data but it is a draft that need to be severely improve, as it doesn't include some fundamental aspects of Wiktionaries so far. Your comments on this essay are welcome too.

Well, sorry if you feel this is not of your concern. I think it can't be bad to know more, to be able to collaborate rather than be notice of a undesired change too late Face-smile.svg Noé 08:35, 23 February 2018 (UTC)

I'm curious how this will play out in practice. I'm all pro-sharing and making data available as widely possible, but this basically means that Wikidata has to start from scratch, and that the collaboration between the projects will always be complicated (at least in one direction, taking data from Wikidata is fine). But if I write a bot to update Wikidata items from Wiktionary I would technically violate licensing terms. – Jberkel 10:24, 23 February 2018 (UTC)
For the last part, yes, and I am curious to know how they will prevent the violation of SA in CC BY-SA. For the first part, have some pieces of a page written in CC0 but displayed as CC BY-SA may also be considered as copyfraud, I think. So, both project may be independent and not compatible in any way. Strange. Face-smile.svg Noé 12:47, 23 February 2018 (UTC)
There would be no issue with using CC0 content within any other context. - TheDaveRoss 14:10, 23 February 2018 (UTC)

Another paper about Wiktionary[edit]

Automatic Generation of Wiktionary Entries for Finno-Ugric Minority Languages. —AryamanA (मुझसे बात करेंयोगदान) 18:39, 23 February 2018 (UTC)

Interesting, thanks for sharing. They mention that they asked for permission to upload the terms they generated, anyone know where that request is? - TheDaveRoss 18:44, 23 February 2018 (UTC)
I don't think the entries were created on the English Wiktionary. DTLHS (talk) 18:57, 23 February 2018 (UTC)
It's mainly in the Hungarian and Finnish Wiktionaries; see Global edits of Finnotka and fi:Keskustelu käyttäjästä:Finnotka. Wyang (talk) 22:16, 23 February 2018 (UTC)

Plurale tantum vs. pluralonly[edit]

When should we mark a term with plurale tantum as opposed to using {{en-plural noun}} (which produces the gloss "plural only")? I tend to prefer the latter as it avoids jargon. Is there a real difference? Equinox 13:13, 24 February 2018 (UTC)

I don't think there is a real difference. For English, at least, our users would probably prefer "plural only", which doesn't need much explanation. DCDuring (talk) 16:32, 24 February 2018 (UTC)