Wiktionary:Beer parlour

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Beer parlour

Lautrec a corner in a dance hall 1892.jpg

Welcome to the Beer Parlour! This is the place where many a historic decision has been made, and where important discussions are being held daily. If you have a question about fundamental aspects of Wiktionary—that is, about policies, proposals and other community-wide features—please place it at the bottom of the list below (click on Start a new discussion), and it will be considered. Please keep in mind the rules of discussion: remain civil, don’t make personal attacks, don’t change other people’s posts, and sign your comments with four tildes (~~~~), which produces your name with timestamp. Also keep in mind the purpose of this page and consider before posting here whether one of our other discussion rooms may be a more appropriate venue for your questions or concerns.

Sometimes discussions started here are moved to other pages for further development. In particular, changes to a major policy or guideline may be discussed on the corresponding talk page and “simple votes” (as opposed to drawn-out discussions) can be conducted on our votes page.

Questions and answers typically remain visible on this page for one to two months, but they can always be found in the appropriate monthly archive (based on the date discussion was initiated). While we make a point to preserve all discussions that were started here, talk that is clearly not appropriate for this page may be deleted. Enjoy the Beer parlour!

Beer parlour archives edit

January 2021

Transclude thesaurus pages to display synonyms[edit]

Module:thesaurus provides a tool to transclude thesaurus pages to display synonyms in all languages (hopefully). With this, editors will no longer need to repeat adding the same synonym list to each individual entry. They can just keep the list in a single thesaurus page. (This idea was inspired by Template:zh-syn-saurus.)

An example of how to use it is given here:

Please share your opinion of whether this new tool is worth a go or how to improve it. give up -- Huhu9001 (talk) 10:35, 1 January 2021 (UTC)

  • I prefer the one line format including the most common or closest synonyms, optionally ending with a link to the thesaurus. The examples take too much space. Vox Sciurorum (talk) 11:14, 1 January 2021 (UTC)
  • I agree with Vox, this is a step backwards. – Jberkel 09:39, 4 January 2021 (UTC)

Old Latin entries[edit]

Per Wiktionary:Votes/2019-08/Abolish the Old Latin header, the “Old Latin” header has been abolished on 14 September 2019. Nevertheless, there are still Old Latin entries (see Category:Old Latin language). J3133 (talk) 15:36, 1 January 2021 (UTC)

@J3133, Fay Freak In order to get rid of them, how should we do it? I propose at least (a) Old Latin should be an etymology-only language; (b) there should be an 'Old Latin' label. Words like duenos and deivos are pretty far from standard Latin and should clearly be distinguished as Old Latin, IMO. Benwing2 (talk) 04:50, 5 January 2021 (UTC)
Is 𐌃𐌖𐌄𐌍𐌏𐌔 seen in the head right? If the letters are mirrored, shouldn’t this be written right-to-left 𐌔𐌏𐌍𐌄𐌖𐌃?  --Lambiam 11:34, 5 January 2021 (UTC)
@Benwing: The label is already used, e.g. in Divana, and entries are added into Category:Old Latin similar how there's Category:Medieval Latin. -11:55, 5 January 2021 (UTC) —⁠This unsigned comment was added by 2003:de:373f:4000:4d99:f4d5:92d:4559 (talk).
Should Old Latin entries not have {{la-IPA}} because the pronunciation was different (most in Category:Old Latin use {{la-IPA}}, whereas those in Category:Old Latin lemmas have different pronunciations)? J3133 (talk) 13:07, 5 January 2021 (UTC)
 @Benwing2: The main reason I moved to abolish the Old Latin header was the noxious definition. Plautus is Old Latin, and what was aimed at was even older Latin, pre-Livius Andronicus Latin, Early Latin, mainly found in now hard to understand inscriptions, and calling this just Old Latin, as all pre-Classical Latin is called, is an understatement. I have no objection to correct labelling. Whereas I am not sure that Old Latin and Early Latin etymology-only language codes would be useful: For what would be derived specifically from the Latin when the Empire of Rome was still small? Nor would one use the designations Early Latin and Old Latin in descendant trees – although on the other hand, since we already have many codes for later Latin, Late Latin, Medieval Latin, Renaissance Latin and so on, it is logical to have codes for Classical Latin, Old Latin and Early Latin – but meseems nobody has deemed them needed.
@J3133: I can’t think of notable pronunciation differences from Early Latin to Classical Latin standard pronunciation, you would need to elaborate that. Of course one could restrict the template to not spit out ecclesiastic pronunciations, yet I have never found such pronunciation issues in Latin anyhow pressing since I always opined that the pronunciation section tells the reader how a word would be pronounced according to a certain imaginated standard not how it actually was pronounced – the reader would have to derive from date of use of the word which pronunciation is applied and it’s patronizing to rule out certain pronunciations which the 21st century reader would transmit a word in because it does not fit the period of use. We also have Modern Egyptological pronunciations in Egyptian entries, you see … Fay Freak (talk) 16:08, 5 January 2021 (UTC)
@Fay Freak: E.g., the entry linked above (duenos) has {{IPA|itc-ola|/ˈdwe.nos/}} (IPA(key): /ˈdwe.nos/) in its pronunciation section. Using {{la-IPA}} ({{la-IPA|duenos}}) instead would result in a different pronunciation:
J3133 (talk) 16:20, 5 January 2021 (UTC)
@J3133: I deem /dw/ correct, otherwise we would not obtain /b/ in Classical forms. So you pass to {{la-IPA}} dvenos or dwenos to get this result; as {{la-IPA}} expects u is a vowel and not a semiconsonant. Fay Freak (talk) 16:29, 5 January 2021 (UTC)
@Benwing2: I have now merged the bloodclat mess of entries. Some 400 uses of the code itc-ola are left, deployed mostly for etymologies copied without sense of proportion and probably no understanding what Old Latin would mean – this bit of removal now just strengthened my observation that the deletion of Old Latin as a header and as a code is right – and that should rather end at Latin, so you can perhaps replace these occurrences by bot, ending them at Latin with added {{dercat}}, whereas when it occurs in Latin entries one can replace the coded wording “from Old Latin blabla” with ”from older {{m|la|}} …”; we may then check the bot’s edited pages for nonsense and then one can remove the language code from the language module data, perhaps catching the rest with a list of all occurrences or with Cat:E. Fay Freak (talk) 20:33, 5 January 2021 (UTC)
@Fay Freak I would prefer to make itc-ola be an etymology-only language with Latin as its parent. No need to eliminate it entirely. Benwing2 (talk) 05:19, 6 January 2021 (UTC)
@Benwing2: Why, there is insufficient terminology. I distinguished between Old Latin and Early Latin when replacing, but now I find Early Latin also means even Terence. But it would be more needed to bring out that hard-to-understand inscriptional Latin than the 3rd to 1st century BCE which is not all that different from the Imperial-Era Latin that sometimes even imitated this Old Latin (Sallust). The intended thing is called in German Frühlatein as opposed to Altlatein from the mid 3rd century to the beginning of the 1st century BCE. If you don’t have certain names for the language states than you can’t have any codes. The Wikipedia article Old Latin is not helpful here and we have to wean from it. As always Wikipedia mushed together heterogeneous stories from different authors under a headword without understanding because thinking about the definitions would be POV or something like that – you know the ting, Wikipedia is not a dictionary and concerned with pointing at the objects more than the correct or unequivocal ways of designating them. You see I think that people cling towards this Wikipedia picture too much, and I didn’t, as we should first know what lects there are that we want to describe and then we can name them. Similarly I judged about the Aramaic question. It’s technically too much a lumping header but splittings proposed must be maintainable with consistency. Fay Freak (talk) 06:18, 6 January 2021 (UTC)
@Fay Freak As usual your writing is not a beacon of clarity :) but I think you're proposing splitting "Old Latin" into two periods, which I agree with. I would call them "Old Latin" and "Early Old Latin", which makes it clear that "Early Old Latin" precedes Old Latin. We can have two etymology languages, itc-ola and itc-eol, both of which have Latin as the parent. I can fix the inherit code so that a Latin term can inherit from itc-ola or itc-eol. We'd have labels Old Latin and Early Old Latin which categorize respectively into Category:Old Latin and Category:Early Old Latin. Benwing2 (talk) 07:25, 6 January 2021 (UTC)
@Benwing2: That sounds better already, although it would be problematic that now you still don’t have a proper name of Altlatein, or you actually call it Old Latin, I am not utterly sure how you mean it: anyway it’s ambiguous whether with Old Latin you mean the the mid 3rd century to the beginning of the 1st century BCE or anything from the 1st century BCE backwards of which then Early Old Latin is a part (which is suggested by the wording “Early Old Latin” containing “Old Latin”, so it is not clear Early Old Latin precedes Old Latin). I honestly don’t try to be stupid but it is important other editors do not confuse what we come up with. You would of course add a description to the category Category:Old Latin of what it is supposed to be used for but then people would reasonably complain it’s arbitrary and intransparent because the language names in the dictionary pages (which most readers read alone) shouldn’t be misleading in the first place.
I still uphold the notion that many etymologies now containing “Old Latin” should end at Latin, because that’s the first blue link and because the wording “Old Latin” was unreliable in what is being meant. Fay Freak (talk) 14:25, 6 January 2021 (UTC)

Extended mover right[edit]

Hello and a happy new year to all. I am seeing many entries with wrong title lately, I'm moving them with redirects, I'm bringing them to WT:RFD and tagging the left over redirects for imminent deletion. If I am given this right, it would really help as I'd be able to straightaway move the page without a redirect instead of moving it first and then requesting its deletion. Thanks and regards - द्विशकारःवार्त्तायोगदानानिसंरक्षितावलयःविद्युत्पत्त्रम् 16:51, 1 January 2021 (UTC)

I suggest that you personally address this matter to some administrator. inqilābī inqilāb·zinda·bād 19:48, 6 January 2021 (UTC)
@AryamanA Can I be given this right? Thanks, 🔥ब्दशोधक🔥 16:32, 7 January 2021 (UTC)
@शब्दशोधक: Sure, I don't see the harm. —AryamanA (मुझसे बात करेंयोगदान) 02:08, 9 January 2021 (UTC)
  • @AryamanA: I asked about this privately to you, would you consider re-granting me the right if I don't abuse it? Also, I didn't really do anything wrong because I deleted pages ( only marked by patrollers). Regards. 🔥𑀰𑀩𑁆𑀤𑀰𑁄𑀥𑀓🔥 05:35, 10 January 2021 (UTC)
@AryamanA, शब्दशोधक: I am seeing posts on शब्दशोधक‘s user talk page indicating that the user was recently blocked by @Chuck Entz (the reason was not evident from what was on the talk page), and concerns over a number of the user’s edits. I think further inquiry is required before any rights are granted. — SGconlaw (talk) 06:32, 10 January 2021 (UTC)
@Sgconlaw: You can, of course check the block log for the reason but I will tell it straightway - Abusing multiple accounts, block evasion : Impersonating an IP vandal. However, you should read User:शब्दशोधक/पुराचर्चापृष्ठम्#Block? to understand what actually happened - in short, someone else accessed my account and did it. Regarding the concerns about my edits - they are only about my Prakrit entries. 🔥𑀰𑀩𑁆𑀤𑀰𑁄𑀥𑀓🔥 07:31, 10 January 2021 (UTC)
Both the block logs for Dviśakāra and शब्दशोधक showed "No matching items in log" the last time I checked. Since @Chuck Entz knows more about what is happening, I will leave it to him to comment. I was just flagging up what I saw on your talk page. — SGconlaw (talk) 07:35, 10 January 2021 (UTC)
@Sgconlaw: See [1]. 🔥𑀰𑀩𑁆𑀤𑀰𑁄𑀥𑀓🔥 07:59, 10 January 2021 (UTC)
Thanks. I see that I filled in the log request form wrongly. — SGconlaw (talk) 09:38, 10 January 2021 (UTC)
I'm afraid I don't see any clear resolution of the matter from the discussion. I'll defer to @Chuck Entz who is more familiar with it. — SGconlaw (talk) 13:29, 10 January 2021 (UTC)
Well, in itself it's not much, but it's part of a long pattern of variations on "I'll never make that mistake again". This user tends to blithely launch out into new things without any thought about what could go wrong. What's more, they don't seem to notice any problem until it's pointed out to them. It's true that they will then take it to heart and very rapidly correct how they do things, but I simply don't trust them when they say they know what they're doing. Yes, they won't make the same mistake again, but there are an infinite number of ways to do things wrong, and they seem to have a knack for finding new and creative ways to unintentionally wreak havoc... Chuck Entz (talk) 06:31, 11 January 2021 (UTC)
@Chuck Entz: Would you please consider granting me this right? Thanks. 🔥𑀰𑀩𑁆𑀤𑀰𑁄𑀥𑀓🔥 17:14, 15 January 2021 (UTC)
@Sgconlaw: Please consider re-granting me this right, it won't be abused. 🔥𑀰𑀩𑁆𑀤𑀰𑁄𑀥𑀓🔥 11:16, 21 January 2021 (UTC)
No. You are not getting the right back. —Μετάknowledgediscuss/deeds 17:09, 21 January 2021 (UTC)
@Metaknowledge: The right would not be abused again. As it is, I did not know that using this to delete pages was abusing the right. Also, I didn't not delete any page I felt like deleting, and only the pages marked with {{d}} (that too by auto patrollers only) were deleted. I confirm that nothing such will happen again, after I am given the right. Thanks. 🔥शब्दशोधक🔥 06:42, 25 January 2021 (UTC)
@शब्दशोधक: Maybe ask AryamanA again, as he was the one who gave you this right? -- inqilābī inqilāb·zinda·bād 21:29, 25 January 2021 (UTC)
@Inqilābī: Well, maybe, but I don't think I'll get it since almost every admin is opposing. I think I'll request a bit later (a month or two?). 🔥शब्दशोधक🔥 03:34, 26 January 2021 (UTC)

I, on the other hand, would like to get that extended mover tool. Seems practical. (AND no one's ever complained over any pages I've moved) Allahverdi Verdizade (talk) 17:24, 21 January 2021 (UTC)

@शब्दशोधक: I don't think you need the right right now--not really because of your actions but because we will have enough South Asian-language editing admins to manage this kind of stuff soon. I think some wait is best. —AryamanA (मुझसे बात करेंयोगदान) 21:02, 26 January 2021 (UTC)

Simplified form of 嗰 (and simplification by analogy in general)[edit]

@H2NCH2COOH has recently changed the simplified forms of words with 嗰 from 𠮶 to 嗰, making 𠮶 a nonstandard simplified form of 嗰. Their argument is that 𠮶 is not standard because it is simplified by analogy even though it is not allowed by 简化字总表 (predecessor of 通用规范汉字表). This has led to a bit of discussion on their talk page and mine. I'm wondering how we should proceed: (1) keep it as it is, with 嗰 being the "standard"/default simplified form and 𠮶 labelled as nonstandard, or (2) have both be shown as "valid" by having 嗰 in |s= and 𠮶 in |s2= of {{zh-forms}}. Pinging @RcAlex36, Suzukaze-c, Mar vin kaiser, Atitarev, 沈澄心, 恨国党非蠢即坏 for thoughts. — justin(r)leung (t...) | c=› } 08:29, 2 January 2021 (UTC)

I think both characters (嗰 and 𠮶) should be given equal, valid status as alternative simplified forms. --Anatoli T. (обсудить/вклад) 08:41, 2 January 2021 (UTC)
No particular opinion. —Suzukaze-c (talk) 08:46, 2 January 2021 (UTC)
I am the one who made this change. Explanation of why 𠮶 is nonstandard can be found on . --H2NCH2COOH (Talk) 09:16, 2 January 2021 (UTC)
Pinging @RcAlex36, Mar vin kaiser, 沈澄心 for response. If you don't have an opinion, like Suzukaze-c, please say so, so we kind of know what the community thinks. @H2NCH2COOH has been continuing to make other similar edits like at 𡅏. As I've said at before, the issue here is that we are privileging analogized simplified forms allowed by 简化字总表 even if they are not part of 通用规范汉字表, which may be problematic until we indicate in {{zh-forms}} whether a form is in 通用规范汉字表. — justin(r)leung (t...) | c=› } 05:15, 14 January 2021 (UTC)
@Justinrleung: I'm leaning towards option (1). However, we would have to add a special note that explains why these "non-analogizable" simplified forms are considered non-standard on Wiktionary. We would also have to note that 𠮶 is attested despite the fact that 嗰 cannot be simplified by analogy per 简化字总表. As I understand it, 無限類推 is undesirable and messy. RcAlex36 (talk) 06:28, 14 January 2021 (UTC)
No particular opinion too. -- 08:48, 14 January 2021 (UTC)
@RcAlex36, 沈澄心: Thanks for responding. When we come to some consensus, I think it's best that we put guidelines at WT:AZH so that we know what to do with characters that aren't found in 通用规范汉字表. — justin(r)leung (t...) | c=› } 22:42, 14 January 2021 (UTC)

"Schröderization" as WOTD?[edit]

The word Schröderization was nominated as a Word of the Day by @Illegitimate Barrister. I've done a Google Books search and it is verifiable. However, is it too controversial to be a WOTD? It has a derogatory sense, and refers to a living former politician (though it appears that his actions that led to the word have been widely criticized – and so tough luck to him?). If it is not too controversial, would it be inappropriate to feature the word on Gerhard Schröder's birthday, or should we definitely feature it on a different date? I look forward to your comments. — SGconlaw (talk) 08:31, 3 January 2021 (UTC)

I like featuring fringe words, but this one is a bit too obscure and specific, it doesn't seem to get used much outside of the initial context (Schröder's deals with Russia). I don't think there's a problem with it being too controversial. – Jberkel 09:33, 4 January 2021 (UTC)
Well, we have been featuring other words which are somewhat obscure (in the sense of not being in very common use). Since there aren’t many objections, I’ve gone ahead and listed the word as a WOTD on Schröder’s birth anniversary. — SGconlaw (talk) 06:35, 10 January 2021 (UTC)

German 2nd person past subjunctives[edit]

(Notifying Matthias Buchmeier, Kolmiel, -sche, Atitarev, Jberkel, Mahagaja): Not sure if this belongs here or elsewhere. I am cleaning up the German verb templates, which are a mess. So far I have created Module:de-verb and {{de-conj-table}} to replace the old {{de-conj}}. Eventually I will expand Module:de-verb to do automatic conjugation and replace the old flawed, half-written Module:de-conj. I have a question though about 2nd person past subjunctives. The forms as listed are e.g. du wärest, ihr wäret and du begäbest, ihr begäbet but I have come across also du wärst, ihr wärt and du begäbst, ihr begäbt. Are these latter forms standard? Should they be listed (possibly with a footnote)? Also what about composed forms like du wärest losgeworden and ihr wäret losgeworden? Should forms like du wärst losgeworden and ihr wärt losgeworden also be listed, possibly with a footnote (or listed as the only possibilities)? Benwing2 (talk) 19:39, 3 January 2021 (UTC)

One other question: For senden and derivatives, does the imperative send (du) exist and should it be listed (possibly with a footnote)? Duden says yes it exists, but our old templates sometimes purposefully omitted it. Benwing2 (talk) 19:41, 3 January 2021 (UTC)
To my fluent but nonnative ear, du wärst and ihr wärt sound completely normal (although I know Duden doesn't prescribe them), but du begäbst and ihr begäbt sound rare and rather poetic. If we're going to be writing footnotes about the past subjunctive anyway, though, we should probably mention that it is very rare in the colloquial language except for wäre, hätte and the modals (würde, könnte, wollte, dürfte etc.) and is usually replaced with the periphrastic construction with würde. So even du begäbest and ihr begäbet sound odd to me, as the usual construction would be du würdest begeben/ihr würdet begeben. —Mahāgaja · talk 20:00, 3 January 2021 (UTC)
@Mahagaja Thanks. Let's see what others say about adding a general past subjunctive note. One other question, about the new (post-1996) spellings kennen lernen, spazieren gehen and similar: Currently the tables have subordinate clause dass ich kennen lerne and zu-infinitive kennen zulernen. Are these correct (especially the zu-infinitive)? I learned German with the old spellings so I have no intuition here. Benwing2 (talk) 20:10, 3 January 2021 (UTC)
dass ich kennen lerne is right, but the zu-infinitive is kennen zu lernen. —Mahāgaja · talk 20:19, 3 January 2021 (UTC)
The past subjunctive is indeed used very rarely in spoken German, often by older speakers, or with some auxiliary verbs, this should be mentioned in the footnotes. It still gets some use in news reporting as indirect speech marker: "sie sagten, sie begäben sich..." The forms begäbt / wärt are acceptable and probably more common than the -et variants. – Jberkel 23:15, 3 January 2021 (UTC)
wärest/wäret/begäbest/begäbet should be labeled as archaisms. When these forms are used they mark a deliberate deviation from everyday language. --Akletos (talk) 09:14, 4 January 2021 (UTC)
@Akletos Thanks, I'll add that when I have a chance. Benwing2 (talk) 04:31, 5 January 2021 (UTC)
(Notifying Matthias Buchmeier, Kolmiel, -sche, Atitarev, Jberkel, Mahagaja): How are the following verbs conjugated? gegenbeschuldigen (ich gegenbeschuldige or ich beschuldige gegen?), endlagern (ich lagere end or ich endlagere? ich habe geendlagert per Collins? ich habe endgelagert per Wiktionary? ich habe endlagert?) Benwing2 (talk) 04:41, 5 January 2021 (UTC)
(Notifying Matthias Buchmeier, Kolmiel, -sche, Atitarev, Jberkel, Mahagaja): Another question, maybe more relevant, concerns stems ending in -s, -x, -z or -ß. dewikt [2] lists the 2sg preterite of e.g. lassen as either ließest or ließt, but the 2sg past subjunctive as only ließest. (We list the 2sg preterite as only ließt.) Are the two preterite forms ließest or ließt equally used, or is one archaic? Does there exist in this case a 2sg past subjunctive of ließt, and if so is the form ließest archaic per User:Akletos? BTW same appears to apply to strong verbs in -sen e.g. preisen and in -zen e.g. schmelzen. Benwing2 (talk) 06:36, 5 January 2021 (UTC)
Verbs that are back-formations from nouns like gegenbeschuldigen (< Gegenbeschuldigung) and endlagern (< Endlager) are often variable in German, with native speakers themselves sometimes being uncertain. Sometimes such verbs show separable-prefix behavior in only some forms (e.g. the past participle babygesittet is OK) but not others (*Ich sitte heute Abend baby is definitely not, it has to be Ich babysitte heute Abend). Sometimes the forms just don't exist: I read a linguistics article once about how the finite forms of uraufführen (< Uraufführung) simply cannot be used in main clauses: *Wir uraufführen heute Abend, *Wir aufführen heute Abend ur, *Wir führen heute Abend urauf are all equally ungrammatical, but Ich hoffe, dass wir heute Abend uraufführen is OK. —Mahāgaja · talk 08:19, 5 January 2021 (UTC)
@Akletos, Benwing2: wärest/wäret/begäbest/begäbet should not be labelled as archaisms, they are the current standard written forms. wärst/wärt/begäbst/begäbt are colloquial, of which wärst/wärt is less marked, as of an irregular verb anyway, but begäbst/begäbt could be marked as errors (A, Ausdrucksfehler) by schoolmasters (more likely than not). Also contrary to what @Jberkel says the past subjunctive is used often enough in spoken German, and I think particularly of moderately formal spoken German e.g. in university, before court and the like – this is the norm to be reflected. I guess in Berlin they don’t learn German anymore, they do Schreiben nach Gehör. Fay Freak (talk) 22:53, 5 January 2021 (UTC)
(Notifying Matthias Buchmeier, Kolmiel, -sche, Atitarev, Jberkel, Mahagaja): @Fay Freak OK, one more question about past subjunctives. I'm writing a module to do automatic German conjugation and I have added a footnote to this module for past subjunctives. It appears there are three categories of past subjunctives:
  1. Verbs where the synthetic form is preferred over würde + infinitive: haben, sein, können, müssen, dürfen, mögen, sollen, wollen, werden
  2. Verbs where both the synthetic form and würde + infinitive forms are frequent: brauchen, finden, geben, gehen, halten, heißen/heissen, kommen, lassen, stehen, tun, wissen
  3. Verbs where the synthetic form is rare compared with würde + infinitive, and highly formal: all the rest.
My questions are (1) are any verbs miscategorized here? (2) are any verbs missing from categories (1) or (2)? (3) what about compounds of the above verbs, e.g. spazieren gehen/spazierengehen, abkönnen, auslassen, wehtun, etc.? Do they behave the same as the base verb, or are they in category (3)? Benwing2 (talk) 05:29, 6 January 2021 (UTC)
@Benwing2: 3) They behave the same. 1) mögen because the meaning is different in the subjunctive which replaces wollen as a more polite form, so “würde mögen” is totally common. “würde haben” also does not sound odd, apparently only with the modal verbs it sounds unusual, brauchen, dürfen, sollen, wollen, müssen, and with werden because of the duplication; and even with those it is not too odd to use würde, so perhaps the distinction between (1) and (2) is unneeded. The synthetic forms are just rare and formal when they are identical to the preterite forms, that is in all weak verbs (also including those with Rückumlaut like kennen and brennen). Else, Berliners might not believe it, we Westphalians also use hülfe as well as stürbe (this latter of sterben does not sound archaizing at all, potentially only hülfe but it is occasionally used in speech). erklimmen, called highly formal here, would be well understood in the subjunctive II erklömme and not frowned upon, and so on with all that is in Category:German strong verbs: The subjunctive II can be used just like würde periphrasis, and even more, in formal speech the periphrasis would be doubtful style. So actually your category 2 should contain most or all all strong verbs and category 3 all weak ones, as stilted forms but none are archaic, in formal speech and writing every subjunctive II is expected. Fay Freak (talk) 15:51, 6 January 2021 (UTC)
@Fay Freak: Should we mention that bräuchte, though widespread in everyday use, is proscribed by prescriptivists and probably better avoided in formal written German? We do already have a note more or less to this effect at brauchen#Usage notes. —Mahāgaja · talk 16:24, 6 January 2021 (UTC)
@Mahagaja: Hö, this prescription is completely unreal. Maybe you read too many prescriptivists? I am not sure I have ever heard brauchte as a subjunctive II, though I have probably read it at some bloggers – it’s so rare that one can think somebody missed the Ä key and it is likely to be mistaken for just the preterite –, and it is now unlikely that bräuchte is ever marked as an error (if only because luckily correctors don’t read as many trashy language materials as you as a non-native speaker naturally have to encounter?). Although the usage without “zu” is likely marked to be wrong, in spite of being consequential with its status as a modal verb, such that I prefer to view the use with zu as pretentious. Fay Freak (talk) 16:41, 6 January 2021 (UTC)

compound vs. affix vs. univerbation[edit]

I created یڭیچری(janissary) as a {{compound}} of یڭی(new) + چری(soldier). @Fenakhay changed the template to {{af}}. Then @PUC changed the template to {{univerbation}}. Are there any guidelines on when to use each template? They all seem to mean the same thing in this case. Vox Sciurorum (talk) 12:56, 6 January 2021 (UTC)

@Vox Sciurorum: I probably shouldn't have meddled in, as I don't know anything about Turkish or Ottoman Turkish. I've simply followed the lead of @Fay Freak, who wrote that یڭیبهار(yeñibahar) is a univerbation. What does @Lambiam think?
However, I was actually just discussing this issue with @Inqilābī at Talk:double penetration. They mentioned the case of wasteland, which is currently categorised as a compound; but seeing that waste is an adjective (per [3]), and that [adjective + noun] phrases don't usually solidify in this way in English (as opposed to [noun + noun] phrases), I think it is best described as a univerbation. PUC – 13:07, 6 January 2021 (UTC)
Thanks for drawing me to this discussion.
@Vox Sciurorum: I think that it is indeed possible to have compound words having an adjective as a component word, and I support categorizing words like یڭیچری‎, wasteland, double uncle etc. as compounds. inqilābī inqilāb·zinda·bād 13:22, 6 January 2021 (UTC)
{{af}} categorizes things as compounds if none of the arguments contains a hyphen (which would indicate an affix), so the choice between {{compound}} and {{af}} is without significance in this case. The question is really whether this is a compound or a univerbation. Without knowing anything about Turkish, Ottoman or otherwise, I'd be inclined to call this a compound. For me a univerbation, unlike a compound, doesn't have a head, i.e. the part that determines both the part of speech and the semantics of the whole: in yeñiçeri, the head is çeri, as yeñiçeri gets both its noun status and its semantics (a kind of soldier specified by the adjective portion of the compound) from çeri. And wasteland is also a compound for the same reason; where did anyone get the idea that we don't have [Adj + Noun] compounds in English? —Mahāgaja · talk 13:30, 6 January 2021 (UTC)
@Mahagaja: I have reverted myself on that Ottoman Turkish entry.
As for [Adj + Noun] compounds in English: I may have been mistaken / not thought this through. But would you really describe items such as genuine article and double penetration as "compounds"? PUC – 13:42, 6 January 2021 (UTC)
This is more like blackbird and headland than those examples. It's not that you can't have Adj + Noun compounds, but that those particular examples aren't Adj + Noun compounds. Chuck Entz (talk) 14:25, 6 January 2021 (UTC)
@Chuck Entz: So what do you think of words like double dribble, double Dutch, true blue (noun), true love etc. These should certainly qualify as compound words. @Mahagaja. inqilābī inqilāb·zinda·bād 20:13, 6 January 2021 (UTC)
Well it probably was at one point just “new soldiers”, not “new-soldiers”. It’s written apart even, or with zero-width joiner or non-breaking half-space, in Redhouse’s dictionary linked under چری‎ as opposed to together at یڭیچری(janissary), both equally common. So if at some point it was adjective + noun, it later does not become a compound but an univerbation – that’s my understanding of an univerbation at least …
You are right to observe that in Turkish the distinction between derivation, compounding and univerbation is formally particularly intransparent; only luckily there are only few proper Turkish prefixes, but in Ottoman there must have been more Persian ones autonomously employed, where with some then one could doubt whether it is rather a compound with an adjective or adverb or an univerbation so there are three possibilities, and the varying spelling یكیچری‎ ~ یكی‌چری‎ seems to show that the Ottomans themselves looked varyingly upon this word. Fay Freak (talk) 14:49, 6 January 2021 (UTC)
The components are not of Persian origin. Originally, the term yeni çeri (“the new military corps”) must have been a completely transparent adj + noun combination, also (in spoken form) to illiterate speakers of kaba Türkçe. (For a collective sense of çeri, see e.g. here. The term probably underwent a similar sense development as English police, in which the collective term was re-interpreted as an unmarked plural of a count noun.) Neither component can reasonably be considered an affix. The primary stress on the first component (/jeˈni.t͡ʃe.ɾi/) is typical for adj + noun combinations that solidify to lexical units; cf. karakuş /kɑˈɾɑ.kuʃ/ vs. kara kuş /kɑˌɾɑ ˈkuʃ/.  --Lambiam 17:08, 6 January 2021 (UTC)
Let me add that in the Redhouse Çağdaş Türkçe-İngilizce Sözlüğü (Redhouse Contemporary Turkish–English Dictionary) (1983) the entry for çeri is simply “çeri  (hist.)  army, troops.”  --Lambiam 16:00, 7 January 2021 (UTC)

Yeniçeri is a compound, not a univerbation. {{af}} handles it correctly. On another note, I do not think diacritics such as three dots over the kaf should be used in the pagename to indicate the /ŋ/. This kind of spelling was found almost exclusively in dictionaries. It can be added as alternative spellings. The transliteration should feature /ŋ/, though. Allahverdi Verdizade (talk) 10:41, 13 January 2021 (UTC)

I followed the policy in Wiktionary:About Ottoman Turkish both for consistency and because I have no specialized knowledge of my own to bring. (I only add entries in obsolete languages when they are particularly interesting to me as sources of etymology, like the origin of the widespread word janissary, or due to subject matter, like the Germanic words for squirrel I added recently.) By that policy, "the ڭU+06AD ARABIC LETTER NG is used in the place of the old /ŋ/." But checking Redhouse's 1890 dictionary, I see he spells yeñiçeri without the dots. And I've seen vowel marks in some places but not in others. I don't have an opinion on what the rule on Wiktionary should be. Vox Sciurorum (talk) 14:24, 13 January 2021 (UTC)
That page is Fay Freak's essay on his views. Some things there are fine, others are just that, his personal views. Allahverdi Verdizade (talk) 14:40, 13 January 2021 (UTC)
@Allahverdi Verdizade: No, it isn’t. I continued what was started before. There had been entries with گ‎ and ڭ‎, and the page informs about a uniform practice. If I had made the first Ottoman entry on Wiktionary all would look differently. It is incorrect to claim I would be responsible for Wiktionary’s coverage not being “in accordance with the current state of the field”, and if you want a specific thing then you should go for it and tell it. There is nothing to “expect”. Fay Freak (talk) 15:21, 13 January 2021 (UTC)

Suggestion to deprecate Template:ko-syllable-hangul[edit]

An example at 강#Etymology 1.

These are not dictionary material, and they are not necessary because all of the (very little) relevant information is contained in Template:character info anyways. I think they should be automatically removed.--Karaeng Matoaya (talk) 03:00, 8 January 2021 (UTC)

(Notifying TAKASUGI Shinji, HappyMidnight, LoutK, Karaeng Matoaya, B2V22BHARAT, Quadmix77): In general, I agree with @Karaeng Matoaya's proposal. The info, which is somewhat useful is covered by {{character info}}.
Does it actually mean that all Korean entries, which don't have any PoS sections but ====Syllable====, will also be deleted? That should be covered by the proposal because there may objections or they could potentially be usefull. I actually think that perhaps we should have some (very basic) Translingual sections for each Hangeul syllables with {{character info}} at the very top where there is no other sense (no word exists). --Anatoli T. (обсудить/вклад) 05:44, 8 January 2021 (UTC)
I agree that they should be removed. They are not useful anyways. — LoutK (talk) 18:41, 8 January 2021 (UTC)
@Atitarev, I think we can leave entries for definitionless Hangul entries as they are for the time being.
@Benwing2, do you think this could be done automatically (delete all etymology sections which include {{ko-syllable-hangul}} in pages with multiple Korean etymology sections, and renumber the etymology lables)? Or should this be done manually?--Karaeng Matoaya (talk) 01:39, 14 January 2021 (UTC)

Are there any cases where Template:es-IPA shouldn't be transcluded?[edit]

I started adding it to some entries and I was concerned that maybe it wouldn't be able to handle ceceo/seseo or lleísmo/yeísmo, etc. but it seems like a pretty good, flexible template. Since Spanish has a pretty phonetic and reliable pronunciation system, this is going to be accurate 99.99%+, so can anyone tell me why a bot shouldn't add this to every Spanish entry? —Justin (koavf)TCM 09:43, 8 January 2021 (UTC)

Too soon: see Module:es-pronunc/testcases. PUC – 10:20, 8 January 2021 (UTC)
I added a few South American words borrowed from indigenous languages and declined to use {{es-IPA}} for two reasons. I don't know if the Spanish word retains any trace of the pre-Spanish pronunciation. I don't know how to suppress the Castillian pronunciation. Vox Sciurorum (talk) 12:01, 8 January 2021 (UTC)
Why suppress it? Surely even Castilian speakers are allowed to utter words that entered the language in Latin America, just as British speakers have their own pronunciations of English words borrowed from Native American languages. —Mahāgaja · talk 12:35, 8 January 2021 (UTC)
I prefer not to create unattested forms. If somebody from Spain knows how an originally Quechua word is pronounced there, go ahead and add the pronunciation. Vox Sciurorum (talk) 13:41, 8 January 2021 (UTC)
As someone who is not a native, nor a bilingual anglo/hispano, but who knows more than the average person about Spanish and other Romance languages, I've never experienced a Hispanic pronouncing an indigenous term with a phonology outside of standard Spanish. The only loanwords I know of that break this are a few aspirated hs, like in hip hop or sometimes pronouncing a w in words like Kuwaiti (rather than "koo-bay-tee"). Very anecdotal but I'd be interested in knowing if South American Spanish that butts up against living indigenous languages (e.g. Jopara Guarani) have non-Hispanic phonemes. —Justin (koavf)TCM 07:22, 9 January 2021 (UTC)
  • @Jonely Mash: Does the pronunciation that I just added at a dolor seem incorrect to you? —Justin (koavf)TCM 07:27, 9 January 2021 (UTC)
    a dolor's pronunciation is fine. That's why I weasel-worded "possibly" into my sentence :). Jonely Mash (talk) 11:02, 9 January 2021 (UTC)
  • But this does bring up a general consideration, which I don't have the solution for. How we could include LOADS of pronunciations in the template - from Cádiz, Buenos Aires, Uruguay, Chile, Asturias, Cuba, the different parts of México etc. which all have notably different pronunciations. Jonely Mash (talk) 00:51, 9 January 2021 (UTC)
  • While there are differences in pronunciation other than just ceceo/seseo and lleismo/yeismo (e.g. dropped -s in some Central American varieties), I think that Spanish has some fairly predictable pronunciations and the thing that would probably make this much easier than (e.g.) English would be the smaller amount of vowels. Even attempting this for English seems pretty daunting but for Spanish, it seems doable. Maybe I'm just too ignorant of Lua or Spanish phonology. —Justin (koavf)TCM 07:24, 9 January 2021 (UTC)
This module still has many problems. ununquadio is still wrong after I pointed out the error over 2 years ago on the talk page, as are many other issues raised there. I'm happy to do research and work with anyone who offers to refine the module. Ultimateria (talk) 07:55, 9 January 2021 (UTC)
Well, ununquadio was a simple fix. I'd also say we should be careful using the IPA template for obsolete terms, like huviesse, çarça, ortographía. Can we know for sure how they were pronounced? Do we care enough? Jonely Mash (talk) 11:07, 9 January 2021 (UTC)
Only if we take care of everything else first. That's still a temporary solution on ununquadio, but thank you for fixing it. Ultimateria (talk) 18:55, 9 January 2021 (UTC)
Would one of you like to add ununquadio to Module:es-pronunc/testcases so it can be tracked? Or is its issue already covered by an existing testcase? - -sche (discuss) 17:35, 10 January 2021 (UTC)
Yes check.svg Done. Ultimateria (talk) 19:05, 10 January 2021 (UTC)

@Benwing2: I know you already have a lot on your plate, but would you be interested in getting this module up to par? PUC – 17:39, 10 January 2021 (UTC)

@PUC I can take a look. I'm not sure some of the existing pronunciations e.g. of accidental are wrong, though. Wikipedia specifically gives [oβtiˈmista] for optimista, for example. Benwing2 (talk) 17:58, 10 January 2021 (UTC)

Some antonyms seem like a stretch or outrite wrong[edit]

E.g. misandry and misogyny are listed as antonyms. Hating men is not "the opposite" of hating women. In addition to the fact that this seemingly ignores the existence of anyone who is intersex, the opposite of hating [group] is loving [group]. It's not clear to me that these bigotries are complementary, graded, or relational antonyms. Am I just being dense here? —Justin (koavf)TCM 07:48, 9 January 2021 (UTC)

I agree they're not antonyms. I'd call them ====Coordinate terms====. —Mahāgaja · talk 08:24, 9 January 2021 (UTC)
Antonymy is a particularly easy concept to misapply, but coordinate terms isn't so easy either. In both cases there is question of opposite|coordinate with respect to what attribute(s) of the target term. Is brush, mop, or vacuum cleaner the best coordinate term of broom? What is the antonym of broom? Why not vacuum cleaner? It draws dirt in, instead of spreading it. DCDuring (talk) 21:02, 9 January 2021 (UTC)
Can't they all be coordinate terms? —Mahāgaja · talk 22:53, 9 January 2021 (UTC)
Or else cohyponyms: men-hate and women-hate are forms of hate. broom and vacuum are both cleaning utilities. -- 23:10, 9 January 2021 (UTC)
As are dust cloth, soap, sponge, washing machine, dishwasher, buffer, polish, detergent, lint brush, face cloth, carpet sweeper, whisk, swab, nail brush, pressure washer, feather duster, etc. DCDuring (talk) 00:57, 10 January 2021 (UTC)
"What is the antonym of broom?" It has no antonym; not every word has an antonym anymore than every word has a synonym. —Justin (koavf)TCM 00:51, 10 January 2021 (UTC)
I think it was a rhetorical question. PUC – 11:54, 13 January 2021 (UTC)

Use ISO 15919 for Hindi transliteration[edit]

There's an ongoing discussion here regarding updating the transliteration scheme that Wiktionary uses for Hindi (and Urdu) to be compliant with ISO 15919, which is the international standard and the one used on other Wiki sites. Let's continue the discussion there. Getsnoopy (talk) 01:21, 10 January 2021 (UTC)

Affected project page: Wiktionary:Hindi transliteration
Possibly affected project page Wiktionary:Urdu transliteration
(Notifying AryamanA, Benwing2, DerekWinters, Kutchkutch, Bhagadatta, Msasag, Inqilābī): Notifying Hindi editors.
(Notifying Taimoorahmed11, RonnieSingh, AryamanA): Notifying Urdu editors.
Please respond to Getsnoopy's proposal. --Anatoli T. (обсудить/вклад) 23:25, 7 February 2021 (UTC)
@Atitarev As an Urdu editor, I would have to say ISO 15919 isn't really compatible with Urdu, as it's quite Indic-based and doesn't really represent Urdu spelling/pronunciation, but having said that I don't think there are any transliteration standards which accurately represent Urdu. I was actually going to propose that we use a custom/modified transliteration standard for Urdu spelling and takes into consideration things like 'loan letters', Fatha/Kasrah/Dammah Majhool and other forms of diacritics/pronunciations in Urdu, See here, as well.
-Taimoor Ahmed(گل بات؟) 00:27, 8 February 2021 (UTC)
@Taimoorahmed11: I see Urdu may further deviate from Hindi. It's probably fine - I saw discussion on Module talk:ur-translit and your talk page. I think you need to start a discussion on Wiktionary:Urdu transliteration, if you're proposing a change to match your module (it is a policy page). Pls note ISO 15919 isn't just for vowels but consonants, which would be easy to automate than vowels but we need an agreement on what the differences are going to be, for example, the Urdu خ‎ is "x" but the Hindi equivalent ख़ is "k͟h", etc. --Anatoli T. (обсудить/вклад) 01:15, 8 February 2021 (UTC)
@Atitarev: When it comes to consonants, I prefer it to be transliterated as "x" in Urdu, and "k͟h" in Hindi, just to highlight the spelling difference in Hindi and Urdu.
-Taimoor Ahmed(گل بات؟) 01:35, 8 February 2021 (UTC)
@Taimoorahmed11: While that is somewhat true, the ISO 15919 standard has specific pages dedicated to transliterating from Urdu consonants, which ends up accounting for most (if not all) characters in the Urdu alphabet meaning that it supports Urdu to a large extent as it does any other Indic script. You can see this on the ISO 15919 wiki page. The diacritics, as you pointed out, are another matter, so an "ISO + Urdu diacritical customizations" strategy would probably be a good way to go to keep its compatibility with other Indic languages, since Urdu is an Indic language after all. Getsnoopy (talk) 01:59, 8 February 2021 (UTC)
@Atitarev, Getsnoopy: I don't really feel strongly about this. On one hand I think the macron on e and o are redundant because their short versions don't exist in Hindi and there is no need to distinguish the two. But then I'm also aware that transliterating ख़ as x does look strange to a reader who is not familiar with our transliteration conventions and may interpret the word as having an actual "ks" sound.
Since here you say this is supposed to cover all the languages of India (Not just "Indic" but Dravidian as well). Does this mean that this standard should also be applied to other languages that use the Devanagari script and the ones that use other scripts viz. Punjabi, Gujarati, Bengali etc. ? If that is the case then the already pinged editors here viz @Kutchkutch for Marathi, @Inqilābī for Bengali, @Msasag for Assamese, @RonnieSingh and @Taimoorahmed11 for Punjabi and Urdu should know what they are voting for. -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 03:22, 8 February 2021 (UTC)
@Bhagadatta: Yes, that's correct. ISO 15919 covers all Indic (being used in the script sense here, so it includes all Brahmic scripts including Dravidian scripts) scripts. This is exactly why the ē and ō characters exist: most (if not all) Dravidian scripts distinguish between short and long versions of those vowels, so ISO distinguishes between the two since it's meant to be one standard to rule them all. The point about "x" being read as "ks" is a good one and is exactly one of the reasons I'm proposing sticking strictly to the ISO standard: it works for almost all intents and purposes as far as Indic transliteration is concerned. Getsnoopy (talk) 05:40, 8 February 2021 (UTC)

I do not think ISO 15919 is a good romanisation considering the fact that different IA and Dravidian phonologies are different and trying to merge them under a single romanisation is ridiculous. I don't think we should transliterate at all, rather transcribe them phonemically. I would suggest getting rid of length in transcriptions in languages that don't make a length distinction for certain vowels. Moreover, I'm completely against using using ⟨k͟h⟩ for ⟨خ⟩ and ⟨ख़⟩. Like I said, transcribe, don't transliterate. And there's no point distinguishing Urdu and Hindi transcriptions or having different letters for the same sound that's written with different letters. People cam already see the spelling there. As for ⟨x⟩ being read as [ks], Persian transcription still uses ⟨x⟩, even though laypeople wouldn't directly make an association in chats and stuff and read it as [ks]. Wiktionary and Wikipedia are academic spaces, albeit for common people. If they come to Wiktionary enough, they'll get used to the usage of ⟨x⟩. Let's not unify all IA and Dravidian transliterations under one umbrella. They have differences and those differences matter. RonnieSingh (talk) 08:15, 8 February 2021 (UTC)
@Atitarev, Getsnoopy: The primary issue appears to be to have some way to compare the romanisation of characters in one script and the comparable characters in another script, particularly those at Brahmic_scripts#Comparison. Perhaps it would be better to satisfy this need for comparability in other ways. For example, the appendix could to be used to compare Wiktionary's language-specific romanisation schemes with other schemes such as ISO 15919, Hunterian, IAST, etc. Kutchkutch (talk) 10:00, 8 February 2021 (UTC)
(Thanks for the ping.) While we the normal editors are busy with serious editing, Getsnoopy is fruitlessly asking for something we know is never going it happen. Even if they start a vote on this, I would call that illegitimate. The ISO standard is good only for Sanskrit, also MIA languages. For any NIA language that is phonologically innovative (or any non-IA language for that matter), we must only use appropriate phonemic transliteration schemes, these obviously being Wiktionary’s modifications of the ISO standard, to varying extents, per the phonology of the language in question. As others have already pointed out, we are not here to simply transliterate the orthography of the language, but rather to transcribe it phonemically so as to faithfully represent the phonology of the language through our transliteration scheme. -- inqilābī inqilāb·zinda·bād 18:53, 8 February 2021 (UTC)
@RonnieSingh: Out of curiosity, could you elaborate on what differences you're referring to and why they wouldn't be accurately captured via ISO 15919?
@Inqilābī: Yes, apparently seriously editing entries written for people who can't seem to understand what they're reading. Perhaps tone down the sarcasm and actually bring up criticism for the proposal? use appropriate phonemic transliteration schemes That is somewhat of an oxymoron; transliteration does not relate directly to phonemics or phonetics, transcription does. Regardless, somewhere around 90% of the phonemes used in IA and non-IA Indian languages are shared, so suggesting that ISO isn't a great fit for unifying the transliteration of these languages is dubious. Could you provide examples of where the problem occurs? Also, as @RonnieSingh suggests, doing away with transliteration entirely is a respectable position to have, as then we're dealing simply with transcription (which is what you seem to be fundamentally in favour of). But then again, we would all be using IPA (an international standard) for that and not arbitrary conventions developed here and there. The point is about following international standards that everyone (not just the editors of Wiktionary) can understand and follow. Getsnoopy (talk) 22:13, 9 February 2021 (UTC)
@Getsnoopy: The length distinctions that the proposed ISO standard makes aren't all made by the languages. It proposes <ē> for Gujarati <એ> which is never long in Gujarati. Similarly, even though distinguished in the spelling, Gujarati and Bengali and many other IA languages do not distinguish length for /i/ and /u/ in their phonologies. Moreover, it suggests use of <ê> for Devanagari <ऍ> which is used to represent /æ/ in English loan words in Marathi and sometimes also in Hindi. Also, when I talk about doing away with transliteration and transcribing instead, I'm still talking about Roman transcription and not IPA, because IPA isn't Roman. On the English Wiktionary, it's customary to have both a romanisation of the non-Latin script word and a phonetic transcription in a separate pronunciation section. —⁠This unsigned comment was added by RonnieSingh (talkcontribs) at 10:08, 10 February 2021 (UTC).
@Getsnoopy: We aren't doing transliteration to begin with since all our IA langs have a transcription that includes schwa deletion. Transliteration is not useful for a reader (that's why the native script is given!), but a transcription is because it represents how a word is said in a language-suitable consistent manner.
I think this argument is not going anywhere, no one besides yourself has said anything positive about the ISO system, and I see no problem in our current systems. The fact is, a dictionary is only going to be useful to someone who has a basic knowledge of the language at hand, and at that level of knowledge one knows that Hindi "x" is not /ks/ because such a cluster doesn't exist in Hindi. Plus, we provide IPA for Hindi. —AryamanA (मुझसे बात करेंयोगदान) 19:13, 11 February 2021 (UTC)
@RonnieSingh: I actually just reviewed the standard, and section 9.1 states that there is a "non-uniform vowels option" where you can transliterate "Bengali, Devanagari, Gurmukhi, Gujarati, Oriya, or scripts whose character repertoires fall within the character repertoires of these scripts" with long e as "e" and long o as "o", so this seems to cover those cases you mentioned. As for lengths of /i/ and /u/, it seems like the sounds are allophonic with their short versions in those languages, but the distinctions are still maintained in certain edge cases and for etymological reasons. I don't see how showing them as such where they're used would be bad. I don't know what problem you're suggesting there is with ê representing ऍ. It seems like the primary issue you have is with vowels, but I'm more focused on the consonant transliterations. Either way, I think this discussion has become something larger than what it was originally intended to be: relevant only to Hindi transliteration, as the title suggests.
@AryamanA: Like I've said already, schwa deletion is a special case. in a language-suitable consistent manner I can say with high confidence that people reading "x" would not read it as "ख़" unless they are familiar with IPA or with Perso-Arabic transliterations, so it's not suitable for the language at all. I'm not aware how much time you've spent in India, but there are plenty of places where words like लक्षमि are written as "Laxmi". As for the cluster not existing in Hindi, a counterexample is अक्स. And similarly, no one (and I'm actually willing to bet on this) would know how to read "ŕ" other than probably you and the other maintainers of the project; it's ironic that even one of the editors called that one out as perplexing. I don't know why you keep trying to rationalize such an obviously absurd transliteration. It's important to remember that these transcriptions are for ordinary people, not for the people who edit regularly using the very transliteration system they created. Given that there's a lot of contention over vowel transliterations and my responses about how ISO allows for wiggle room there, I suggest we at least fix the transliterations for ख़ and ऋ so that we can restore some rationality to the pre-existing scheme. Getsnoopy (talk) 06:44, 15 February 2021 (UTC)
@Getsnoopy: The only reasonable change is r with ring below for that grapheme. I haven't seen the Wiktionary type r with acute anywhere else. And I'd hardly say Metaknowledge "called" anyone "out" there. kh with a line below is not an epitome of rationality (That thing is not an aspirate! Why the digraph? We're not underlining anything else! It's just so inconsistent with all the other fricatives if we do that).
And yeah, I speak Hindi natively and am well aware of ad-hoc romanisations of it. We need consistency so we have to sacrifice some simplicity: च and छ are usually ch and chh, श is sh, फ is f, etc. in everyday use but we don't do that and reserve digraphs for aspirates. Making a consistent romanisation is a decision that had to be made for every dictionary. These transcriptions are indeed for ordinary people, but they're meant to be consistent and scientific and this modified IAST is one such system that we all have maintained a consensus on for many years. I'm hardly imposing anything here. —AryamanA (मुझसे बात करेंयोगदान) 09:06, 15 February 2021 (UTC)
(perhaps I could be pinged earlier as I'm a Hindi editor) @Getsnoopy, I'm also not in support of changing the transliteration system for Hindi here, and I can say for sure that this is not going to happen, seeing that almost all editors are opposing. I think you better give up, so that we don't waste time for useless things and focus on improving Wiktionary! 🔥शब्दशोधक🔥 07:05, 16 February 2021 (UTC)
@शब्दशोधक: The ultimate goal of Wiktionary's existence is to be accessible to ordinary people, and improving Wiktionary involves any effort toward that goal. Using cryptically transliterated characters flies in the face of that goal; you could have the most comprehensive dictionary in the world, but it would be useless if people couldn't access it and understand what it says. I don't understand why people keep parroting this tired idea that transliteration (or any other such fundamental item) is somehow useless. If it's useless, why do it at all? Getsnoopy (talk) 23:34, 16 February 2021 (UTC)
@Aryamanarora: Then I propose changing the transliteration of ऋ to r̥ (and likewise of ॠ to r̥̄); I'll take some change over none at all. Getsnoopy (talk) 23:34, 16 February 2021 (UTC)
@Getsnoopy: Well, actually for native Hindi speakers who may search for some word of Hindi, we don't need transliteration at all. For some non-native speaker, transliteration is necessary. s with diacritics to make श and ष is confusing again, since it is not used commonly. Daily life transliterations are very unstable. I've seen many write नहीं as nhi, even dropping the a which they do pronounce. sh as श & ष, f as फ and फ़ (regardless of it having nuqta or not), a as अ & आ (sometimes also aa or A but that's rare) and so many more. About ऋ, I asked User:Bhagadatta earlier, but he said that it is to avoid confusion between ड़ and ऋ, and I'm sticking to that - ŕ may be strange, but it does differentiate itself from ṛ. On the other hand, r̥ is a bit too similar to ṛ (although I don't deny dot != ring). Our transliteration system seems perfectly fine to me, and I Symbol oppose vote.svg oppose any change to it. 🔥शब्दशोधक🔥 03:10, 17 February 2021 (UTC)

t:ja-compound auto-categorize affixes[edit]

Just like t:affix, it can check whether a constituent begins or ends with a hyphen and then add categories like cat:Japanese words suffixed with める. Would you like this feature?

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Suzukaze-c, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233): -- Huhu9001 (talk) 13:03, 11 January 2021 (UTC)

Symbol support vote.svg SupportSuzukaze-c (talk) 13:24, 11 January 2021 (UTC)
Symbol support vote.svg Support ‑‑ Eiríkr Útlendi │Tala við mig 08:13, 16 January 2021 (UTC)

Added to Category by Mistake?[edit]

@Tommassammot [4] and [5] added Kyrgyzstan and Nepal to Category:en:Places in China --Geographyinitiative (talk) 01:04, 13 January 2021 (UTC)

I fixed those, but there are lots more with the same problem. Code like <<c/...>> is only for holonyms, that is, places that the subject of the entry is part of, not for coordinate terms, which are things of the same type and part of the same larger grouping. Neighboring countries should be made into simple links with [[...]] or a linking template like {{l}}. It looks like most of @Tommassammot's edits will have to be either fixed or reverted. Chuck Entz (talk) 04:34, 13 January 2021 (UTC)
After going through all of the ones with {{place}}, it turns out to not be as bad as I thought. They mentioned neighboring countries in only a few of them, and those are now fixed. Chuck Entz (talk) 15:10, 13 January 2021 (UTC)

Codifying Western Yughur[edit]

I would like to propose a codification of Western Yugur in order to enable its lemmatization. Western Yugur is not written and there is no standard orthography.
Right now, we have 6 entries in the language, but it could be expanded using Roos (2010)[1]. It is a great source, containing texts and a wordlist, also providing etymologies and cognate lists for most items. The entire work is written in a phonological notation. Using it to lemmatize WY is less desirable for the following reasons. 1) some characters are not really found in unicode, such as LATIN SMALL LETTER S WITH CURL; aspiration is indicated in an IPA-style, with an ascended h; 2) some characters cannot be found for upper case (true for most IPA-inspired characters). 3) some characters are not conventional from the IPA point of view either, such as <ɕ> for /c͡ç/ or <c̨> for /ʈ͡ʂ/.

Therefore, I propose the following system for lemmatization of Western Yugur. There you find a comparative table of characters in Roos (2000), their phonetic values and the characters under proposition. There is also a list of sample words in all three notations and a sample text in the proposed orthography. The benefits of the proposed system are the following:

  • Full phonematicity, which will make the construction of an IPA-module very easy.
  • Closeness to the "Common Turkic alphabet" where possible (this will make comparisons in etymology sections and reconstruction space more easily accessible).

I am open to discuss individual solutions to the different aspects of the system under proposition. If we can agree on a standard spelling, I will take on creation of a core bulk of entries. —⁠This unsigned comment was added by Allahverdi Verdizade (talkcontribs) at 11:12, January 13, 2021 (UTC).


The issue arises for many LDLs. I think that Wiktionary is not, and should not be, in the business of devising or promoting systems of orthography; if we include a term, it should normally be in a spelling that is attested (where the bar for attestation is lower for LDLs).  --Lambiam 16:33, 16 January 2021 (UTC)
As there is allegedly no spelling for Western Yughur - it is not written - we cannot record spellings for this language. The best we can do is record pronunciations! For pronunciations, we should use (uncased) IPA. Recording the lemmas in the IPA seems the most neutral option for recording an unwritten language. --RichardW57 (talk) 00:22, 17 January 2021 (UTC)
How are Wiktionary users most likely to encounter this language? If Roos's system is the most likely, then perhaps we should use his system, and naturally exclude words that are not written in the current repertoire of Unicode. --RichardW57 (talk) 00:22, 17 January 2021 (UTC)
I don't know how Wiktionary users are most likely to encounter this language. Roos (2000) is perhaps the most known work, but there is an important PhD-thesis on the language from 2019[2]. There are earlier dictionaries and collections of texts as well. If you visit glottolog, you will see a more complete list of references. All of them use different notations, even the several papers co-authored by Roos. Allahverdi Verdizade (talk) 08:13, 17 January 2021 (UTC)


  1. ^ Roos, Marti (2000) The Western Yugur (Yellow Uyghur) Language. Grammar, Texts, Vocabulary, Leiden: University of Leiden
  2. ^ Zhong, Yarjis Xueqing. 2019. Rescuing a Language from Extinction: Documentation and Practical Steps for the Revitalisation of (Western) Yugur. (Doctoral dissertation, Australian National University; xxxi+467pp.)

Pali Pronunciation[edit]

I am not keen on attempting to record this, but @Octahedron80 has requested it for citta. Firstly, the ancient pronunciation seems not to be certain - were orthographic clusters of resonant + h murmured or simply clusters as written? Secondly, there are a lot of present-day regional variations, and nowadays even variations due to inconsistent attempts to achieve a more authentic pronunciation.

For example, how do we accommodate the Sinhalese and English failure to sound aspiration and distinguish dentals and retroflexes?

Should we tie pronunciations to scripts, or where possible tie them all to the trans-script (in this case, Latin script) form. For traditional Tai pronunciations, Pali is a tonal language. Phonemically, it should have two tones (etymologically voiceless v. voiced), but the allophones will belong to different tonemes in the speakers' mother tongues, and irregular pronunciations may make some of these differences locally phonemic. How do we handle regional variation in the tones within countries - and do they occur?

We also have the issue that the citation forms do not always occur in the language, or, I suspect, in the script. --RichardW57 (talk) 15:09, 13 January 2021 (UTC).

Appendix:Japanese counters page[edit]

An apparently well-meaning user, Zenkaino_lovelive (talkcontribs), has created this page. I find it bafflingly organized. I think what they're trying to do already exists at Category:Japanese_counters.

@Zenkaino, could you please have a look at Category:Japanese_counters and see if that might already list up the various entries you appear to be collating at Appendix:Japanese counters? If the Category page already does what you need, we should delete your Appendix page. If the Category page doesn't do what you need, could you please explain what you were trying to accomplish with your Appendix page?

‑‑ Eiríkr Útlendi │Tala við mig 08:19, 16 January 2021 (UTC)

Likewise, I think the Appendix:Jōyō kanji by Kanten degree page might be re-creating content we already have somewhere else. @Suzukaze-c, Huhu9001, TAKASUGI Shinji, other Japanese editors, could you check? My bandwidth lately is very restricted, and I likely won't be able to spend any appreciable time on Wiktionary for a while. ‑‑ Eiríkr Útlendi │Tala við mig 08:23, 16 January 2021 (UTC)

@Eirikr: I made the appendices because I think that conjunction of numbers and counters is necessary for foreigners. Also, I think that we should divide Japanese kanji by Kanten degree. These are just my thought, though. Zenkaino lovelive (talk) 08:37, 16 January 2021 (UTC)

@Zenkaino lovelive: No, understanding and learning numbers and counters is necessary. Simply throwing them together in one place without explanation just creates an incomprehensible wall of Japanese characters. I challenge anyone not fluent in Japanese to figure out what information is being presented. Chuck Entz (talk) 08:57, 16 January 2021 (UTC)

Then what is the best? Zenkaino lovelive (talk) 09:00, 16 January 2021 (UTC)

I personally think that the best location is the entry for the counter itself ([6]), but I can't imagine what header would be good for it. —Suzukaze-c (talk) 09:02, 16 January 2021 (UTC)
I think that adding information (when to use the counters) is the best. BTW, @Suzukaze-c, could you correct my errors in Appendix:Japanese counters? I think that there is something incorrect. Zenkaino lovelive (talk) 09:05, 16 January 2021 (UTC)
Well, both "Classifier" and "Counter" are listed as permissible POS headers at WT:EL. Barring those, I would argue against "Noun", which we use for English measure words, because the classifier system is semantically more like the noun class system in many African languages. I would note that we tend to use "Particle" when we can't think of anything else. If you're talking about the header to use to house a list, that's trickier. They're definitely not "Derived terms" or "Coordinate terms", and none of the nyms fit. I guess we would be stuck with "See also". That said, maybe it would be better to have something along the lines of the subcategories of Category:Chinese nouns by classifier, except with information on the semantic characteristics the classes have in common. Chuck Entz (talk) 02:12, 17 January 2021 (UTC)

What's "Kanten degree"? Can't find it on Google. -- Huhu9001 (talk) 12:13, 16 January 2021 (UTC)

Kanten is "漢字検定". Degree is "級". Zenkaino lovelive (talk) 12:48, 16 January 2021 (UTC)
@Zenkaino lovelive: What? Romaji for 漢字検定 is Kanji Kentei. Where is this Kanten stuff? -- Huhu9001 (talk) 00:41, 17 January 2021 (UTC)
@Huhu9001:漢字検定->漢検->Kanken. I've corrected. BTW, could you correct my errors in Appendix:Japanese counters? Looks like there is something incorrect. Zenkaino lovelive (talk) 00:44, 17 January 2021 (UTC)
  • It is totally incomprehensible and meaningless. I would just delete it. SemperBlotto (talk) 12:19, 16 January 2021 (UTC)
Good intentions and efforts but it's not a very useful page. Delete. --Anatoli T. (обсудить/вклад) 05:02, 17 January 2021 (UTC)
Keep. —Suzukaze-c (talk) 07:02, 17 January 2021 (UTC)
Neutral. -- Huhu9001 (talk) 11:43, 17 January 2021 (UTC)
I think that if I add when to use the counters, the appendix will be useful. Zenkaino lovelive (talk) 05:41, 17 January 2021 (UTC)
I've written all when to use the counters. See: Appendix:Japanese counters. Zenkaino lovelive (talk) 08:02, 17 January 2021 (UTC)


Is the label uncountable applied correctly in the def 1.2.1 ("(uncountable, measure word for livestock and game) A single animal."; usage example: 200 head of cattle) in head? The label seems to be used to indicate that head doesn't take a plural-s. The question arose over German Stück, but could be similarly asked for other German measure words (Glas, Meter, and other units of measurement). How should these forms be classified? --Akletos (talk) 07:54, 18 January 2021 (UTC)

I see it as a plural, not uncountable. "100 million head of cattle are..." Equinox 08:22, 18 January 2021 (UTC)
snow (state of water) is an example for something being uncountable, there's no 1 snow, 2 snows, 3 snows. (Well, in technical language there could exist such a plural (Artenplural or Sortenplural) when there are different types of snows, but then it also has another meaning.) --幽霊四 (talk) 11:18, 18 January 2021 (UTC)
Would it be better to write {{lb|en|singular}} => (in the singular)? Vox Sciurorum (talk) 16:11, 18 January 2021 (UTC)
But Equinox's example shows it to agree with a plural verb. I would call it invariant, ie, plural form = singular form.
Snow is both countable and uncountable: "We only got three significant snows last year.". I suppose one could argue that the countable use is informal. DCDuring (talk) 16:19, 18 January 2021 (UTC)
[[head#Noun]] is a great example of the variety or English plurals and (un)countability. It has a conventional plural and an invariant plural for at least one sense. It has senses that are almost always countable and some that are both countable and uncountable. It is probably a mistake to put (un)countable labels on senses rather than subsenses. I am not so sure about the inflection line either, but we have have (un)countability labels there for single-definition nouns and for multi-definition words that are only one or the other. The prospect of reforming the presentation of countability and uncountability for polysemic terms that have some uncountable and some uncountable definitions seems unlikely to generate a consensus or enthusiasm. DCDuring (talk) 17:48, 18 January 2021 (UTC)

User:Liywy is mass-removing Japanese term written in katakana[edit]

@Eirikr, TAKASUGI Shinji: Hi. User:Liywy is mass-removing Japanese terms written in katakana from translations. Please review. --Anatoli T. (обсудить/вклад) 09:48, 18 January 2021 (UTC)

(copying response here:) I for one agree with Liywy that most of these katakana words can be omitted (perhaps not all of them— but some of these are obviously marginal/marked, like エンゼル and フッテージ). —Suzukaze-c (talk) 09:51, 18 January 2021 (UTC)
Individual terms can be disputed but mass-removing valid terms is bad. Even フッテージ (futtēji, footage) or エンゼル (enzeru, angel) can simply be labelled as rare. --Anatoli T. (обсудить/вклад) 10:37, 18 January 2021 (UTC)
Labelling marked/marginal/rare terms seems like the better way (if the terms do exist, of course). Alternatively, the usual term could give uncommon terms as synonyms, but firstly, that doesn't seem like a better way, and secondly, currently 天使 doesn't give エンゼル or エンジェル (both are in アルゼンチン) as synonyms. --幽霊四 (talk) 11:28, 18 January 2021 (UTC)
The ban seems a bit over-hasty. — Mnemosientje (t · c) 14:56, 18 January 2021 (UTC)
Well, they are given as synonyms for "angel", not for "Argentina" on that oage. You can also view some Japanese dictionary entries: https://ejje.weblio.jp/content/エンゼル and https://kotobank.jp/word/エンジェル-1847#E7.B2.BE.E9.81.B8.E7.89.88.20.E6.97.A5.E6.9C.AC.E5.9B.BD.E8.AA.9E.E5.A4.A7.E8.BE.9E.E5.85.B8 I only mentioned エンゼル as an example that the word does exist on my talk page, it's not the edit that I have undone. --Anatoli T. (обсудить/вклад) 22:25, 18 January 2021 (UTC)
These links are dishonest. Note that for the Weblio link, there are 26 example sentences in total, with the vast majority of them being about baseball and the rest featuring "エンゼル" in a compound word, while the Kotobank link primarily features encyclopedia pages on proper names "Angel", or angel investors, with only one (1) dictionary definition featuring 2 literary quotes from 1891 and 1907, and I encourage everyone to use Google Translate on the latter link to personally verify. —Suzukaze-c (talk) 02:36, 19 January 2021 (UTC)
Deletion of エンゼル" is not the translation that I have reverted. Not sure about dishonesty. The examples given in both dictionaries use are unhelpful for our CFI but the definitions/translations are. --Anatoli T. (обсудить/вклад) 04:16, 19 January 2021 (UTC)
@Atitarev, I also think it was a bit hasty to ban @Liywy, especially when they're a native speaker. If these gairaigo removed by Liywy are anything like their equivalents in Korean (and I freely admit they might not be, since I don't speak Japanese), many of them would be marked or unusual forms—purposeful Anglicisms, to some extent used precisely because of their foreignness—and if so, I see why Liywy might have wanted to remove them from the translation tables.
In addition, many of Liywy's edits were adding non-gairaigo equivalents into translation lists without removing established English loans (1, 2), which I don't see as disruptive. I think this should have been discussed more.--Karaeng Matoaya (talk) 15:16, 18 January 2021 (UTC)
Liywy has removed many attestable terms, I have only undone some of them. The disruptive edits were in the edit-warring that followed on some entries, not the edits themselves. --Anatoli T. (обсудить/вклад) 22:16, 18 January 2021 (UTC)
I can understand why we would want to stop @Liywy from making mass changes until they're discussed, but a sitewide block is overkill. They're not being disruptive in general, but (allegedly) in a very narrow context. I changed it to mainspace-only so they can participate here, and I think we can reduce it or remove it once we understand better what's going on. Chuck Entz (talk) 15:28, 18 January 2021 (UTC)
I have invited User:Liywy to this discussion on their talk page. DCDuring (talk) 17:55, 18 January 2021 (UTC)
A Japanese speaker should try to engage this user (e-mail, other projects, invitation to WT:AJA in case their Babel box accurately characterizes their English skills as non-existent. DCDuring (talk) 18:07, 18 January 2021 (UTC)
Their English is apparently fine judging by what they told me on my talk page.
My block decision was based not on the edits themselves or [opinion] but because of the edit warring. I have reverted some of their removals with an explanation and a link to Google books but they have reverted again with no explanation. --Anatoli T. (обсудить/вклад) 22:10, 18 January 2021 (UTC)
Their validity is highly disputable, and casually presenting them in a translation table on par with common terms is misleading. —Suzukaze-c (talk) 02:38, 19 January 2021 (UTC)
Which particular translations are highly disputable? So far I have reverted removal of Japanese loanwords from English for the following words: discount, pill, boyfriend, girlfriend, rival, sportsman, electronics. The existence and use of valid corresponding katakana words are easy to prove. I haven't gone through all removals, though. --Anatoli T. (обсудить/вклад) 04:16, 19 January 2021 (UTC)

Chagossian Creole[edit]

What could be the code for the Chagossian Creole? --Apisite (talk) 12:33, 19 January 2021 (UTC)

I'd think it's classified as part of the Mauritian Creole on Wiktionary, so mfe. Thadh (talk) 12:43, 19 January 2021 (UTC)
@Thadh: How different are the creoles from each other? --Apisite (talk) 12:54, 19 January 2021 (UTC)
@Apisite: No idea, I'm hearing of Chagossian for the first time; I just noted the probable answer to your question, since Wikipedia (and ISO apparently) classifies it as a dialect of Mauritian. Thadh (talk) 13:06, 19 January 2021 (UTC)

Turkish words derived from Old Turkic[edit]

I have seen Old Turkic used with {{etyl}} as if it were an ancestor of Turkish, but it is not configured to be one in the module data. It is unclear if the intention is to indicate inheritance from Old Turkic, borrowing from Old Turkic, or mentioning of the word as a contemporary cognate of an unattested ancestor. I wonder if Category:Turkish terms derived from Old Turkic and Category:Ottoman Turkish terms derived from Old Turkic should have as many entries as they do, or if the Old Turkic words should generally be listed as cognates. Vox Sciurorum (talk) 19:44, 19 January 2021 (UTC)

@Vox Sciurorum: Quote Allahverdi Verdizade at Wiktionary:Etymology scriptorium/2020/December § Moving Proto-Turkic words on /*g-, *d-/ to /*k-, *t-/: Nişanyan doesn't deal with Proto-Turkic (only Common Turkic) and obviously equals Old Turkic with Common Turkic, and views it as an ancestor of all modern Common Turkic languages […]. He has also said in one of his streams (in a series of youtube-videos called Dilbilim ve etimoloji, if the memory serves) that "Old Turkic is the ancestor of all modern Turkic languages", a view that we of course cannot support. Quote end.
And Nişanyan is the most accessable etymological dictionary of Turkish, so this is why Turkish editors treated the Old Turkic cognates, which Old Turkic words are only in relation to Turkish, barring rare cases were Old Anatolian Turkish may have borrowed from Old Turkic, given by Nişanyan as ancestors. So the derivations are wrong altogether and Old Turkic words are cognates. Fay Freak (talk) 20:15, 19 January 2021 (UTC)
I remove them when I see them, but I haven't gone through all entries systematically. Someone should do it, but it's a hell lot of work, 377 entries. Many Old Turkic cognates are moreover entered in Latin script... Allahverdi Verdizade (talk) 21:48, 19 January 2021 (UTC)
I suspect that Sevan Bey has a different understanding of the term Eski Türkçe (literally “Old Turkish”) than what we call “Old Turkic”. The latter name is a bit confusing; one would expect it to mean “an older version of Turkic”, a progenitor of the various branches of Turkic, instead of referring to merely one among several older versions of Turkic languages.  --Lambiam 01:02, 20 January 2021 (UTC)

Are there any cases where Spanish terms shouldn't have a link to the DLE?[edit]

Many of our entries have links to the DLE using {{R:RAE}} but some don't and I trying to add them whenever I see it missing (e.g. https://en.wiktionary.org/w/index.php?title=byte&diff=prev&oldid=61611329). Is there any good reason to not add this link as long as there is an entry in the DLE to point to for an entry? Seems like this is almost bot-like work and it seems obvious that it should be included without any real human discrimination but maybe there's something I'm missing. Thanks. —Justin (koavf)TCM 08:14, 20 January 2021 (UTC)

I think {{R:RAE}} should normally be used. I expect there are a few entries where it will not match our definition, maybe because there is a new or distinctly American sense that hasn't been recognized (yet) by the authorities in Spain. A bot would need to parse the HTTP response from dle.rae.es to see if it contains a definition, a pointer to an alternative form, or an error. Vox Sciurorum (talk) 15:10, 20 January 2021 (UTC)
This was done several years ago with links in French entries to the TLFI. I believe it was considered a success, even though there are still entries being discovered with bad links. It might help if we could track down discussions on that operation to see what lessons were learned (here is a search for mentions of it in the Wiktionary, Talk and User Talk namespaces). For reference, it was done by @Kc kennylau using User:Kennybot. Chuck Entz (talk) 15:37, 20 January 2021 (UTC)

Interface admin rights[edit]

I'd like to get the interface admin bit to add a bit of a WikiHiero related kludge to common.js. Since I am a bureaucrat, I'm technically able to add the right myself, but I'd like to first make sure that the community would agree with that decision. — surjection??⟩ 20:07, 20 January 2021 (UTC)

Before the introduction of the interface admin right, all admins could edit those pages. I oppose any bureaucracy limiting admins from becoming interface admins, and I would support a measure to give those rights to all admins by default. —Μετάknowledgediscuss/deeds 20:14, 20 January 2021 (UTC)
Seconded. If you don't trust admins' judgement, then why are they admins? Everyone makes mistakes but I think the community here is generally pretty conservative in its editing anyway and tends to stick to areas of competence as it is. —Justin (koavf)TCM 21:40, 20 January 2021 (UTC)
I’ll throw in a third support for extending this to admins in general. The current situation was only intended as a stopgap solution to begin with. Also chiming in with support for Surjection’s IA-ship (and WikiHiero-related changes) in any case. — Vorziblix (talk · contribs) 05:50, 21 January 2021 (UTC)
Makes sense to me. --{{victar|talk}} 08:12, 21 January 2021 (UTC)
It makes sense, and I agree to this individual request of course. But the Wikimedia guys made a general decision that admin rights should not automatically entail interface admin rights. So at least we had this small hurdle of somebody requesting this right so not all who don’t even need it have it, for security reasons. One could only argue that the supposed danger with en.Wiktionary in particular is not that great anyway because there are so few admins anyway so Wiktionary can give it to every admin anyway, but this is difficult to argue if even en.Wiktionary with its now seemingly few editors is in the tops of the most-edited Wikimedia wikis – I don’t know which are the most-edited wikis, when I search this I only get endless articles about the most-edited Wikipedia articles which is of zero interest as I’d like to know the largest wikis by frequency, but at least we have lists for total size, and en.Wiktionary takes the tenth place by total admin count, and that is only an eleventh of English Wikipedia’s total number of admins, so the assumption is likely that we are not allowed to just give interface admin to every admin. Fay Freak (talk) 17:32, 21 January 2021 (UTC)
  • You're too honest, Surjection! I would have just given it to myself, and if anyone noticed and thought I did wrong I would plead insanity. Alexfromiowa (talk) 21:27, 24 January 2021 (UTC)

A book that may interest Wiktionarians[edit]

https://www.npr.org/2021/01/11/955480798/the-liars-dictionary-is-a-clever-delight-for-language-loversJustin (koavf)TCM 09:46, 24 January 2021 (UTC)

A second: https://www.npr.org/2021/01/25/960299623/voice-author-explores-accents-language-and-what-makes-a-tone-sexyJustin (koavf)TCM 04:19, 26 January 2021 (UTC)


I'm not sure whether this should be moved or deleted, but the name is incorrect- it's a Hindi headword template, and it's definitely not ready for prime time. At the very least it should me moved to a subpage of "Module:User:Kushalpok01". Pinging @Kushalpok01, who may not have seen my message on their talk page yet. Chuck Entz (talk) 21:20, 24 January 2021 (UTC)

Category:Requests for quotation/Johnson[edit]

I started working on Category:Requests for quotation/Johnson, and after the first dozen or so, I considered this a useless pursuit. For all those I checked, the only Johnson quotations supposedly requested were actually from Samuel Johnson's dictionary. Perhaps Samuel Johnson in his day was requesting quotations? Perhaps whoever first added {{rfquotek|Johnson}} was confused and wanted to add a reference tag instead, like "hey, this term appears in SJ's dictionary!"? Even so, I kinda just wanna delete all instances of {{rfquotek|Johnson}}. Unless someone gives me a good reason not to (specifically the Johnson rfqs), that's exactly what I propose. Alexfromiowa (talk) 21:23, 24 January 2021 (UTC)

  • I think these are words that were known only from Johnson's dictionary. At least two distinct people added the request. For example, @Equinox when importing opprobry from Webster's 1913. But acervation was created long ago with "[R] Johnson" at the end of the definition and years later converted to have a request for quotation. Vox Sciurorum (talk) 21:44, 24 January 2021 (UTC)
    • Another option not to be discarded is that Samuel Johnson was trolling us with these entries, like in that recent lexi-book, or like approximately 0.2% of Wonderfool's edits. Alexfromiowa (talk) 23:41, 24 January 2021 (UTC)
Yes, they are references to entries in Johnson's dictionary. Some might have quotations there though. Equinox 09:10, 25 January 2021 (UTC)

Close WT:TRREQ?[edit]

I just took a moment to look at WT:TRREQ. Of the roughly 90 requests on the page, only one has actually been answered. We even have this BJAODN-worthy request, which I guess I am "answering" by means of this BP post.

The incoming traffic presumably is coming from this Quora answer with more than a million views (it is the fourth or fifth result for this Google search if you can't read it via the direct link).

Since nobody is actually translating anything on this page anymore, I'd like to suggest marking the page archived, closed, historical, or whatever the practice is on this site. It is pretty misleading to give people an expectation that their phrase will ever be translated, when in fact only 1 or 2 percent of requests are fulfilled. This, that and the other (talk) 04:11, 25 January 2021 (UTC)

If we retire this and if it also has a lot of incoming traffic, we should have a big banner at the top explaining that it was retired and why. Maybe even edit-protect the page and offer alternative suggestions. Frankly, I'm surprised that anyone would even come here to ask how to translate from English to Spanish "The girls that were following u yesterday was trying to fight u" since this is a phrase that online translators should easily be able to resolve. —Justin (koavf)TCM 05:10, 25 January 2021 (UTC)
Might as well. It was mostly Stephen G Brown who used to fulfil the requests. Machine translation such as Google's free service has improved a lot in recent years too. Equinox 05:13, 25 January 2021 (UTC)
I also support closure. Koavf's plan (edit-protection with an explanatory banner) sounds like the right call. —Μετάknowledgediscuss/deeds 05:23, 25 January 2021 (UTC)
Ditto. Let's close it. --Anatoli T. (обсудить/вклад) 05:40, 25 January 2021 (UTC)
Symbol support vote.svg Support PUC – 12:06, 25 January 2021 (UTC)
I'll add that I support closing this page because I don't see how it relates to the work we do here. We're not a forum for learning languages or for providing free translation services. If it helped filling up gaps in our coverage (in our translation tables, for example) I would keep it, but it clearly doesn't do that. PUC – 22:51, 25 January 2021 (UTC)
Symbol support vote.svg Support closure, the addition of a banner explaining what and why, and edit protection of the page. ‑‑ Eiríkr Útlendi │Tala við mig 19:15, 25 January 2021 (UTC)
Symbol support vote.svg Support closing with a banner. Andrew Sheedy (talk) 21:57, 25 January 2021 (UTC)
Why not start translating the requests again (which would be the obvious solution) instead of closing? J3133 (talk) 12:24, 25 January 2021 (UTC)
Because 95% of the requests are utterly inane. I used to follow that page and provide translations wherever I could, but I eventually got fed up with the ridiculousness of the requests. —Mahāgaja · talk 13:19, 25 January 2021 (UTC)
  • Symbol oppose vote.svg Oppose resuming translation. Ditto Mahāgaja's comment. Most of the requests are patently garbage. Many aren't even understandable in the source text. I have better things to do with my time. Plus, as noted above, machine translation has improved to the point of at least marginal usability, so the value of us even having this page is quite low (at least, for the kinds of requests we've been getting). ‑‑ Eiríkr Útlendi │Tala við mig 19:15, 25 January 2021 (UTC)
    @Eirikr: I'm pretty sure based on your comments that you intended to support, rather than oppose. —Μετάknowledgediscuss/deeds 20:07, 25 January 2021 (UTC)
I think he means to oppose J3133's suggestion. But either way, I think you would be willing to support the above proposal, Eirikr? Andrew Sheedy (talk) 21:57, 25 January 2021 (UTC)
  • OK, I added a lame sentence to Wiktionary:Translation requests/header. If I were admin I'd just protect the page, thereby avoiding all future translations of some random bullshit Alexfromiowa (talk) 22:19, 25 January 2021 (UTC)
  • Symbol support vote.svg Support closure. At this point AI has taken over, and we can make use of the occasion to call for learning target languages as for example by help of the dictionary in so far as that is not true. Fay Freak (talk) 22:20, 25 January 2021 (UTC)
    Long live AI! We just need it to be able to add definitions to words, and we can all go and spend our time outside getting tanned. Alexfromiowa (talk) 22:23, 25 January 2021 (UTC)
    Yes! While at it, the AI should also create the matching quotation material, Devil's dictionary style. – Jberkel 23:25, 25 January 2021 (UTC)
  • I've now closed and edit-protected it per the community consensus. —Μετάknowledgediscuss/deeds 19:02, 2 February 2021 (UTC)

Adding surface analyses to fill in gaps in suffix categories[edit]

Most of the suffix category pages (e.g. Category:English_words_suffixed_with_-ation) are missing many words that could be placed in those categories. I have an interest in making those categories as complete as possible, so I have started adding suffix templates where appropriate.

For example, I recently added

"Morphologically combine +‎ -ation"

at the end of the Etymology section of the combination page.

I eventually hope to be making many such edits, so I wanted to drop a note here in case anyone had any comments or objections.

Jonathanbratt (talk) 22:38, 25 January 2021 (UTC)

This type of information is more common in Romance entries for some reason. We typically use “corresponding to ... + ...” or “equivalent to ... + ...”. — Ungoliant (falai) 22:44, 25 January 2021 (UTC)
As long as you are analyzing your edits by hand and not just automatedly adding the categories to any word ending in "ation" I have no problem with it. DTLHS (talk) 22:48, 25 January 2021 (UTC)
I object. Doing so clutters up a category that should show actual historical derivation (and mostly does so) with John-come-lately morphological reanalysis. DCDuring (talk) 22:49, 25 January 2021 (UTC)
I am indeed analyzing by hand, though I have been submitting the edits via the API to streamline the process. And I am happy to use "Equivalent to..." or whatever preface is preferred. I've seen several varieties in use, and don't have a strong opinion about it. I sympathize with the desire to avoid cluttering up the Etymology section with "surface" morphological reanalyses, but there is also value in providing such surface analyses. If there were a section besides Etymology to place these edits in, I would be happy to make them there. Jonathanbratt (talk) 23:00, 25 January 2021 (UTC)
Would combine (/komˈbn/) + -ation really give combination (/kombɪˈneɪʃən/) and not *combination (/komˈbneɪʃən/)? --幽霊四 (talk) 23:07, 25 January 2021 (UTC)
Also, regarding historical etymology vs. surface analysis, I had based my approach on the guidelines at Wiktionary:Etymology#Surface_etymologies. Jonathanbratt (talk) 23:21, 25 January 2021 (UTC)
I agree with DCDuring. I'm not against showing surface etymologies, but I think they should use |nocat=1 in the affix template. Ultimateria (talk) 04:07, 26 January 2021 (UTC)
It probably should be discussed if |nocat= is to be made mandatory. I don't think everyone would agree with that. --{{victar|talk}} 05:10, 26 January 2021 (UTC)
I'm ok with using |nocat=1 in the templates, if that's the preferred approach. But if it is, I suggest that the guidelines (linked above) be updated accordingly. Jonathanbratt (talk) 05:05, 26 January 2021 (UTC)
The existence of a productive suffix facilitates the borrowing of foreign terms that show the respective equivalent in the foreign language, that's a self-reinforcing process. So this productive suffix plays a major role in the borrowing of terms as you can see e.g. in the way these words are pronounced. You can't explain the pronunciation of -ation in combination without taking -ation into account. A parallel case are compounds; even if there was a ME predecessor of barnyard, the word's still a compound of barn+yard and should be categorised as such. --Akletos (talk) 07:20, 26 January 2021 (UTC)
I think we need to have clean diachronic (historical) etymological derivation categories. If the cost of that is having synchronic (morphological) categories also, that would be fine with me. It might take quite a while before the two are properly populated, but this is a giant work in progress anyway. DCDuring (talk) 17:09, 26 January 2021 (UTC)

Add a note on pages with Lua memory errors[edit]

Which would link to Wiktionary:Lua memory errors, in order that readers be informed about this problem and that it is known (i.e., we are aware of it; see Wiktionary:Feedback § a, § i). J3133 (talk) 12:06, 26 January 2021 (UTC)

Wiktionary:Grease pit/2021/February § Lua error: not enough memory, Talk:a § Lua error. J3133 (talk) 19:50, 8 February 2021 (UTC)

Deprecating bbz[edit]

Babalia Creole Arabic language is considered a spurious language, and its ISO 639 code bbz was retired in 2019. Can we deprecate it at WT:LT and delete the code from Module:languages/data3/b? —Mahāgaja · talk 18:17, 26 January 2021 (UTC)

@Mahagaja: Did we miss a set of code changes? We should probably be more on top of that... —Μετάknowledgediscuss/deeds 18:30, 26 January 2021 (UTC)
@Metaknowledge: I dunno; I didn't check the others at w:Spurious languages#Retired 2019. —Mahāgaja · talk 18:37, 26 January 2021 (UTC)
Of those, we still seem to be using ayy, bbz, cca, lmz, tbb. —Mahāgaja · talk 18:50, 26 January 2021 (UTC)
I noted some of these (e.g. lmz, tbb) at Wiktionary:Beer parlour/2020/October#2019-2020_ISO_code_changes, but missed bbz; I wonder whether it was part of a different set of changes? (Or I just missed it.) I held off on retiring "lmz" (and evidently no-one else did anything, either) because there's been controversy over recognizing the Lumbee as an Indian People or Tribe, and I wanted to research whether controversy over recognizing a Lumbee lect was connected to that; it appears no Lumbee language is attested, and w:Lumbee#Language asserts that one never existed. I suppose we could retire the code and later re-add it as an ety-only code if there are words in Lumbee English that sources have suggested might derive from "Lumbee". - -sche (discuss) 23:24, 8 February 2021 (UTC)

Unnecessary sidebar links[edit]

I think that the "Create a book" and "Download as PDF" links in the sidebar are pretty useless for a dictionary. Any objections to just removing these? --Yair rand (talk) 06:34, 27 January 2021 (UTC)

The PDF one might be very useful for somebody using an e-reader for offline reading (or maybe printing it out; Web pages don't print well). Certainly at least get usage statistics before killing the working feature. Equinox 06:44, 27 January 2021 (UTC)
I'm not sure those stats are available anywhere, unfortunately. It strikes me as unlikely that one would save a dictionary entry for offline reading in the first place, though. --Yair rand (talk) 07:07, 27 January 2021 (UTC)
What's the harm in leaving 'Download as PDF'? This seems like a solution in search of a problem. —Μετάknowledgediscuss/deeds 07:10, 27 January 2021 (UTC)
Do we get any specific benefit from removing either of these or is it just a rage for tidiness? DCDuring (talk) 14:44, 27 January 2021 (UTC)
The latter, kinda. Having unnecessary components of the interface is bad for the user experience. --Yair rand (talk) 02:10, 28 January 2021 (UTC)
If it is a question of whether it is worth the effort to determine whether it is worth repairing the Book gadget, I could see the point of putting the matter to a vote or asking whether anyone would be willing to work on it. The pdf creator doesn't seem to have any problem. The book gadget might be useful to create first drafts of specialized glossaries, though I don't know of anyone trying to do such a thing. Also there is a supposed 'partner' that would print up such books. Do they still offer the service? Would they be willing to work to repair, enhance, or replace the gadget? DCDuring (talk) 21:48, 28 January 2021 (UTC)
I am against a removal like this one proposes. Some entries have multiple definitions or have a lot of example sentences. Someone might want to use this information in an offline presentation like a 4-H meeting or some kind of small gathering, or maybe want something to read or confer with while deprived of internet access on a plane or on a bus or something like that. I remember downloading a pdf of a Wikipedia page or two a long time ago. However, I think it would be good to get usage stats if they can be gotten. If really no one is using it, then I think trashing it becomes very reasonable. --Geographyinitiative (talk) 22:10, 28 January 2021 (UTC)
I have no objection to removing "create a book". "Download as PDF" might be useful, as described above. We could see if the devs could give us stats on how often the features are used. - -sche (discuss) 02:20, 9 February 2021 (UTC)

Moving Wikimania 2021 to a Virtual Event[edit]

Wikimania's logo.

Hello. Apologies if you are not reading this message in your native language. Please help translate to your language. Thank you!

Wikimania will be a virtual event this year, and hosted by a wide group of community members. Whenever the next in-person large gathering is possible again, the ESEAP Core Organizing Team will be in charge of it. Stay tuned for more information about how you can get involved in the planning process and other aspects of the event. Please read the longer version of this announcement on wikimedia-l.

ESEAP Core Organizing Team, Wikimania Steering Committee, Wikimedia Foundation Events Team, 15:16, 27 January 2021 (UTC)

Arabic script isolated forms[edit]

Should (U+FE9D "Arabic letter jeem isolated form") be a hard redirect to ج (U+062C "Arabic letter jeem")? The latter page lists U+FE9D among the possible forms. There may be others. This is one I was confused by when I cut-and-pasted a character that was a non-combining form. (@Atitarev) Vox Sciurorum (talk) 21:00, 27 January 2021 (UTC)

Yes, all the special isolated, initial, medial, and final forms should be redirected. —Μετάknowledgediscuss/deeds 21:08, 27 January 2021 (UTC)


Is it allowed to use this template directly under an etymology heading to point to an etymology rather than to a specific sense? If not, is there any other template for that? — surjection??⟩ 22:18, 27 January 2021 (UTC)

  • Use {{senseid|tag=p}}. Vox Sciurorum (talk) 23:24, 27 January 2021 (UTC)
    Maybe it'd be worth it to have a separate {{etyid}} template, even if internally it just uses {{senseid}}? — surjection??⟩ 10:17, 30 January 2021 (UTC)
  • @Surjection, do you have any examples? I've used {{anchor}} for incoming links to various parts of an entry, and then formatted the targeting links accordingly. Not sure I'm accurately understanding the use case you're describing? ‑‑ Eiríkr Útlendi │Tala við mig 01:29, 28 January 2021 (UTC)
    I'm planning to add a whole bunch of |id= tags to disambiguate affix/compound etymologies so that it's possible even for machines to tell e.g. which kuusi the one in lehtikuusi is referring to. In some cases, it isn't clear which exact sense is being meant, even if the etymology is clear (and I prefer the browser jumping to the etymology rather than directly to the definition anyway). — surjection??⟩ 08:29, 28 January 2021 (UTC)

Project Grant Open Call[edit]

This is the announcement for the Project Grants program open call that started on January 11, with the submission deadline of February 10, 2021.
This first open call will be focussed on Community Organizing proposals. A second open call focused on research and software proposals is scheduled from February 15 with a submission deadline of March 16, 2021.

For the Round 1 open call, we invite you to propose grant applications that fall under community development and organizing (offline and online) categories. Project Grant funds are available to support individuals, groups, and organizations to implement new experiments and proven ideas, from organizing a better process on your wiki, coordinating a campaign or editathon series to providing other support for community building. We offer the following resources to help you plan your project and complete a grant proposal:

Program officers are also available to offer individualized proposal support upon request. Contact us if you would like feedback or more information.

We are excited to see your grant ideas that will support our community and make an impact on the future of Wikimedia projects. Put your idea into motion, and submit your proposal by February 10, 2021!

Please feel free to get in touch with questions about getting started with your grant application, or about serving on the Project Grants Committee. Contact us at projectgrantsTemplate:atwikimedia.org. Please help us translate this message to your local language. MediaWiki message delivery (talk) 08:01, 28 January 2021 (UTC)

PageNotice extension again[edit]

In my reading of the matter, the community intends that the {{reconstruction}} template currently transcluded at the top of every page in the Reconstruction: namespace be displayed automatically using the PageNotice extension. See Wiktionary:Beer parlour/2019/January#PageNotice extension.

In order for this to be done, evidence of community consensus needs to be shown to Wikimedia developers. This doesn't need to be a formal WT:VOTE, but can be a discussion or vote here showing that the community is in favour. Pinging @Erutuon and @Victar.

This, that and the other (talk) 09:48, 29 January 2021 (UTC)

AWB access redux[edit]

I'm planning to do a whole bunch of edits using AWB; to avoid clogging up my contribs, I've created an alt account (User:Citrarta); could that account have AWB access? Hazarasp (parlement · werkis) 01:59, 30 January 2021 (UTC)

@Hazarasp: Yes check.svg Done. Will you need a flood flag on that account? Also, if you plan on bulk edits, have you considered starting a vote to get a bot flag on that account? —Μετάknowledgediscuss/deeds 02:17, 30 January 2021 (UTC)
If all goes smoothly with the first batch of edits I do, I'll consider both of those (I don't want to enable the flood flag if the edits end up being prone to mistakes) Hazarasp (parlement · werkis) 02:43, 30 January 2021 (UTC)
@Metaknowledge Probably going to make another bunch of edits with AWB; it'd be good if the flood flag was enabled on User:Citrarta. Hazarasp (parlement · werkis) 22:56, 8 February 2021 (UTC)
@Hazarasp I noticed that no one had done anything about the flood flag, so I went ahead and took care of it. Chuck Entz (talk) 04:47, 9 February 2021 (UTC)
@Hazarasp: Note that pings only work if they are added in the same edit as your signature. —Μετάknowledgediscuss/deeds 05:11, 9 February 2021 (UTC)

Merging of U+34A8 㒨 and U+20457 𠑗[edit]

The character 20457 𠑗 appears to be a duplicate of 34A8 , but there is a page for each of them, with different references in Kangxi Zidian (where supposedly only 20457 𠑗 is present but 34A8 㒨 would be on the next page) and Hanyu Dazidian (where 34A8 㒨 comes immediately before 20457 𠑗). I do not have a copy of HYDZD with me, and while Wiktionary supposedly uses the first edition, all online resources I have seen so far point to the second edition. Could someone with a copy confirm that both these characters (if they are different at all) appear?

The Chinese Text Project has also normalized 20457 𠑗 to 34A8 㒨, but I am not certain of its authoritativeness.

Compare the duplicates 3DB7 and 2420E 𤈎 which both use one page with a note that 2420E 𤈎 was encoded as a duplicate by mistake.

If there is no evidence that 34A8 㒨 and 20457 𠑗 are different characters in HYDZD, I propose that these two pages get merged into one. OosakaNoOusama (talk) 00:06, 1 February 2021 (UTC)

@OosakaNoOusama: The Hanyu Dazidian data is according to Unicode's Unihan Database. —Suzukaze-c (talk) 19:09, 2 February 2021 (UTC)
As for duplicate-ness, GlyphWiki seems to be treating them differently (different default [un-suffixed] glyphs). —Suzukaze-c (talk) 19:11, 2 February 2021 (UTC)
@OosakaNoOusama: While this blog by Andrew West does say the two characters are unifiable, characters 5 and 6 in HYDZD, volume 1, page 239 are different. Character 5 does not have 囟 in between 𦥑, but something similar to 同 with 丿, except the 口 is stuck to the left (sorry for the bad description). It seems like this character was removed from in the second edition of HYDZD, though. — justin(r)leung (t...) | c=› } 21:09, 2 February 2021 (UTC)
This is how it looks. — justin(r)leung (t...) | c=› } 01:03, 3 February 2021 (UTC)
FWIW, Wikipedia states: "U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8."  --Lambiam 11:19, 4 February 2021 (UTC)

February 2021

Splitting WT:RFVN[edit]

This page is getting close to unworkable due to its size. It takes a long time to make any edits to it. I think we need to split it by month, or maybe by language family. Benwing2 (talk) 06:26, 1 February 2021 (UTC)

Latin script and non-Latin script?--Karaeng Matoaya (talk) 06:31, 1 February 2021 (UTC)
I think language family is a good idea, or even just geographical area, e.g. /European_languages, /African_languages, /Asian_languages, /Oceanian_languages, /indigenous_American_languages. —Mahāgaja · talk 10:43, 1 February 2021 (UTC)
Support by language family, absolutely not by geography. DTLHS (talk) 18:04, 1 February 2021 (UTC)
OK, except there really shouldn't be 150 different subpages. I'd support making subpages for 4 or 5 of the largest families (e.g. Indo-European, Afroasiatic, Sino-Tibetan, Austronesian) and then having one subpage for everything else. —Mahāgaja · talk 20:01, 1 February 2021 (UTC)
I suggested spliting RFD last year: Wiktionary:Beer_parlour/2020/October#Splitting_RFD_non-English. Vox Sciurorum (talk) 17:57, 1 February 2021 (UTC)
The real solution is to close old RFVs, which is a task that often needs to be left to specialists. Let's make an effort to clean it out, by doing what we can and pinging knowledgeable people for what we can't, and then reassess. —Μετάknowledgediscuss/deeds 19:22, 1 February 2021 (UTC)
I don't think language family is a good way to do it. We should look at the natural divisions in the community, regardless of genetic relationships. For instance, the CJKV languages are all unrelated to each other, but there's lots of overlap (at least for the CJK part). What's more, the communities for those languages have their own way of doing things and it's very hard for anyone who doesn't have a background in those languages to contribute anything useful to the discussions and workflow. If you think about it, the CJKV part of RFV is already separate for all practical purposes because almost no one outside of the CJKV communities can even understand what the discussion is about, let alone contribute to it. I think splitting off CJKV would make a substantial dent in the overload.
There are probably a couple of other other natural divisions: the Turkic, Iranian and Arabic languages are also unrelated and also have a considerable amount of overlap, and then there's South Asia. I'm not sure if those have enough volume to make much of a difference though. Of course, there are languages that have ties to more than one area- Urdu is both a Middle Eastern and a South Asian language, for instance. Even so, that kind of thing happens with just about any criteria you might use. Chuck Entz (talk) 04:18, 2 February 2021 (UTC)
And what’s with threaded discussions as on User talk:Rua? Isn’t en.Wikipedia using similar for frequently edited project pages, apart from cutting them into multiple pages? Fay Freak (talk) 18:38, 2 February 2021 (UTC)
Splitting is unavoidable. But maybe we could also start closing old entries that have not reached a consensus while leaving the RFV notice in the actual entry so that readers know the validity of the word has been contested? This is somewhat similar to the system that some wikis employ that shows the last verified version of an article. Dixtosa (talk) 17:57, 4 February 2021 (UTC)
Related discussion: Wiktionary:Grease pit/2021/February § Out of memory!. J3133 (talk) 06:13, 5 February 2021 (UTC)
I agree with User:Chuck Entz here. From looking at the recent entries I think we could make things a lot better just by making two splits: (1) CJK languages, (2) Italic (i.e. Latin + Romance languages). There's no reason we need to split everything at once; we can proceed in several stages as needed. Splitting in a way that lines up with communities helps minimize the number of different pages that need monitoring. Benwing2 (talk) 23:33, 7 February 2021 (UTC)
@Benwing2, Chuck Entz, Fay Freak, Mahagaja, Karaeng Matoaya, DTLHS, Metaknowledge, Dixtosa: So, how many groups are proposed?
Perhaps, we could suggest (for a vote) just the three for now: 1. CJK (minus roman-based Vietnamese, just CJK) + the whole of Sino-Tibetan. 2. Everything Roman-based, 3. Everything in non-Roman scripts. Potentially, Cyrillic, Greek, Armenian and Georgian script languages to be grouped together with the Roman script languages. Need to make sure that each group has enough languages and users, though. What do you think? --Anatoli T. (обсудить/вклад) 00:06, 8 February 2021 (UTC)
@Atitarev: I think this is so finely arbitrary and unambiguous (save Serbo-Croatian) that nobody wants to disagree because of deeming it POV. It is a bit unnatural though, to separate Polish and Ukrainian etc., and Turkic languages (I guess you mean Turkmen will be at the Roman side while Kazakh will be at the Cyrillic side). Fay Freak (talk) 00:36, 8 February 2021 (UTC)
@Fay Freak: Yes, it is arbitrary and this is just a discussion. We're open for other suggestions. Do you have any? I also suggested possibly grouping easy (?) scripts like Cyrillic and Greek together with the Roman-based languages (+ Armenian and Georgian but less sure about this part).
For a larger number of users and editors a foreign script is a hurdle they won't even try to overcome and skip/ignore such words, even if texts in Roman scripts can be full of diacritics. So yes, for them Polish would be OK but not Ukrainian. It is not my opinion. --Anatoli T. (обсудить/вклад) 00:54, 8 February 2021 (UTC)
(edit conflict) Turkish is in Roman script, Ottoman Turkish is in Arabic script, and most of the rest of Turkic is in Cyrillic. Similar problems with Javanese. Then there are the Gothic and the Italic scripts alongside all the Roman-script German and Latin/Romance languages. Greek really shouldn't be separated from all the European Roman-script languages, nor should the Slavic languages be separated by script (what would you do with Serbian?), so including Greek and Cyrillic with Roman scripts is not optional. Southeast Asia is likely to be a train wreck: Burmese and Tibetan would be lumped with CJK, while Vietnamese would be lumped with Spanish and Lithuanian, and Thai and Khmer would be lumped with Arabic and Ethiopian. Chuck Entz (talk) 01:15, 8 February 2021 (UTC)
An interesting project might be to look at the revision history of RFVN and figure out who contributes to discussions on which languages, then which groupings of languages have the largest numbers of contributors in common. IMO it's all about overlapping of expertise.
I think that any global criterion is going to fail spectacularly in some cases. There's no way we can split all the languages of the world cleanly at one go. Let's find a large grouping that seems natural and split it off. Later, after discussion, we can find another one and split it off, etc. Whatever we do, it should be split by language codes so a module can generate a link to the correct one without some IP or first-time logged-in user having to read up on the descriptions of all the different choices. Chuck Entz (talk) 02:11, 8 February 2021 (UTC)
Just looking at the existing entries in WT:RFVN, I bet it would be sufficient, at least at first, to split off CJK (with "C" construed broadly to contain all Sinitic languages but not the rest of Sino-Tibetan) and leave everything else in the main group. If that turns out not to be enough, we can discuss another split later. —Mahāgaja · talk 07:35, 8 February 2021 (UTC)
I agree that grouping non-Sinitic Sino-Tibetan with CJK would not be of benefit, since mostly the contributors to these languages (save perhaps Tibetan and Burmese) aren't CJK editors.
Perhaps another - or a complementary - solution would be to split LDLs from WDLs, any thoughts on that? LDL RFVs generally have less to no imput, so the difference in the number of languages shouldn't be an issue. Thadh (talk) 21:57, 8 February 2021 (UTC)
@Thadh: That might be hard to implement given that Chinese (among other languages) is only partially well-documented (only Standard Chinese is considered well-documented). — justin(r)leung (t...) | c=› } 02:26, 11 February 2021 (UTC)
@Mahagaja, Chuck Entz, Dixtosa, Metaknowledge, Vox Sciurorum, Fay Freak, Atitarev This discussion seems to be petering out. I would like to poll people to see what people think about the following two splits: (1) CJK, i.e. all varieties of Chinese, Japonic (= Japanese + Ryukyuan), Korean only, not including other Sino-Tibetan languages; (2) Latin + Romance. Please respond to the following ("support" means you would like to see the relevant split happen now; "oppose" means you would not like it to happen now, not committing yourself one way or another to a later split): Benwing2 (talk) 03:13, 14 February 2021 (UTC)
@Benwing2: Thanks, I have voted for (1). Please give more detail re (2) Latin + Romance, which languages or groups are included, any exceptions for languages written in multiple scripts? --Anatoli T. (обсудить/вклад) 03:20, 14 February 2021 (UTC)
@Atitarev To me, Latin + Romance simply means Latin + all Romance languages. It's rare to have Romance languages written in any script but Latin letters, although it occasionally happened with Mozarabic and maybe Ladino. I don't think we should make an exception for such cases; they are very rare in any case. I don't have any opinion as to whether we should include other Italic languages (Oscan, Umbrian, Faliscan). Benwing2 (talk) 03:46, 14 February 2021 (UTC)
@Benwing2: Yes, we already have over 200 Ladino entries written in the Hebrew script. The other non-Latin script Romance language that occurs to me is Romanian in Cyrillic, which was common first in the 19th century, then in the Moldavian SSR and up to today in Transnistria. —Mahāgaja · talk 07:13, 14 February 2021 (UTC)

Option 1: Split off CJK from WT:RFVN now (support/oppose)[edit]

  • Support. Benwing2 (talk) 03:13, 14 February 2021 (UTC)
  • Support.--Tibidibi (talk) 03:17, 14 February 2021 (UTC)
  • Support --Anatoli T. (обсудить/вклад) 03:20, 14 February 2021 (UTC)
  • Support Chuck Entz (talk) 05:51, 14 February 2021 (UTC)
  • SupportMahāgaja · talk 07:13, 14 February 2021 (UTC)
  • Support. Thadh (talk) 11:32, 14 February 2021 (UTC)
    I think Meta is right; Abstain. Thadh (talk) 09:42, 15 February 2021 (UTC)
  • Support. Vox Sciurorum (talk) 13:31, 14 February 2021 (UTC)
    OK, there seems to be consensus for this option. I think in a couple of days we can split the page, if that's OK with everyone. I'm thinking the abbreviation for the new page should be called WT:RFVCJK (even though this is a bit of a consonant soup); we can also create WT:RFV/CJK as another alias that's a bit more readable. What about the full page name? Should it be just Wiktionary:Requests for verification/CJK (CJK is a well-known abbreviation, see the Wikipedia entry on CJK characters), or spelled out as Wiktionary:Requests for verification/Chinese-Japanese-Korean? Also, should WT:RFVN (full name Wiktionary:Requests for verification/Non-English) remain as such or be renamed? My thought is to leave it as-is for now and just update the text at the top of the page describing what it does; we can rename it later if need be. An alternative is to call it something like Wiktionary:Requests for verification/Miscellaneous or Wiktionary:Requests for verification/Misc-Non-English. Benwing2 (talk) 21:25, 14 February 2021 (UTC)
    I think the actual sub-page name should be "CJK", but the header at the top of the page and any references to it in running text (backed up with redirects) should be something like "Chinese, Japanese and Korean"- in other words, the reverse of the usual shortcut/full-name relationship. That's because the URLs are starting to get fairly long. Not that I'm dogmatic about that- it should be whatever everybody thinks will seem most clear and natural.
    I don't think anyone has mentioned the Ryukyuan languages and Ainu, but I think they should be included in the new page because they're usually in the same writing systems as the CJK languages. Dungan is another edge case, but it's really more a dialect of Mandarin that has developed out of the Sinosphere in a way that's analogous to Urdu and Hindi, but with Cyrillic writing and influenced by completely different neighboring languages.
    As for RFVN: it's still the page for most non-English pages, and the nature of what's left after the split is the same as it was before- we're better off leaving it alone for the time being. The CJK page is really a subpage of it, but spelling that out risks setting a precedent for some really long page names and URLs.
    On an unrelated note, we might think about splitting off Reconstruction entries, as well. They're a completely different animal from mainspace entries as far as the request pages go- they're really more related to the Etymology scriptorium. Chuck Entz (talk) 22:20, 14 February 2021 (UTC)
    @Chuck Entz FWIW, I did mention above that Ryukyuan languages should be included in CJK; my reasoning was the same as yours, that they are written with Japanese characters. I agree with you about Ainu as well. For Dungan, I dunno but it will be fairly rare so it's not clear it matters that much. Benwing2 (talk) 01:02, 15 February 2021 (UTC)
    @Benwing2: I find this whole process to be overly rushed. Do any of the people who voted even regularly help in closing CJK RFVs? Claiming that there is consensus after one day of voting without getting any input from the people who will actually do the work is not the way forward. Let's see what someone like @Justinrleung has to say before making a major change. —Μετάknowledgediscuss/deeds 03:43, 15 February 2021 (UTC)
    @Metaknowledge Fine. Pinging Chinese editors (Notifying Atitarev, Tooironic, Suzukaze-c, Justinrleung, Mar vin kaiser, Geographyinitiative, RcAlex36, The dog2, Frigoris, 沈澄心, 恨国党非蠢即坏, Michael Ly): , Japanese editors (Notifying Eirikr, TAKASUGI Shinji, Atitarev, Suzukaze-c, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233): , Korean editors (Notifying TAKASUGI Shinji, Atitarev, HappyMidnight, LoutK, Karaeng Matoaya, B2V22BHARAT, Quadmix77): . Apologies for the spam. Benwing2 (talk) 06:52, 15 February 2021 (UTC)
    I don't typically close RFVs, but sure, I don't mind splitting this off to a separate page. Makes things easier to navigate. The dog2 (talk) 07:25, 15 February 2021 (UTC)
    I doubt whether this really solves the problem. This splitting seems not stopping the page from growing to an unworkable volumn again. -- Huhu9001 (talk) 09:04, 15 February 2021 (UTC)
  • Abstain. ---> Tooironic (talk) 11:08, 15 February 2021 (UTC)
  • Support, but we really should split it by month soon. — TAKASUGI Shinji (talk) 11:19, 15 February 2021 (UTC)
  • Abstain I don't see the 'big picture' well enough to give a productive viewpoint on this issue. --Geographyinitiative (talk) 12:49, 15 February 2021 (UTC)
  • Symbol support vote.svg SupportSuzukaze-c (talk) 12:53, 15 February 2021 (UTC)
  • Symbol support vote.svg Support RcAlex36 (talk) 12:56, 15 February 2021 (UTC)
  • Symbol support vote.svg Support Shen233 (talk) 14:33, 15 February 2021 (UTC)
  • Symbol support vote.svg Support just to make the pages more navigable. — justin(r)leung (t...) | c=› } 15:17, 15 February 2021 (UTC)
  • Abstain. Cnilep (talk) 22:52, 15 February 2021 (UTC)
  • Symbol support vote.svg Support. -- 06:10, 16 February 2021 (UTC)
@Metaknowledge Are you now satisfied that there is enough support to split? Benwing2 (talk) 02:32, 21 February 2021 (UTC)
@Benwing2: Of course. But you didn't need to satisfy me — my point was that you needed to satisfy the people who will actually be doing the work. —Μετάknowledgediscuss/deeds 02:51, 21 February 2021 (UTC)

Option 2: Split off Latin+Romance from WT:RFVN now (support/oppose)[edit]

  • Support. Benwing2 (talk) 03:13, 14 February 2021 (UTC)
  • Support.--Tibidibi (talk) 03:17, 14 February 2021 (UTC)
  • Support --Anatoli T. (обсудить/вклад) 03:55, 14 February 2021 (UTC)
  • Oppose. —Μετάknowledgediscuss/deeds 05:21, 14 February 2021 (UTC)
  • Abstain. Not sure if this is a natural way to split, or as urgent as the above. I'd also like to hear from the community- you know, the people who are doing the actual work... Chuck Entz (talk) 05:51, 14 February 2021 (UTC)
  • Abstain. I suspect splitting off CJK will be sufficient to solve the memory problems. If not, we can do Latin+Romance later. —Mahāgaja · talk 07:13, 14 February 2021 (UTC)
  • Oppose. Thadh (talk) 11:32, 14 February 2021 (UTC)
  • Abstain, agreeing with Mahāgaja. Vox Sciurorum (talk) 13:32, 14 February 2021 (UTC)
  • Support, since these (well, also German) are pretty well the only languages in RFV that I'm able to help out with. When I occasionally peruse RFVN, these are the languages I keep an eye out for. It's also a pretty natural grouping. However, I don't feel strongly either way (and I would oppose if the split makes it harder for other users who contribute to RFV more than I do). Andrew Sheedy (talk) 01:26, 15 February 2021 (UTC)
  • Abstain. ---> Tooironic (talk) 11:07, 15 February 2021 (UTC)
  • Abstain. — justin(r)leung (t...) | c=› } 15:18, 15 February 2021 (UTC)

Hyphens for Korean affixes on Wiktionary.[edit]

I want to make a request about the addition of hyphen Korean affixes. The removal of such hyphens has previously been discussed in Beer parlour in 2011, but I think not using them, for Korean at least, is kind of wrong. First issue is the consistency. In other languages of Wiktionary, separating an affix page with a hyphen is a regularity and Korean doesn't block people from using them. Secondly, the main dictionary used for Korean terms, pyojun-gugeodaesajeon - Dictionary of Standard Korean, uses hyphens to distinguish between suffixes and words. As such, such words as 어서 (eoseo) have 2 entries in the aforementioned dictionary: 어서 (quickly) and -어서 (because). As for the issue of redirecting, I think it's possible to create a hyphen version of affixes in Korean Wiktionary to cause less trouble for the learners and searchers, which has already been mentioned by sche in the previous discussion:

"We also have the option (if our Japanese and Korean editors prefer to include the hyphens in the page titles) of creating unhyphenated pages as redirects, and asking the Japanese and Korean Wiktionaries to create hyphenated versions as redirects. This is how en.Wikt and de.Wikt (which use l') link to and from fr.Wikt (which uses l’)."

I personally believe this is a great idea. Such action will keep consistency and reduce overflow in one pages, when it's completely unnecessary. I do not know how Japanese works, but as for Korean, I think considering 2 main reasons: consistency and the source of terms, is enough to add the function back. Yes, Korean may not use hyphen in the actual texts and such, but in many Korean learning resources, suffixes are still separated either by a hyphen or a tilde, however, the latter isn't really used for other languages. Please let me know what you think. This has already been discussed with @Karaeng Matoaya and, as of now, he agrees with the proposal. -Solarkoid (talk) 18:04, 3 February 2021 (UTC)

@LoutK, what do you think? I think this will help remove clutter in entries like (i), which currently has twenty etymologies—creating -이 (i) and moving the particles and verbal suffixes there would reduce that to a "mere" thirteen, once we get rid of the useless "Hangul syllable" entry. On the other hand, this would be less convenient for suffixes with clear etymological connections to free morphemes, e.g. Sino-Korean ones. A solution (one I might personally prefer, though I'll have to think a bit more) might be to move all the verbal suffixes and case-marking noun particles to hyphenated lemmas while keeping noun suffixes with clear equivalent free morphemes together with their free forms in non-hyphenated lemmas.--Karaeng Matoaya (talk) 01:23, 4 February 2021 (UTC)
  • For JA entries, I'm not a fan of hyphenating -- no other resource that I'm aware of uses hyphens, and I don't think they're necessary.
That said, I have no opposition to hyphenation for KO entries -- written Korean has many more homographs than Japanese, and thus much more potential for huge polysemic entries like Karaeng's (i) example above. (FWIW, I like the idea of keeping standalone entries and affixes together under hyphen-less spellings, and moving those affix entries that are etymologically distinct off to hyphenated spellings.)
‑‑ Eiríkr Útlendi │Tala við mig 01:44, 4 February 2021 (UTC)
I wouldn't mind the creation of hyphenated lemmas. But, I really like Karaeng's solution. If we were to unconditionally move everything, including Sino-Korean, it would actually complicate things more than it should be. For example, would be split among (Han, historical dynasties) and -한 (han, man; person), a suffix not used in isolation. And this would be the same for with (mu) and 무- (mu-), which is already neatly organized under (mu). I would much prefer both definitions be under one entry as both share the same etymology. — LoutK (talk) 18:05, 4 February 2021 (UTC)
@LoutK Ah... Okay, now that's another obstacle to be tackled. What do you think of this: If a 한자어 exists on its own and as an affix, let's keep it under the same etymology and create a redirect link from that affix with a hyphen to the 한자어 Korean reading page, as in: Create 무- (mu-) and make it be redirected to (mu)'s second etymology. If that is not the case, then why not just make the hyphenated page be the main one? Thank you :D (Actually to add, in Georgian we have არა- (ara-, negative prefix) and არა (ara, no) though they have the same etymology). -Solarkoid (talk) 19:17, 4 February 2021 (UTC)
I'd prefer the status quo. Apart from the reasons stated above, it doesn't seem a Korean convention (not used in Korean dictionaries) but the an the headword and the automated transliteration are already in place, e.g. the suffix 이시여 (isiyeo) is displayed as —이시여 (isiyeo) and is transliterated as "-isiyeo".
We had similar discussions regarding the Arabic somewhere. If I am not mistaken, the agreement was not to include hyphens in prefixes, articles and siffixes but a taṭwīl (an elongatation symbol ـ‎) e.g. the definite article ال(al-) (always spelled together with the related word) optionally displayed as الـ(al-) and a hyphen could be used in transliterations. --23:52, 7 February 2021 (UTC)

Word of the Day theme for April Fools' Day 2021[edit]

Hello, all. For April Fools' Day 2021, I'm thinking of featuring six pairs of interesting words which are anagrams of each other. We already have three pairs at "Wiktionary:Word of the day/Nominations#Other; please nominate another three more – bonus points if you can find anagrams of words already in the nomination list! Longer words are probably more interesting (suggestions like bat and tab are too dull). — SGconlaw (talk) 22:00, 3 February 2021 (UTC)


Some time ago, these two labels were merged, because it was argued that there isn't a clear distinction between them. Maybe, but I don't think it was a good move.

Imo, derogatory carries a nuance of belittling or insulting that pejorative doesn't necessarily have. When I say that someone has a mémoire sélective (selective memory) (maybe not the best example), I'm pointing out an attribute that is generally disliked and frowned upon, but I'm not "belittling" them or "insulting" the person for it. PUC – 22:53, 3 February 2021 (UTC)

I think the distinction is too subtle to make having separate labels worth it. If it’s really necessary to make this distinction for a particular entry, put it in a usage note. — SGconlaw (talk) 18:37, 4 February 2021 (UTC)
Prior discussion was WT:BP/2018/July. I thought "derogatory" was stronger, harsher than "pejorative", like "rare" vs "uncommon", but this doesn't seem to be reflected in how other dictionaries define the two terms; they do not seem to bear out the idea that "pejorative" is weaker, and in this wordreference thread someone says they thought "pejorative" was stronger and "derogatory" was euphemistic! It seems like there is not a consistent / maintainable difference. - -sche (discuss) 05:37, 8 February 2021 (UTC)
The Collins and AHD thesauruses show these terms as synonymous. Whatever subtle differences in connotation there may be or have been, they are not something we could rely on now. DCDuring (talk) 22:15, 8 February 2021 (UTC)

Separate entries for reflexive verbs[edit]

Suppose a verb X carries some distinct meaning(s) when used reflexively. Right now, this seems to be handled inconsistently in English, with a mix of the following options:

  1. Add a definition for the reflexive sense at X
    Examples: pride, acquit, trouble, lower
  2. Create a separate entry at X oneself
    Examples: help oneself, occupy oneself, kick oneself, shit oneself, piss oneself, express oneself, sun oneself (though, in the last two cases, there are generic transitive senses at express and sun which arguably subsume the reflexive configurations)
  3. Both
    Examples: top / top oneself, soil / soil oneself, carry / carry oneself

For more examples, see Category:English reflexive verbs, and intitle:oneself.

I found that this has been discussed before. That thread itself links to 7 earlier discussions of the topic. It seems like there was at least tentative support for this policy described by Mahagaja: "regardless of semantics, we only have separate entries for reflexive verbs in languages where the reflexive particle is written together".

But, to quote another participant, it continues to be unfortunate that "rules and decisions are lost and forgotten from one generation of contributors to the next". I'm wondering how we could go about codifying a consensus on this question (if not globally, at least for English). I noticed that Mahagaja's rule is reflected at Wiktionary:About French, and Wiktionary:About Czech, but we have nothing about reflexive verbs at Wiktionary:English entry guidelines. I'm not familiar with the processes around policy here. Would it be acceptable in this case for me to BOLDly add this to WT:AEN and see if it sticks? Or would this need to go through some RfC process? Also, are there other policy pages where this could be codified?

I'd be eager to work on cleaning up the inconsistency around these entries, but it's unclear to me at the moment which direction to standardize on. Colin M (talk) 22:28, 4 February 2021 (UTC)

I'd be in favour of not having separate entries for reflexive forms, which is currently the case for most languages. The one potential problem with this is that a lot of English speakers probably don't know what "reflexive" means in a label, since it's not as common in English as in, say, Romance languages. The solution, I guess, would be to make sure they all have usexes. Andrew Sheedy (talk) 04:18, 5 February 2021 (UTC)
You can’t decide it globally, because of polysynthetic languages and because of different lexicographic traditions. The Macedonians and Bulgarians find it consequential to usual practice to add reflexive verbs as page titles with separately written reflexive pronouns, whereas for Serbo-Croatian it is inacceptable, whereas for Russian the reflexive verbs are considered more fused, being written together, and deserve entries as derivations. For German it is generally unexpected to have entry titles of reflexive verbs barring longer idioms. I don’t know what you have in English. The meaning of “reflexive” not being understood in labels at least is a poor reason for any decision – no need to foster lacking the lack of basic education. Fay Freak (talk) 20:09, 6 February 2021 (UTC)
We already have long established conventions for languages where the reflexive particle is separate from the word and (can often be written far away from the verb) but what I found is missing is the information for users what that particle IS. E.g. I don't find befinden: (reflexive) to occupy a place a very helpful definition. Where is the reflexive particle sich? The actual term is sich befinden, not befinden.
Now, compare with the Bulgarian ка́звам (kázvam) (sense# 2). Displaying it as (reflexive) (~ се) to be called is much more useful, IMO. It's achieved with the template: {{bg-reflexive}}. --Anatoli T. (обсудить/вклад) 01:24, 8 February 2021 (UTC)

Wiki Loves Folklore 2021 is back![edit]

Please help translate to your language

Wiki Loves Folklore Logo.svg

You are humbly invited to participate in the Wiki Loves Folklore 2021 an international photography contest organized on Wikimedia Commons to document folklore and intangible cultural heritage from different regions, including, folk creative activities and many more. It is held every year from the 1st till the 28th of February.

You can help in enriching the folklore documentation on Commons from your region by taking photos, audios, videos, and submitting them in this commons contest.

Please support us in translating the project page and a banner message to help us spread the word in your native language.

Kind regards,

Wiki loves Folklore International Team

MediaWiki message delivery (talk) 13:25, 6 February 2021 (UTC)

Time to retire Appendix:List of protologisms?[edit]

If you look at the history for the individual pages (A-F, G-P, Q-Z) you will see that there was really very little activity in 2020 (despite everybody being stuck at home). I think the likes of Urban Dictionary are now popular enough for people inventing words to go there instead. The protologisms don't really serve any purpose for our project and are mostly not even interesting or funny. Equinox 18:21, 6 February 2021 (UTC)

Symbol support vote.svg Support. As I understand it, LOP was originally intended to be a kind of shunt for people who would otherwise make a mess in the main namespace. Honestly, I don't see the need for that, and I don't like the idea that the appendix is a dumping ground for crap nobody actually wants in a dictionary. It seems worth one last scan for anything that might genuinely have become attestable, and then it can be deleted. —Μετάknowledgediscuss/deeds 19:14, 6 February 2021 (UTC)
Vote created: Wiktionary:Votes/2021-02/Retire_the_Protologisms_appendix. Equinox 05:36, 14 February 2021 (UTC)
Vote moved to Wiktionary:Requests for deletion/Others#Wiktionary:List of protologisms. —Μετάknowledgediscuss/deeds 21:42, 15 February 2021 (UTC)

Wiktionary cited in legal proceedings[edit]

Citing the crowdsourced website Wiktionary, they argued "the 2000s" could refer to "the period from 2000 to 2999," and that Maxwell couldn't possibly see into the future.

The 2000s page could be improved. DTLHS (talk) 15:13, 7 February 2021 (UTC)

I've RfVed the century and mellenniumn senses. For all we know the anon edits could have been by parties to the lawsuit (or their agents(. DCDuring (talk) 17:51, 7 February 2021 (UTC)
Quite funny, but sad that legal representatives are permitted to make such obviously bad-faith claims. Equinox 17:53, 7 February 2021 (UTC)
They can make the claims, but they are likely to be laughed out of court. It is embarrassing that we have had both of these definitions since 2014 with no citations. DCDuring (talk) 17:57, 7 February 2021 (UTC)
Making citations mandatory would improve the quality and reliability of Wiktionary very much. A start can be making it mandatory to give sources in the edit summary or requiring them for new entries. As it is now, en.wt will never be realiable, trustworthy. (Even with WT:RFV hoaxes can lie inside of en.wt without being noticed by anyone.) --幽霊四 (talk) 02:34, 8 February 2021 (UTC)
It would also be unworkable in general and quite pointless for a lot of senses. ←₰-→ Lingo Bingo Dingo (talk) 12:48, 14 February 2021 (UTC)

Limits of Old Spanish[edit]

I am trying to do a mass cleanup of {{etyl}} for Spanish entries. Essentially, I load the 2100 or so pages using this tag into a single file, then make all the edits needed, then push the results back using my bot. I have done this a lot in the past; for edits of this sort, I always add "(manually assisted)" in the bot changelog message. I'm running into a few issues, however:

  1. What is the limit of Spanish vs. Old Spanish? For example, can Spanish directly borrow a term from Andalusian Arabic (which must have happened pre-1492), or is there always an Old Spanish intermediary? Similarly with Classical Nahuatl (I think these borrowings generally happened in the 1500's). Wikipedia says Old Spanish goes up to the early 15th century, but it also says the boundary of Old Spanish occurred "before a consonantal readjustment gave rise to the evolution of modern Spanish"; this sound change occurred c. 1550-1600.
  2. Does anyone know of an Old Spanish dictionary? I'm having a hard time even finding a reference to one, much less an online source.
  3. What are the best sources of Spanish etymology? There don't seem to be very many good online sources.

Thanks! Benwing2 (talk) 23:21, 7 February 2021 (UTC)

First question: I have used to set the line at 1492 to avoid difficulties with borrowings in America’s prior languages, and because this is the similar to the 1500 line of other European languages. By mutual intelligibility of the chronolects the line may be seen one or two generations earlier in the past. Moroccan Arabic كابوس(kābūs) has been borrowed right at the line. It is true that most borrowings from Andalusian Arabic into Iberian Romance must have happened in Old Spanish in any case, however note that Andalusian Arabic was spoken until the early 17th century in Spain, the speakers didn’t just vanish by one expulsion. In the early 1600s in Spain one still needed court interpreters for Arabic – till the language was completely eradicated due to intolerance. Sound changes are overrated for language division.
Third question: I am not sure particularly about best sources or all-encompassing sources, dealing with Spanish only incidentally when coming from specific onomastic topics or relatives (this is also an approach that works after all). There is Corominas, there is recently Edward A. Roberts, but all have flaws. Spanish, similar to though less severely than English, has obtained its loans from so sundry directions that it is difficult for an individual to be good at all. For Arabisms one has comprehensive coverage by Corriente, Federico (2008) , “anything”, in Dictionary of Arabic and Allied Loanwords. Spanish, Portuguese, Catalan, Galician and Kindred Dialects (Handbook of Oriental Studies; 97), Leiden: Brill, →ISBN but lately Corriente, Federico; Pereira, Christophe; Vicente, Angeles, editors (2019) Dictionnaire des emprunts ibéro-romans. Emprunts à l’arabe et aux langues du Monde Islamique (in French), Berlin: De Gruyter, →ISBN, for all Ibero-Romance. For Americanisms there are some but little-known works. I have Diccionario de Americanismos by Marcos A. Morínigo on my shelf, as an example what there is. Fay Freak (talk) 01:00, 8 February 2021 (UTC)

Durable archiving[edit]

If we take screenshots of e.g. websites, and upload to Wiktionary/Wikimedia, does that count as "durably archived"? (Yes, I am aware that screenshots can be faked fairly easily.) Mihia (talk)

In practice, "durably archived" seems to refer to the type of source rather than anything to do with how it is archived. I think it is high time that this language was relitigated. Perhaps we should have a more positive criterion; for example, a list of sources that are deemed acceptable that can be easily amended. DTLHS (talk) 01:35, 8 February 2021 (UTC)
Pictures on Wikimedia can be altered/deleted as well, can't they? (Even though for alterings/modification there's a version history.) And the part you mentioned in brackets also means, it's unreliable, not trustworthy. One could easily provide three fakes, and claim they're real pictures and that the original source is gone. --幽霊四 (talk) 02:34, 8 February 2021 (UTC)
Is that really that hard to do now? I could easily make up three books and claim that none of them are listed in Google books. I know I've read books that are nowhere to be found on the Internet. Would anyone really check? (The trick might not work in RFV, but it wouldn't be hard to create an entry like this that no one ever caught.) Andrew Sheedy (talk) 05:48, 8 February 2021 (UTC)
Indeed, fakes aren't a high concern. They're easy to accomplish, if you know what you're doing, but if you know what you're doing around here, you clearly care enough about the dictionary that you're very unlikely to add fake cites. (I'm sure WF will claim he's added fake cites, but he cares more than he lets on...) —Μετάknowledgediscuss/deeds 06:30, 8 February 2021 (UTC)
Sure, I've added fake cites before, because I'm a freaking anarchist. Sadly, I don't keep a list of them, but occasionally some fake stuff I added gets picked up - a recent one found was cuntbutt. MM0898 (talk) 14:53, 8 February 2021 (UTC)
Screenshots can be rather awkward to work with, and may be rendered wrongly by various browsers etc. It seems it would be preferable to extract the actual text or HTML in some way. There are copyright issues in either case. Equinox 02:45, 8 February 2021 (UTC)
One side effect of the "durably archived" rule is to give heavier weight to professionally edited material than to a typo-laden rant somebody banged out on a keyboard before rushing out the door. We shouldn't try to systematically preserve stuff we like without considering how to change the CFI rules. This also applies to, for example, comments sections preserved by search engines and archive.org. Vox Sciurorum (talk) 09:41, 8 February 2021 (UTC)
For some days I have a rule formulation floating in my head to implement, and I finally express it, since derises grow, the old increasingly being viewed as antiquated, the virus holding closed the libraries etc.
As a third point after “Attested” means verified through in WT:CFI:
“consistently appearing on the internet.”
It means that, as some have realized earlier but I do not want to seek out now from the archives, that if hundred pages use a term then it does not matter that those could all vanish if at the same time in some years there are other pages replacing the vanished pages as a term has become recurring internet vocabulary. It is a dynamical concept of durability. It excludes protologisms because they do not appear consistently, and it excludes typo-laden rants because typos are not that consistent. And most importantly, we can thus without experiment keep 🦀 used to convey joy or excitement, we can have Anitwitter, we can have glownigger. What we can’t have is what just X and his friends use (→ protologisms), it is not carte blanche for private language. Processually, if pending a request for verification concerning such a term it is assessed that the term is thus consistent in appearance then it does not matter either if later a term’s uses have vanished from the internet “completely” because of being outdated etc., for then Wiktionary archives traces of the past in its files, so because only the ex ante view matters there is no danger of contradiction by reality. Still it is not fine to have linkrot in the mainspace so one would avoid links in it that aren’t intended to be durable in principle (this linkrot is, correct me if I am wrong, the main reason for the durability barrier in the first place), but there is agreement that the citation namespace can document low-durability quotes, as showing the basis the editors have worked on (heck, our best people even quoted some Twitter utterances by philologists for etymologies). Fay Freak (talk) 13:02, 8 February 2021 (UTC)
  • There is already provision in the CFI for attestation through "clearly widespread use", which would apparently include the case where the term is used on "enough" websites, even if the web pages are not individually durable. However, I would like us to be able to attest usage by reference to an individual web page, provided the content is "clearly sensible", however we can best define this, e.g. not random gibberish, non-native gibberish, highly ephemeral/casual content, etc. One way to bolster the evidence of usage, make it slightly harder work for people to fake citations, and provide some degree of permanence would be to take a screenshot and upload it. I note the copyright concern raised by Equinox. Would we be able to claim some sort of "legitimate use" exemption for these purposes? Another thing has also occurred to me. Do we consider content on Internet archive services to be "permanently archived"? I'm not very familiar with using these. Are they sufficiently reliable and complete in their coverage for our purposes? If we do allow them, do we have any particular preferences for one over another? Mihia (talk) 18:24, 8 February 2021 (UTC)
    If only this was how people interpreted that line. That's what I always thought it meant, but then I've frequently seen people saying it doesn't apply to non-durably archived material. If so, then what on earth is the point? I think clearly widespread use should be good enough, regardless of whether it's archived or not.... Andrew Sheedy (talk) 23:54, 8 February 2021 (UTC)
Then what people frequently say is certainly not what the text of the CFI actually says. Mihia (talk) 10:12, 9 February 2021 (UTC)
  • An archive.org backup of a website is superior to a screenshot in just about every way. It's much quicker than taking a screenshot and uploading to Wikimedia Commons, it's not susceptible to forgery, and it can capture the full context (whereas, for a page with lots of content, you're only going to be able to capture a screen's worth of it in a screenshot). You can proactively force their crawler to take a snapshot by going to https://archive.org/web/ and entering the URL under "Save Page Now". Also, templates like Template:quote-web include an "archiveurl" parameter so you can provide the original url and an archive link. The only case where this won't work is pages that aren't accessible to the crawler, for example a forum post that can only be read if you register an account. I think such sources should be avoided if at all possible. Colin M (talk) 20:25, 8 February 2021 (UTC)
So (question to all) is it, then, generally accepted that an archive.org link to "sensible" web content counts for attestation purposes? If it does, somehow I have never known that. Mihia (talk) 20:41, 8 February 2021 (UTC)
No! Equinox 20:43, 8 February 2021 (UTC)
So why not? Mihia (talk) 20:44, 8 February 2021 (UTC)
In addition to the quality problem I mentioned earlier, archive.org could go away at any time. For example, they got sued over massive copyright violation last year. That could put them out of business. Or a change to copyright law or online liability law ("Section 230" in the USA) could cause content to disappear. Vox Sciurorum (talk) 21:12, 8 February 2021 (UTC)
There would need to be some regulation about the quality of content, I agree, but I don't see why the fact that there is a lot of crap on the web should prevent us from using the "sensible" stuff. We allow Usenet, after all, which is unedited/unregulated, and also contains a lot of crap (in fact, to my eye, the CFI gives Usenet a peculiarly high prominence, even ahead of printed books). In respect of quality, why would we allow Usenet and not "sensible" web content? Mihia (talk) 21:57, 8 February 2021 (UTC)
Seems like an argument that could be used to disqualify any online source. WT:ATTEST approvingly mentions "Usenet groups, which are durably archived by Google". And yet, Google could shut down its Google Groups service at any time. The whole company could be sued out of existence. But I think it's very unlikely either Google Groups or Archive.org is going to disappear overnight. Colin M (talk) 21:58, 8 February 2021 (UTC)
Indeed. I think it's a horrible argument, to be perfectly frank. There are plenty of books, movies, etc. out there that could theoretically be completely destroyed. The Internet could be destroyed and then nothing on the Internet would count as durably archived (obviously, this would negate the need for any Wiktionary CFI, but hopefully my point is clear). There are books I have that are not fully searchable anywhere online and would be very hard to find in a library. Are they not durably archived? It seems that CFI is a bit behind the times, intended for a time when the Internet was new and its future not certain. It looks like it's here to stay, short of a global catastrophe, and I don't think we should be so restrictive about our CFI.
Here's another thought: an RFV discussion that passes is usually a good indication that the cites were findable at the time of the discussion. Could a word not be kept on the basis of, say, 5 tweets, and then the discussion archived? The discussion would serve as much for attestation purposes as the cites themselves. If necessary, the RFV could be renewed if the tweets no longer existed, and new cites found. But if a word or emoji never makes it into a book, it seems really strange that we'd exclude it on those grounds alone. Andrew Sheedy (talk) 23:54, 8 February 2021 (UTC)
The CFI in this regard seems at once up with times in its comment that "Wiktionary is an online dictionary" and behind the times by saying that "this naturally favors media such as Usenet groups". Mihia (talk) 00:22, 9 February 2021 (UTC)
As we move away from the concept of "durable archiving" (whatever that means), it means the obligation of editors to faithfully record what they see is higher, since Wiktionary itself is now the "durable archive". What does it mean when we have a citation from a website from 2020 that no longer exists in any form in 2030? Do we have to go back to the editor that originally added that citation and make some judgement on their reliability? IMO there now needs to be a higher emphasis on reviewing citations as they are added. DTLHS (talk) 20:27, 9 February 2021 (UTC)
What I had in mind is that non-durably archived cites could still be challenged in RFV. So, if I create an entry with two cites from Twitter and one from a book, it could still be brought to RFV. If those cites no longer existed, they could be removed or made less prominent. If three new cites were found on, say, Twitter and Reddit, they could be added, and the RFV would serve as proof that the cites were at one point authenticated, even if they disappear later. So the only thing that would change from the way things are now (aside from allowing non-durably archived cites in the first place) would be that an entry with three cites could still be challenged once in RFV, and the original cites wouldn't count for anything unless they still existed. Andrew Sheedy (talk) 20:57, 9 February 2021 (UTC)
We'd probably prefer to know when a word first came into use, if possible. Equinox 21:02, 9 February 2021 (UTC)
That's a good point, but I think that just makes the out-datedness of the current CFI even more apparent, given that so many words take off on the Internet before becoming common in print. I think that if one cite from a given year was deleted, it wouldn't be hard to find another from the same time anyway, so I don't think this would be much of an issue in practice. Andrew Sheedy (talk) 05:36, 10 February 2021 (UTC)
I wasn't active when the language approving of Google groups was written. In the old days Usenet was highly redundant and it wasn't hard to find a new source of news. Over time it became more dependent on a few large servers. Google groups shows why we can't trust one company: several editors here have complained that it can no longer be searched easily, or at all. The NNTP server I used to use shut down a decade or more ago and I never bothered to find another. The Usenet services I found last year did not offer free public searching. (Perhaps there is one that does; I didn't look hard.) I wouldn't object to saying modern usenet posts, say after 2000 or 2010, don't count as durably archived. Vox Sciurorum (talk) 13:50, 9 February 2021 (UTC)


It's obvious that there's much inconsistence in en.wt:

  • Taxonomic adjectives are sometimes entered as Translingual (mul) and sometimes as Latin (la), sometimes even when they were never used in Latin.
  • Terms derived from Latin, like law terms, are sometimes entered as Latin (ex turpi causa non oritur actio), English (hostis humani generis) or Translingual (ius cogens).
  • Species names are sometimes mentioned with an internal wiktionary link, sometimes with an external wikispecies link albeit there's not necessarily a wikispecia entry ({{taxlink}}).
  • Constellations: Translingual Andromedae is given as genitive of English Andromeda. And in English Andromeda Translingual Andromedae is given as a derived term. Translingual And links to both Translingual Andromedae and English Andromeda. Translingual Piscium is given as genitive of Translingual Pisces (no entry), while English Pisces gives English Piscium (no entry) as derived term.

It's also obvious that to some extent the handling isn't based on attestion, usage, hence contrary to WT:CFI.

  • Latin pneumophilus even states: "Used exclusively as a taxonomic epithet and thus not inflected except in the nominative singular; other inflections are theoretical." A note which would be incorrect for Latin, even for scientific taxonomic Latin.
  • Latin albifrons, Latin iudex non calculat failed WT:RFVN (i.e. were created without Latin attestion), while Translingual albifrons, English iudex non calculat do exist.

It's also obvious that to some extent people do what they like or prefer:

  • Tyrannosaurus: Uses italics for the head and has a † before the name. WT:Taxonomic names however still has it as a question, as an undecided matter.
  • English jus cogens was moved to Translingual ius cogens, and as far as I can see without any discussion or community approval.
  • Translingual Homo sapiens contains information about inflection, including a Latin inflection template totally unfitting for it, and as far as I can see too without any discussion or community approval.

Things which have to be considered:

  • Pronunciation, inflection, gender - cp. WT:About Translingual#Rejected.
    • Gender:
      • Translingual Nix Olympica, albeit feminine in origin as can still be seen in Olympica, is used as masculine in German in "der Nix Olympica" (maybe because of Berg m, Krater m, Vulkan m).
      • Translingual ius cogens, albeit neuter in origin, is used as masculine in French (French has no neuter).
      • (German) uses Felis with articles and as feminine (because of felis f or Katze f). (German, without proper noun capitalisation) has masculine or neuter "namen des felis catus", which is from Latin cattus (catus) m (male cate) in apposition and not from an adjective *catus (-a, -um). (same) is similar. (German) has masculine/neuter "des Felis pardus" (genitive). (French) has "le Felis spelaea", from spelaeus (-a, -um), that is: The Latinate/internal gender expressed through the adjective is feminine but the French real/external gender expressed through French articles/pronouns is masculine. And the German examples hint that the German real gender is feminine and masculine/neuter (from the examples above it can't be decided).
    Was the issue of Latinate/internal vs. real/external gender ever discussed anywhere, or did the English lacking genders never considered this issue?
    • Latinate/internal gender could also be given in the etymology section.
    • Lack of real/external gender can make entries somewhat useless.
    • Inflection:
    Basically there are three ways: Inflect it as Latin, as somewhat native or mixed.
    • Latin inflection (6 cases, 2 numbers) was common in German too, and can also be seen in English homo sapiens (Citations:homo sapiens) which probably is an exception.
    • In English and modern German both a mixture and a somewhat native inflection are common, like Homo sapiens in the singular regardless of case, Homines sapienties (from Latin nom. pl.) or Homo sapiens (unchanged plural, which is common in German and still present in English) in plural regardless of case.
    • Macra (not talking about possible attested spelling variants):
      • Translingual Homo sapiens currently gives Latin inflection and with macra Homō sapiēns etc. The German pronunciation [ˈhmo ˈzpi̯ɛns] (not: [ˈhom ˈzapi̯ɛːns]) shows that the macra don't make sense translingually, and google books:Homō sapiēns hints this is not a common spelling actually used (if it's even attestable?).
  • Alternative forms, spelling conventions:
    • Is a capitalised German term like Jus cogens a translingual alternative form of Translingual ius cogens?
    • There's English [[[ie#English|[ie]]/i.e./i. e., German i. e.: For German dots and space are prescribed (Duden), so even if German i.e. exists, it's proscribed (for example by Duden). For English there's i.e.#Usage notes regarding the use of comma and italics in English.
  • Constellations and entries' correctness:
    • Latinate genitives (Andomedae, whether English, Translingual or even Latin) are not derived from English.
    • Does Translingual And really abbreviate English Andromeda? If And is used in multiple languages, then probably also the full form, though possibly in different registers (scientifically using the Latinate term, commonly using a native term similar to Big Dipper).

Things which should be voted upon:

  • (Fancy Style)
    Should some entries have special styling, like italics or † in the head?
    • Are taxonomic terms always used in italics? No. Should a non-italic taxonomic term be considered as an italic taxonomic term which by default is placed in italics (e.g. Homo sapiens = italics of Homo sapiens)? That's complicated and moreover ridiculous, isn't it?
    • † in head is ambiguous: Does it mean the species is extinct or that the term is obsolete? Both is better explained in the usual way: "(obsolete) a species" or "an extinct species" or combined "(obsolete) an extinct species".
    • If † is used for species names, why not use 卐 for nazi terms (卐Führer), ✡︎ for Jewish things (✡︎Torah)? It's ridiculous too, isn't it?
  • ({{taxlink}})
    Should the template be used inside of entries in sections for hypernyms/hyponyms (..regna/../genera/species/..), or should it be limited to Further reading?
  • (Translations)
    Should translingual terms can have translations sections?
    As of WT:About Translingual#Under discussion it's undecided;
    WT:Entry layout#Translations however also permits it for taxonomic terms;
    Translingual ius cogens, as a law and not as a taxonomic term, has a translation section too.
    • The awkwardness, inconsistence of constellation terms might be caused by some (old) rule like "only English entries can have translations".
  • (Macra)
    Should macra be added on Translingual (mul) terms based on the Latin origin even though it's not spelled this way and does make no sense? (This is not about actual spellings with macra, if they exist.)
  • (Attestion)
    Should terms really have to be attestable (WT:CFI)?
    For the beginning, asking more specifically:
    • Should taxonomic terms have to be attested?
      If a taxonomic term is only mentioned once ("We discovered a new subspecies and called it Fish and chips"), does it deserve an entry? Or does it need more, like usages, a certain number of usages, usages in multiple languages?
      Currently "Translingual" is not a WT:WDL. However, it's also not really a language and it can be argued, that for being translingual, a term must be attested in at least two languages.
      If taxonomic terms have to be attested, what's sufficient? Regular attestion in at least two languages; three usages - in one or multiple? - languages; ...?
      • Nix Olympica passed RFV with three usages in two languages being added to the entry.
  • (Gender)
    Which gender should be given? The real/external, the Latinate/internal or both?

--幽霊四 (talk) 02:39, 8 February 2021 (UTC)

Answers: One thing is that terms are both Latin and translingual. The statement that “ius cogens” is Latin and the statement that “ius cogens” is translingual are both true. Due to the nature of translinguality, it is also true to say it is Ukrainian and Romanian, but not on the same level – it does not mean we should have Danish duplicates of such phrases. Your delatinizing translingual entries is therefore a spectacular fail of realizing this close connection of translingual and Latin. The translingual is an ideal entity, therefore we want macrons and inflection tables (the Latin inflection table templates should account for this placement) and the genus should be the Latin one, irrespectively of how it is used in French. If ius cogens has nominal class 35 in some Bantu language this does not mean the head template should include it; although the usage note can if the nominal class is hard to guess, which is not the case for the French masculine, so you see why I gave but the neuter.
Then, you are fanciful, in wanting it all so exact, the style is not fancy. Italics and † before a taxon are standard under certain circumstances, in the taxonomical sciences. If you don’t know the circumstances and find it special then why do you bloviate about it? Again here, nobody would see a problem here short of you. Because it is clear in those sciences how things should be italicized it is not plausible that anyone would see a need to vote upon it, as we shan’t have any vote contrary to science.
Attestation: Strange question. Apparently not three times as translingual is not mentioned in WT:WDL. Which is reasonable because if somebody reclassifies something this year it is likely it will be used by others, or if not then it is still inclusionworthy because somebody might stumble upon it and try to look up here what it means or what synonyms there are.
Alternative spelling conventions: Please avoid capitalised German terms like Jus cogens because this is already not based on a translingual rule but on a German rule, which is even different according to the various orthographical frameworks (and often not adhered to e.g. by authors which use Neue Rechtschreibung because of thinking German rules having nothing to say about the writing foreign terms).
“Should translingual terms have translations sections” – why not, if it is the best place. It is specifically mentioned under Wiktionary:About Translingual § Under discussion, that is not even by me—I added the caveat that Wikispecies also allows translations so it is for the specific matter of taxons avoidable on Wiktionary; but ius cogens is a translingual term with no corresponding English term though there be native terms in other language so therefore we have translation sections, it is unavoidable unless one argues, only to have translation sections only under English, that the translingual entry should be duplicated in such a fashion that we also have an English section under it; but no I reckoned reasonably that such a constellation that in English only a term is used that we should treat as translingual while other languages use native formations is possible. And yes, ius cogens was moved to translingual because with respect to serving all languages it appears to serve users best and depict the usage most accurately (regarding the question “which language is it?”); you can still add local pronunciations under the translingual pronunciation section (native pronunciations are always an argument people profer).
BTW theoretically, to solve the ever-arising question whether something is Latin or translingual, it might be possible to merge translingual into Latin and present Latin, that is even the Roman Latin, before English, but I think, apart from the fact of distinction loss in that case, it is easier to just categorize by practicality like I do. If a term is devised as translingual then it is translingual, and it is irrelevant in which languages the term is used. Radical and consequential, as well as unintuitive to monolinguals, but intuitivity shan’t assert itself as translingual terms are presented on the top of pages, as well as not against objectivity. See, it all has a system. Fay Freak (talk) 12:08, 8 February 2021 (UTC)
So if you really want votes, to formally append WT:CFI or WT:EL or other acts, abstract-general formulations of the rules may be the following:
1. “Without prejudice to the requirement of having been used, a term is translingual if devised as translingual.”
2. “If a lexical unit is not used in an individual language but as translingual then it only belongs to the latter.” (this solves the cases when something is formally Latin but is not used in Latin but as translingual; it also excludes Danish entries for translingual bonmots.)
3. “A translingual term may present diacritics, inflection tables, and similar grammatical information particular to an individual language if the term is manifestly closely connected to it.” (then it is also no undue hardship to not have Latin entries for certain words because the translingual term can have all the macrons and inflections)
This is without decision for the question about capitalization inside Latin. Fay Freak (talk) 12:32, 8 February 2021 (UTC)
Do we want to be the best English/German/French/etc. dictionary we can be? Then any system that prevents us from pointing out that "jus cogens" is far more common in English, and whatever forms are usual in those languages are usual in those languages, is a bad one. Certainly anyone searching for jus cogens in an English text is ill-served by being told it's an alternative spelling of iūs cōgēns.--Prosfilaes (talk) 07:47, 10 February 2021 (UTC)
It’s not any preventing system. 幽霊四 added something about gender in individual languages in the usage notes, and like that it is possible to add something about preferred spellings – though, in English there seems to be a regular adaption towards ⟨j⟩ spellings; however capitalization of such terms in German is also regular and I would not see a need of it being mentioned. Poor argument in any case to devise a system where one cannot see the forest for the trees because of individual language information.
The question always comes up under which headings content has to be sorted. If the editors only want to state something is used in a particular language, then this fact alone may not be enough to warrant a whole language section, if in the same fashion it can be stated for many languages (after all it is the very idea of “Translingual” sections); if it is about pronunciation, then it still can be avoided if the pronunciation of “Latin” terms is after the usual measures (it is still within the concept of “Translingual”); it is similar to why the existence of {{ar-IPA}} does not at at all compel Arabic editors to sort everything under Pronunciation N sections, for it would be disproportionately more cluttered than when one just ignores the pronunciations – which are obvious enough from the transcriptions and the spellings for anyone who repeatedly deals with the language. Fay Freak (talk) 10:54, 10 February 2021 (UTC)
Gender matters for Translingual terms when they are names of genera that might be combined with a species name treated as a Latin adjective. Sciurus niger but Polietina nigra. Offhand I can't think of any other cases likely to come up in modern writing. Often one finds phrasing like die Gattung Sciurus allowing a native word to control the grammar. Once upon a time people wrote species descriptions in Latin and there is still a rule of zoological nomenclature allowing names to be corrected to the nominative singular when they were first mentioned in a different case. Vox Sciurorum (talk) 15:51, 9 February 2021 (UTC)
@幽霊四 Please do NOT move Latin taxonomic etc. terms to Translingual, as there is no consensus for doing this. Benwing2 (talk) 05:10, 10 February 2021 (UTC)
Wiktionary includes far too much stuff under 'Translingual' simply out of a distaste for having multiple language entries for 'the same thing'. I think a radical reduction of allowed content for Translingual, that for instance includes taxonomic names but excludes legal, grammatical and musical jargon, would be a vast improvement. Language-specific comments about gender, semantic relations, variants and language-specific senses are better explained in the sections of the individual languages. The current setup is also immensely Eurocentric because the majority of Translingual cruft is Latinate. ←₰-→ Lingo Bingo Dingo (talk) 13:04, 14 February 2021 (UTC)
@Lingo Bingo Dingo: It is a vast improvement when we have moved the Latinate legal, grammatical and musical jargon to translingual. The distaste is not wrong but grounded, and it is sophistry on your part to pretend by this wale of wordhoard that I have appealed to emotion, which you alone do. Your contradistinction of taxonomic names is completely arbitrary and particular to your own taste, which particularity you fail to admit. Anatomical terms are standardized just like taxonomic names. Hence we come to other medicinal terms, names of pathological conditions with the body areas they occur in – a lot of names of muscles and bones have to be moved – as well as the organisms which are their agents. It is just a little but consequential step to declare grammar terms linguists and philologers made up to be used in all languages as well translingual. Music also distinguishes itself in its international character. The legal terms are translingual in so far as they usually do not refer to anything specific to one legal system but are supranational topoi. Obviously you know nothing about comparative law and private international law. The Eurocentrism is a false claim. If Japan is taking over German dogmatics then it is a skew to make it look somehow less European. Only, ironically, thou, not knowing any languages but a few Latin-written ones, art Eurocentrist, failing to consider the absurdity of creating lots of Latin phrases like genitivus absolutus as Russian or Hindi.
A professor consciously uses international terms independently of whether he is a botany or medicine or law professor. Making the categorization depend on some code of nomenclature is an irrational appeal to authority.
It is a misconception that terms are not translingual by default and only described so out of convenience. On the contrary, the circumstance that any word belongs to a particular language is a claim that requires evidence, and like on first glance accepted can also be disproven. By default, utterances do not belong to languages. Fay Freak (talk) 13:53, 17 February 2021 (UTC)
@Fay Freak It is ironic that you lie that I use appeals to emotion, whereas there isn't a single emotional statement in my previous comment and most of your comment is an unpleasant screed. Your claim that I only have knowledge of Latin-script languages is mendacious, for one I mention Ancient Greek, Biblical Hebrew, Coptic, Yiddish and Syriac on my user page; there are also others but I don't expect you to read minds. Perhaps you should learn to read carefully before you write. Dutch genitivus absolutus, which I made, is a good example of something that should never be considered Translingual; but thank you for demonstrating that you never took the effort to check for actual attestation because it is only widely attested in a limited number of languages. ←₰-→ Lingo Bingo Dingo (talk) 17:53, 17 February 2021 (UTC)
@Lingo Bingo Dingo: It is ironic that you pretend that I use appeals to emotion, whereas there isn't a single emotional judgment in my previous post but I was concerned to debunk all emotions. There is no such thing as “emotional statements”. The sharpness and harshness does not correspond to emotionality; whereas behind your displayed impartiality there is nothing but emotion leading to your decision, not being informed and thinking through.
You should learn to read carefully before you write. I already argued that the number of languages it has been used in is irrelevant. There are translingual terms only attested in one language – including many taxa. The actual attestation of Dutch genitivus absolutus is as translingual. You are completely missing the point. That “it is attested in Dutch” does not serve to distinguish whether a word is translingual or Dutch, nor any arbitrary number of languages a word is attested in. You do not shed light on any criteria to determine whether something is translingual. Fay Freak (talk) 22:06, 17 February 2021 (UTC)

Saterland Frisian orthography[edit]

@Leasnam, Apisite: I think it's time to make this official: which orthography do we want on en.wiktionary? I propose the one handled by {{R:stq:SW}} (which matches the one portrayed by en.wikipedia). Other attested spellings could be given as "alternative spelling of". Any objections? If not, I'll update WT:ASTQ accordingly. Thadh (talk) 11:50, 9 February 2021 (UTC)

Moldovan vs. Moldavian varieties of Romanian[edit]

In the Republic of Moldova there are some words that are not in use in Romania, for instance rutieră instead of microbuz (meaning minibus) or bătută instead of șnițel (meaning schnitzel).

We currently have a category Category:Moldovan Romanian that includes words from both the Republic of Moldova and the Moldavia region of Romania, which is making things a bit confusing. There is no way to see the words in use in that country, like we have for instance for Category:Australian English.

Since these varieties of Romanian are separate, should we divide them into Category:Moldovan Romanian (for the words from the Republic of Moldova) and Category:Moldavian Romanian (for the words from the Romanian Moldavia)? Bogdan (talk) 20:13, 9 February 2021 (UTC)

Seems sensible to me. Due to the political border, this is useful for navigation and in terms of actual linguistic shift, political borders (particularly international ones) will inevitably lead to some changes in the language itself. —Justin (koavf)TCM 21:16, 9 February 2021 (UTC)
@Bogdan The main risk I see is that most people won't notice the subtle difference in spelling between the two, and we'll end up with the same uninformative hodgepodge, but now randomly distributed in two different categories. Worse, it may not even be obvious that there are two categories. I think we need to make the names longer and more distinct, perhaps something like "Moldova Republic Romanian" and Moldavian Region Romanian". I realize those names are long and awkward, but it's no good to have short and sweet names that end up meaning nothing. Chuck Entz (talk) 03:30, 10 February 2021 (UTC)
@Bogdan, Chuck Entz: Therefore, and combined with a diachronic view, it may be advised to keep it ambiguous. For if one deals with a term that was used in Moldavia – and one will only roughly know which Moldavia – two hundred years ago, one cannot work with the current political distinction. For the time being, to save regiolect information that you possibly have, you can employ an insider distinction, Bogdan: You can label explicitly with something like “Moldavian region of Romania” while categorizing the same as with Republic of Moldova terms. The same the labels {{lb|ar|al-Andalus}} and {{lb|es|Andalusia}} categorize as Category:Andalusian Arabic and Category:Andalusian Spanish while displaying differently (which is also devised in view of potential Aramaic or Hebrew usages in the region). Then you might see if you have enough and sufficiently unambiguous label uses to turn on separate categories. Fay Freak (talk) 11:17, 10 February 2021 (UTC)
My problem is not just with the ambiguity of what's displayed, but also that we cannot see a list of all the words that are in use only in the Republic of Moldova. Bogdan (talk) 11:29, 10 February 2021 (UTC)

My fear is that we yet again start moving towards recognising Romanian spoken in the Republic of Moldova as a separate language. It goes against what we voted for all those years ago. To the best of my knowledge, the Moldovan regiolect does not have rigid boarders, hence, keeping to one category isn't wrong. If there are in fact words only used in the Republic of Moldova, can't we just add usage notes? Seems more reasonable that way. --Robbie SWE (talk) 13:33, 10 February 2021 (UTC)

Regardless of the merits of a split, I am not a fan of the two names chosen. These two terms are too similar and not very clear in what they indicate. Something like "Republic of Moldova Romanian" for the former category would be much clearer. —Rua (mew) 13:40, 10 February 2021 (UTC)
I guess "Republic of Moldova Romanian" is the least ambiguous version. Bogdan (talk) 07:09, 11 February 2021 (UTC)
I agree with Chuck and Rua that "Moldovan" / "Moldovian" / "Moldavian" are too similar so people are unlikely to understand or make or maintain the desired distinction, so there will just be too categories with words from each region instead of one. I suppose the categories could be split if this is really necessary; if they are split, and we use more distinguishable names like Chuck proposes, the existing "Moldova" category could be retained (under some spelling...) as a parent category, to contain both subcategories and to contain terms used in both regions and/or terms where it is not possible to determine exactly which of the two subregions they are used in, or terms which predate the split, this would address the issue Fay Freak mentions. (On some level this reminds me of the discussion of whether the "Canadian English" category should allow an overview of terms used only in Canada, with terms that are also used in Scotland or the Maine or wherever split off or removed.) - -sche (discuss) 01:15, 12 February 2021 (UTC)

Cannot open collapsible sections (translations, etc) in Kiwix[edit]

(Initial discussion was at sv:Wiktionary:BB#Translations_missing_in_the_official_.zim_dumps)

A bug in the MWoffliner .zim creator makes it impossible to view the content of collapsible sections in the official .zim dumps for Wikimedia's offline project w:Kiwix. This is not specific to English Wiktionary and has been reported with respect to other wiki languages. For example, translations and inflections are not displayed.

The issue seems to be entirely on MWoffliner's side: Wiktionary's CSS collapses the sections iff Javascript is available and MWofflner adds its own Javascript, which interferes with the ability of expanding them.

Should the Wiktionary/ies do anything to work around this until MWoffliner is fixed? Is there a wiki page for a real cross-wiki report/discussion? -- 07:30, 10 February 2021 (UTC)

You speak as if people know (/ care about) what MWoffliner is.
Wiktionary (-ies) is (are) the data origin, and it is the responsibility of 3rd party parsers to ensure that their parsers work (and you already say that it "seems to be entirely on MWoffliner's side"). :s —Suzukaze-c (talk) 11:28, 10 February 2021 (UTC)

CFI for place names[edit]

I've drafted a vote at Wiktionary:Votes/2021-02/Expanding CFI for place names‎‎ for expanding and clarifying CFI for place names. I appreciate any input left on the vote's talk page! Ultimateria (talk) 22:09, 10 February 2021 (UTC)

Guanche terms in Spanish etymology sections[edit]

I have been encountering several cases of confidently cited Guanche terms in Spanish etymologies, written in Tifinagh (a script used in writing Berber languages). Examples: Beneharo, Derque, Echedey, Firjas, gofio, Hañagua, Itahisa, Kebehi, Meagens, and several others listed in Category:Spanish terms derived from Guanche. Some of these use an asterisk by the Tifinagh original and/or the transliteration. (None of the above-cited terms star the original but many of them put an asterisk on the transliteration; see tajinaste for a term with an asterisk by the original.) I am highly skeptical of the accuracy of these terms. In general they are cited to a certain Ignacio Reyes Garcia, who is not in Wikipedia and who seems to have written an obscure book called "Nombres Personales de las Islas Canarias" that is out of print. They also cite a blog that I am guessing copied the book, but the blog no longer exists. (Some of the pages are archived in the Wayback Machine, but I tried opening the link on gofio and just get a blank page: [7].) Per Wikipedia, although Tifinagh did in fact exist in certain inscriptions in the Canary Islands, the variant that was used is not well deciphered, and it isn't even known for certain that Guanche is a Berber language (as Wiktionary claims). I suspect that Reyes Garcia's transliterations and definitions are largely fanciful (similar to supposed "decipherments" of the Indus Valley and Rongorongo scripts), and he may have even taken reconstructed Guanche terms and transliterated them into Neo-Tifinagh in order to generate the supposed originals. The etymology on gofio was added by User:Jberkel, while many of the others were added by User:JaS, who I am not familiar with and who appears no longer active. The one on tajinaste, which is not cited, was added by User:Smettems, another user I don't recognize. Benwing2 (talk) 02:12, 11 February 2021 (UTC)

Yes, they are fanciful and should be removed. No Guanche entries should use Tifinagh script, and without highly compelling evidence, no reconstructed entries should be created. The existing entries (by @Tibidibi) also have major problems, but as they haven't fixed them, I guess I'll have to get around to it myself. —Μετάknowledgediscuss/deeds 18:45, 11 February 2021 (UTC)
@Metaknowledge, sorry about that! I'd kind of forgotten about cleaning up after myself :( You can delete them all if it's necessary.--Tibidibi (talk) 05:40, 12 February 2021 (UTC)

What's the best way to search/link to Usenet?[edit]

Google Groups has made its interface less useful, as you now need to be logged in to search all groups, and many have been blacklisted for containing spam. Usenet Archives doesn't allow searching and doesn't appear to provide metadata (and is mostly incomplete to boot). Is there a better Usenet archive? Even if Google Groups is ideal for permalinks, is there any other place to search, especially if Google decides to stop providing this service? grendel|khan 02:45, 12 February 2021 (UTC)

How should we handle Late Common Slavic loans into Romanian?[edit]

Romanian has a large amount of Slavic words that were borrowed from the local Slavic population during the 8th to 10th centuries. They spoke a language that was still mutually intelligible with the rest of the Slavs and innovations were still spreading throughout the area, but some dialectal differences had already began to arise.

From what language should we say they were borrowed? If I say "Proto-Slavic", in 95% of the times the form of the word is identical, but there are exceptions.

Romanian linguists typically use "Old Slavic", but which makes it ambiguous because it includes everything: the actual Proto-Slavic (6-7th century), Old Church Slavonic and these words as well.

I see that Slavicists often use "Late Common Slavic", which is good in setting the timespan, but even that is a bit ambiguous geography-wise, as South Slavic already began to be separate from the rest. Bogdan (talk) 11:04, 12 February 2021 (UTC)

I assume that's why we just give Common Slavic as the source – since we don't actually know when and where the loan was made, due to a complete lack of sources, it's better not to guess. --Robbie SWE (talk) 20:23, 12 February 2021 (UTC)

Relocating Japanese historical hiragana[edit]

Currently historical hiragana are displayed like this:

える • (kaeru) transitive ichidan (stem え (kae), past えた (kaeta), historical kana かへる)

But seemingly that bracket is intended for inflected forms. This looks somehow illogical. How about this?

える • (kaeru)←かへる (kaferu, hist.) transitive ichidan (stem え (kae), past えた (kaeta))

-- Huhu9001 (talk) 11:54, 12 February 2021 (UTC)

No objection. I definitely agree that the old location is awkward. —Suzukaze-c (talk) 12:08, 14 February 2021 (UTC)
Nitpicking on presentation: I don't think that <small> is necessary, and it also makes clicking on the question mark (enclosed in sup→small→sup) hellish. —Suzukaze-c (talk) 12:11, 14 February 2021 (UTC)
<small> removed. -- Huhu9001 (talk) 14:02, 14 February 2021 (UTC)

Should "Middle Korean adjectives" be abolished?[edit]

@LoutK, Suzukaze-c,

Currently, Wiktionary distinguishes adjectives from verbs in Middle Korean. However, the distinction between the two is much less clear in MK. Examples:

  • 性이 서르 갓가오나 ᄇᆡ호ᄆᆞ로 서르 머ᄂᆞ니 (their nature is close to each other, but by learning they become distant)
    • The ModK equivalent is 性이 서로 가까우나 배움으로 서로 멀어지니
  • 피리 부로매 셴 머리 도로 검ᄂᆞ니 (in playing the flute, the white hair becomes black again)
    • The ModK equivalent is 피리 부르니 센 머리가 다시 검어지니
  • 愛水 흐르디 아니케 ᄒᆞ면 湛性이 ᄆᆞᆰᄂᆞᆫ 젼ᄎᆞ로 (because the tranquil mind becomes clear if one makes the sexual fluids not flow)
    • The ModK equivalent is 愛水가 흐르지 않게 하면 湛性이 맑아지기 때문에
  • 蓮ㅅ 고지 븕고져 ᄒᆞ놋다 (How the lotus flower seeks to be red!)
    • The ModK equivalent is 연꽃이 붉어지려 하는구나
  • 君子이 싁싁ᄒᆞ고 공경ᄒᆞ면 나날 어디러 가고 (the junzi who is strict and reverent becomes more benevolent day after day)
    • The ModK equivalent is 군자가 엄하고 공경하면 나날로 어질어져 가고
  • ᄀᆞ장 모딘 罪 다 업서 버서나니라 (their most grave sins shall all vanish and they shall shed [their agony])
    • The ModK equivalent is 가장 심한 罪가 모두 없어져서 [이를] 벗어나리라; an adjectival interpretation doesn't make any sense.

In effect, most adjectives could also be considered verbs with the meaning of "to become [STATE]", with e.g. 블근 곳 (pulkun kwos, red flower) literally being "flower that has become red".

There are also very rare cases of transitive use of adjectives:

  • 갈 바ᄅᆞᆯ 아득ᄒᆞ야 머리ᄅᆞᆯ 돌아보니 (turning the head to look back, considering as distant the way that I must take)

There are still convincing morphological arguments for differentiating verbs from adjectives, and scholars do usually still consider "adjectives" as a discrete category. For example, only adjectives in hota can be adverbalized with hi. The distinction is especially clear in the semantic field of emotion. 젛다 (cehta) is specifically "to fear", i.e. "to become scared", and the derived 저프다 (cephuta) is "to be scared"; 슳다 (sulhta) is "to grieve" and the derived 슬프다 (sulphuta) means the same thing it does now. These derived adjectives of emotion almost never show verbal usage.

Still, it's my opinion that Wiktionary would be better served if we abolished Category:Middle Korean adjectives entirely, because as it stands most the definitions are incomplete (they're missing the verbal "to become [STATE]" definition), there is effectively no difference for the purposes of conjugation templates, and it better shows the ways in which MK differed grammatically from MdK.

Thoughts?--Tibidibi (talk) 10:28, 13 February 2021 (UTC)

No opinion. But I note that Chinese has the same feature, where some adjectives can mean 'become [adjective]', as in . I think we only record the adjective sense. —Suzukaze-c (talk) 23:04, 13 February 2021 (UTC)
Symbol support vote.svg Support If you think the dictionary is better served by this change, then I'm definitely on board. — LoutK (talk) 01:38, 14 February 2021 (UTC)

alternative pronunciations vs forms[edit]

Halp, am of doubt as to whether or not to list alternative pronunciations on the lemma page for languages that pretend to be phonemic (with a beast like English, it's a no-brainer.). Seems intuitively obvious, but then in many cases there's no still corresponding spelling; it would also imply that people who pronounce it alternatively also use the alternative spelling, and pronounce the standard spelling using spelling-pronunciation, which is rarely true.

Secondary question: would it not be better to give a truncated transcription of only the segment that differs instead of the whole word, where no braackets are possible (e.g. presence/absence of nasalisation: [õ])? Can this be easily done with the pronunciation template, or better to use manual IPA?

Tertiary question: do or don't list alternative forms on other alternative forms' pages? Brutal Russian (talk) 20:06, 15 February 2021 (UTC)

Here's my thoughts; they don't necessarily correspond to community consensus:
  • The first question doesn't have an easy answer; things are done differently by different editors and in different contexts. However, for languages where the spelling doesn't correspond well to pronunciation (e.g. English), I prefer to centralise most pronunciations at the lemma, because speakers often like to use standard spellings even when it doesn't correspond to their own pronunciation (for example, I say /ˈɛk(t͡)ʃɫi/ and /pɹəˌnæɘ̜nsiˈæɪʃən/, but write actually and pronunciation, not akshly and pronounciation).
  • However, some (particularly prominent) alternative forms can have their own pronunciation sections. Alternative forms that are particularly divergent or exotic (in meaning etc.) maybe shouldn't go on the lemma.
  • For the second question, practice varies here as well, but I generally prefer to have each alternate pronunciation wrote out in full unless there's too many alternate pronunciations to make that practical. I don't edit languages that have their own pronunciation templates, so I wouldn't know how well
  • For your third question, it's better to avoid listing alternative forms on other alternate forms, as that increases the amount of maintenance to do (e.g. if somebody wants to add a new alternate form.
Hazarasp (parlement · werkis) 06:05, 17 February 2021 (UTC)
  • For #3, I'll ditto Hazarasp's comment. Japanese alternative form entries are kept as simple stubs acting as soft-redirects back to the lemma form. We've been able to use templates and modules to keep these very streamlined. See, for example, lemma entry あなた (anata), and alternative-form entry 貴方. The latter is complicated due to the oddities of Japanese orthography -- the pronunciation anata for that spelling is the alt form, pointing the user back to the lemma at あなた (anata), while the pronunciation kihō is an entirely separate term with its own separate etymology and other details.
For a simpler example, see also lemma (sakura) and alternative (hiragana) form at さくら (sakura). There's a lot of detail visible at さくら (sakura), but if you open the page in edit view, you'll see that this is all pulled from the lemma entry using the template and module. This approach allows us to keep all the important lexicographic detail in one place -- in the lemma entry. This keeps maintenance much simpler, since we don't have to update the same information in multiple places. ‑‑ Eiríkr Útlendi │Tala við mig 19:32, 19 February 2021 (UTC)

Monthly Community Project to Import a Public Domain Dictionary or Glossary to Wiktionary[edit]

I've noticed that Project Guntenberg has multiple dictionaries [8] [9] whose addition would greatly benefit Wiktionary. Could we set up a monthly contest to have their data imported into Wiktionary? Perhaps at the start of the month, we could set up a list of words contained in the dictionary and then have individuals check off once they have imported it? These public domain dictionaries are such a rich source of knowledge and would greatly enhance wiktionary. Languageseeker (talk) 20:36, 15 February 2021 (UTC)

etymology: bor template doubts[edit]

Halp, more confusenings. Special:Diff/60429477 that User:Ultimateria made a while ago warns against using {bor} unless there's no previous links in the etymological chain. But my intention there was precisely to categorize the word as an Occitan term borrowed from Latin - even if the borrowing first happened in Old Occitan. Firstly, often the time of borrowing, or the more appropriate periodisation of the language is unclear (Basque marti). Second, if one wants to find all the Latin borrowings in Occitan, the no previous link approach only makes this possible by going to the earlier stages of the language, and in some cases by having to check two or more language stages, because the "Derived from Latin" category will show inherited and borrowed terms promiscuously. I specifically wanted the word to show up as a borrowing.

Additionally, there are many European words of Latin origin that have spread via the mediation of some prestige language, most often French. And at times it's impossible to tell whether the word was borrowed directly from Latin or rather the-French-borrowed-it-and-so-did-we, i.e. technically the borrowing is from Latin, but could be treated as a relatinisation of a French word borrowed from Latin (a calque-borrowing? :pensive_face:). Case in point: Dutch unie, which can't be directly from either French or Latin, but is a Romance/Medieval Latin-style adaptation into unia, also visible in many Slavic languages: unia/уния/унија. Dutch dictionaries don't know what to make of it: some say from French, others say from Latin. In my opinion both is justified and should be templated as such. Brutal Russian (talk) 20:37, 15 February 2021 (UTC)

Exclusively focusing on the Dutch example: It can be a direct borrowing even if the ending has been changed, -ion < -io endings are practically never borrowed as -io(n) in Dutch because the type is considered foreign. In this case the more recent etymological dictionaries agree that it is from French and they date it to Middle French, so considering it a borrowing from Middle French should be uncontroversial. ←₰-→ Lingo Bingo Dingo (talk) 18:54, 16 February 2021 (UTC)

learned borrowings template[edit]

This template is severely underused even for languages like Dutch, where a clear separation between learned and non-learned Latin borrowings is desirable. In Romance languages, where basically any term borrowed from Latin is almost by definition learned (exceptions would mostly look like this - check it out, entertaining)... in these languages it's unused presumably because nobody feels the need for it. What do?

Corollary: this question extends to the {desc} template's lbor=1 parameter. Are there clear guidelines on when to use it? A whole tree of borrowings with (learned) added looks quite ugly and unnecessary. My current thinking is to use it when learned and non-learned borrowings appear in the same {top2} field; when I separate inherited and borrowed descendants, I use bor=1 instead to get the nicely-looking arrow. Brutal Russian (talk) 21:00, 15 February 2021 (UTC)

Because the template isn't used much, if you're wanting to massively expand its use, you're in the (un)enviable position of setting the standard here - do whatever you feel comfortable with, as long as it doesn't contradict existing practice and makes some kind of sense. Hazarasp (parlement · werkis) 06:09, 17 February 2021 (UTC)
I only recently became familiar with the template, but I will use it from now on where I think it is appropriate. ←₰-→ Lingo Bingo Dingo (talk) 17:31, 17 February 2021 (UTC)
@Hazarasp, Lingo Bingo Dingo: So given what I said above about Romance languages, I've been thinking: is it possible or desirable to automatically convert all borrowings from a parent language (English borrowing from Old English etc) into learned borrowings? With an override parameter for the sicutera's? Do we even have an appropriate template/category for such folksy corruptions/Chinese telephone items? Brutal Russian (talk) 02:49, 20 February 2021 (UTC)
@Brutal Russian I'm not really sure; a large proportion of borrowings from Latin will be learned borrowings, but I can't really judge the extent that it is possible that terms in some registers were e.g. picked up by members of the general public when in contact with Latin jargon (not necessarily with irregular secondary changes). ←₰-→ Lingo Bingo Dingo (talk) 08:20, 20 February 2021 (UTC)

cognates vs related terms[edit]

Special:Diff/61781951&oldid=61781606 by User:Rua warns against listing as cognates in the Etymology section those terms that already appear under Related. This slaps some further questions into me: which section to prefer for listing these words? Do we reserve Etymology for outside cognates and list internal ones in Related? What if mentioning some or all of these while discussing Etymology is beneficial - do we now forego listing them in Related? Any insights appreciated. Brutal Russian (talk) 21:07, 15 February 2021 (UTC)

Related terms is definitely the preferred section. "Do we reserve Etymology for outside cognates and list internal ones in Related?" Yes. If for example a word is inherited from a word in the parent language but is influenced by another word in the child language, it should be mentioned in the etymology, but I think it's fine to also link to it in related terms in this case. In Romance languages at least there's not much reason for same-language links in etymologies. The kind of redundancy I often remove is when a word is simply "foo" + "-bar" and "foo" is listed under related terms. Ultimateria (talk) 21:55, 15 February 2021 (UTC)
I wonder whether it might not be a good idea to have Related terms be an L4 heading under ===Etymology=== rather than under the part of speeech, since Related terms is for etymologically related words, regardless of the POS of the lemma. —Mahāgaja · talk 12:22, 17 February 2021 (UTC)
I wouldn't like that because it would push the down start of the main content even more. --{{victar|talk}} 06:33, 18 February 2021 (UTC)
Ignoring the likelihood that most monolingual users and many English learners want definitions more than any other content, we often have ridiculously long alternative form, pronunciation, and etymology sections. Alternative forms can be shown in a comma-separated, horizontal list. Regional pronunciation variations can appear under show-hide bars, as can long lists of cognates and alternative speculations about etymology. We could probably go further by hiding IPA. Registered users can set preferences to display hidden content. Related terms seem likely to be helpful for reasons other than interest in etymology. DCDuring (talk) 14:30, 18 February 2021 (UTC)
Man, some etymology sections for English words read like novels (see: thou, plat). I don't know what the solution is aside from maybe separating the omnipresent lists of 25+ cognates into another section/line. (Or...just not listing so many damn cognates) DJ K-Çel (contribs ~ talk) 20:17, 18 February 2021 (UTC)
See thou. I didn't change plat because the longest etymology section was for obsolete definitions. The entire content of etymology section 3 is of solely historical interest. DCDuring (talk) 13:46, 19 February 2021 (UTC)
Perhaps we need some kind of weasel-proof guidelines for which cognates to use- and which ones to leave out. The problem is all the IP editors that insist on adding their languages whenever they see someone else's language being used: you can't have just a Swedish cognate- that would be unfair to Danish, Norwegian, Faroese and Icelandic. Spanish? What about Asturian? Portuguese? What about Galician and Mirandese? Bulgarian? What about Macedonian? The languages I'm mentioning may not be the actual ones that get the most of this, but it should give an idea. The worst part is that the ones added by the "me too" folks are usually closely related to the ones already there, and thus mostly useless. Then there are the cases where there are a few representative cognates illustrating some finer point regarding the local development of, say West Germanic, and someone decides to add Albanian or Persian instead of letting the Proto-Indo-European entry tell that part of the story. Chuck Entz (talk)
I'm for showing a few (3 or 4, preferably from different branches of the language family) cognates even when a parent entry exists because users can make some helpful comparison without having to chase links around in truncated etymology sections. But I agree that more than that is undesirable. I'd also support showing cognates under a collapsible box if it bothers people too much about cluttering the page (we have that already on some entries). Mahagaja's idea of having Relates terms under Etymology is a good one. Altforms, if shown using {{alter}} already are listed horizontally instead of vertically. Maybe we can have the POS header first followed by pronunciation, etymology and descendants... -- 𝓑𝓱𝓪𝓰𝓪𝓭𝓪𝓽𝓽𝓪(𝓽𝓪𝓵𝓴) 11:02, 19 February 2021 (UTC)
Just hide them. DCDuring (talk) 13:46, 19 February 2021 (UTC)

Side note, I've been to lazy too put this together, but really the cognates list should have the option to be automated much like we do with {{suffixsee}}, perhaps in a template called {{cognatesee|lang|root|family}} -- working in conjunction with {{root}} -- and thrown under a ====Cognates==== header. @Erutuon --{{victar|talk}} 21:33, 19 February 2021 (UTC)

WordHippo potential violation[edit]

Where do we report potential copyright violations? Compare this revision of spew to Word Hippo's definitions, including "(Can we verify(+) this sense?)". I've saved a screenshot as well. There doesn't seem to be any attribution on the page. DAVilla 11:57, 16 February 2021 (UTC)

@DAVilla: Wikipedia:Mirrors_and_forks#Non-compliance_process. —Suzukaze-c (talk) 12:01, 16 February 2021 (UTC)
Ah, in that case I can't proceed, at least not with that word. I wasn't an editor for that page. DAVilla 14:03, 16 February 2021 (UTC)
How does that matter? The standard letters just point out the copyright violation.  --Lambiam 08:55, 17 February 2021 (UTC)
He wouldn't have standing to press the matter legally. I don't understand why MWF wouldn't accumulate complaints against offenders and then make the license-violation/copyright complaint. It will never be efficient for the responsibility to remain with each individual copyright holder. DCDuring (talk) 16:14, 17 February 2021 (UTC)
I'm tempted to suggest that we set up a Wall of Shame calling out blatant unattributed plagiarism like this, but it probably would be more trouble and bad publicity than it's worth. Chuck Entz (talk) 04:26, 19 February 2021 (UTC)
definitiondb.com is another one. There are tons. Equinox 16:47, 19 February 2021 (UTC)
It is highly unlikely that these are one-offs. I expect that each such site took content, probably everything available in a dump that met some criteria. If so, it would seem that WMF should be made aware, take whatever action they deem appropriate, and let us know what is the upshot. We could ask them what they are likely to do before we waste time on documenting the problem for the various sites that are guilty of this unacknowledged downloading. DCDuring (talk) 17:01, 19 February 2021 (UTC)
WMF will never win. Spammers always win. That's why the beautiful pioneering Internet of the '90s is now trash TV. Equinox 22:27, 19 February 2021 (UTC)

Possible case of admin abuse[edit]

See Infodesk. Can we get some more input on this.__Gamren (talk) 13:58, 19 February 2021 (UTC)

The long block of the user seems wrong on its face. The user seems to have provided evidence showing usage that fits the labels and definitions involved. The admin's position seems at best prescriptivist and possibly PoV. DCDuring (talk) 14:34, 19 February 2021 (UTC)
On Wikipedia some policies allow action only by "uninvolved" admins. I don't think admin participants in a dispute should be handing out long blocks to their unarmed adversaries. A few hours to a day is plenty long enough for an uninvolved admin to make a long term decision. Vox Sciurorum (talk) 15:14, 19 February 2021 (UTC)

Cognates for borrowings[edit]

Wiktionary:Etymology#Cognates says that cognates should be listed only for inherited words. This is generally a sensible policy, as the fact that a Romanian word borrowed from Ottoman Turkish has a Swahili cognate is irrelevant trivia.

However, there is the case in which the identification of the source language is not straightforward.

For instance, Romanian borrowed a few thousand words from a plethora of Slavic languages: Proto-Slavic, Old Church Slavonic, Old Bulgarian, modern Bulgarian, Serbian, Russian, Ukrainian and Polish, etc. and many words look very similar in all these languages.

Identifying which is the source language is not easy and comparing the forms in the possible candidates is part of this process.

Any thoughts about this? Bogdan (talk) 20:32, 19 February 2021 (UTC)

@Bogdan: I can give you the definite answer that the section you refer to is blatantly false, apart from the document being an old compilation that hasn’t seen our recent best practices and architecture, so naturally contradicting itself and established later schemes. Obviously with borrowed terms, wanderworts, the origin of which is dubious, we list cognates. Only if one restricts the meaning of “cognate” by “related as as sister languages from a common ancestor” that statement is valid. But it is even less true for inherited words than for any other words because then we regularly can reconstruct an ancestor where we list cognates in a descendants list, making the cognate lists superfluous.
The document reads like one tried an introduction into historical linguistics. But the matter you describe, the information you have, already suggests you how to write etymology, and hopefully you know yourself from outside how etymologies should be written! Fay Freak (talk) 21:21, 19 February 2021 (UTC)
I would say "from a {{der|ro|sla}} language. Compare " followed by Slavic terms that seem to be related to whatever the donor term was. If appropriate, I might add "Ultimately from {{der|ro|sla-pro|[...]}}" with the Proto-Slavic term replacing the [...] in this example. Another option would be to substitute the "Ultimately from {{der|ro|sla-pro|[...]}}" part for the "Compare" part- that is, let them find all the presumably related Slavic forms in the Descendants section of the Proto-Slavic entry. Chuck Entz (talk) 21:51, 19 February 2021 (UTC)
In most cases, it is possible to tell which language was the source due to phonetic differences.
What problem I still have is what name to use for the language from which the bulk of 8-10th centuries borrowings were made.
  • Late Common Slavic -- basically it was still a late stage of Proto-Slavic, still acting like one language and mutually comprehensible with the other parts of the Slavic dialect continuum
  • Old Church Slavonic -- the differences between the language from which the words were borrowed and the dialect which was standardized as Old Church Slavonic were very small, basically the same language. (for this reason, Bulgarians call Old Church Slavonic "Old Bulgarian")
  • Old Bulgarian -- the surviving similar dialects ended up being Bulgarian
Bogdan (talk) 22:05, 19 February 2021 (UTC)
@Bogdan: If you have considered yourself more confused, misinformed or disappointed after reading Wiktionary:Etymology in earnest, you might vote pro on the just created motion for deletion of that page. Fay Freak (talk) 22:22, 19 February 2021 (UTC)

Taking into consideration that we have a substantial amount of high-quality reconstruction pages for several proto-languages, I personally see no point in adding cognates in an etymology section, unless it provides an interesting aspect that the main etymology does not convey. Can we choose to keep the etymology sections short and sweet, then we should definitely put in the effort. --Robbie SWE (talk) 18:50, 20 February 2021 (UTC)

Descendants: Inherited vs Borrowings[edit]

So I've been busying myself with these - separating the raisins from the flies, as the Russian saying goes - for a while now, and the best approprioach (made this word up accidentally but it works) I've discovered through practice is roughly this: to have two separate sections if the ratio of inherited to borrowed comparable and they don't fit into two columns of 5. Alanus is already pushing it and I'd rather separate them. minimus absolutely needs two separate fields in my opinion. I also separate ancient/natural borrowings from learned ones, especially those from the early-modern period onwards, by listing the former together with inherited words, as I did in fabrica. I haven't felt the need to use all three sections for a single word so far. Apart from this solution seemingly not being adopted by anyone else (I may have seen it once), there's at least one other thing: it breaks {{desctree}}. So firstly, how do other editors feel about the issue, and secondly, if the feeling if mutual, I propose introducing the practice and modifying desctree with an option to choose between listing items from the Descendants, Borrowings or Learned borrowings section. This will also benefit the template and make it usable in cases which currently would result in a monstruous list of descendants, with many unwanted learned borrowings. Alternatively, make a {{bortree}}.

Also, what are your thoughts on collapsible lists? IMO the website's styling would benifit a lot from hiding long lists like that, and I would also add that I find the approach to collapsibel lists that the Hungarian wiktionary takes to be the bestest: uncollapse by default, use smallcaps for language names, separate the columns with background colour - and honestly, we could add links to other-language wiktionaries to descendants as well. Brutal Russian (talk) 03:06, 20 February 2021 (UTC)

I agree with you that separating inherited terms from borrowings is a good idea if there's a whole bunch of each. I might start doing that myself - the inability to {{desctree}} is a loss I can take, especially as its flaws already often make it unusable (e.g. it can't handle terms with multiple etymologies).
As for collapsible lists, I'm kinda non-plussed by them, though I don't have any cogent objections to them other than the fact that I think the hu.wikt translation box looks ugly (though there's other styles that could be used, such as that used by Template:col4) Hazarasp (parlement · werkis) 09:05, 20 February 2021 (UTC)
@Brutal Russian: You can't create headers that aren't in WT:EL. I would support a ===Borrowed terms=== header if you make a vote for its inclusion, but until then, please use |bor= and |lbor= where appropriate. --{{victar|talk}} 21:09, 20 February 2021 (UTC)
@Brutal Russian, |1= should always be lang. Please place |bor=, |lbor=, etc. and the end of {{desc}}. Thanks. --{{victar|talk}} 22:58, 20 February 2021 (UTC)
I'm not sure that a ====Learned borrowings==== header is actually the right way to do things. A more elegant method to organise the morass of descendants that some words have would be to allow users to divide up ====Descendants==== sections using subheadings (e.g. =====Inherited=====, =====Borrowed=====, but freedom would be given to define arbitrary subheadings within reason). Hazarasp (parlement · werkis) 01:30, 23 February 2021 (UTC)
I disagree, but draft a vote. --{{victar|talk}} 07:42, 23 February 2021 (UTC)

Alternative forms - diacritics[edit]

While listing these for Latin I've discovered that in many cases, diacritics stand in the way. Most alternative forms come from periods vowel length breakdown such as Late and Medieval Latin, so one might as well mark all of these with macron-breve. When these ā̆'s agglomerate, the result can be plain ugly. For the lulz a representative example: deorsum. Outside of these, the macrons are not only rather redundant, they also encourage the editor to list variant prosody as alternate forms. While this is clearly the right way to go for languages that include diacritics in the page name (Latvian, Spanish), in Latin the difference will simply be noted on the same page. I expect the same arguments will be true for Ancient Greek etc - but especially with reference to ancient languages with multiple periodisations/pronunciation traditions, where specifying prosody in alternative forms would lead to innumerable alternatives. I propose adopting a policy of foregoing prosody in alternative forms for such languages, as I did on ieiunus. If someone feels this to be somewhat inconsistent with listing ie- and ia- forms while also giving both pronunciations in the lemma (based on the discussion just above), I also welcome your thoughts. Brutal Russian (talk) 03:48, 20 February 2021 (UTC)

Derivatives/Descendants of alternative forms vs the lemma[edit]

Example: ientō, alt form of ieientō, or zizania, collective singularisation of zizanium. Where do the descendants go in such cases? I suppose this can be resolved using {{desctree}}, but is this suggested? — And what about derivatives? — Also, what's the best/currently adopted way to specify that something comes from an alternative form instead of the lemma in the Descendants section, while avoiding making a Latin word a descendant of another Latin word (which I take it is discouraged)? For instance, some of the descendants continue one form of the word, and the rest clearly continue another, perhaps unattested? Does this call for making a new (reconstruction) page and including the desctree from that in the lemma? Brutal Russian (talk) 04:02, 20 February 2021 (UTC)

Again, multiple approaches have been used here, depending on the scenario, personal preference, etc. This is really a area where there personal judgement supersedes hard-and-fast rules, but here's some broad guidelines:
If only one or two descendants come from a alternative form, I would add a qualifier next to those descendants, e.g. at bolster:
  • English: bolster
  • Scots: bowster, bouster, boster (bowstur)
If more descendants come from a alternative form, then you can put them under a subheading (e.g. at sabbatum):
From the variant *sambatum:
If most or all descendants come from a alternate form, then I believe it's to centralise them on the main page, maybe with a note beforehand explaining the situation (compare tabula; the forms at tabla probably should be replaced with a note telling readers to go to tabula). However, other editors do disagree with me (as indicated by the existence of forms at tabula). Some might even think it's best to have forms at both the main form and the alternate form.
As for your comment about making Latin words descendants of other Latin words, yes, this is generally avoided, but some still do it (see the current situation at sabatum. Hazarasp (parlement · werkis) 08:57, 20 February 2021 (UTC)
If the alternative form has a lot of descendants, why aren't they on that form's own page, possibly with a cf. or something to draw attention to it? DCDuring (talk) 23:23, 22 February 2021 (UTC)
Generally, I think everything should be centralised on the main lemma as much as possible (within reason). This is because it's what people are more likely to look for (e.g. more people will search for tabula than tabla). Hazarasp (parlement · werkis) 01:35, 23 February 2021 (UTC)

Google Groups no longer provides message IDs.[edit]

See this early use of netiquette. The original message headers are no longer available, which means I can't include the Message-ID, which uniquely identifies a Usenet message. I also can't revert to the old-school flavor of the original Google Groups, which let me do that. Is this a regression? Is there anything we can do? grendel|khan 08:35, 20 February 2021 (UTC)

Google Groups is not the only place for Usenet messages. J3133 (talk) 08:51, 20 February 2021 (UTC)
Do you know of another searchable archive of Usenet messages?  --Lambiam 14:50, 20 February 2021 (UTC)
@J3133: I asked about that earlier this month, but didn't hear anything back. {{quote-newsgroup}}'s docs and discussion don't suggest anything else. Do you have another link to the message used there? Or its Message-ID and instructions on how you got it? grendel|khan 16:41, 20 February 2021 (UTC)
You can also no longer forward these messages, which would (presumably) have included the message-ID.  --Lambiam 14:50, 20 February 2021 (UTC)
Google was only nice when being nice helped build the brand. Now they seem to be cutting costs, limiting financial/legal liability, and reducing political risk. DCDuring (talk) 16:52, 20 February 2021 (UTC)
Archive.org has a Usenet Historical Collection. According to the description, it spans "more than 30 years", though it's not clear which years, or how their coverage compares to what's indexed by Google Groups. However, as an experiment, I went to their news subcollection and downloaded news.misc.mbox.zip (23.5 MB). After unzipping, I grepped the mbox file for the string 'CORPARASHUN' and found the netiquette message linked in the original post. The Message-Id header looks like: Message-ID: <7805@BIT.NET>. Now, if you wanted to do a search against all groups, it would require some level of technical proficiency and a fair bit of disk space (eyeballing it, all the files in the usenet-news subcollection look like they'd add up to around 8GB uncompressed, and that's just one out of 1,019 subcollections). But it's at least comforting to know that this redundancy exists.
Apparently there was also another Usenet archive on archive.org, The UTZOO Wiseman Usenet Archive, but the author took it down recently as a result of some legal threats. Unfortunate.
I also stumbled on UsenetArchives.com, another online archive of usenet posts, but it doesn't look very promising. It doesn't have a search feature, and currently doesn't seem to be indexed by Google, and coverage seems spotty. But it's a recent-ish project, and maybe it will get better as development continues.
As a final note, I would personally not be too concerned about adding quotes without message ids. I think having the group name, plus year, plus subject line, plus author, plus text excerpt should be more than enough bits of entropy to uniquely identify the message. Colin M (talk) 21:38, 20 February 2021 (UTC)
Would it be possible to add a URL as well? That would add to verifiability. — SGconlaw (talk) 14:04, 21 February 2021 (UTC)
For a message found in the archive.org collection? No, unfortunately the only way to view a particular message from that collection is to download the corresponding zip file to your computer, unzip it, and find the message inside the mbox file. I suppose you could include the url for the zipped mbox file, but I doubt many readers would want to go through the legwork to deal with it. Colin M (talk) 20:40, 21 February 2021 (UTC)
Ah, I meant at Google Groups. Even a message ID can no longer be provided, at least a URL can be added. — SGconlaw (talk) 21:15, 21 February 2021 (UTC)
Oh, in that case then yes, that's definitely possible. See the first URL in this thread for example. You can get a permalink for a given message by clicking the triple-dot button in the top-right of the message and clicking "Link" from the dropdown menu. Colin M (talk) 20:56, 22 February 2021 (UTC)
Thanks for those. The hoster of the Usenet Archives gives some background here; it is a mirror of the UTZOO Wiseman that has been taken down [fake news: it contains the UTZOO Wiseman archive but it also contains other messages]. The site does have a "search posts" function on my end, though. ←₰-→ Lingo Bingo Dingo (talk) 09:50, 21 February 2021 (UTC)
Whoops, you're right, I just missed the 'search posts' toggle in the UI. Colin M (talk) 20:43, 21 February 2021 (UTC)
I've edited my previous comment, because the Usenet Archives are clearly not limited to the UTZOO Wiseman archive. ←₰-→ Lingo Bingo Dingo (talk) 15:31, 23 February 2021 (UTC)
Thanks! I fetched the mbox file for news.misc, and the message ID for that particular message is 14434@goofy.megatest.UUCP; the one you listed is for a (wacky) reply from a BIFF. This is usable for a one-off, but (obviously) much more time-consuming than I'd prefer. I'll send some feedback, for whatever good that might do. Note also that you can still search by message ID in the interface. grendel|khan 19:17, 23 February 2021 (UTC)
I don't know what the total storage requirements are, but in theory something like that could also be hosted on WMF lab infrastructure, or at least a tool to extract message ids. – Jberkel 11:48, 22 February 2021 (UTC)
That would be amazing! There are a lot of annoying limitations with the Google Groups UI. The biggest one for me is the inability to sort from oldest-to-newest, which would be so useful for antedating. It would also be great to be able to search by regex. Also, it would be nice if you could, say, select some text from a message and click a button to get a pre-filled quote-newsgroup template. Colin M (talk) 21:03, 22 February 2021 (UTC)
Re-thinking this again, the content itself cannot be stored/served from WMF servers, because of the fuzzy copyright status of usenet messages. – Jberkel 22:23, 22 February 2021 (UTC)

Category:Matter and its subcategories vs. Category:Chemistry[edit]

I recently added Category:Matter to Category:Chemistry (aside from it being included in "Nature") because most if not all of its subcategories are closely related to chemistry: Acids, Chemical elements‎, Drugs, Dyes, Explosives, Gases, Inorganic compounds, Ions, Liquids, Metals, Minerals, Natural resources, Organic compounds, and Poisons. (The only exception seems to be Subatomic particles, which belongs more to physics.) Also, compounds may get categorized broadly under "chemistry" but if someone wanted to move them into the right subcategory, it was not so easy to find it, as "(in)organic compounds" was/were not part of "chemistry". On the other hand, @Benwing suggested to me that if there are subcategories of 'matter' that relate to chemistry, IMO you should add those subcategories directly to 'chemistry'.

I replied that in that case I'm afraid "Matter" itself would become partly redundant and somewhat pointless, and the distinction between the categories directly included in "chemistry" and those that are not may be fairly arbitrary. On the whole, treating "Matter" as a meaningful unit on its own still seems more feasible. In fact, I'd suggest that we delineate the category "Matter" more in accordance with its current content, which has a great deal of overlap with chemistry.

Benwing wrote that "Matter" on the whole sounds very vague and it seems very strange e.g. to put "Drugs" under "Matter". Maybe getting rid of it and moving those categories above to "Chemistry" and putting "Subatomic particles" directly under "Physics" is the right thing to do. – What do you all think about this? Adam78 (talk) 17:00, 21 February 2021 (UTC)

"pronominal" vs. "reflexive" in Spanish[edit]

(Notifying Ungoliant MMDCCLXIV, Metaknowledge, Ultimateria, Gibraltar Rocks): Supposedly there is a distinction between "pronominal" and "reflexive" verbs in Spanish. See https://www.spanishdict.com/answers/208148/how-do-you-distinguish-between-pronominal-and-reflexive-verbs. Correspondingly, some senses in some verbs are labeled "pronominal" and some "reflexive". I know of no other language with reflexive verbs (e.g. Russian, French, Italian) that makes such a distinction, and Wiktionary doesn't recognize the label "pronominal" or categorize it in any way. I don't understand the distinction, really, and I doubt it's necessary to make it. Anyone object if I replace "pronominal" with "reflexive"? Benwing2 (talk) 06:05, 22 February 2021 (UTC)

Reflexive forms are a special case of pronominal forms, see reflexive. I think it's worth keeping the distinction, and not just in Spanish. Marking all as "reflexive" would certainly be wrong in some cases. – Jberkel 10:59, 22 February 2021 (UTC)
A verb that indicates an action that is necessarily done by the subject unto themself is different from a verb that requires a meaningless reflexive pronoun as part of its lexeme. If you want to remove the label reflexive (which is recognised), this information should be preserved some other way. — Ungoliant (falai) 11:08, 22 February 2021 (UTC)
  • I'm inclined to merge them because I've never seen the distinction in any reference materials, and I don't feel that it's very important. It's worth noting that Spanish dictionaries call these "pronominal" verbs, and English-Spanish dictionaries and learning/teaching materials almost always call them "reflexive". There's normally little reason to define true reflexive verbs (e.g. bañarse which should be a soft redirect) because their meaning is usually exactly what you'd guess, so most of the entries in our Spanish reflexive verb category would ideally be those Ungoliant's describes as "verb that requires a meaningless reflexive pronoun", in which case it would be more correct to call them "pronominal", a term that's still correct for the true reflexive verbs. (Pinging also @Froaringus.) Ultimateria (talk) 18:16, 22 February 2021 (UTC)
Lots of languages use reflexive morphology on verbs without reflexive semantics; I've never known a dictionary to label the two types differently. Even English has a small number of verbs with reflexive morphology but no reflexive semantics, such as avail (oneself of something). I don't think we need to make the distinction here either: as far as learning inflected forms goes, there's no difference, and the glosses of the entries tell us what they mean. And the difference isn't always clear-cut, anyway; certainly in German there are some morphologically reflexive verbs whose semantics are not immediately obviously reflexive, but could be considered reflexive with a bit of imagination, like sich trauen. —Mahāgaja · talk 18:47, 22 February 2021 (UTC)

Replacing all uses of {{etyl}} with new templates {{uder}} and {{uety}}[edit]

I notice several users, e.g. User:Apisite, User:Vivaelcelta, User:Embryomystic, User:Donnanz, "helpfully" replacing {{etyl}} with {{der}} in a more or less mechanical fashion instead of making the correct distinctions between {{bor}}, {{inh}} and {{der}}. User:Mahagaja has been trying to clean up {{etyl}} for a long time now, and these mechanical replacements destroy the use of {{etyl}} as a signal that manual cleanup is needed. I understand these users may be doing this because {{der}} looks nicer than {{etyl}}, but these replacements aren't helpful. To forestall further such changes, I propose to replace *all* uses of {{etyl}} by bot with one of two new templates, both of which indicate that further cleanup is needed:

  1. {{uder}} (undefined derivation) works like {{der}} but indicates that cleanup is needed, and will place the page in a cleanup category, similarly to {{etyl}}. It will be used whenever a construction like {{etyl|FOO|BAR}} {{m|FOO|...}} currently occurs, as well as in cases like {{etyl|ML.|BAR}} {{m|la|...}}, where {{etyl}} occurs with an etymology language whose parent is used in {{m}}. It will also be used in corresponding constructions where {{l}} occurs instead of {{m}}.
  2. {{uety}} (undefined etymology) replaces all remaining occurrences of {{etyl}}, like this: {{etyl|FOO|BAR}} -> {{uety|BAR|FOO}}. The idea is to use the standard language ordering, making it easier to later replace e.g. {{uety|es|ML.}} with something like {{bor|es|ML.|-}} or {{bor|es|ML.|term}} (as the case may be).

Potentially, an edit filter will flag instances of adding {{uder}} and {{uety}} by hand and maybe even prevent them from happening, and they may throw errors for languages that have already been completely cleaned up, similarly to what {{etyl}} currently does. (On the other hand, it might be useful to allow people to add them by hand in cases where it's not clear which etymology template is correct.)

Thoughts? Benwing2 (talk) 06:28, 22 February 2021 (UTC)

How is {{der}} different from the templates you propose? --Vahag (talk) 07:41, 22 February 2021 (UTC)
Trackability. {{uder}} means "no one has checked what kind of derivation this is yet", while {{der}} ideally ought to mean "someone has determined that {{der}} is correct here rather than {{inh}} or {{bor}} (but of course up to now it doesn't necessarily actually mean that). I support this suggestion. —Mahāgaja · talk 07:53, 22 February 2021 (UTC)
All this may be rather futile, as you can't stop {{der}} being used in newly added etymology or newly created entries. The only way you can prevent that is by deleting {{der}} itself. DonnanZ (talk) 09:33, 22 February 2021 (UTC)
No, we can't completely stop {{der}} from being misused, but we can reduce the frequency of such misuse. —Mahāgaja · talk 09:53, 22 February 2021 (UTC)
I don't necessarily agree that {{der}} is being misused, but I do suspect {{etyl}} is still being added by a tiny minority of users in languages where they are still able to, so the sooner the etyl cleanup is completely finished the better. DonnanZ (talk) 11:21, 22 February 2021 (UTC)
What is definitely still happening is that {{etyl|xyz}} is being used to generate the name of a language in etymology sections, in cases where {{cog}} or {{noncog}} – or simply writing the language's name – should be used instead. I know this is still happening because every couple of weeks I find new pages in Category:Language code missing/etyl. —Mahāgaja · talk 12:16, 22 February 2021 (UTC)
Great idea. Fay Freak (talk) 12:52, 22 February 2021 (UTC)
I totally support this, but I don't think adding {{uder}} should be flagged. I add a lot of etymologies, and I'm much more interested in finding the etymon and adding descendants to it than determining the method of derivation. I'm always a little uncomfortable just adding {{der}} when it needs to be fixed later. Ultimateria (talk) 19:05, 22 February 2021 (UTC)
I've been bothered by users replacing {{etyl}} with {{der}} for a while now, but I haven't been able to convince them to stop. Now we have a new mess that's much harder to clean up. :/ So I'm in favour of doing this. But at least for {{uder}} we can just make it a redirect to {{der}}. The transclusions will act as tracking, no more would be needed I think. —Rua (mew) 20:30, 22 February 2021 (UTC)
@Rua: Now that you have spoken, maybe you'd like to deal with the four instances in User:Rua/ja. DonnanZ (talk) 22:33, 22 February 2021 (UTC)
Great idea. I too find these indiscriminate changes less than helpful. —Μετάknowledgediscuss/deeds 20:36, 22 February 2021 (UTC)

Usage of t:ja-conj-bungo[edit]

I saw someone mass removing this template from all 一段 verb pages. Should the classical conjugation of classical 教ふ be given in the page 教える? -- Huhu9001 (talk) 22:04, 22 February 2021 (UTC)

Many other online dictionaries such as weblio.jp will give the classical conjugations for 一段 verbs.


  • That may have been me.
Regarding what other dictionaries do, the Daijisen example from Weblio shows a common hyper-abbreviated notation that includes several pieces of information.
modern kana 〔historical kana〕 【kanji + okurigana】
[verb vowel-stem lower monograde] [literary-form] historical kana  ["H"-stem lower bigrade]
The two lines parallel each other: modern info first, then historical. The historical here also gives us some information about etymology, by showing us the earlier Classical / Middle Japanese (sometimes also Old Japanese) lemma from which the modern form derives.
We include derivational information in our ===Etymology=== sections. Fuller information about the older forms, such as full inflection tables, we provide in the relevant lemma entry for those older forms.
I am fully supportive of including this inflection information in the lemma entries for verb forms that are actually included in that paradigm.
My concern is that the modern lemma verb forms ending in -iru and -eru for so-called 一段活用 (ichidan katsuyō, monograde conjugation) verbs are entirely absent from the Classical / Middle Japanese 二段活用 (nidan katsuyō, bigrade conjugation) verb paradigm.
For those unfamiliar, here are the basic conjugation stems for modern verbs ending in -iru and their Classical Japanese counterparts, and modern verbs ending in -eru and their Classical counterparts:
Conjugation Modern -iru verbs Classical Modern -eru verbs Classical
未然形 (mizenkei, irrealis form or negative stem) -i -i -e -e
連用形 (ren'yōkei, continuative or positive stem) -i -i -e -e
終止形 (shūshikei, terminal or predicative form)
Also known informally as the "dictionary form": this is the lemma.
-iru -u -eru -u
連体形 (rentaikei, attributive form) -iru -uru -eru -uru
已然形 (izenkei, realis or hypothetical form) -ire -ure -ere -ure
命令形 (meireikei, imperative or command form) -iro / -iyo -iyo -ero / -eyo -eyo
The key point I'd like to emphasize here is that the Classical counterparts to our modern -iru verbs have no forms ending in -iru. Likewise for our modern -eru verbs. It strikes me as problematic for a modern verb entry to include a conjugation table for Classical Japanese, where that table does not -- and cannot -- include the headword of the entry.
By way of loose analogy, our modern English do entry does not include Middle English conjugation forms like dide or dost -- rather, the Middle English conjugation tables are located at the lemma form for the Middle English verb, at don. Likewise, the Classical Japanese conjugation tables should presumably be located at the lemma forms for the Classical verbs -- the terminal or "dictionary" forms ending in just -u.
Where the lemma forms for Classical and modern align, I am perfectly happy for the shared lemma forms to include both Classical and modern inflection tables -- most notably, for the modern so-called 五段活用 (godan katsuyō, quintigrade conjugation) verbs and Classical 四段活用 (yodan katsuyō, quadrigrade conjugation) verbs. But where the Classical inflection paradigm doesn't include the modern lemma form, I do not think the modern lemma entry should include the Classical inflection table, nor should the Classical lemma entry include the modern inflection table. ‑‑ Eiríkr Útlendi │Tala við mig 01:29, 23 February 2021 (UTC)
Symbol abstain vote.svg Abstain ~ weak Symbol support vote.svg Support providing classical conjugation on the entries for modern verbs, as the entries for classical verbs do not presently exist. —Suzukaze-c (talk) 01:43, 23 February 2021 (UTC)

(Notifying Eirikr, TAKASUGI Shinji, Atitarev, Suzukaze-c, Poketalker, Cnilep, Marlin Setia1, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233): -- Huhu9001 (talk) 10:15, 23 February 2021 (UTC)

It's unnatural to put a classical conjugation on the modern form lemma. The classical form of verbs should have separate lemma and should be linked from the ===Etymology=== section.--荒巻モロゾフ (talk) 11:02, 23 February 2021 (UTC)
I also support putting the classical form in the etymology section. Onionbar (talk) 21:01, 23 February 2021 (UTC)

Stock ticker symbols[edit]

Should we keep them or toss them?

We currently have Category:en:Stock symbols for companies which isn't fitted into the category tree. I think these either fail CFI as written (they are not terms that you "would run across ... and want to know what it means" - these symbols are tightly bound to a financial context, where it is obvious that you should consult a list of stock symbols rather than a generalist dictionary) or should be expunged (by vote?) so our time isn't wasted with entries for every stock symbol that ever lived.

See Wiktionary:Beer_parlour/2009/October#Stock_symbols and Talk:A#RFD_discussion:_March–June_2014 for a little bit of past discussion on this. This, that and the other (talk) 09:08, 23 February 2021 (UTC)