Wiktionary:Beer parlour/2023/August

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Early thoughts on Banat Bulgarian[edit]

The Banat Bulgarian dialect (Glottolog bana1308), unlike other Bulgarian dialects, has a codified written norm, and an independent literary tradition going back to the 19th century. The norm is recognized by the Institute for Bulgarian Language - our language regulator - as a literary variety of a pluricentric Bulgarian language. Banat Bulgarian is spoken in the area of Banat, which is mostly in Romania, and partly in Serbia and Hungary. It is classified as a Rup dialect of Bulgarian.

There are several differences between Banat Bulgarian and standard Bulgarian that present potential challenges for how one would properly add such entries to Wiktionary:

  • Banat Bulgarian is written in the Latin alphabet, unlike standard Bulgarian which is written in Cyrillic. I'm assuming we could draw inspiration from Serbo-Croatian, which is also written in both alphabets, and where some of its codified varieties (e.g. Croatian) only use one of the alphabets.
  • As an Eastern Bulgarian dialect, Banat Bulgarian has phonological features that set it apart from standard Bulgarian, such as more pronounced vowel reduction for unstressed vowels, consonant palatalization between front vowels, the vowel /ɨ/, etc. Those differences are reflected in spelling. This means that {{bg-IPA}} wouldn't work for Banat Bulgarian unless we effectively double the code size and complexity.
  • As a consequence of the above two factors, inflection templates like {{bg-ndecl}} and {{bg-conj}} would similarly be insufficient for handling Banat Bulgarian unless they become a lot more complex.

I'm not personally a speaker of Banat Bulgarian, so this is me trying to get an idea of the groundwork that would need to be laid before we even start adding entries for it on Wiktionary. Here's a non-exhaustive list of questions that come to mind:

  • would it need its own language code - e.g. bg-ban - or would we instead change some of the configuration of the bg language code, e.g. to allow it to recognize the Banat Bulgarian Latin alphabet in addition to Cyrillic?
    • if it were its own language code, what would its {{inh}} relationships be with Bulgarian and Old Church Slavic?
    • if it were its own language code, would that effectively double the {{topics}} space, by having e.g. bg:Cats alongside bg-ban:Cats? Or is there a way for things to all go under bg:Cats?
  • what would be the proper form-of template to relate a Banat Bulgarian form like ugništi to its corresponding standard Bulgarian form огнище (ognište)?
  • would it be preferable to modify existing {{bg-FOO}} templates to handle Banat Bulgarian, or develop new templates like {{bg-ban-FOO}}?
  • what other important considerations are there that I'm not asking about?

Thanks,

Chernorizets (talk) 03:04, 1 August 2023 (UTC)[reply]

@Chernorizets There are various considerations here:
  1. Whether to make it an etymology-only variant of Bulgarian or a full-fledged separate language.
  2. How to handle the different script. IMO Serbo-Croatian isn't so good an example because the Latin and Cyrillic scripts map almost one-to-one onto each other and the same underlying dialect is represented, modulo a few trivial differences. Maybe a better example is Tajik vs. standard Persian vs. Dari Persian. These are linguistically three dialects of the same language, but Tajik uses Cyrillic while the other two use Perso-Arabic script, so maybe for that reason Tajik is a full-fledged language. Somewhat similarly is Ottoman Turkish vs. standard Turkish, where there is a script difference as well as a ton of Persian and Arabic loans in Ottoman Turkish that aren't in regular Turkish, even though the dialect based is very similar; in this case again, Ottoman Turkish is its own full-fledged language. Hindi and Urdu are another case where the dialect base is the same but the scripts are different and we've chosen to go with two separate full-fledged languages based partly on the fact that the higher registers are markedly different. Mongolian is another case where there are two scripts and the two scripts do not at all map one-to-one; they represent etymologically different approaches to spelling the language (one much "deeper" than the other). I suspect the dialect base is the same for both scripts, and here we have chosen to unify the two scripts into a single language. I'm not sure in this case how the different spellings are handled in Mongolian inflection tables, maybe User:Theknightwho or User:Atitarev can comment. So it sounds like maybe we need a separate full-fledged language code; although User:Theknightwho has recently added the ability to have things like per-etym-language and per-script transliteration, so the technical considerations forcing a separate full-fledged language are less than before. Regardless, because of the different scripts I would recommend having different headword templates like {{bg-ban-FOO}}; whether you can reuse a single inflection module like Module:bg-verb and Module:bg-nominal depends on how well the scripts and morphology map between the two of them. If the mapping is close, maybe you can have one module using an underlying representation that's then mapped to a surface representation in either Cyrillic or Latin; but if there are lots of differences, you might want two verb modules and two nominal modules, with common code factored out into Module:bg-common or Module:bg-verb-common, Module:bg-nominal-common modules. User:Theknightwho can you comment further?
  3. As the for etymological relationships, I dunno. If Banat Bulgarian is a dialect, maybe it should derive from Middle Bulgarian, same as standard Bulgarian? Presumably terms from standard that end up in Banat are borrowings or calques, and vice-versa? Benwing2 (talk) 05:09, 1 August 2023 (UTC)[reply]
BTW if we have a full-fledged language for Banat, it gets its own topic space. That probably makes sense because of the different script IMO, but I'd accept the other way as well (which you'd get if Banat were an etymology-only language). Benwing2 (talk) 05:11, 1 August 2023 (UTC)[reply]
Thanks for the comments, @Benwing2! I've sent an email to the Institute for Bulgarian Language to see if they'd be willing to submit a change request to SIL so that Banat Bulgarian gets its own ISO 639-3 code. I'll update this thread if/when I get a response.
The Banat dialect has speakers, blogs, books and newspapers in the present day, so it doesn't make sense to me to treat it as an etymology-only variety - it's in active use. Considering that the Banat population is descended from a migration in the 17th century, the argument for having it descend from Middle Bulgarian is compelling.
As for how full-fledged it is, the delta between it and standard Bulgarian is not nearly as large as between e.g. Chakavian and Kajkavian, or between the dialects of Slovene. As a speaker of an Eastern Bulgarian dialect myself, I have quite an easy time understanding it, except for the extensive loanwords from Romanian, German and a few other languages that have replaced native terms. It is officially considered a regional norm of Bulgarian; how that maps to the concept of "language" on Wiktionary is something I'm still ramping up on.
But yes, the comparison with Serbo-Croatian is not the best, since in SCr the different scripts do encode the same orthography and pronunciation, which wouldn't be the case here. I'm looking forward to learning more about the technical details of handling languages where script variation is accompanied by variation along other dimensions as well.
Cheers,
Chernorizets (talk) 05:38, 1 August 2023 (UTC)[reply]
@Chernorizets Etymology-only languages are not at all restricted to dead languages. E.g. we have en-US (American English) and en-GB (UK English) as etymology-only variants of en (English). Benwing2 (talk) 06:11, 1 August 2023 (UTC)[reply]
@Chernorizets We don't necessarily need an ISO code to create a new langcode. Vininn126 (talk) 08:04, 1 August 2023 (UTC)[reply]
@Vininn126 good to know; I was curious about the viability of the ISO route. Maybe nothing happens - I'm a random guy who wrote to the Bulgarian Academy of Sciences :) But if we do get an ISO code, we don't have to make one up.
@Benwing2 I guess I'll have to read up more on "etymology-only" languages and how they are used. Thanks for the example. Chernorizets (talk) 08:39, 1 August 2023 (UTC)[reply]
@Chernorizets An exmample of an Ety-only language might be Middle Polish, for which we have a label, category, special infrastructure, and a code, but it's nested under Polish and any links generated by zlw-mpl point to Polish. I'll also point out there is no ISO code zlw-mpl. Vininn126 (talk) 08:42, 1 August 2023 (UTC)[reply]
And it is in flagrant breach of BCP47, which does have provision for private codes. --194.74.130.171 08:48, 1 August 2023 (UTC)[reply]
What the hell are you talking about? This is something regularly done. Vininn126 (talk) 08:50, 1 August 2023 (UTC)[reply]
{re|Vininn129} Which makes the breaches even more flagrant. --RichardW57m (talk) 09:15, 1 August 2023 (UTC)[reply]
@RichardW57m There is no way languages that only have ISO codes is a good idea. You will find essentially no one that would support that. It's a sort of moot point to claim that it's wrong. Vininn126 (talk) 09:16, 1 August 2023 (UTC)[reply]
@Vininn126: Private use codes should have an 'x-' in them, denoting private use. Most of our codes would be fine with that subtag inserted in second place. --RichardW57m (talk) 09:27, 1 August 2023 (UTC)[reply]
@RichardW57m Good luck convincing people of that. Vininn126 (talk) 09:30, 1 August 2023 (UTC)[reply]
@Vininn126@Chernorizets Yeah, were it not for the script issue I would definitely recommend that Banat Bulgarian be treated as an etym-only language given its apparent similarity to standard Bulgarian. Maybe even with the script issue we should do that but it definitely complicates things; I'd like to get some thoughts from others who can provide comparable situations with other languages (e.g. Malay allows either "Jawi" = Arabic or "Rumi" = Latin script, Kazakh supports Cyrillic, Arabic and Latin, Mongolian supports Mongolian script, Cyrillic and maybe Latin, etc.; but all of these have the same dialect base). Benwing2 (talk) 08:50, 1 August 2023 (UTC)[reply]
Sounds like the situation of South Azerbaijani – and some other Caucasian languages which have a major script we standardize to and an alternative one used in another country, like Laz in Turkey. Lexical peculiarities do not even give rise to the idea that there should be an etymology code, more like code separation only causes annoyance. You just add “Latin spelling of” entries and if not, if there are isolated full entries in the regiolectal script, nothing bad happens either. Of course inflection tables for Azerbaijani in Arabic script work differently. Vulgar Latin reconstruction entries have different Proto-Romance tables. Fay Freak (talk) 10:30, 1 August 2023 (UTC)[reply]
@Fay Freak is it possible to have custom relationships or labels besides {{spelling}} and the like, e.g. "Banat term for"? It's not the case that Banat Bulgarian is just a Latin-alphabet transliteration of standard Bulgarian - the Banat forms adhere much more closely to phonetic spelling, reflecting the dialect's peculiarities. So, for instance, Bulgarian огнище (ognište) doesn't correspond to Banat **ognište, but to Banat ugništi, which is how it's pronounced in Eastern Bulgarian dialects (including Banat, and coincidentally my own). Chernorizets (talk) 11:39, 1 August 2023 (UTC)[reply]
@Chernorizets{{alt form}} with |from=. Fay Freak (talk) 11:44, 1 August 2023 (UTC)[reply]
@Fay Freak Just found it too as I was looking at some Laz examples :) I guess, if we wanted to take a cue from Serbo-Croatian after all in terms of at least supporting multiple scripts under a single language code, there is Module:labels/data/lang/bg where one could add "Banat Bulgarian" with a proper Wikipedia link and categories. Chernorizets (talk) 11:51, 1 August 2023 (UTC)[reply]
@Benwing2: Where is the per-etym transliteration capability documented? It sounds like something I should align Pali to. --RichardW57m (talk) 09:07, 1 August 2023 (UTC)[reply]
@Chernorizets: Per-script variation in inflection is supported for Pali, with the simplest implementation in Module:pi-decl/noun. This treats stems and inflection separately, with provision for utter irregularity. (The price of separation is some complexity in gluing stem and inflection together.) The inflection tables are stored in per-script data modules, with fallback by transliteration from the Witionary-main script. There are flags to handled variations in spelling system within the scripts. Prakrit works almost similarly, but with tables of inflections by regional dialect, but no provision for gross irregularity and the use of transliteration from Roman to target script for stem and inflection together, exhibiting a touching belief in the fidelity of transliteration. (The Prakrit scheme was developed from the Pali scheme with a fair amount of tl;dr.) --RichardW57m (talk) 09:07, 1 August 2023 (UTC)[reply]
@RichardW57 It's not documented yet AFAICT; ask User:Theknightwho. Benwing2 (talk) 20:54, 1 August 2023 (UTC)[reply]
@Chernorizets Hey, sorry to be late on this discussion, but I'm not very informed about Banat Bulgarian and so I've been thinking a bit as to how to treat it on here, but, in my opinion, it should be made its own language, with its own templates, etc., given that it's spelled completely differently and the speakers of Bulgarian and Banat Bulgarian are somewhat distant from each other, don't communicate much, don't share the same media, etc. and will continue to diverge (if you ask me) even more. I think Banat Bulgarian occupies a similar position to Macedonian, in that the two are very similar, in many ways mutually intelligible, but still quite different; and Banat Bulgarian could be called a dialect of Bulgarian, but I think it would be easier for future Banat Bulgarian editors if their language were totally separate, with its own templates & tools, their own language header, and no pressure to relate their words to standard Bulgarian. It would be nice to have opinion from natives, but... I don't know of anyone that speaks the language.
Although the current templates are indeed inadequate, I don't really know what we can do about it without a detailed reference of the differences (or just in general the phonology, declensions, etc., which I guess will be similar to Bulgarian but not quite the same). Also, like you, I also don't speak the language, and I don't know that I'll be able to contribute anything to it with any confidence...
As for the classification, in my opinion it would make sense for Banat Bulgarian to be a descendant of Bulgarian, just purely given the factological argument that its speakers come from Bulgaria and spoke a variety of Modern Bulgarian. On Wikipedia, the chart is organized as
If we can create these foundations, then at least we can institute Banat Bulgarian as a formal language on here so that potential future editors will know how to edit it and expand the dictionary from there. Also, do we have any online Banat dictionaries, etymology sources, etc.? Kiril kovachev (talkcontribs) 20:29, 6 August 2023 (UTC)[reply]
@Kiril kovachev I reached out to the Institute for Bulgarian Language to see if they'd support Banat Bulgarian having its own ISO 639-3 code. So far, no response - I'll let you know if there are any developments. In the meantime, I found some Banat Bulgarian groups on Facebook, and asked about grammars and dictionaries - I've gotten some references, but I think those are good old-fashioned paper dictionaries :) Which btw is totally OK. I think I should also ask in these groups whether Banat Bulgarians would want a separate language code or not, rather than deciding that on their behalf.
As for linguistic descent, IMO it would be more correct to have Banat Bulgarian descend from Middle Bulgarian (cu-bgm) rather than literary modern Bulgarian, since both standardizations happened in the 19th century based on different Bulgarian dialects. That said, there's definitely precedent for marking an extra-territorial variety of a language a descendant of that language. I could go either way.
Thanks,
Ruslan Chernorizets (talk) 21:28, 6 August 2023 (UTC)[reply]
Ah, that's great, thanks for contacting those groups, it'd be perfect to hear them weigh in on it instead of us marshalling these changes in their place, like you say. And I apologise if I misunderstood where Banat Bulgarian descends from, but I was judging entierly off of the description on Wikipedia, in which it says modern "Bulgarian" is considered to exist starting from the 16th century, and Banat speakers didn't first apparently migrate until 1688 (w:Banat Bulgarians#Origin and migration north of the Danube, which would suggest that by the time that they started moving they were already speaking "Bulgarian". However, I didn't know Bulgarian wasn't standardised till more recently, so yours and Benwing's proposals are much more sensible in light of this. The only consideration is whether descending from "Bulgarian", unstandardised as of 1688, should be equated with descending from the modern Bulgarian literary norm, which Wikipedia does not do and still considers Banat Bulgarian a descendant of Bulgarian. I also don't mind, but it would be not at all controversial to say Middle Bulgarian since that much is a fact, so why not go with that.
I don't know what the IBL's response is going to be, but why don't we contact the ISO directly? Do they listen to institutions like the IBL more closely than to individuals or? I would be happy to do so too, if there is a desire for this in the first place - if there is, then check out this here link, https://iso639-3.sil.org/sites/iso639-3/files/change_requests/2021/2021-005.pdf, where people have submitted to get Ruthenian added as a code. I guess if you need credentials this stacked to get accepted, then we may really need the IBL ;)
Thanks, Kiril kovachev (talkcontribs) 22:18, 6 August 2023 (UTC)[reply]
@Kiril kovachev I was originally going to submit the change request myself. Change requests are judged on the merits, not on factors like institutional backing. I contacted the IBL because it felt a bit wrong for me to do it as a private citizen, given that there's an institute dedicated to the Bulgarian language and its variants. The recognition in ISO of languages and language variants often becomes news in the respective countries, like it did in Croatia when Chakavian got its own code a few years ago. I wouldn't want the IBL to be caught unawares by a news article that Banat Bulgarian now has its own code, and I don't trust the way the media would spin this, so I decided to be cautious.
Cheers,
Chernorizets (talk) 23:46, 6 August 2023 (UTC)[reply]

formatting last=/first= in authors in quote templates[edit]

@Sgconlaw, -sche, DCDuring Currently if you separate the authors in quote-* templates into |first=Joe, |last=Schmoe, |first2=Jane, |last2=Doe, etc. you get the authors displayed as:

  1. 2025, Schmoe, Joe; Doe, Jane; Roe, Richard, FUBAR: A Memoir of Wiktionary, BF Egypt: Wonderfool Publishing, Inc.

but if you specify the authors more naturally as |author=Joe Schmoe, |author2=Jane Doe, etc., you get

  1. 2025, Joe Schmoe; Jane Doe; Richard Roe, FUBAR: A Memoir of Wiktionary, BF Egypt: Wonderfool Publishing, Inc.

I propose to change this to use the consistent First Last; First2 Last2; etc. order regardless of how the authors are specified in the wikicode. Any objections? (In general I think we should avoid the |first=/|last= format, as it doesn't make sense in many non-English-speaking cultures.) Benwing2 (talk) 05:31, 2 August 2023 (UTC)[reply]

@Benwing2: personally I don't see much point in using the "last name, first name" format in quotation templates used in entries, so I wouldn't object to your proposal. (It makes more sense to order a list by last names in a bibliography, but I don't see that happening frequently here at the Wiktionary.) — Sgconlaw (talk) 06:01, 2 August 2023 (UTC)[reply]
I agree, I'm not really sure why that display is an option for quote templates since it just creates an opportunity for formatting inconsistency for no real reason. Benwing's proposal sounds good. —Al-Muqanna المقنع (talk) 08:26, 2 August 2023 (UTC)[reply]
@Benwing2, Sgconlaw, Al-Muqanna: Seeing as an approved request to delete {{red}}, partly on the grounds of its rareness, was used as justification for deleting the commoner {{lime}}, I will address the effect on citations, as in {{cite-book}}. There is the problem that Wiktionary documentation is inadequate, but the parameter |last= appears to call for a person's surname, the meaning Wiktionary gives for last name. I use it to good effect in {{R:th:Li}}, where the author's surname is 'Li', the name on the book appears as 'Fang Kuei Li', and the Wikipedia page for the author is w:Li Fang-Kuei. In the <last>, <first> convention, this works well.
There is also the issue of Japanese names. In English, they have hitherto appeared as <personal name> <family name>, but there has recently been a Japanese ordinance that in English they should appear in the Japanese order, <family name> <personal name>. Again, the format <last>, <first> is proof against such a changeover. --RichardW57m (talk) 09:41, 2 August 2023 (UTC)[reply]
Of course, there is the problem that the ill-documented parameter 'first' might be mistaken for the first personal name, so were we to quote a work by the former PM Boris Johnson, someone might decide on |first=Alexander! There are a lot of us who use our second forename rather then our first.
Benwing2's proposal would change the interpretation of |first= and |last= to mean the first and last parts of a person's name. --RichardW57m (talk) 09:41, 2 August 2023 (UTC)[reply]
@RichardW57m: This is about quote templates, not citation templates. Fyi, though, the standard in Wikipedia's style where last/first is mandatory is that the entire name should be in the "last" parameter for Chinese names et al. —Al-Muqanna المقنع (talk) 09:49, 2 August 2023 (UTC)[reply]
That works if a Chinese name is always given in Chinese order - it's the occasional Anglicisation of the order that causes problems. The same goes for any names with inconsistent ordering. --RichardW57m (talk) 12:05, 2 August 2023 (UTC)[reply]
Indeed Anglicisation of a Chinese order name causes such problem. I should further elaborate that the ordering of names of people from Hong Kong are very complicated, see this Wikipedia article (though it's still pretty lacking). My practice is generally to use the ordering and spelling according to the source (or use the Chinese name directly when it is not available), and link to the Wikipedia article if possible. Note that since the surname is in between the given names, there are a number of cases where this becomes awkward to deal with - Li, David C. S. is one such example at Wiktionary:About Chinese/references#L, where the typical ordering would be David LI Chor-sing (note the use of capitalisation to denote which part is the surname), but he publishes academically as David C. S. Li (one may say C. S. is the middle name), while Google Scholar also lists LI Chor Shing David and Li Chor-Shing as alias. (If I remember correctly, he has even written a paper on the name phenomenon itself). But I digress. Basically the name situation is so complicated that I would say that keeping |first= and |last= only adds to the complications, and so I think they should be deprecated/deleted. – Wpi (talk) 18:18, 2 August 2023 (UTC)[reply]
I wonder whether it is worth the effort, but I certainly wouldn't object to the result. DCDuring (talk) 13:08, 2 August 2023 (UTC)[reply]
I agree with the proposal, and I might also suggest that if we eliminate "Last, First; Last2, First2", then I wonder if we could just use commas between the names; it strikes me as weird to have "Year comma First Person semicolon Second Person comma Work Title" as if there is some break in content and the year is more closely associated with only the first person, and the Work more closely associated with only the second person. - -sche (discuss) 18:57, 2 August 2023 (UTC)[reply]
@-sche Yeah it seems backwards. Usually semicolons indicate a higher-level grouping than commas, but here we have it the other way. The only issue comes when we have editors or translators. E.g. currently if you write
{{quote-book|en|author=Hayden Carruth|author2=Joe Schmoe|editor=Mary Bloggs|translator=Richard Roe|title=The Hudson Review|location=New York, N.Y.|publisher=Hudson Review, Inc.|year=2025}}
You get this:
2025, Hayden Carruth, Joe Schmoe, translated by Richard Roe, edited by Mary Bloggs, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
which is a muddle of names, and if it's all commas you get this:
2025, Hayden Carruth, Joe Schmoe, Richard Roe, transl., Mary Bloggs, editor, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
which is maybe even worse. If we use and like I proposed at some point above, this becomes
2025, Hayden Carruth and Joe Schmoe, Richard Roe, transl., Mary Bloggs, editor, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
which is a bit better. Maybe we should reword the translator and editor so you get this:
2025, Hayden Carruth and Joe Schmoe, translated by Richard Roe, edited by Mary Bloggs, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
With this, it's also possible to dispense with the word and:
2025, Hayden Carruth, Joe Schmoe, translated by Richard Roe, edited by Mary Bloggs, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
Benwing2 (talk) 19:16, 2 August 2023 (UTC)[reply]
If it was up to me I would go with either your last option or an abbreviated version ("transl. Richard Roe, ed. Mary Bloggs"). I also find the current style quite confusing with the comma/semicolon mess and "transl." and "ed." added afterwards. —Al-Muqanna المقنع (talk) 21:23, 2 August 2023 (UTC)[reply]
Yeah, putting "edited by" in front of the editor(s) strikes me as the clearest option, your last option above. (I wouldn't object to also using semicolons to set authors vs translators vs editors apart, if anyone wants to, it's just using semicolons to set authors apart from other authors that seems weird to me.)
One other idea: for technical works that have like twenty-nine authors or editors, do we want to display only the first N and hide the rest behind an "et al." that displays their names if you hover over it, like [] ? (They were hidden in an HTML comment here.) Or is it fine to list them all? I don't really mind listing them all, but it does tend to make the bibliographic data take up as much or more space than the quote. In any case, looking through the results for searches like insource:"author17" makes me notice that in Nopeville and science-fictionish we're citing collections of stories by lots of authors and just listing all the authors whose works are in the collection, when we should only list whoever authored the quote we're quoting... - -sche (discuss) 05:29, 3 August 2023 (UTC)[reply]
There might be the occasional issue with licensing (e.g. CC-by-SA) if we truncate the list of authors and quote the text. I'm not confident that quoted text will also be supplied through |text= rather than other means. For quotations, it often isn't. --RichardW57m (talk) 09:51, 3 August 2023 (UTC)[reply]
Scrub the above - putting the less visible names in a tool tip should be good enough. --RichardW57m (talk) 10:11, 3 August 2023 (UTC)[reply]
I have always manually truncated lists of more than three authors or editors with et al. The information should be enough to identify the book, I'm not sure who it's actually benefiting to provide vast lists of authors when just the ISBN or the lead author and title (and edition number if necessary) will be sufficient to identify it anyway. Afaik et al. with a tooltip is what Wikipedia CS1 does. —Al-Muqanna المقنع (talk) 10:07, 3 August 2023 (UTC)[reply]

gl-IPA[edit]

I think someone who knows about Lua (and who has enough time to make it) should make a pronunciation template for Galician, anyway, its phonology is not that different from that of Portuguese (except maybe the presence of /θ/ in the Eastern dialects, the merger of voiced fricatives with their voiceless counterparts and the lack of deaffrication of ch, which remains an affricate like in Spanish. Rodrigo5260 (talk) 13:59, 2 August 2023 (UTC)[reply]

@Rodrigo5260 Unfortunately all of these modules take time to write. The Portuguese module, for example, runs to 2,001 lines of code, due to all sorts of complications in the Portuguese spelling-to-phonology mapping as well as there being multiple region-specific pronunciations. Galician might not be quite so bad but it also has multiple regional pronunciations. If anything you'd want to start with the Spanish rather than the Portuguese module, since Galician (at least in the standard rather than the reintegrationist spelling) largely uses Spanish spelling conventions. Benwing2 (talk) 19:24, 2 August 2023 (UTC)[reply]
Ah, yes, you're right, but the standard spelling fails to distinguish the open e and o from their closed counterparts (tho the minimal pairs are fewer than in Portuguese). Rodrigo5260 (talk) 19:37, 2 August 2023 (UTC)[reply]
The few regional pronunciations I know of are the seseo found in Western Galician and the the gheada (when the /g/ sound becomes a /h/ like sound, like in Ukrainian). Rodrigo5260 (talk) 21:24, 2 August 2023 (UTC)[reply]
@Rodrigo5260 I have heard that there are lots of East-Central-West differences in whether e and o are open or closed in specific words (due to different handling of metaphony), as well as differences in the handling of original nasal vowels, etc. Benwing2 (talk) 21:35, 2 August 2023 (UTC)[reply]
Agreed. I actually started a sandbox a while ago, where I copied and condensed transcriptions from the Manual of Romance Phonetics and Phonology, which has a chapter dedicated to Galician. I never did finish the task, but perhaps what material I gathered can be of some use. Nicodene (talk) 22:22, 3 August 2023 (UTC)[reply]

Naming the "ecclesiastical" pronunciation in Latin entries[edit]

The pronunciation currently named "ecclesiastical" in Latin entries—actually what is called Italianate in the literature—is not universally used in the church either historically or, as @Andrew Sheedy recently noted on my talk page, geographically (e.g. it is not used by German Catholics). In fact its dissemination is a relatively recent phenomenon dating to the late 19th and early 20th centuries, and to some extent it has also now receded—though salient features like /t͡ʃ/ for ⟨c⟩ before certain vowels are still typical for Catholics in English-speaking countries, there is much less effort to exactly mimic Italian pronunciation in the way that some prelates enforced 100 years ago. ([z] for ⟨s⟩ used to be strictly proscribed, for example; not so much now.)

Calling the pronunciation we provide "Ecclesiastical Latin" with no qualification is misleading. Some editors have apparently inferred from the name that it applies to Medieval Latin, for instance, but in fact there's no convincing reason why either a modern Italianate pronunciation or a classical pronunciation should be invented for terms in Medieval Latin without any later ecclesiastical usage. My preference would just be for it to be renamed "Italianate", but if that term is too obscure there might be a more comprehensible alternative like "Italianizing", or simply "Italian Ecclesiastical". Tagging @Urszag, Nicodene who I think have discussed EL pronunciation before. —Al-Muqanna المقنع (talk) 18:21, 2 August 2023 (UTC)[reply]

I would suggest a label beginning with 'modern' in order to clear up any chronological ambiguity. That can be followed by 'Italian(-ate/-izing)', 'Roman', or similar. Nicodene (talk) 18:28, 2 August 2023 (UTC)[reply]
That's a good point actually, maybe "Modern Italianate" or something. —Al-Muqanna المقنع (talk) 18:51, 2 August 2023 (UTC)[reply]
"Italianate" seems fine. Several documents presenting this style (e.g. the Liber Usualis) refer to it as the "Roman" pronunciation (referring to contemporary, not ancient, Rome) but this would obviously be undesirably ambiguous without further qualifiers. Ganss 1951 ("Pronunciations of Latin in Church") refers to it simply as "the Italian pronunciation" and reserves "the Roman pronunciation" for the reconstructed ancient pronunciation. In theory there are presumably some differences between a) an idealized standard based on accepted traditional Italian usage (but this hypothetical standard may not even have a definite existence); b) the various pronunciations used by native Italian speakers today when the speak Latin and c) the pronunciations used by non-Italian speakers who try to imitate what they think Italian pronunciation is like, but in practice I don't think it's feasible to clearly distinguish these.--Urszag (talk) 19:06, 2 August 2023 (UTC)[reply]
I like User:Nicodene's suggestion of including the word "Modern"; otherwise it wouldn't be obvious to someone like me (a semi-informed outsider) that "Italianate" refers to a modern Church pronunciation. Benwing2 (talk) 19:18, 2 August 2023 (UTC)[reply]
Yeah, I'm inclined to think "Modern Italianate" is clear enough. It's worth noting that it's spilled out of ecclesiastical contexts at this point—people routinely pronounce Latin borrowings like in excelsis in "Italianate" style now (see [1] @ 1:46:22)—so I'm not sure how important it is to flag up the church aspect explicitly. I added a sense at Italianate which summarises the history. —Al-Muqanna المقنع (talk) 20:06, 2 August 2023 (UTC)[reply]
I oppose naming it "Modern Italianate". Almost anywhere you look on the internet and in many reference books refers to two pronunciation systems: Classical and Ecclesiastical. The latter is inherently prescriptive--the Roman pronunciation was officially proclaimed the standard for the Catholic Church by Pope Pius X. All other pronunciations are not Ecclesiastical, they are simply regional/historic (e.g. Germanic, English, Baroque French). I think the labels are fine as they are, but I would support additional pronunciations, which would be helpful for some users (for instance, performers/directors of Medieval/Renaissance music often try to reflect the historical pronunciations used by the composer, which it would be useful to include for that purpose). For the Roman pronunciation, I would also be fine with "Modern Ecclesiastical" to distinguish from the historical pronunciations. Andrew Sheedy (talk) 21:05, 2 August 2023 (UTC)[reply]
Also, thanks for pinging me. I'm keenly interested in this discussion, but I would have missed it otherwise, because I'm travelling. Andrew Sheedy (talk) 21:07, 2 August 2023 (UTC)[reply]
I'm happy to accept "Modern Ecclesiastical" as a compromise, since it at least accounts for the historical aspect, which is IMO the more important and as it stands misleading. However, I'm not convinced by the prescriptive point: yes, Pius X instituted it as a standard, but John XXIII also obliged seminarians to learn Latin and nobody with the pertinent authority has been much interested in canonically enforcing either of those provisions for many years now. As it stands today Italianate pronunciation is generally (at least for clergy) something that has been learned informally rather than something enforced. —Al-Muqanna المقنع (talk) 21:14, 2 August 2023 (UTC)[reply]
While that is true, the Ecclesiastical pronunciation is what most Catholics are interested in learning when it comes to praying/liturgical usage. Unlike Pope John XXIII's injunction, Pius X's was actually widely taken up, if not universally, at least enough to be considered "the" Ecclesiastical pronunciation. I would be fine with "modern Italianate Ecclesiastical", as long as we don't drop "Ecclesiastical", which is what many people will recognize. Not everyone interested in Ecclesiastical pronunciation will realize that it is Italianate. Andrew Sheedy (talk) 20:52, 3 August 2023 (UTC)[reply]
'Modern Italianate Ecclesiastical' is quite long. I like the suggestion of 'Modern Italianate', and I propose using this precise link to clear up any confusion. (At the moment, we link to the article Ecclesiastical Latin, which is rather broad and does not focus on the modern pronunciation in question.) Nicodene (talk) 02:10, 4 August 2023 (UTC)[reply]
I object to the label 'ecclesiastical' for two reasons:
1) It isn't specific enough. The catholic church has decreed that the Italian norms are to be the standard pronunciation, certainly, but many catholics carry on using different styles; see here, for instance, for various Polish examples. It should also be pointed out that the catholic church does not 'own' Latin, even as far as christians are concerned, since there exist churches, mainly protestant ones, which both employ Latin and reject the Pope's authority (which would include, naturally, the authority to decide on the 'proper' rendition of Latin).
2) It's too specific. The described pronunciation is not specifically 'ecclesiastic'; it is simply how most Italians pronounce Latin, in any context.
Nicodene (talk) 22:05, 2 August 2023 (UTC)[reply]
As an alternative to "Ecclesiastical", I would be fine with "modern Catholic" or "Italianate Catholic" or "prescribed Catholic" as a label. It's true that other Catholics use different pronunciations, but those aren't really ecclesiastical pronunciations, they are simply national pronunciations (which I would also like to include, although collapsed by default). Andrew Sheedy (talk) 20:54, 3 August 2023 (UTC)[reply]
How are they 'not really ecclesiastical' when they are used by other christians, whether Catholic or otherwise, in ecclesiastical contexts? Nicodene (talk) 01:54, 4 August 2023 (UTC)[reply]
Because those pronunciations are not limited to ecclesiastical use and hence are not specifically ecclesiastical. But maybe it's splitting hairs, because ecclesiastical use would probably be the main use nowadays. At least historically, though, the local pronunciations would have simply been the local pronunciation of Latin, not specifically a "church" pronunciation. Andrew Sheedy (talk) 00:26, 5 August 2023 (UTC)[reply]
The Italian pronunciation in question is not limited to ecclesiastical use either, as mentioned. Nicodene (talk) 01:13, 5 August 2023 (UTC)[reply]
Having looked up the specific statement by Pius X in question btw it was not a formal regulation, just an informal recommendation in a letter pertaining specifically to France (p. 169 in these acts), so it was never prescriptive in more than a loose sense. I don't think there needs to be specific emphasis on its Catholicity. —Al-Muqanna المقنع (talk)
I was going to suggest this even before Andrew's comment, but what about "Italianate Ecclesiastical" or "modern Italianate Ecclesiastical" (or lowercase "e", or uppercase "m", whatever)? I assume whatever we do we'll wikilink it to somewhere that explains the nuances. It does strike me as potentially slightly confusing that we currently seem to mean two distinct things when we say "Ecclesiastical" in {{a}} vs in {{lb}}; as far as I can tell, the {{label}} would also be used in the case of peculiarities of Latin as used by German/French/British/etc Catholics or Protestants, while the {{accent}}-label wouldn't, which makes it a little weird to use the same name for both. - -sche (discuss) 01:18, 3 August 2023 (UTC)[reply]
That's a good point about the Ecclesiastical sense label vs. pronunciation qualifier. I think that ultimately goes back to my main point above, that labelling the pronunciation "Ecclesiastical" (not to mention linking it to the Ecclesiastical Latin wp page) implies it pertains to "Ecclesiastical Latin" as a whole, i.e. the entire body of Christian usage stretching back to late antiquity referred to by the gloss label and the ety language, when it doesn't by a long shot. —Al-Muqanna المقنع (talk) 21:00, 3 August 2023 (UTC)[reply]
I was going to suggest "Ecclesiastical (modern Italianate)" as a kind of compromise, but it's most unfortunate that the accents are already in brackets... actually, why do we italicise and parenthesise accent labels? But that's a different question. How does this look:
In an ideal world we would do away with the word "Ecclesiastical" altogether, but Andrew's point about making the label meaningful to as many people as possible is a valid one, I feel. @Al-Muqanna, -sche, Andrew Sheedy, Nicodene, Urszag This, that and the other (talk) 11:46, 6 August 2023 (UTC)[reply]
I agree that it's a bit unwieldy, but I'm in favour of the longer label, for the sake of balancing user-friendliness and accuracy. We're an English-language dictionary and most Latin learners who want to learn the Italianate pronunciation in the English-speaking world will look for the "Ecclesiastical" pronunciation. Andrew Sheedy (talk) 15:57, 6 August 2023 (UTC)[reply]
Is this supposed to be meaningful to relatively ordinary folk, say, secondary-school students of Latin? Does such a person's understanding of the word modern correspond to our intended meaning? Or is this just for scholars? If it is for scholars, why is it so prominently displayed by default at the top of the Latin L2 section? DCDuring (talk) 16:43, 6 August 2023 (UTC)[reply]
@This, that and the other, Andrew Sheedy: I'm happy to go with a longer one. I do accept Andrew's point that this pronunciation is frequently labelled ecclesiastical and "Italianate" is not likely to be widely understood. My concern's mostly with the unqualified "ecclesiastical" giving a misleading impression of the historical and the present-day scope of its use to laymen, so qualifying it solves the big issue. @DCDuring: Of the two pronunciations, the classical is probably the scholars-only one given that the Italianate/ecclesiastical is widely used in prayers, chants, etc. whereas the classical is not. "Modern" is used in a conventional sense, since it pertains to the 19th century to the present. —Al-Muqanna المقنع (talk) 16:53, 6 August 2023 (UTC)[reply]
I am not interested in whether the pronunciation is scholars-only, but in whether the label proposed is not misleading or confusing to an identifiable, possibly sizable, portion of the users of these entries.
Learner's dictionaries have but one definition for modern: "Of the present time or recent times", with the synonym contemporary. I don't see how this definition applies. DCDuring (talk) 17:49, 6 August 2023 (UTC)[reply]
@DCDuring: Not sure what you're getting at, that is precisely what "modern" means here. —Al-Muqanna المقنع (talk) 18:09, 6 August 2023 (UTC)[reply]
I thought, apparently mistakenly, that this was more of a late 19th century thing that had petered out. It may be petering out, but the process is definitely not complete. DCDuring (talk) 18:35, 6 August 2023 (UTC)[reply]
Quite the opposite. It is the dominant Latin pronunciation across the 'Catholic world' and is still making gains vis-à-vis traditional regional pronunciations. Nicodene (talk) 21:10, 6 August 2023 (UTC)[reply]
Indeed. If it is petering out, it is do to more infrequent use of Latin, not proportionately less use of this pronunciation. As Nicodene says, it is the more regional pronunciations that are petering out, whereas the internet is facilitating the spread of a standard ecclesiastical pronunciation. Andrew Sheedy (talk) 03:40, 7 August 2023 (UTC)[reply]
It has occurred to me that 'modern catholic standard' would be an objectively correct label that more or less addresses everyone's concerns. Nicodene (talk) 18:40, 6 August 2023 (UTC)[reply]
I would also accept that label (though I have a slight preference for the "Ecclesiastical" label that is, in my experience, so widespread). Andrew Sheedy (talk) 03:39, 7 August 2023 (UTC)[reply]
As an addendum to this thought: when I look up a pronunciation of a Latin word (or when I did, anyway), I Google "[word] Ecclesiastical pronunciation". If we remove the word "Ecclesiastical" from our label, our entries may no longer show up in the search results for people looking for this pronunciation. Andrew Sheedy (talk) 03:42, 7 August 2023 (UTC)[reply]
@Andrew Sheedy, @Al-Muqanna:
In the interest of coming to a consensus, I vote for 'modern Italian Ecclesiastical' (slightly shortened from @This, that and the other's proposal). Does this more or less suit everyone? I myself had objected to 'ecclesiastical' on the grounds that the pronunciation is not exclusively so, but the point about that being the most widespread label 'in the wild' is a fair one. Nicodene (talk) 20:53, 9 August 2023 (UTC)[reply]
On second thought, perhaps 'Italianate' is indeed better, as 'Italian' may suggest that the pronunciation is specific to that country, which it certainly isn't. Nicodene (talk) 20:57, 9 August 2023 (UTC)[reply]
I agree "Italian" is probably misleading to laymen. There are downsides to every plausible option, though IMO being long is probably the least relevant. Overall I'd say "Modern Italianate Ecclesiastical", "Modern Italianate", "Italianate Ecclesiastical", "Italianate", "Modern Ecclesiastical", "Modern Catholic" are all fine by me, in roughly that order. —Al-Muqanna المقنع (talk) 21:13, 9 August 2023 (UTC)[reply]
Yeah, I like shortening it to "Italian," but that might be misleading. I think "m/Modern Italianate Ecclesiastical" is the best we're going to get while still being mostly in agreement. "Italianate Ecclesiastical" is my second favourite, followed by "modern Ecclesiastical". Andrew Sheedy (talk) 21:57, 9 August 2023 (UTC)[reply]
I've made the change, with a lowercase "m" to emphasise that "modern Italianate Ecclesiastical" is a descriptive phrase, not a set term. Happy to adjust if needed. This, that and the other (talk) 23:11, 9 August 2023 (UTC)[reply]
I'm happy with that. Nicodene (talk) 00:14, 10 August 2023 (UTC)[reply]
I'm happy with that wording—although it's a little lengthy, that seems preferable to using a shorter but less clear term.--Urszag (talk) 01:15, 10 August 2023 (UTC)[reply]

Issues regarding the Inuit languages[edit]

(moved from Wiktionary:Grease pit/2023/August)

Recently I managed to get a copy of the Comparative Eskimo Dictionary so I wanted to start adding some entries and checking entries that have already been made. And I noticed a few (potential/possible) issues.

  1. Currently we have Inupiaq (ik), Northwest Alaska Inupiatun (esk), and North Alaskan Inupiatun (esi) all listed as languages on the same level. This feels like a strange choice considering esk and esi are the two sub-groupings of ik. So would it not be better if these were something like etymology-only languages under ik? Or some other solution, I do not remember if there is a precedent for this on here that we could stick to.
  2. As of now only Inuktitut iu has Canadian syllabics (Cans) as a script code, while the other varieties used in Canada do not have it as a listed script (even though it should be). These languages are Inuvialuktun (ikt) and Inuinnaqtun (esx-inq).
  3. Same as with the Alaskan Inuit varieties I mentioned before, currently, we also list Inuinnaqtun (esx-inq) as being on the same level as Inuvialuktun (ikt). This would be incorrect as Inuinnaqtun is one of various varieties of Western Canadian Inuit (Inuvialuktun). So a) it feels strange that we only have a special lang-code for that variety and not for other varieties like Siglitun, Netsilik, or Kivallirmiutut and b) that it is listed on the same level and not one level down.

BartGerardsSodermans (talk) 06:31, 2 August 2023 (UTC)[reply]

I can answer point 1: WT:LT says, for Inupiaq, "only the macrolanguage is treated as a language". We have no entries under the Cat:Northwest Alaska Inupiatun language and Cat:North Alaskan Inupiatun language trees. All words should be created under Inupiaq. Where do you see that they are "listed as languages on the same level"?
(As a side note, I wonder why the lects are named "Northwest Alaska Inupiatun" and "North Alaskan Inupiatun" - it seems we could be more consistent.)
In general this discussion might be better suited to the beer parlour, as it does not concern purely technical issues. This, that and the other (talk) 07:22, 2 August 2023 (UTC)[reply]
@This, that and the other Yea I agree this is a better fit for the beer parlour too, don't know why I posted this here. I made a mistake with the the languages being on the same level too. But what exactly is the purpose of having categories for Northwest and North Alaska(n) Inpiatun languages if they don't include any entries and all words should be created under Inupiaq? I also noticed the strange naming of the lects, though it isn't a big issue it is inconsistent. Apart from that I'll add the other questions I had to the beer parlour. BartGerardsSodermans (talk) 09:09, 2 August 2023 (UTC)[reply]
I was wondering that too. If these languages aren't languages according to WT:LT, why are they listed in WT:LOL at all? Can anyone else help? This, that and the other (talk) 11:05, 2 August 2023 (UTC)[reply]
@This, that and the other, BartGerardsSodermans I moved this discussion to the Beer Parlour. User:-sche can you answer some of these questions? Also presumably we should rename 'Northwest Alaska Inupiatun' to 'Northwest Alaskan Inupiatun' (rather than the other way around), do you agree? Benwing2 (talk) 23:21, 2 August 2023 (UTC)[reply]
If we're treating only the macrolanguage(s) as a language, and not the dialects, then yeah, let's either move the dialect codes to the etymology-languages module (ike is currently invoked in several etymologies), or comment them out (like is done for e.g. "tw" in Module:languages/data/2 — I started doing that after seeing various people readd codes, unaware of earlier discussions and assuming they were just missing; perhaps that's even what happened here). BartGerardsSodermans, how many languages do you think it's sensible to have language headers for? Just ik and iu, or are you proposing ik, iu, and ikt as co-level languages? (I'm not sure, from your division into three points above.) For "Northwest Alaska(n) Inupiatun" both forms seem about equally rare; no objection to renaming if it's kept in some capacity. - -sche (discuss) 07:10, 3 August 2023 (UTC)[reply]
I am personally in favor of treating the two Inupiaq lects as etymology-languages. Especially because words often show variation between the lects and neither of the two can really be seen as the standard Inupiaq, they also use slightly different orthographies, and this way you can easily show both languages in the descendant section of reconstructions while still having all words be in the Inupiaq language category (where they can then be sorted into the correct regional forms).
I also think it might be a good idea to use ikt for headers as well, simply because there are some differences between it and iu. There is also the issue of script, in that ikt is rarely written using Canadian syllabics and nearly exclusively uses the Latin script, while for iu the situation is a different in that both the syllabic and Latin script are used (though the syllabic script does seem more common to me). I am not fully aware of how different varieties have to be to qualify for a language header status and if something like orthography should be taken into account. I do see one issue possibly arising in having iu and ikt be different co-level languages in that iu is also used by some people for the Western Canadian lects and not exclusively for the Eastern Canadian lects, which might confuse people not aware of how we decided to treat the different language codes. So if we do split them it should be made clear that iu is only used for the Eastern Canadian lects. BartGerardsSodermans (talk) 08:05, 3 August 2023 (UTC)[reply]
If we want to have ikt for the Western lects and then make clear that the other language is only intended to encompass the Eastern lects, we should just keep using ike for that (for those Eastern lects), since ike is the code that denotes "the Eastern lects specifically", and iu is the code that encompasses both Western and Eastern together, AFAIK. (Wikipedia does not make this as clear as it could, since it redirects w:ISO 639:ike to an article it titles bare "Inuktitut [...] also known as Eastern Canadian Inuktitut", and gives iu as the code for both that lect and "Inuvialuktun (part of Western Canadian Inuktitut)", in each one's infobox.) - -sche (discuss) 09:36, 4 August 2023 (UTC)[reply]
Yea I was also a bit confused by the way Wikipedia just seems to call equate Inuktitut and Eastern Canadian Inuktitut. And I think it is fine if we keep using ike to refer to the Eastern Canadian lects (I quite honestly didn't notice we had ike as a language code and category). I did notice that ike currently doesn't use the transliteration module that is used for iu, which it probably should so that we don't need to add transliterations manually each time.
So with that cleared up for me, I believe the only thing we'd need to do, is make esk and esi etymology-languages (unless others disagree of course). And make sure both names use either Alaska or Alaskan instead of different forms. BartGerardsSodermans (talk) 20:33, 4 August 2023 (UTC)[reply]
 Done. This, that and the other (talk) 05:16, 6 August 2023 (UTC)[reply]
In translations, should dialects like Inuinnaqtun be entered as separate languages, or should they be under the umbrella language? e.g. for arvaq, linked from the English page hypothenar eminence, should it be listed under Inuinnaqtun (esx-inq) or Inuvialuktun (ikt)? Or neither? Thanks, Soap 20:14, 3 August 2023 (UTC)[reply]
Considering that the entry for arvaq is under esx-inq the link from the English translation page should probably also use that language code. BartGerardsSodermans (talk) 20:37, 4 August 2023 (UTC)[reply]
Hi, I am responsible for creating the first entries in Inuinnaqtun, however so far they are pretty bare bones. Nunavut, the Canadian territory in which most Inuit languages/dialects are spoken recognizes Inuktitut and Inuinnaqtun as distinct, official languages. Although I do not reside in Nunavut, I can say with much certainty that these languages are in fact distinct and should be considered separate languages, but under the Inuit classification thing or whatever. Also, Inuinnaqtun does not use Canadian syllabics: https://www.gov.nu.ca/sites/default/files/_thumbs/standards_of_communication_-_eng_large_size.pdf. I am very interested in this discussion and happy it came to light! GKON (talk) 04:04, 7 August 2023 (UTC)[reply]

English anagram run[edit]

@Ioaxxere @CitationsFreak Hello, regarding the English anagram run you asked for in the Bulgarian thread: I basically got everything running, i.e. I have the full Wiktionary wordlist for English and can run the bot with the same capabilities as with the Bulgarian version. I'd just like your advice on a few parameters of the run, mainly being what to do about duplication on the page, i.e. do we skip adding an anagram if:

  • It's already in the see-also section at the top of the page?
  • It's already in the alternative forms section:
    • As an L3
    • As any header level, so long as it's under the English header?
  • It's linked to whatsoever in the English entry?

I see the previous anagram bot author appears to have skipped out all the ones in the see-also. Check out MacBees: my bot tried to add Macbees as an anagram, although it's already linked in the see-also. I later reverted it, because evidently both terms existed well before the previous anagram run, yet the author decided not to duplicate them regardless. Similar, on 1000-metre, 1000 metre, 1,000-metre were not linked before, but 1,000 meter, 1000 meter, 1000-meter all were. This time, they do appear in the alternative forms, but they're also duplicated as anagrams.

Do we want anagrams to overlap with other obvious links like these? Specifically, in what configuration?

Also, here is the process by which I normalize words for the anagram calculation.

  • Remove all whitespace at the start and end.
  • Convert to lowercase.
  • Convert æ to ae, œ to oe, and ı to i. These are but a few of the possible equivalences we can have, I don't know what mappings we'd like, nor what the previous author had, but even without this we have a lot of anagrams anyway.
  • Decompose all characters to their simplest, e.g. é becomes e + ACUTE. Specifically, this is Unicode NFKD normalization, which may convert e.g. № to No, which is why we need to convert to lowercase again after this.
  • Remove all irrelevant elements (non-alphanumeric characters: the alphanumeric characters I recognize as unique are abcdefghijklmnopqrstuvwxyz0123456789βðπø).

If you have any concerns about this, please let me know; and also I'd love your input on the affair with deciding what to duplicate. Kiril kovachev (talkcontribs) 18:56, 3 August 2023 (UTC)[reply]

Why "βðπø" in particular? —Justin (koavf)TCM 18:59, 3 August 2023 (UTC)[reply]
@Koavf It was originally a rather long list of characters. I checked all English entries, and first got rid of all alphabetic characters, which left a number of symbols and diacritics. In the end, the only alphabetic characters that may make any difference (i.e. have a lot of terms where they're used) are those 4. In truth they might not matter, maybe they can be excluded. Kiril kovachev (talkcontribs) 21:32, 3 August 2023 (UTC)[reply]
Actually, this makes me realise that what I did was inadequate, because when I remove those four, we suddenly get around 60 more anagrams than before. Which makes sense if you think about it—removing those characters makes those letters disappear in any word which features them, which can lead to collisions with words that don't have them at all. I think I need to re-introduce those other characters as well, but are there any other ideas as to how to deal with this to make sure there are no mistakes, e.g. β-carotene becoming an anagram of carotene? Kiril kovachev (talkcontribs) 22:00, 3 August 2023 (UTC)[reply]
UPDATE: I changed it to use the regular expressions \w and \d for alphabetic and numerical characters respectively, which I'm hoping means that any alphabetic characters that are used will be respected. However, there may still need to be exceptions made for e.g. 🧢 (cap), because "🧢ing" would be an anagram of "-ing", which is definitely not right. Kiril kovachev (talkcontribs) 22:16, 3 August 2023 (UTC)[reply]
@Kiril kovachev: At minimal analysis, all symbols should be taken into account. There may be exceptions, but I can't think of any. --RichardW57 (talk) 05:44, 4 August 2023 (UTC)[reply]
I think you want to skip anything where normalized(a) == normalized(b). JeffDoozan (talk) 21:19, 3 August 2023 (UTC)[reply]
I see, this is a good idea. I believe I've implemented this now. I forgot to mention that the code is here, as I've usually been sharing it for other scripts.
Given that, is there anything else you think needs doing? Kiril kovachev (talkcontribs) 21:37, 3 August 2023 (UTC)[reply]
I think we do want anagrams that are alt forms with different spelling (eg not cafè and cafe but theater of war and theatre of war). For, say, Scrabble players, since (as Eq says) maybe you want an R in the Double Letter Score section. cf (talk) 21:24, 3 August 2023 (UTC)[reply]
@Kiril kovachev Just for reference, User:OrphicBot used to update the {{also}} sections at the top of each page and worked off of an equivalence list that's documented on the bot's page. Also I agree with User:CitationsFreak about including alt forms with a different normalization (which is also essentially the converse of what User:JeffDoozan is saying). Benwing2 (talk) 23:43, 3 August 2023 (UTC)[reply]
Also when computing equivalences, it sounds like you want to convert to NFD form and remove diacritics (in the range U+0300 through U+036F), as well as maybe anything identified by Unicode as punctuation, but not all symbols; that will avoid the issue you mentioned above with the 🧢 symbol. Benwing2 (talk) 23:46, 3 August 2023 (UTC)[reply]
@Benwing2 Thanks, this is a good idea. I'll try it out tomorrow when I can. As for alt forms: I'm not sure I understand what we're after. Should we put words with the same normalisation (e.g. on the page cafe, café) under an Alternative forms header by default? Or in see-also? Kiril kovachev (talkcontribs) 23:52, 3 August 2023 (UTC)[reply]
By "see also" are you referring to the "See also" hatnote at the top of the page? That is supposed to be for orthographically-similar terms in a language-independent fashion, whereas "Alternative forms" are language-dependent forms that are related by both form and meaning. So they are logically independent/orthogonal. Benwing2 (talk) 23:58, 3 August 2023 (UTC)[reply]
@Benwing2 Yes, that's what I thought, e.g. on 1000 meter, placing 1000-meter in that hatnote. Although it's language independent, I believe it still works fine to put such variants in there because they're just small variations in representation of the same letters.
Anagrams that are actually rearrangements should clearly go under ==Anagrams==; and so, would it be fitting to put forms with the same normalization in the {{also}}, or somewhere else? (Or not anywhere at all?) Kiril kovachev (talkcontribs) 12:02, 4 August 2023 (UTC)[reply]
Also, I did as you said and now implement the following:
DIACRITICS = f"{chr(0x0300)}-{chr(0x036F)}"
PUNCTUATION = r"’'\(\)\[\]\{\}<>:,‒–—―…!.«»-‐?‘’“”;/⁄␠·&@\*\•^¤¢$€£¥₩₪†‡°¡¿¬#№%‰‱¶′§~¨_|¦⁂☞∴‽※" + f"{chr(0x2000)}-{chr(0x206F)}"
REDUNDANT_CHARS = f"[{DIACRITICS}{PUNCTUATION}]"
def normalise(word: str) -> str:
"""Normalises the word.
Using the following method:
- Remove all whitespace at the start and end.
- Decompose all characters to their simplest, e.g. é becomes e + ACUTE
- Convert to lowercase (casefold)
- Remove all irrelevant elements (punctuation, diacritics).
"""
word = word.strip().casefold()
for source_char, replacement in CONVERSIONS.items():
word = word.replace(source_char, replacement)
word = re.sub(REDUNDANT_CHARS, "", unicodedata.normalize("NFKD", word.strip()).casefold())
return word
The new number of anagrams is 276373, much lower than before, which signals that the number of erroneous anagrams last time was immense. So thanks for your help on this, I believe it's now much better, although I will check out the anagram sets it generates to spot any further errors.
Please let me know any feedback on potential mistakes in this implementation, Kiril kovachev (talkcontribs) 18:51, 4 August 2023 (UTC)[reply]
@Kiril kovachev You are stripping whitespace at the beginning and end; shouldn't you just ignore whitespace entirely? Also I see -‐? in the middle of the punctuation regex, what does that mean? Benwing2 (talk) 19:00, 4 August 2023 (UTC)[reply]
@Kiril kovachev Also, as for the see-also hatnotes, ===Alternative forms=== and anagrams, I think they should operate independently of each other; each has its appropriate definition and I wouldn't worry if there happens to be some duplication. Benwing2 (talk) 19:02, 4 August 2023 (UTC)[reply]
@Benwing2 -‐? is actually [U+002D (hyphen-minus)] + [U+2010 (hyphen)] + [U+003F (question mark)]. Well-spotted, the hyphen-minus was meant to be escaped (i.e. it was meant to be \-‐?, because otherwise it was doing some strange character range from » (U+00BB) to ‐, which was a mistake. It was giving normalise("Shangri-La") == "shangri-la" as True, but after the change it's fixed. I had to escape several other characters that I missed the first time around as well, and the new set looks like PUNCTUATION = r"’'\(\)\[\]\{\}<>:,‒–—―…!\.«»\-‐?‘’“”;/⁄␠·&@\*\\•\^¤¢\$€£¥₩₪†‡°¡¿¬#№%‰‱¶′§~¨_\|¦⁂☞∴‽※" + f"{chr(0x2000)}-{chr(0x206F)}".
This made the anagram count go back to 311421.
Also, I agree we should keep those headings separate. It occurred to me that updating see-also would be rather difficult, considering some of them have links to the Appendix to see other forms, meaning adding forms indiscriminately to the see-also bit is not so easy because a different page altogether would need to be considered. And adding to alternative forms is impossible without knowing the semantics of the word. So I guess we shouldn't do any of that. Kiril kovachev (talkcontribs) 19:13, 4 August 2023 (UTC)[reply]
And, yes, I don't know why I was explicitly stripping it, it should just ignore it everywhere. Kiril kovachev (talkcontribs) 19:17, 4 August 2023 (UTC)[reply]
@Kiril kovachev: Sounds good. Note that you don't need to escape characters inside of bracketed character classes except for [, ], - and \. Definitely we should worry about the see-also hatnote later; thankfully there is some existing code available for this at User:OrphicBot so if we wanted to redo it, it would be easier than starting from scratch. Benwing2 (talk) 19:19, 4 August 2023 (UTC)[reply]
Thanks for the heads-up, I got rid of the unnecessary \s.
That bot's source code looks incredibly sophisticated, I can't make out anything about it, but if we can just run the code and set it up to generate the see-alsoes again, that would be nice. (I don't even know how it works; do we need to feed it something for it to know what entries to update/include?) Kiril kovachev (talkcontribs) 19:27, 4 August 2023 (UTC)[reply]
Your guess is as good as mine :) ... I haven't even looked at the source code yet. Benwing2 (talk) 20:40, 4 August 2023 (UTC)[reply]
We give Banda as an anagram of English A band, so the responsible anagrammatizer ignored internal spaces.  --Lambiam 23:08, 4 August 2023 (UTC)[reply]
Quite right, this is fixed as of the latest changes. Kiril kovachev (talkcontribs) 01:16, 6 August 2023 (UTC)[reply]
@Kiril kovachev Why are you folding ı and i? --RichardW57 (talk) 05:49, 4 August 2023 (UTC)[reply]
Because they are considered to be the same kinda letter in English. cf (talk) 05:50, 4 August 2023 (UTC)[reply]
@CitationsFreak What? 'ı' is a blatantly foreign letter. Or are you claiming a subtractive diacritic? --RichardW57m (talk) 08:43, 4 August 2023 (UTC)[reply]
Yes it is treated as a subtractive diacritic in English. Among the handful of place names in Turkey for which we have English entries using it it is freely replaced with 'i' in actual English writing (e.g.) and has already been treated as equivalent to 'i' in anagrams as at Ağrı. —Al-Muqanna المقنع (talk) 09:36, 4 August 2023 (UTC)[reply]
Thank you, I was going to point to the exact same example. That's where I originally got that mapping from. I don't know how to find others like it, but it's one of the few I could find. Kiril kovachev (talkcontribs) 11:56, 4 August 2023 (UTC)[reply]

Family tree of the Slavic languages[edit]

Hello. Please update the Slavic language tree. It is necessary to eliminate errors and inaccuracies.

Slavic language tree
  1. Old Novgorodian (zle-ono) is not a descendant of Old East Slavic (orv). This is a "sister language", which has very archaic features that were not in the OES. Therefore, it is necessary to make the Old Novgorodian a descendant of East Slavic (zle) family;
  2. Carpathian Rusyn (rue) should be made a descendant of the etymological Old Ukrainian (zle-ouk), together with the Ukrainian (uk).
  3. Etymological Old Ukrainian (zle-ouk) and Old Belarusian (zle-obe) should be renamed to "Middle Ukrainian" & "Middle Belarusian". Because the forms of these languages with "Old" belong to the period of Old East Slavic, like dialects.
  4. Bulgarian (bg) and Macedonian (mk) are very related. Why then is only one of them listed as a descendant of Old Church Slavonic (cu)? Either make Macedonian & Bulgarian descendants of OCS, or reconsider the position of Bulgarian as a descendant of OCS.

ZomBear (talk) 12:46, 4 August 2023 (UTC)[reply]

@ZomBear (1) seems non-controversial. I don't have enough knowledge to speak to (2) and (3). (4) is definitely controversial; some people think that Bulgarian and Macedonian shouldn't be descendants of OCS, some think both should, some think only Bulgarian should. I think the reason that only Bulgarian is currently a descendant of OCS is that most OCS material (at least here in Wiktionary?) is in the Bulgarian recension (I think?), and has specifically Bulgarian features (such as confusion of шт and щ and of the two yers?). Sorry, I put a lot of question marks due to my knowledge here being a bit shaky. Pinging the Proto-Slavic group: (Notifying Rua, Atitarev, Bezimenen, Jurischroeer, Useigor, Greenismean2016, PUC, Fay Freak, Vorziblix, ZomBear): Benwing2 (talk) 19:10, 4 August 2023 (UTC)[reply]
1 is logical, 3 is correct and probably dispels confusion, so I support after a quick look upon the usage of both variants. 2 like Benwing I lack knowledge, in many cases to this day Rusyn is even treated as a dialect of Ukrainian, and it likely sneaks in as such. 4 I don’t think it controversial much, if only one is made a descendant more like inertia. I have tended to assume for years that actually Macedonian should be made a descendant of Modern Bulgarian even, since it is an Abstandssprache created from what was from a balanced view reckoned dialects of Bulgarian. There is dialect literature from the 19th century where it is difficult to say whether it is Bg. or Mk. and in the end modern editions normalize it to one of the two—so I could quote it in both Macedonian and Bulgarian entries, lol: The book at чу́тура (čútura). Fay Freak (talk) 21:14, 4 August 2023 (UTC)[reply]
I'm pretty sure we have a majority consensus on 4 with only Sławobóg still opposing it. We should get on with it. Thadh (talk) 19:27, 5 August 2023 (UTC)[reply]
@Thadh In what direction do you mean? It's just if we intend to remove Bulgarian as a descendant of OCS, there may be some problems because there are a lot of entries that use {{inh}} for its relationship with it, and that's how I've been entering any terms from OCS personally. Contrarily, I see zero Macedonian etymologies quoting OCS and all referring directly to Proto-Slavic. So what's the plan? Did we decide this in a different discussion? Kiril kovachev (talkcontribs) 01:12, 6 August 2023 (UTC)[reply]
Here's the link. I believe the overall outcome of this discussion was that we should make both languages descendants of OCS and handle them as such. Thadh (talk) 07:19, 6 August 2023 (UTC)[reply]
@Benwing2: I agree with you on #4. @ZomBear: Overall, it's a good tree as is. I don't have a strong opinion or knowledge about the other points. Weak support on #3. Anatoli T. (обсудить/вклад) 02:28, 7 August 2023 (UTC)[reply]
@ZomBear I lack the background to comment on the first three points, so I'm only going to comment on 4) about Bulgarian and Macedonian. The topic of their descent from OCS has come up several times in Beer Parlour, and the discussions IMO usually end up sounding like on-the-fly original research. Old Church Slavic was based on dialects that are ancestral to both Bulgarian and Macedonian contemporary dialects. The fact that both languages have standard varieties selected in the 19th (bg) and 20th (mk) centuries is a language policy question, not a linguistic descent question, and non-standard varieties continue to be alive and well in both countries. The idea that Old Church Slavic, Old Bulgarian and Old Macedonian refer to lects that are materially different from one another strikes me as a curious one, and I'm very eager to see references to research that portrays that idea as a consensus opinion in the field. The Institute for Bulgarian Language at the Bulgarian Academy of Sciences would have to disagree.

@Fay Freak despite the official position of my own country, I'm against marking Macedonian as a descendant of Bulgarian. In the context of past and ongoing dynamics between our two countries, this would be a hugely political statement, and injurious to Macedonians. It's long been recognized that dialect/language distinctions in a dialect continuum can't always be made on purely linguistic grounds. Macedonians have a separate national identity and a separate ethnolinguistic identity, so the fair approach would be to treat Macedonian and Bulgarian as sister Southeast Slavic languages with a common origin - i.e. OCS. This isn't very dissimilar IMO to the situation with Scandinavian North Germanic languages which have a high degree of mutual intelligibility and a shared origin in Old Norse. One could probably make the case that Middle Bulgarian is a shared stage of the development of both Bulgarian and Macedonian, since that predates 19th-century nationalist movements and the historical record of distinct ethnolinguistic identities. I'm not sure it would be helpful to anyone in practical terms, though.
Thanks,
Chernorizets (talk) 07:17, 6 August 2023 (UTC)[reply]
@Chernorizets: POG, you mentioned the Bulgarian Academy of Sciences, which forbids Macedonians to call their literary language as the Macedonian language...
I hasten to please this the Bulgarian Academy of Sciences that the first lexicographic description of the Macedonian language, which covered the initial period of its formation (1945‒1960), intellectually belongs to two Soviet linguists Vladislav Illich-Svitych and Dime Tolovski.
I added this material to Wiktionary two years ago Template:R:mk:Tolovski–Illich-Svitych 1963. ɶLerman (talk) 11:37, 6 August 2023 (UTC)[reply]
At the expense of Middle Bulgarian (in fact, there are many monuments there) this is very good information. However, accentologically, we cannot directly trace the development of the accent curves of the Bulgarian and Macedonian systems from these monuments. I think you can guess why. ɶLerman (talk) 12:08, 6 August 2023 (UTC)[reply]
@ZomBear: You say that you need to remove inaccuracies, although at the same time you offer new inaccuracies. ɶLerman (talk) 10:37, 6 August 2023 (UTC)[reply]
@ɶLerman What inaccuracies do you see in points 1, 2 and 3? ZomBear (talk) 10:42, 6 August 2023 (UTC)[reply]
I have set Old Church Slavonic as the ancestor of Macedonian. Vininn126 (talk) 12:36, 15 August 2023 (UTC)[reply]
Honestly, I've never understood why Macedonian is considered to be so close to Bulgarian as opposed to Serbo-Croatian. At least in terms of its historical phonology, it seems to have as many sound changes in common with SC as opposed to Bulgarian as vice versa (considering only the standard varieties of all three languages at least). Like SC, PSl. *ť ď become /c ɟ/, unlike Bulgarian where they become /ʃt ʒd/. Like Ekavian, PSl. *ě is always /e/, unlike Bulgarian where it's /ʲa/ sometimes (бел (bel) vs. бял (bjal), снег (sneg) vs. сняг (snjag)). Macedonian and SC both have a syllabic /r̩/, unlike Bulgarian. Macedonian and SC both change *čr to cr, unlike Bulgarian. And the number of sound changes Macedonian and Bulgarian have in common as opposed to SC doesn't seem to be any higher. Is Macedonian closer to Bulgarian in other aspects of the language (morphology, syntax, lexicon)? (I know it shares with Bulgarian the loss of the nominal cases.) —Mahāgaja · talk 13:24, 15 August 2023 (UTC)[reply]
@Mahagaja this sums it up. TL; DR - grammar. The phonological differences you've listed don't take into account the dialects of Bulgarian and Macedonian, only the varieties that were chosen to be standard by deliberate language policy on both sides of the border. Chernorizets (talk) 18:39, 15 August 2023 (UTC)[reply]
@Vininn126 Thank you! Long overdue. I'm glad it's finally done. Chernorizets (talk) 18:40, 15 August 2023 (UTC)[reply]

Interslavic language[edit]

How about adding a language code for an constructed language — Interslavic (isl)? Supports both Latin & Cyrillic script (just like Serbo-Croatian). Dictionary (17,970 words). After all, Wiktionary has Esperanto, Lingua Franca Nova etc. ZomBear (talk) 13:03, 4 August 2023 (UTC)[reply]

@ZomBear, @Sławobóg isl can't be used for Interslavic, it's already assigned to Icelandic in the ISO 693-3 standard. Also per WT:Languages, any language that does not have an ISO code must have a Wiktionary-specific code made per the guidelines listed there. Also, per WT:CFI#Constructed languages, it'd have to be in the Appendix unless there's a full vote for it. AG202 (talk) 17:56, 4 August 2023 (UTC)[reply]
@AG202 I do not insist on using the code isl. You can use some other, for example sla-isl, sla-int or think of something else. However, an application has already been submitted for obtaining the Interslavic of its ISO code. ZomBear (talk) 18:13, 4 August 2023 (UTC)[reply]
I was only giving information as to what the current policy is. I’m not currently going to take a stance about whether or not it should be included even in the Appendix. And with the ISO code application, until it’s approved, we cannot assign the code to it. (See: the case with Toki Pona) AG202 (talk) 20:32, 4 August 2023 (UTC)[reply]
What would this language code be used for? If the goal is to add Interslavic entries, I oppose this. In fact, I oppose spreading any Interslavic material on the website except perhaps in appendices. And how can a constructed language be said to be a descendant of anything? PUC18:18, 4 August 2023 (UTC)[reply]
Oppose like PUC probably also for appendices. There is no clear nature or corpus of this. It is an occasional pidgin from existing material and Wiktionary already serves the language more than anyone else by its exhaustive documentation of the Slavic languages and in particular Proto-Slavic, of which it may also be reckoned an orthography, due to its finding common grounds by means of archaicization, with imaginate synsemantics from the individual languages. Fay Freak (talk) 18:29, 4 August 2023 (UTC)[reply]
@PUC Well, for example, the Old Church Slavonic (cu) was an artificial language. On Wiktionary, he is listed as a descendant of Proto-Slavic (sla-pro). The Interslavic language is just as artificial, constructed on the basis of all Slavic languages, who are all descendants of Proto Slavic. All the original Slavic words on which the Interslavic language is built - all those words come from a common ancestor language - Proto Slavic. ZomBear (talk) 18:30, 4 August 2023 (UTC)[reply]
No. Don’t equate literary standards with conlangs. Distanzsprache is always different from Nähesprache. Fay Freak (talk) 21:18, 4 August 2023 (UTC)[reply]
Lingua Franca Nova is analogous: like Interslavic should be, it's an appendix-only constructed language made up of words from languages in one language family. We don't consider it a member of that family, nor do we consider it a descendant of any language in that family (reconstructed or not). Its words are derived from words in those languages (I would say borrowed), but definitely not inherited. Chuck Entz (talk) 18:47, 4 August 2023 (UTC)[reply]
  • Support, but put in appendix. User:CitationsFreak (talk) 23:08, 4 August 2023 (UTC)[reply]
  • Oppose. Before we gonna list all the artificial languages, i think it is better to first be done with the natural ones (dialects as well). I'm not sure why Esperanto is not in appendix.
Tollef Salemann (talk) 10:13, 5 August 2023 (UTC)[reply]
Because it's the famousest conlang, with a lot of durably archived material. cf (talk) 17:06, 5 August 2023 (UTC)[reply]
It also has a significant number of native speakers. AG202 (talk) 19:22, 5 August 2023 (UTC)[reply]
I think this is a more convincing reason, to be honest. Theknightwho (talk) 21:06, 5 August 2023 (UTC)[reply]
Oh yeah, I forgot about the native speakers thing. cf (talk) 21:17, 5 August 2023 (UTC)[reply]
Oppose. Conlangs should be in an appendix. Vininn126 (talk) 10:14, 5 August 2023 (UTC)[reply]
Oppose. — Fenakhay (حيطي · مساهماتي) 16:29, 5 August 2023 (UTC)[reply]
Oppose Thadh (talk) 19:26, 5 August 2023 (UTC)[reply]
I Support this but like others have said, in the appendix, and only if there's a reference lexicon to draw from—we don't want to make up words and so on just for this site. Kiril kovachev (talkcontribs) 01:15, 6 August 2023 (UTC)[reply]
@Kiril kovachev https://interslavic-dictionary.com/ - dictionary where 17,970 words. ZomBear (talk) 06:21, 6 August 2023 (UTC)[reply]
@ZomBear Looks good to me. I support this idea. I don't see why it shouldn't be considered a Proto-Slavic descendant, personally, I think it would be hard to explain the etymology of entries without being able to refer to the literal origin of the term precisely like this; unless we consider the terms to be borrowings, either from Proto-Slavic (?) or from numerous modern languages, which I think is just much uglier than listing a single Proto-Slavic derivation (which I assume applies for the majority of words). But whatever we decide, I guess. Kiril kovachev (talkcontribs) 18:10, 6 August 2023 (UTC)[reply]
That's not how inheritance works. Vininn126 (talk) 18:13, 6 August 2023 (UTC)[reply]
@Kiril kovachev It will be possible to write that the words are ultimately borrowed from Proto-Slavic. For example, for Esperanto it is often indicated that the words are borrowed from Latin or another language. ZomBear (talk) 18:50, 6 August 2023 (UTC)[reply]
Fair enough, I don't know if that dictionary explains any of its etymologies, but I guess these can be inferred from the obvious cognates? Kiril kovachev (talkcontribs) 20:02, 6 August 2023 (UTC)[reply]
@Kiril kovachev FWIW, here are the design criteria for Interslavic vocabulary, which explain where the creators of the language borrow words from. A lot of the vocabulary is common Slavic, so you'd represent that as {{der}}ived from sla-pro, but then there are words specific to each Slavic language they used as a vocabulary source, and there you'd want to {{der}}ive from that language. Chernorizets (talk) 21:40, 6 August 2023 (UTC)[reply]
I see, that way it doesn't need to inherit nor borrow, exactly, from any langauge. Seems good, thanks for sharing that link. Kiril kovachev (talkcontribs) 21:52, 6 August 2023 (UTC)[reply]
Hm, sorry folks, am a bit courious what's the position here on the Wellentheorie here at Wiktionary? Was it any discussions about it here previously? It bothers me a lot when am thinking on Scandinavian and Eastern Slavic. How be we suppose to deal with stuff like Interslavic then? Sorry for kinda offtop. Tollef Salemann (talk) 22:05, 6 August 2023 (UTC)[reply]
Abstain FWIW, the third and current change request to add a code for Interslavic to ISO 639-3 is asking for isv. The previous two change requests were denied, and the current one is Pending.
Conlangs aren't really "descendants" of anything. The creators of Interslavic market it as a modernized and simplified variant of Old Church Slavic, but that doesn't change the fact that it didn't "evolve" from it in the linguistic sense of the term - it was created by people on purpose in the 20th century. It will be interesting if it one day becomes someone's first language, the way I've heard Esperanto has.
Speaking of, I don't understand why the answer would be different from the one for Esperanto, which has more entries on EN Wiktionary than Bulgarian - a natural living language and one of the official languages of the EU. Interslavic is the so far most successful attempt at creating a Slavic auxiliary language, so even if people around here are unfamiliar with it, it's relevant to people interested in facilitating inter-Slavic (no pun intended) communication. Chernorizets (talk) 07:36, 6 August 2023 (UTC)[reply]
I'd say that the size of a language's lexicon shouldn't be a factor in whether it's included in mainspace or not, whether the language is natural or artificial. We may have another vote coming, though, at which things like this could be talked about. See the discussion at the very bottom of the April 2023 Beer Parlour, though I don't wish to create the impression of imminency, or to rush the community towards a vote that might need to be very carefully written before it comes to table. Soap 12:20, 6 August 2023 (UTC)[reply]
Oppose. It's best to shoot conlangs on sight, before they can really begin to multiply. Nicodene (talk) 22:15, 6 August 2023 (UTC)[reply]
Support. To be honest, I don't get why so many people here hate conlangs. As long as they're actually being used, I don't see why we shouldn't cover them. Binarystep (talk) 23:22, 9 August 2023 (UTC)[reply]
Support. I think "all words in all languages" should include any conlangs that have any kind of use. Andrew Sheedy (talk) 09:50, 12 August 2023 (UTC)[reply]
Much too lenient of a criterion imo. PUC10:00, 13 August 2023 (UTC)[reply]
I agree with @PUC on this one: "any conlangs that have any kind of use" opens the door to anything, really. Someone using their own made-up gibberish for a journaling exercise would clear that low bar. I'm not sure where the threshold should be, but "anything" is too loose, and would invite a crapflood. ‑‑ Eiríkr Útlendi │Tala við mig 17:54, 14 August 2023 (UTC)[reply]
Im not sure they meant any kind of use quite the way we're interpreting it here .... Interslavic has an active userbase of 7000 speakers according to Wikipedia, and is a well-developed language unlike some artistic languages where "learning the language" means memorizing a list of a few dozen words used in a movie or fantasy novel because that's all that exists of the language. Soap 11:37, 22 August 2023 (UTC)[reply]
Oppose --Anatoli T. (обсудить/вклад) 03:31, 15 August 2023 (UTC)[reply]

Use of {{rfdef}}[edit]

Is it in order to request the deletion of entries whose sole definition is an invocation of {{rfdef}} on the grounds that there is no content? The documentation of the template rather suggests that this is a request for content, and that we should not simply implement a policy of refusing requests point blank. I appreciate that some linger because no-one chooses to make the effort - which for a good definition can be considerable. --RichardW57m (talk) 14:30, 4 August 2023 (UTC)[reply]

Some such entries have been speedied, which suggests to me that the templates's documentation needs to be changed, perhaps to suggest a time after which such requests may simple be rejected on the basis that the editors can't be bothered to create a definition. --RichardW57m (talk) 14:30, 4 August 2023 (UTC)[reply]

IMO no it is not in order. To me, {{rfdef}} means that a term (or glyph as I suspect you're talking about) is verifiable, but the author of the entry isn't sure what it means or how to phrase a sense. If someone thinks it isn't verifiable or that there's some convincing reason why it's unsuitable for inclusion then it should go through the normal RFV and RFD process. "No content" is not a good reason for speedying in that case. —Al-Muqanna المقنع (talk) 14:44, 4 August 2023 (UTC)[reply]
Sorry, I forgot to confess that the question was prompted by the deletions of Unicode characters with minimal research and their imminent deletions. I am actually bothered by some of the recently invented characters that have been encoded, but proper research can be hard work - and even harder if one doesn't have easy access to a good specialist library. It occurred to me that I found quite a few alleged Pali words without definition, and no referenced evidence of their existence. Some of these I've condemned to RfV because I can find no evidence and I hope they were just typos or misanalyses; fortunately, most I was able to find at least in dictionaries. I was wondering how many simply got speedied because they irritated whoever tried to expand them, a waste of everyone's effort. --RichardW57m (talk) 16:53, 4 August 2023 (UTC)[reply]
What would be in order in most cases would be an RfV. Defining an obscure or otherwise hard-to-define term without some real evidence is impossible. — This unsigned comment was added by DCDuring (talkcontribs) at 17:19, 4 August 2023 (UTC).[reply]
I would speedy delete an English entry that consisted solely of a {{rfdef}} sense line with no further info; that is what WT:REE is for. It seems to me that {{rfdef}} is meant to be used where you have found a word but are not sure how to define it. This means it needs to be accompanied by quotations, usage examples, or some other kind of starting info.
The situation with letters/characters is a little different, as quotations are not really relevant for these types of entries. In the long run I believe we should have an entry (or hard redirect) for every available Unicode character, irrespective of attestation considerations (saying that the character is not used is valuable info in itself), but I suspect my view is not shared by the majority of the community. This, that and the other (talk) 02:44, 6 August 2023 (UTC)[reply]
@This, that and the other: Indeed, that argument was lost for the automatically generated Hangul syllables, and I suspect would be lost for the unified ideographs. I think a lot of the latter have been trawled from dictionaries. I do think there is a case for recording persistent lexicographic ghosts, though.
The sort of sketchy entry I have in mind is Special:diff/48296543#Pali, where we at least have a part of speech. I did a clear out of such back in June, and it would seem that any more recent Pali ones, if created, have been dealt with in some fashion. It's possible that one thing preventing a quotation was the lack of a citable source; there's a lot of material on the internet that appears to be not durably archived or in copyright and unlicensed. --RichardW57m (talk) 10:11, 7 August 2023 (UTC)[reply]
I agree we should not have entries that are just part of speech + "rfdef" and no other content (use WT:REE instead). If the entry has substantial content, e.g. etymology, citations, references, but no definition, that is non-ideal but okay. I occasionally do this for very technical terms where I can't work out the meaning (chemicals, advanced mathematics, etc.). There are users who look for these and try to fill them in. Equinox 16:35, 7 August 2023 (UTC)[reply]
@Equinox: Should one really use WT:REE for Pali words? --RichardW57 (talk) 07:12, 8 August 2023 (UTC)[reply]
@RichardW57m there is Wiktionary:Requested entries (Pali) - currently empty. This, that and the other (talk) 01:01, 19 August 2023 (UTC)[reply]
@This, that and the other: Thank you. As it's a page as opposed to a category, I must have missed it when looking for requests for entries, perhaps because it's the sole page in a category in a category that otherwise consists only of categories. I vaguely remember working from it, and I wondered why I couldn't find it. --RichardW57 (talk) 07:13, 19 August 2023 (UTC)[reply]

non-standard Dari[edit]

notifying @Ariamihr, @Atitarev, @Benwing2, @Dijan, @Mazsch, @Rodrigo5260, @ZxxZxxZ, @Saranamd and all else involved in Persian


I am proposing non-standard varieties of Dari NOT be categorized as "well documented" inWiktionary:Criteria for inclusion/Well documented languages

for multiple reasons:

1) While standard Dari is well documented due to news agencies, regional vocabulary is not well recorded AT ALL. Largely because, unlike Iran, most people in Afghanistan do not have any internet access. While Iranian colloquialisms and regionalisms are well recorded, documentation for colloquial Dari is extremely difficult to find. If you do find any, it'll most likely be Kabuli the chances of finding material for other regional varieties are even slimmer. On top of that the majority of Afghanistan is illiterate, as sad as it is; and people who are literate tend to write in the standard dialect, so regional vocabulary being attested in writing is rare. It's hard to look at that and say regional varieties of Dari are "well documented".

2) Due to overlap between Hazaragi and standard Dari, Hazaragi was merged into Persian. This was probably a good thing and avoids situations like آب where the "Hazaragi" entry was just about standard Dari and didn't include any information on Hazaragi (Hazaragi pronunciation would be aw or ew). However prior to the merger Hazaragi was categorized as "not well documented", and the merger did not take that into account or put an exclusion for Hazaragi in the well documented list. While Hazaragi has much overlap with Kabuli and standard Dari it is also well known for having A LARGE array of regional vocabulary rarely or sparingly found elsewhere.

3) With political instability in Afghanistan, who knows when Afghanistan will become stable enough for linguists to be comfortable traveling there again. And with the already poor material on varieties of Dari, It's likely that varieties of Dari other than Standard Kabuli will never be properly documented for the foreseeable future. سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 20:57, 5 August 2023 (UTC)[reply]

Support Rodrigo5260 (talk) 21:13, 5 August 2023 (UTC)[reply]
Support, obviously. سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 23:40, 5 August 2023 (UTC)[reply]
Support Sounds very usefull. Tollef Salemann (talk) 00:09, 6 August 2023 (UTC)[reply]
Support As you already noted for colloquialisms, even the well-documentedness of standard Dari is doubtful, as news agencies typically cover certain registers and topics, and how is press freedom in Afghanistan now anyway. I have always had the impression that little is coming out of Afghanistan on the internet. I don’t understand though that this would formally change the text of Wiktionary:Criteria for inclusion/Well documented languages. Not even the recent discussions about Middle Polish did. But its society was never exposed to the Internet to be well documented on the Internet. Beside obvious chronolectal exceptions to the WDL list catering to the actual spread of printing within societies, it was always clear to me that terms needed only to be attested as would be expected from the usage context claimed for an entry, providing that the circumstances are appropriately reliable, rather than just schematically counting three quotes. Fay Freak (talk) 00:34, 6 August 2023 (UTC)[reply]
Support @Sameerhameedy: It seems reasonable but are we already separating Dari from (other) Persian (using 'prs' vs 'fa')? And what does it achieve? Do we have separate treatments for non-standard varieties? We record what's available. If you know a Dari form, you can add it, otherwise it defaults to modern Iranian. Are you referring to a specific Dari variety, which should be downgraded? The problems I see with the current pronunciation module is that it automatically, by default generates four varieties, as you know: "Classical Persian", "Dari Persian", "Iranian Persian" and "Tajik" pronunciations, which is mostly predictable and mostly correct but not always, especially if a word in a given variety doesn't even exist. --Anatoli T. (обсудить/вклад) 04:23, 6 August 2023 (UTC)[reply]
@Atitarev From my understanding, languages categorized as "Well documented" have stricter requirements for inclusion. I just wanted to make sure those strict requirements didn't apply to regional varieties of Dari since finding resources for nonstandard terms is extremely difficult.
Part of it is also that I found a Hazaragi dictionary from SIL international, it is one of the only comprehensive resources for Hazaragi i've ever found on the internet with appropriately ~10,000 entries. Since Hazaragi's merger there isn't clarity on whether Hazaragi is still considered "less documented". Not all entries are worthy of inclusion and most overlap with Kabuli but I wanted to make sure including some of the important ones was allowed. (I also definitely will not include all those terms since I mentally cannot sort though 10,000 definitions in a dictionary). سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 04:37, 6 August 2023 (UTC)[reply]
See Wiktionary:Votes/2022-08/Regional and Obsolete variations as LDL's. Most people would agree that such varieties should be treated differently, but we have yet to figure out how to do it without trashing the rest of CFI. In the meanwhile, this kind of discussion probably won't accomplish much: if anyone were to rfv any of these terms, they would fail without the required 3 cites. Chuck Entz (talk) 05:14, 6 August 2023 (UTC)[reply]
So if terms from small 'lects like Hazaragi were rfv'd they'd probably be deleted?? I really hope that's not what your saying because that basically means Hazaragi's merger with Persian was a death sentence.
I know the site-wide vote failed, but can't language communities adopt language specific policies?? AKA can't Persian Wiktionary vote to adopt that policy for Persian entries? سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 06:36, 6 August 2023 (UTC)[reply]

Slavic palatalization template[edit]

Hello, do we have a template to specify a sound change due to the Slavic palatalizations, kind of like {{ja-rendaku2}}? I mean to use it/them for formulaic sentences like this one on Bulgarian шикалчен (šikalčen)'s Etymology: "The к changes to ч in an instance of the Slavic first palatalization." I haven't been doing this until just recently, but it's not just in this case that it happens, so I figure it might be nice to have. I couldn't find any existing template for it — is that because it doesn't exist? Also, if creating it from scratch, should it be translingual or should each language have its own, for categorization purposes? Kiril kovachev (talkcontribs) 23:01, 5 August 2023 (UTC)[reply]

@Kiril kovachev: Usage notes templates should start with "U:" followed by the language code, if there is one (a Slavic-wide template could start with "U:sla:"). See Special:PrefixIndex/Template:U:. If the wording is the same across languages, it would be very simple to include an optional parameter for a language code. You probably would want to have a parameter for the consonant in question, and either one for what it transforms into, or a lookup table if the correspondence is one-to-one. This is a simple enough application that just about everything but expanding the language code can be done just with template logic (and even that could be handled by a large enough lookup table if you really wanted). Chuck Entz (talk) 23:27, 5 August 2023 (UTC)[reply]
Oops! I see you meant this for the etymology section. Scratch the part about the naming conventions.Chuck Entz (talk) 23:30, 5 August 2023 (UTC)[reply]
See Category:Etymology templates by language. Chuck Entz (talk) 23:34, 5 August 2023 (UTC)[reply]
@Chuck Entz Yeah, that's the thing, I know it maps 1-to-1 for my uses, but I don't know how it works in other languages, but in principle it should be just about the same. {{ja-rendaku2}} just has the user input both the original form and the sound-changed form, so we could have that be a maybe-optional second parameter for if there's a problem with the automatic generation. I don't know if I'm competent enough to write a template like that though, only through a module...
Also there should be two templates, one for each of the two Slavic palatalizations. For the record, I could only see two Slavic languages in that category, so I take it there's no template like this, because the Polish/Silesian ones are about the alphabet. I'll make an effort to make these, potentially tomorrow. Kiril kovachev (talkcontribs) 23:38, 5 August 2023 (UTC)[reply]
@Kiril kovachev would we want a template for this? Palatalization is pervasive in Slavic languages, so that would end up being repeated/repetitive detail on a lot of entries. I can think of a couple of alternatives:
  • a dedicated appendix which explains the Slavic palatalizations and provides examples from several languages. It should be linked to from things like Category:Bulgarian language.
  • a simpler template that produces the text "with palatalization", where the hyperlink points to the correct palatalization in the Glossary. E.g. something like {{sla-pal|ru|first}} (maybe the language code is redundant).
Cheers,
Chernorizets (talk) 08:02, 6 August 2023 (UTC)[reply]
@Chernorizets I like the idea of having a template, and I invoked {{ja-rendaku2}} because I think it's a good model for how to structure the information, i.e. to fully explain (1) what sound changes; and (2) into what corresponding sound; (3) and as part of what palatalization: so, effectively I was thinking of having one template for each palatalization, since they apply differently, and, also, although I'm not that experienced with them, I think the first is much more common to me than the others (if Appendix:Glossary is any indicator, then it reinforces this take because only the first and progressive are documented there); if we do it this way, explaining the first palatalization won't require a separate argument, but rather will be built into the template itself, so the template won't have as much to do. But, besides the first, there're the second and progressive palatalizations to cover, which may be an argument for having just one template. If it were me, though, I'd do:
  • have the template output text like what I cited from шикалчен (šikalčen) above, with a syntax such as {{sla-pal-1|bg|к}};
  • the result of the palatalization is calculated as a function of the letter that's passed in, but could optionally be passed as a second argument, e.g. {{sla-pal-1|bg|-ка|-чен}} or similar;
  • it links to the glossary to the relevant palatalization.
The main glossary already has the first and progressive palatalizations documented, but like you said it might be good to have a page dedicated just to explaining it clearly, by example from multiple languages. I don't really know what's best, but I've logged on today to try to write a basic template like what I just described; I'll share if I manage to get it to work ^^ What do you think we ought to do? Kiril kovachev (talkcontribs) 18:06, 6 August 2023 (UTC)[reply]
Also, I forgot to mention, I meant to include the language code so we could do categorizations such as Category:Bulgarian terms affected by the Slavic first palatalization (kind of verbose name, but you get the picture); would this be a desirable feature? Kiril kovachev (talkcontribs) 18:14, 6 August 2023 (UTC)[reply]
@Chernorizets Please check out Template:User:Kiril kovachev/sla-pal-1, I managed to get something together. Kiril kovachev (talkcontribs) 19:57, 6 August 2023 (UTC)[reply]
However upon thinking about it, your "with palatalization" idea is quite smart, it's a lot less bloated and may be more suitable for modern languages, and it may be just as good for explaining what sounds go to what, as long as the appendix link is detailed. It's slightly less explicit, though: I do like the notion of naming the pre-palatalization letter and what it changes into, but that's just me. Kiril kovachev (talkcontribs) 20:34, 6 August 2023 (UTC)[reply]
We don't need this at all. Sławobóg (talk) 19:31, 6 August 2023 (UTC)[reply]
@Sławobóg You don't need it, perhaps. I think it's a useful tool, partly didactically, to teach readers why the consonant changes. Is there a reason why you think we should not reflect such information in etymologies? Kiril kovachev (talkcontribs) 19:59, 6 August 2023 (UTC)[reply]
@Kiril kovachev I'm a bit leery of this because it feels like it will result in excessively verbose etymologies. My solution to this for Russian was to put the appropriate information in the page for the suffix, rather than on each page that uses the suffix; you can see an example of this in the usage notes for -ный (-nyj). A bunch of changes happen before this suffix (including the Slavic first palatalization, nouns assuming their unreduced form, etc.) and I didn't want to have to put the info on each page using the suffix (templatized or not). It's true that this requires the reader to click through to the suffix in question, but it feels to me like an interested reader who notices an unexplained sound change in the etymology will do this naturally. Benwing2 (talk) 00:59, 7 August 2023 (UTC)[reply]
@Benwing2 That sounds fine to me, too. I don't really mind, clicking the suffix is also completely valid and as you say explains the same information. What do you think of Chernorizets' solution of just citing the presence of palatalization? Or just skip that altogether and just mention that only on the linked suffix? Kiril kovachev (talkcontribs) 01:28, 7 August 2023 (UTC)[reply]
@Kiril kovachev I actually like the idea of putting that info in the suffix page. Proto-Slavic suffix pages are good at elucidating those kinds of phonological processes with the help of examples. Our current coverage of Bulgarian affixes in general is low, so this could be a nice forcing function to fix that. Chernorizets (talk) 01:45, 7 August 2023 (UTC)[reply]
@Chernorizets I was thinking the same thing, I didn't want to say it because I'm not a Proto-Slavic editor really and I wasn't sure if it would be acceptable or not, but I agree, having this information in those central locations could be quite informative, and there'd be no need to duplicate it for every entry that uses the suffix like that as I had spun myself into thinking. Do we still want to keep the template idea or just input this information manually? (If keeping it, maybe ought to change the category to (Language name) first palatalization or something more generic?) Kiril kovachev (talkcontribs) 01:59, 7 August 2023 (UTC)[reply]
Another example of such a suffix is -ać. Vininn126 (talk) 07:01, 7 August 2023 (UTC)[reply]
We are dictionary, not a grammar/phonetics book. If we accept that idea we need to do it for other Slavic languages and actually all other languages that we support. You literally want all other editors to do additional job because you want that template. Should we have template for liquid metathesis too? Sławobóg (talk) 11:45, 12 August 2023 (UTC)[reply]

I don't know if anyone is interested in this, but there may be room for a category of antiphony-type words (a specialized type of Category:English reduplications), including ding-dong, king-kong, ping-pong, sing-song, flip-flop, hip-hop, tip-top, tick-tock, ching-chong. Source: [2] --Geographyinitiative (talk) 14:49, 6 August 2023 (UTC)[reply]

It would be a type of term rather than a topic, but also usually it's called apophony, and refers to a broader class of vowel modification rather than specifically reduplication (cf.). I wouldn't object to something like "English apophonic reduplications". —Al-Muqanna المقنع (talk) 17:16, 6 August 2023 (UTC)[reply]

Relatedly, we have Category:English alliterative compounds. PUC17:18, 6 August 2023 (UTC)[reply]

I am interested in it. (Although I do have to ask, what does "king-kong" mean?) cf (talk) 20:06, 6 August 2023 (UTC)[reply]

I've added "apophonic reduplications" to the list of etymology categories and created Category:English apophonic reduplications @Geographyinitiative. —Al-Muqanna المقنع (talk) 15:24, 7 August 2023 (UTC)[reply]
I like this! Now, what about other languages? Off hand I remember Mandarin has 玲瓏玲珑 (línglóng) and a bunch of other ones. Geographyinitiative (talk) 16:05, 7 August 2023 (UTC)[reply]
Isn’t like more than 90% of the 325 entries I see in Category:English reduplications apophonic in the sense assumed by this category? Almost nothing to be left. But maybe this is intended. Fay Freak (talk) 16:18, 7 August 2023 (UTC)[reply]
@Fay Freak: No, the majority of English reduplications don't specifically involve a vowel change. Consonant changes are also very common (easy peasy, hodgepodge etc.). I've already categorised most of the terms to which this applies. —Al-Muqanna المقنع (talk) 16:32, 7 August 2023 (UTC)[reply]
@Al-Muqanna: I see it now, I didn’t read your definition in the category as strict as you assumed it. I cannot derive this meaning from the definition of apophony in the mainspace. “alternation of sounds within a word that indicates grammatical information” is a vacuous definition altogether, since indicating grammatical information is not possible other than by alternation or meaningful non-alternation. If it means vowel mutation then stem mutation being declared a synonym is doubtful, since in accordance with the literal meaning I think “stem mutation” can also be consonant changes.
So I assume your distinction is that this subcategory should contain formations created by exclusively ablaut other than the reduplication itself; you define in the category “reduplication with a change in vowel sound”; jiggery-pokery does not belong into it, since it contains other changes? What about the insertion in chockablock? While the supercategory requires in its definition that the origin “involved a repetition of roots or stems”, I am not even sure if this is not misleading inasmuch as the stem or root is altered. We have to make sure the categories are not misapplied by sleepy editors. Fay Freak (talk) 16:52, 7 August 2023 (UTC)[reply]
Largely it is equivalent to ablaut, the current entry is wrong. —Al-Muqanna المقنع (talk) 17:02, 7 August 2023 (UTC)[reply]
Fixed now. —Al-Muqanna المقنع (talk) 20:29, 7 August 2023 (UTC)[reply]
If this is the correct term then antiphony apparently lacks that sense. Equinox 16:37, 7 August 2023 (UTC)[reply]
And apophony does not even have a sense. Fay Freak (talk) 16:52, 7 August 2023 (UTC)[reply]
@Equinox: Added with cites. —Al-Muqanna المقنع (talk) 20:34, 7 August 2023 (UTC)[reply]

I've been wondering if there'd be any interest for such a thing in contrast to {{homophone}}. One problem I forsee with this are pages with tons and tons of minimal pairs, in which case we might want to create some sort of subpage with everything sorted. I feel like most words are only gonna have a handful. Thoughts? Vininn126 (talk) 15:28, 7 August 2023 (UTC)[reply]

I don't think the benefits of such a template would outweigh the disadvantages. Thadh (talk) 15:40, 7 August 2023 (UTC)[reply]
Yeah... it's interesting information for language nerds like us who edit this dictionary, but maybe not interesting enough to readers to be worth the space it'd take up, the potential confusion for those who aren't familiar with "minimal pair" and wonder what the connection is that "pairs" bat to butt and is causing us to link them, and the overlap it'll have with the Rhymes in a lot of cases (bat vs cat, mat, etc)...? It might make more sense at a 'higher' level, e.g. if at Appendix:English pronunciation (etc) we added, for each phoneme, examples of minimal pairs where it contrasts with another phoneme. - -sche (discuss) 18:43, 7 August 2023 (UTC)[reply]

Gallo-Italic of Sicily[edit]

We have a few Latin entries whose Descendants sections include a listing for Gallo-Italic of Sicily, a language for which we have no code. Ethnologue apparently considers it a variety of Lombard (lmo). Do we want to follow suit, or do we want to create our own code for it, e.g. roa-gis? —Mahāgaja · talk 10:10, 8 August 2023 (UTC)[reply]

Yes. A Gallo-Italic of Basilicata would also be quite welcome. I wouldn't categorize either as a variety of Lombard. Nicodene (talk) 14:03, 9 August 2023 (UTC)[reply]
OK, I've created roa-gis and roa-gib. —Mahāgaja · talk 22:09, 16 August 2023 (UTC)[reply]

Rename Northern Kurdish to Kurmanji[edit]

I want to request that "kmr" is renamed from "Northern Kurdish" to "Kurmanji". This is because the term Kurmanji (which is "kurmancî" in Kurmanji) is much more often used, and is more correct because Northern Kurdish isn't actually only spoken in north of Kurdistan. I've first-hand seen confusion by Kurds who speak Kurmanji the meaning of "Northern" vs "Central" Kurdish when not accompanied with some sort of word in the dialect or an explanation. The Wikipedia article is also called "Kurmanji", and the list of languages in Google Translate calls it Kurmanji as well.

If it should be made clear that Kurmanji is a Kurdish dialect (which I think is pretty useful as most foreigners just call it "Kurdish"), the name "Kurmanji Kurdish" could be used as well. In fact, I think I would prefer that. Guherto (talk) 10:39, 8 August 2023 (UTC)[reply]

See Wiktionary:Beer_parlour/2018/August#"Kurmanji"_and_"Sorani"_to_"Northern_Kurdish"_and_"Central_Kurdish" for the previous discussion. PS I have no opinion. Vahag (talk) 11:04, 8 August 2023 (UTC)[reply]
“Much more often used” may not be so in formal, linguistic, academic usage. As the term Ebonics, which pervades English more than the appropriate names. Besides there is the issue of citogenesis: Wikipedia copied some database and then unwitting website coders like those of Google copied it. Or relatedly, Farsi, when the internet was just spreading. With its spread over multiple countries, we also have Serbo-Croatian, which probably is not any more used as much as multiple other names in the language. Of course native speakers can be confused by linguistic discourse. Fay Freak (talk) 16:04, 8 August 2023 (UTC)[reply]
@Fay Freak I find it kind of disrespectful that you call the name of the language/dialect a "citogenesis". It's not like Wikipedia made up this fact, which is what citogenesis means. This is a name that has existed for centuries, and is used among most Kurmanji speakers, and those who do not use the name Kurmanji don't use "Northern Kurdish" either (they're literally in the south). Also, not sure why academic usage would matter more than what it is commonly called and known as. And, I'm not even sure "Northern Kurdish" is more common in academic usage. Guherto (talk) 16:41, 8 August 2023 (UTC)[reply]
I don’t call it. I’ve pointed out that there is an issue, or danger of citogenesis. It suffices for it that the distribution of a term or idea is slanted by a choice made by Wikipedia; a term being completely made up by their English edition is indeed rare due to their rules, but as long as someone finds something it is included as “cited”, without quality assessment. We have deleted some of Wikipedia’s language names because of being unattested in use.
Commonness is a construction by means of a (un)certain crosssection. It is clear, too, even on Wikipedia, that a more rigid use can be pursued in opposition to popular science, and a certain register may be more relevant than the other. So I am not sure why common calling would matter more than what certain experts call. Fay Freak (talk) 19:01, 8 August 2023 (UTC)[reply]
  • Support the rename to Kurmanji, which certainly seems to be more common than "Northern Kurdish" in English, quite apart from Wikipedia. The most common name in English appears to be simply "Kurdish" unmodified, but obviously that won't do for our purposes. —Mahāgaja · talk 19:20, 8 August 2023 (UTC)[reply]
    • On degruyter.com, the hit count of Northern Kurdish vs. Kurmanji is about 1:3, but in the latter case a lot of hits are lists of languages (though the former case is not devoid it, as references need soft redirects of synonyms), the same database artefacts suspected, and else more liberal arts books than linguistic books, i.e. studies about the cultures and politics of Middle Eastern societies: a bubble? Unlike with Guherto’s failure to distinguish, I note that I look here for English distribution as opposed local Kurdish usage which of course is not inherited from Wikipedia—a disingenuous imputation. There is of course also a chance that a Kurd who used the word in Kurdish continued to do so in English, which makes people like Guherto biased against the term “Northern Kurdish”. Fay Freak (talk) 19:38, 8 August 2023 (UTC)[reply]
Just spit balling here, what about "Kurmanji Kurdish"?? I think it's important for people googling terms to know there's multiple languages called "Kurdish" so people know which one they need. سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 23:00, 8 August 2023 (UTC)[reply]
In terms of the arguments given above, I see nothing wrong with "Northern Kurdish" as a descriptive term. Looking at the map it's indeed spoken to the north of Central Kurdish. Whether we should rename should go only based on usage in academic sources. That said, if the sources favor "Kurmanji", I would not be opposed to the rename but *ONLY* if we also rename Central Kurdish -> Sorani. Kurmanji Kurdish strikes me as a weird compromise that might not be best. Benwing2 (talk) 23:22, 9 August 2023 (UTC)[reply]
This too. But because the word Kurdish does not occur in the articles anymore then, which is also rejectable SEO as certainly more people stumble upon us by vague Kurdish than any of the particular names, we might even have more complicated language section titles like Northern Kurdish (Kurmanji) but only if we do not need to write it but the L2 names are generated by templates, while the categories names, more internal to the dictionary anyway, would stay simpler. The Wiktionary community at some point considered such templates but for its eventual server strain and reprogramming and maintenance work-loads it is low priority, and not of a greater priority because of the present pedantic argument, where even in the worst case we are only 25 % against 75 %, of disparate qualities. Fay Freak (talk) 01:56, 10 August 2023 (UTC)[reply]
My (probably fairly useless) two cents: some years ago, there was an active community of ckb speakers who were involved in setting up their own-language edition of Wikipedia. I distinctly recall the language was referred to by all involved as "Sorani Kurdish". I'm not sure if the equivalent double-barrelled name is ever used for Kurmanji though. This, that and the other (talk) 11:41, 12 August 2023 (UTC)[reply]
Yeah I think we should follow the example of Sorani Kurdish wikipedia. The names "Kurmanji Kurdish", "Sorani Kurdish", and "Gorani Kurdish" make it abundantly clear that there are multiple languages that refer to themselves as Kurdish. Which I think is an important thing to note since speakers tend to refer to themselves simply as "Kurdish" and as "Kurdish speakers". That, on top of the fact that speakers often refer to Kurdish varieties as dialects even though they are not mutually intelligible... makes me think there's some value in maintaining the distinctions between varieties of Kurdish in headers. سَمِیر | sameer (مشارکت‌هابا مرا گپ بزن) 22:25, 13 August 2023 (UTC)[reply]
What do I need to do, for something to happen. I am not an active contributor of this website (I am a frequent reader though), I have no idea. Guherto (talk) 22:19, 19 August 2023 (UTC)[reply]

Are surname variants altforms?[edit]

Huang currently lists various English surnames from in different languages as altforms. This seems odd to me since even closely-related surnames can't be swapped around at will: see the list of variants I added at Keble for example. Wong instead lists them as doublets, which seems preferable to me. —Al-Muqanna المقنع (talk) 17:14, 8 August 2023 (UTC)[reply]

Doublet seems like the better choice. Each transcription has legal status when it is part of a proper name. I suspect that someone named 黃 in China could be accused of fraud in the US if applying for loans under both Wang and Huang. DCDuring (talk) 18:57, 8 August 2023 (UTC)[reply]
I usually list variants as altforms, but define each altform with a {{surname}}, referring to the main form in the etymology. Compare Դևեջյան (Deweǰyan) and Տևեճյան (Tewečyan). Vahag (talk) 19:08, 8 August 2023 (UTC)[reply]
It wouldn't surprise me if our practice needs to be different for different languages. For Middle English (for example) finding multiple different spellings of the same person's surname or even given name, even within the same text, is normal. Now, many people would probably rightly object to their own or others' names being defined as mere "alternative forms of" other names. - -sche (discuss) 02:39, 9 August 2023 (UTC)[reply]
I prefer treating such surnames as doublets (and not altforms or synonyms), with separate etymology headers for each language (e.g. Chan). The situation at Wang feels a bit overwhelming to me (but Wong is fine). Meanwhile for some surnames like Choi it actually makes sense to have both doublets and altforms, as those are refering to different surnames (though we should separate the Cantonese and Korean surnames at Choi, since not all altforms apply to both). – Wpi (talk) 04:45, 9 August 2023 (UTC)[reply]
This might be a bit of a different case, but one thing to keep in mind is that a widespread phenomenon exists in North America whereby illiterate European (or other) immigrants had no standard way of spelling their last name, and so members of the same family who immigrated separately could be registered under different spellings by immigration officers, in censuses, etc. For that reason, I listed "Sheady" as an alternative spelling of Sheedy (rather than something like a related term), because those with the different spellings have common ancestors in many cases. The same is true of the many variant spellings of my mother's maiden name: same ancestor, but about 5 different spellings of the name depending on the branch of the family. Andrew Sheedy (talk) 22:03, 9 August 2023 (UTC)[reply]
A parallel case of what we call alternative forms is color and colour - there is rarely freedom as to which spelling is allowed. --RichardW57m (talk) 12:49, 11 August 2023 (UTC)[reply]
The derivational aspects of the name would seem to belong in Etymology, not in definition. DCDuring (talk) 17:22, 11 August 2023 (UTC)[reply]
This came up at Cheung recently. Also check out Xu and the alt forms there- how would these ideas apply there? I do not have any particular view on the issue of alt form versus doublet versus see also, but of course I am interesed in this discussion. --Geographyinitiative (talk) 17:39, 11 August 2023 (UTC)[reply]

FYI: Unicode August update[edit]

https://mailchi.mp/5cdb42752951/unicode-in-6246222Justin (koavf)TCM 00:39, 9 August 2023 (UTC)[reply]

Ainu numerals, Sakhalin vs. Hokkaido Ainu[edit]

I am attempting to improve the overall usability of wiktionary for the Ainu language. While looking through entries I noted that the user who originally created the pages for numerals used strange spellings. I am unsure if I should continue to move these over time (I have been throttled and cannot do that now) or to create new pages for the proper Hokkaido spellings of these words. I do not speak specifically Sakhalin Ainu, so I can not attest to the use or disuse of these spellings on that island.

More specifically the words are formatted in a way which includes an extra vowel not present in them 'arawan', 'seven' is meant to be 'arwan', 'ikasma' which is used in 11-19, 21-29 and so forth is also spelled incorrectly with 'ikasima' (sine ikasima wan), this is not standard practice, at least not anymore. I would like to see this corrected and am willing to spend the time doing that if I am given permission. Thank ye. ACertainNumberFive (talk) 10:36, 9 August 2023 (UTC)[reply]

@ACertainNumberFive: I don't know a lot about Ainu, but the basic policy is: Anything that is attested as being used by a speaker is allowed in our dictionary. If you're not sure about whether or not spellings like arawan are attestable, you should request their verification by adding {{rfv}} and starting a discussion at WT:RFVN.
As for lemmatisation - imho you should use whatever is currently the most used by the speech community. If that means prioritising spellings like arwan over arawan - go for it, and you can label arawan with {{alternative spelling of}} or {{alternative form of}} and add it under an "Alternative forms" header at the page for arwan. Thadh (talk) 10:58, 9 August 2023 (UTC)[reply]
Great thanks, I consulted with a publication and a person more familiar with sakhalin ainu, 'ikasima' is unattested and likely to be a misreading from the kana spelling of the same word. Those still need to be changed (likely moved?), however, 'arawan' is attested along-side Hokkaido 'arwan'. Anyhow, the {alternative form} markers will absolutely help. ACertainNumberFive (talk) 11:26, 9 August 2023 (UTC)[reply]
@ACertainNumberFive Isn't it also the case that "arawan" & "ikasima" are older pronunciations? I agree for Hokkaido Ainu putting the lemma at arwan & ikasma, but arawan & ikasima (and aruwan) are cited in different texts.
  • Batchelor, John (1905) “Arawa(n), Aruwa(n); Ikashima; Ikashma”, in An Ainu-English-Japanese dictionary (including a grammar of the Ainu language), pages 44; 165-166, based on Hokkaido Ainu, lists arawan & aruwan, along with ikashima as a lemma, with ikashma linking back to it.
  • Berthold Laufer (1917) “The Vigesimal and Decimal Systems in the Ainu Numerals: With Some Remarks on Ainu Phonology”, in Journal of the American Oriental Society[3], volume 37, pages 193-194 states:

    On Saghalin only a-ru-wan. Batchelor (Dictionary, p. 44) gives for Yezo [Hokkaido] both arawan and aruwan on equal footing; the Moshiogusa, according to Pfizmaier, only aruwan. Kuril Ainu (Radlifiski) arwa (from *aruwa) […] The numbers eleven to nineteen are formed on the scheme 1 + 10, 2 + 10, šine ikasima wan; on Saghalin simply šinä ikasima = 1+.

To me, it seems that some type of vowel condensing happened here, and at worst, the spellings should be marked as {{obsolete form of}} or something similar. Both terms in question here have been cited elsewhere. I was actually planning on making the number module for Ainu and was going through these sources but ended up getting distracted at some point. AG202 (talk) 13:17, 9 August 2023 (UTC)[reply]
I see you'd already moved entries before this discussion. Next time, please make sure to get consensus beforehand if you're not 100% sure. AG202 (talk) 13:19, 9 August 2023 (UTC)[reply]
@AG202: to complicate things further, I believe Japanese has devoicing of vowels in certain environments, which sounds like deletion to most non-natives. This may have influenced the Ainu orthographies.
@ACertainNumberFive: As for the pagemove throttle, there are two behaviors targeted. First, vandals tend to do as much damage as possible right away before an admin finds out and stops them. Slowing them down isn't meant to be an air-tight barrier, but reduces the damage they can do before they're stopped and takes some of the fun out of it. The other behavior is new contributors making wholesale changes before they have a chance to learn about Wiktionary policies and practices. That would seem to be the case here. It prompted you to ask here so we could explain things to you, which was the whole idea. Either way, the filter is only concerned with that initial burst of activity, so the throttle goes away by itself after a while. Chuck Entz (talk) 14:33, 9 August 2023 (UTC)[reply]
Yeah sorry about that, I still believe 100% that 'ikasima' is a mistake, being a false translit. of イカシマ. as for アラワン, while it does exist in sakhalin, this is not what the original author intended, as they did not include the sakhalin dialect numeral 5 'asne'. I think the consistency of this is just the Japanesic lens through which the language has been described.
About the spelling ikasima being older, this is not consistent with the etymology of the word, i-kasu-oma, it is just the phonotactics of the ainu language. ACertainNumberFive (talk) 20:57, 9 August 2023 (UTC)[reply]
Minor orthographic note -- I haven't seen the skinny single-byte kana (Unicode FF7C) used for Ainu. In general, the preference seems to be to use the small kana (Unicode 31F1). See also w:Ainu_language#Special_katakana_for_the_Ainu_language. Cheers! ‑‑ Eiríkr Útlendi │Tala við mig 21:21, 9 August 2023 (UTC)[reply]
Yup, IME had them right next to each other. ACertainNumberFive (talk) 21:29, 9 August 2023 (UTC)[reply]
Putting aside Laufer, Batchelor still lists both spellings with ikashma (ikasma) linking to ikashima (ikasima); if it were purely a mistake or false translit, then I doubt that they'd list both spellings. In general, I think it'd be helpful to send some sources disputing this, preferably with etymological information, because without them, folks are more likely to trust established literature like this. Even if it's not vowel condensing or even an older form, the forms are still cited, so we'd need information disputing their existence entirely. Even Vovin (1993) accepts Batchelor's entries in his Proto-Ainuic reconstructions at the very least as alternative forms.
  • "as for アラワン, while it does exist in sakhalin, this is not what the original author intended, as they did not include the sakhalin dialect numeral 5 'asne'"
    I'm a bit confused by this, which author are you referencing? Are you saying that Batchelor didn't intend to list arawa(n), or are you talking about Laufer?
As for the etymology of ikasma, this seems to not be in line with Vovin 1993. A Reconstruction of Proto-Ainu, now that I take a look at it (not saying that ikasima follows it either): He reconstructs it as Proto-Ainuic *ì=kásmà, with only two morphemes. (See more here: Appendix:Proto-Ainu_reconstructions) AG202 (talk) 22:12, 9 August 2023 (UTC)[reply]
'Author' referring to the original creator of the wiktionary entries for the ainu numerals. Etymological evidence points toward ikasima never existing, in my input at least I do not believe these spellings should be listed. ACertainNumberFive (talk) 22:50, 9 August 2023 (UTC)[reply]
Aha, yeah, etymological evidence specifically making ikashima solely a misspelling that cannot be included will need to be cited. Per WT:CFI, we include any and all spellings that are cited, unless they can be proven that they're uncommon misspellings per the misspellings section, though it's not still certain that this is a misspelling. This would usually happen through an RFD discussion, since the entries in question have already been cited. Currently, Batchelor's Hokkaido entries, with the fact that they've been continuously cited points towards at least an alternative form (who knows how it came about), fall under CFI until definitively proven otherwise. He also explicitly mentions that his Latin spellings are not based off of Kana. Etymological evidence does not automatically preclude random phonological changes that can happen (I don't know if this is what happened in this case). After all, we've seen weird things happen all over different languages' words contrary to their etymologies like with English apron, cockroach & island, or German Hängematte. AG202 (talk) 23:12, 9 August 2023 (UTC)[reply]
FWIW, the Foundation for Ainu Culture has a good bit of material available about the Ainu language, albeit provided in Japanese, over here on their website: https://www.ff-ainu.or.jp/web/potal_site/details/post.html
(They do have an English section, but it is not a full localization of the website, and the content there is limited.)
The language materials section includes the following glossary of terms in Karafuto (a.k.a. Sakhalin) Ainu: https://www.ff-ainu.or.jp/web/potal_site/files/karahuto_tango.pdf That reference lists アラワン arawan as the numeral 7, and アラワンペ arawanpe as the noun 7 (as in, seven things, a quantity of seven of something).
HTH! ‑‑ Eiríkr Útlendi │Tala við mig 17:59, 9 August 2023 (UTC)[reply]
As hinted at above, some of the attested forms may be corrupted spellings. Either small kana are transliterated as full kana, or were not used in the first place (esp. in older material, or per the comment above about devoiced vowels being ignored). For Ainu in Latin script, I think we need to distinguish (a) Latin script that transcribes Ainu directly -- i.e. vowel letters reflect vowels in the language -- and (b) Latin script that transliterates kana -- i.e. vowel letters that are due to kana orthography and do not reflect anything in the language. It may not always be clear which is which, but we should be careful to distinguish alt forms that are different forms of the word, like English ask and aks, and alt spellings that are due to transmission through kana. Imagine if the English word ask were transmitted through kana. We would then have an alt spelling *asuku that doesn't correspond directly to the language. We would want to be clear to the reader that English asuku is a spelling variant and not actually a different form of the word. kwami (talk) 18:33, 9 August 2023 (UTC)[reply]
I hear you. I also note that the Foundation for Ainu Culture appears to be using large and small kana consistently, so if they write アラワン, I am reasonably sure that they mean arawan and not the アㇻワン arwan they record for the Chitose dialect, for instance.
That said, for Karafuto / Sakhalin Ainu in specific, my understanding is that this went extinct in the mid 1990s, as per w:Ainu_languages#Sakhalin_Ainu, so as you point out, the possibility is there that the transcribers at that time might not have spelled things using the same standards as today. Along similar lines, I often wonder if the differences between Batchelor's 1903 kana spellings and modern spellings has to do with actual differences in pronunciations, or simply changes in orthography.
Barring other textual evidence, I defer to an organization that is located in Hokkaido and dedicated to Ainu culture and language. 😄 ‑‑ Eiríkr Útlendi │Tala við mig 19:48, 9 August 2023 (UTC)[reply]
Probably best.
For Sakhalin we might also have attestation in Cyrillic, which wouldn't suffer from the same problems. I would hope that if a vowel is reconstructed in proto-Ainu, it is not just a kana spelling. kwami (talk) 20:00, 9 August 2023 (UTC)[reply]
Re: reconstructions, agreed! Sifting through historical data to ensure that reconstructions are not based on orthographic phantoms or later borrowings is a challenge.
Re: Cyrillic, that would be lovely to have additional external corroboration. I've found the Portuguese references of the late 1500s, early 1600s to be invaluable in sorting out Middle Japanese for precisely this reason. Sadly, my Russian is close to non-existent, so while I can sound out Cyrillic, my ability to search in the Russosphere is currently basically nil. I must rely on others for that. ‑‑ Eiríkr Útlendi │Tala við mig 17:06, 10 August 2023 (UTC)[reply]
I was just about to say this. With Batchelor's 1903 dictionary, though, he explicitly mentions how kana doesn't properly show the pronunciation at that time and how the Latin spellings are the main lemma to follow. With that, he does show the difference between for example "ikashima" & "ikashma", so I trust that his dictionary transcribes coda consonants properly as well. AG202 (talk) 20:01, 9 August 2023 (UTC)[reply]
The glossaries of the three textbooks all use アㇻワン, I'm not sure where to find references to アラワン in titose dialect..., do you have a p.# reference? ACertainNumberFive (talk) 21:08, 9 August 2023 (UTC)[reply]
@ACertainNumberFive, I'm not aware of アラワン arawan in Chitose, only アㇻワン arwan, as listed in the glossary at the bottom of this PDF: https://www.ff-ainu.or.jp/web/potal_site/files/chitose_tyukyu.pdf ‑‑ Eiríkr Útlendi │Tala við mig 21:23, 9 August 2023 (UTC)[reply]
Okay, sorry, must've misread what you were saying. There isn't, アラワン is specific to sakhalin. ACertainNumberFive (talk) 21:27, 9 August 2023 (UTC)[reply]
Cheers, ya, no worries -- we've got アラワン arawan attested for Karafuto / Sakhalin, and アㇻワン arwan (no medial /a/) attested for Chitose, Saru, and Horobetsu.
Digging a bit more, apparently Ishikari has アルワㇺペ aruwampe as the noun form, suggesting a numeral form of アルワㇺ aruwam, although such a term is missing from this short glossary. ‑‑ Eiríkr Útlendi │Tala við mig 17:14, 10 August 2023 (UTC)[reply]
Sorry, just realized I hadn't been thorough enough with the Ishikari word list -- that also includes noun form アㇻワンペ arwanpe and numeral form アㇻワン arwan. Curious now about that アルワㇺペ aruwampe -- is this a real alternative pronunciation, or just an orthographic variant? Hmm... ‑‑ Eiríkr Útlendi │Tala við mig 17:17, 10 August 2023 (UTC)[reply]
This is what I used to confirm the existence of arawan in sakhalin dialect prior. I trust their sources and they aline fairly well with other scientific papers. ACertainNumberFive (talk) 21:02, 9 August 2023 (UTC)[reply]

Template editor request[edit]

Hello, is it okay if I could please request the template editor permission, so that I can edit modules that are protected, etc.? I've recently been doing a number of edits to modules, e.g. Module:bg-nominal, Module:bg-pronunciation, and previously Module:ja-pron, but sometimes I can't make changes without consulting with a more senior editor, even for small things that may not be worthy of their time. Of course, for any big changes, I'll continue to ask others whether a change is indeed necessary and desirable before editing anything that's widely-deployed. Thanks very much for any feedback, Kiril kovachev (talkcontribs) 22:24, 9 August 2023 (UTC)[reply]

I support this. Benwing2 (talk) 23:10, 9 August 2023 (UTC)[reply]
@Kiril kovachev: You don't need template editor rights to edit those modules. I've nominated you at WT:WL for autopatroller (Your edits seem fine and you know what you are doing). — Fenakhay (حيطي · مساهماتي) 13:30, 10 August 2023 (UTC)[reply]
@Fenakhay, @Benwing2, @Vininn126: thanks for your help! Kiril kovachev (talkcontribs) 14:15, 10 August 2023 (UTC)[reply]

Error with Word of the Day?[edit]

Wiktionary:Word of the day/2023/August 11 should be 11 August but is showing 13 August. Equinox 01:50, 11 August 2023 (UTC)[reply]

Fixed. Fay Freak (talk) 02:12, 11 August 2023 (UTC)[reply]
Thanks, @Fay Freak. — Sgconlaw (talk) 05:37, 11 August 2023 (UTC)[reply]

The entire supposed 'Tourangeau language'[edit]

I will begin by pointing out that there does not appear to be a single linguistic source in existence that speaks of a 'Tourangeau language', nor a 'langue tourangelle', nor a 'langue de Touraine'.

English Wikipedia does not have any such article, and understandably so. French Wikipedia does have one, where the title curiously labels Tourangeau an idiome rather than langue, although the article that follows uses the term langue in reference to Tourangeau no less than four times, and idiome precisely zero times.

The introduction states that:

'The vernacular [sic: not "language"] of the region of Touraine currently has no official spelling rules...' [Le parler de Touraine ne présente pas de règles orthographiques officielles à ce jour...]

One might ask, then, what spelling system is used for the (language? idiom? vernacular?) of Touraine in this article. A hint is found in the section labelled prononciation, where it is stated that:

'In the spelling system proposed here [emphasis mine], Tourangeau is, in most cases, read the way that French is...' [Dans l'orthographe proposée ici, dans la majorité des cas, le tourangeau se lit comme le français...]

It is certainly not the business of a Wikipedia article to propose orthographies, needless to say.

Understandably, the matter of where this spelling system came from was raised on the talk page, where, in the discussion that followed, the only source that could be found was a local 22-year-old student who had apparently dedicated an online blog (no longer viewable) to the matter and also produced a YouTube video in which, his amiable demeanour aside, the amateur nature of the entire endeavour quickly becomes apparent..

In the same talk page discussion, a user calling himself 'TourangeauNatif' stated, apparently off-hand and without further reflection, that:

'As far as I know, there has never been an attempt at standardization, given that Tourangeau is often not even considered a language...' [À ma connaissance, il n'y a jamais eu de tentative de normalisation étant donné que bien souvent, le tourangeau n'est même pas considéré comme une langue...]

In truth, he is understating the point; the only individuals who do consider Tourangeau a language appear to be a handful of local enthusiasts at best.

In any case, all of this goes a long way towards explaining why fantastical spellings like ⟨{{l|roa-tou|ouighlĕ}}⟩ or ⟨{{l|roa-tou|poeirĕ}}⟩ are to be found nowhere at all outside of the online Wiki-space.

The recommended course of action is to delete the nine entries in Category:Tourangeau_lemmas, delete the associated language code, and wash ourselves of this sorry mess. Nicodene (talk) 14:42, 11 August 2023 (UTC)[reply]

@Nicodene: One of the lemmas, {{m|roa-tou|iau}}, is much older than the other eight and was added by @-sche. It also (uniquely) provides two references. Could anything be salvageable from those? —Al-Muqanna المقنع (talk) 15:03, 11 August 2023 (UTC)[reply]
That entry is reasonable, yes. Meanwhile the French Wikipedia page spells the very same word as ⟨aiguĕ⟩ (I wish I were making that up), whilst admitting that it is 'pronounced [o] or [jo]'.
The Oxford guide to the Romance languages (page 293) places most of Touraine in the Angevin zone, and the eastern fringes in 'Francien', that is Standard French. This is my best attempt to superimpose the region on their map.
I would recommend, therefore, merging this Tourangeau lemma into Angevin iau, which already exists in that precise spelling. Nicodene (talk) 15:44, 11 August 2023 (UTC)[reply]
What Nicodene proposed. Fay Freak (talk) 15:51, 11 August 2023 (UTC)[reply]
@Nicodene Thanks for the sleuthing. Your proposed course of action sounds good to me. Benwing2 (talk) 01:04, 12 August 2023 (UTC)[reply]
Shall we, then?
@-sche, LanguageLovingNerd, Sigehelmus, Areitoyaya
(everyone who has added a Tourangeau lemma)
Nicodene (talk) 22:05, 12 August 2023 (UTC)[reply]
Uhh I support your proposal then, @Nicodene LanguageLovingNerd (talk) 22:43, 12 August 2023 (UTC)[reply]
@Nicodene: See Category talk:Westrobothnian language for how we handled a much more extreme equivalent recently. Chuck Entz (talk) 23:24, 12 August 2023 (UTC)[reply]
@Nicodene I think you should go ahead and do something similar here. There are only 9 entries so just create a page Wiktionary:Todo/Tourangeau cleanup and put the content of the pages there, and then delete the existing entries. No need for a bot to do this. Benwing2 (talk) 21:34, 14 August 2023 (UTC)[reply]
All the Tourangeau content that I could find is now concentrated on that page, with the exception of the entry iau, now merged into Angevin iau. Nicodene (talk) 23:53, 14 August 2023 (UTC)[reply]
@Nicodene Great, thanks! Benwing2 (talk) 00:11, 15 August 2023 (UTC)[reply]
I'm late to the party/ping but glad to see a good approach has been found, to subsume any attested Tourangeau content into a broader lect. (Remember to remove the code, or relocate it to be an etymology-only code.) As far as I recall, the lect was added as part of the request at Wiktionary:Beer parlour/2017/May#Language_codes_for_Bourbonnais_and_Poitevin (probably fr.Wikt also included the lect at the time?), where I did worry we were being too splittist with regard to European lects (even as we sometimes lump African languages). - -sche (discuss) 06:07, 15 August 2023 (UTC)[reply]
I've removed Tourangeau from WT:LOL and deleted all the categories. I didn't put it as an etymology-only code for now, noting that Cat:Terms derived from Tourangeau is empty, but this can be easily done if required. This, that and the other (talk) 00:31, 19 August 2023 (UTC)[reply]

"Further reading" vs. "References" for Belarusian[edit]

@Atitarev, @Benwing2 The existing dictionary headword entries for the Belarusian language are inconsistent. Some of them are using the "Further reading" section for the links to other online dictionaries (ex. абрус, порт, інфляцыя, хутар) and the others are using the "References" section for exactly the same purpose (ex. бачыць, аловак, снег). I'm also guilty of contributing to this inconsistency, because I'm using the existing Wiktionary articles as copy/paste templates when adding new articles, so I end up copying one style or another. I tried to search for the guidelines and the old beer parlour discussions, but only got even more confused. So is "Further Reading" or "References" actually preferred? And is this policy global or language specific? Ssvb (talk) 11:03, 15 August 2023 (UTC)[reply]

My personal preference is to put nothing but <references/> or {{reflist}} under the ===References=== header (across all languages) so that that section is reserved for listing references backing up specific claims marked with <ref>...</ref> in the body of the entry. More general stuff like links to other dictionaries then go under ===Further reading===. However, this is just my personal preference and I don't think it's codified anywhere that this is what we must or should do. —Mahāgaja · talk 11:51, 15 August 2023 (UTC)[reply]
@Mahagaja I prefer this, too, however I run into a problem when I generate collocations using {{R:pl:NKJP}}, as I cannot wrap collocations in ref. Vininn126 (talk) 11:56, 15 August 2023 (UTC)[reply]
I am somewhat looser and use "References" if I'm claiming that the information in the entry is sourced from a text, regardless of whether it's cited inline. I think in previous discussions some people have resisted restricting "References" to inline citations since the entire entry might be largely or wholly taken from a particular source and it might still be useful in that case. —Al-Muqanna المقنع (talk) 11:58, 15 August 2023 (UTC)[reply]
I'm the other way around: I only put a "Further reading" header if it hasn't contributed to the entry, but might be useful - e.g. information about a part of the definition, wikipedia links, etc. Since I work with LDLs, the references I give under the "reference" tab are often the only attestations of the term. Thadh (talk) 12:16, 15 August 2023 (UTC)[reply]
Everyone is guilty of this. I recommend that the only thing that appears below "References" is >references/<. If you actually add something new below the header, it's "Further reading". --Geographyinitiative (talk) 12:29, 15 August 2023 (UTC)[reply]
I am not "guilty" of it, I believe it is the right way to source entries. Thadh (talk) 18:17, 15 August 2023 (UTC)[reply]
FWIW I believe the correct thing to do is put only <references /> or {{reflist}} under ===References===. One advantage of this is it allows a bot to correct mistakes involving ==Further reading== and ==References==. In practice, I find that any other method (in particular, anything requiring a human judgment call to be made between ==Further reading== and ==References==) is unsustainable and just leads to a big mess. Possibly this is different in LDL's, as User:Thadh works with, but for HDL's, I think the approach that the majority is advocating is by far the most sensible one. Benwing2 (talk) 06:16, 16 August 2023 (UTC)[reply]
I do it this way even for LDLs. Some people will put a <ref>...</ref> tag next to the term in the headword line to indicate which dictionary it was taken from. Doing that would allow citing such dictionaries as references rather than Further reading while still using the ===References=== section just to list footnotes. —Mahāgaja · talk 06:24, 16 August 2023 (UTC)[reply]
So what about my example using the reference for collocations? Should I just not, since I can't add references to collocations? Vininn126 (talk) 09:11, 16 August 2023 (UTC)[reply]
IMO the collocations are a good use case for References without reflist. "Further reading" and "References" mean, or ought to mean, different things: "Further reading" is directing the reader to more info, "References" is stating the actual sources used by the entry. —Al-Muqanna المقنع (talk) 10:55, 16 August 2023 (UTC)[reply]
This is essentially what Thadh is saying - that if a given work was what was used to make an entry (very frequently done for LDL's), then "references" should be used. Vininn126 (talk) 10:56, 16 August 2023 (UTC)[reply]
Moreover, I believe in making an entry 'pass' RFV during creation, not letting other people figure out if the term is attested or not afterward. When using dictionaries, which are often the only mentions of the quoted word, putting them under a Further Reading header does not automatically tell the reader where you got the information from, nor the editor that this entry is already verifiable. Thadh (talk) 11:50, 16 August 2023 (UTC)[reply]
I think this is something that makes more sense for LDL's as opposed to WDL's. I'm not going to add three quotes for every Polish definition with these most used words that I'm focusing on - by definition they have some of the most quotes and finding them would be a waste of time, however, when doing it for a different language, one with fewer resources, or even rarer Polish words, I would follow a similar practice. I think there should be some middle ground. Vininn126 (talk) 12:00, 16 August 2023 (UTC)[reply]
@Thadh: I think what needs to be done is that the |ref= parameter of {{co}} and {{coi}} (and {{ux}} and {{uxi}} for that matter) needs to be improved so that it can contain a template rather than a bare URL (which we really shouldn't be using anyway). —Mahāgaja · talk 14:32, 16 August 2023 (UTC)[reply]
And what about a giant list of collocations in {{co-top}}? Vininn126 (talk) 14:34, 16 August 2023 (UTC)[reply]
@Thadh: You can add refs to collocations. Just add |ref=<ref>{{R:whatever}}</ref> to {{co}} or {{coi}}. —Mahāgaja · talk 14:39, 16 August 2023 (UTC)[reply]
To reiterate, collocations are the least of my worries - distinguishing between a link to the dictionary definition of kalanikka and a description of Izhorian fishermen in the 18th century is a much bigger issue. Thadh (talk) 22:03, 16 August 2023 (UTC)[reply]
@Vininn126: If they're all from the same source, you can add the reference to the description in |1= of {{co-top}}. If they're from different sources you can add the reference after the individual term. —Mahāgaja · talk 14:41, 16 August 2023 (UTC)[reply]
This seems awkward when you have multiple inline collocations, like up to 3 per definition or something. Should we add ref to each, even if it's the same source? Vininn126 (talk) 14:54, 16 August 2023 (UTC)[reply]
I don't know, list them all in the description of {{co-top}}? That won't tell you which collocation is from which source, of course, but neither does lumping all the sources together at the bottom of the page under ===References===. —Mahāgaja · talk 15:57, 16 August 2023 (UTC)[reply]
@Mahagaja Putting collocations in co-top just because there's more than one is a crazy idea. Vininn126 (talk) 16:11, 16 August 2023 (UTC)[reply]
I'm sure people can use their common sense in that situation. This is not an insurmountable hurdle. —Mahāgaja · talk 16:40, 16 August 2023 (UTC)[reply]
My solution is to just add the reference as a bare template in the references section. Vininn126 (talk) 16:44, 16 August 2023 (UTC)[reply]
@Mahagaja: For exactly repeated references, one can use <ref name=ID>..</ref> for one and <ref name=ID/> for the others. It gets trickier if just the page numbers differ, but there is a mechanism in some Wikimedia projects for listing page numbers next to references, e.g. w:Template:rp. --RichardW57 (talk) 20:14, 16 August 2023 (UTC)[reply]
This would be a bad idea for collocations (box aside), because the page would be spammed with [N]. Vininn126 (talk) 20:17, 16 August 2023 (UTC)[reply]
@Vininn126: Sod your aesthetics. Can't most readers filter out references? Even the likes of [8]:143 are not too bad. --RichardW57m (talk) 11:39, 18 August 2023 (UTC)[reply]
@RichardW57 Disrespect aside, I don't think most people are gonna agree with you; a big part of the thread is aesthetics. Vininn126 (talk) 11:45, 18 August 2023 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── If this is a topic people feel strongly about then maybe there needs to be a vote on it? WT:EL does not say that references need to be inline, it basically follows my and Thadh's understanding. —Al-Muqanna المقنع (talk) 16:46, 16 August 2023 (UTC)[reply]

Personally I'm on board with that understanding more or less, except in the case I mentioned earlier with very common words in WDL's. If people feel it should be inline, I'd prefer to be a vote, but if people can live with having non-inline references, then perhaps we can avoid it. Vininn126 (talk) 16:56, 16 August 2023 (UTC)[reply]
Honestly I don't like a situation where the user has to decide whether to put things into Further Reading or References based on nebulous criteria; regardless of what WT:EL says. This is not practically maintainable as has been shown ample times. Benwing2 (talk) 19:55, 16 August 2023 (UTC)[reply]
Fine, but how would I reference collocations as mentioned above? Vininn126 (talk) 19:57, 16 August 2023 (UTC)[reply]
@Vininn126 See User:Mahagaja's comment. Given the way |ref= works now in {{co}}, you can use |ref=<ref>...</ref>. Probably it should be changed to not require the surrounding ref tags, similar to how the |ref= param in {{IPA}} works. Benwing2 (talk) 20:22, 16 August 2023 (UTC)[reply]
Hmm, it looks like you don't like that but I don't quite understand why, can you clarify? Benwing2 (talk) 20:25, 16 August 2023 (UTC)[reply]
@Benwing2 I have no problem using a ref= for an individual inline collocation or even for a box, but the issue is spamming it for multiple inline ones. Vininn126 (talk) 20:29, 16 August 2023 (UTC)[reply]
@Benwing2 This ignores my concerns where a given definition has a few inline collocations - this does not warrant a box and it seems ridiculous to use name= for all of them. Vininn126 (talk) 20:28, 16 August 2023 (UTC)[reply]
@Vininn126 Still a bit confused; can you point to a specific page that has several inline collocations where this issue might come up? Can you not just put a reference by the first one? Benwing2 (talk) 20:53, 16 August 2023 (UTC)[reply]
@Benwing2 adoracja Vininn126 (talk) 20:57, 16 August 2023 (UTC)[reply]
@Vininn126 Are you referring specifically to this link? Pęzik, Piotr; Przepiórkowski, A.; Bańko, M.; Górski, R.; Lewandowska-Tomaszczyk, B (2012) [National Polish Language Corpus, PELCRA search engine][1], Wydawnictwo PWN For one thing, mixing footnotes and added lines like this looks terrible, but more to the point if you need to put a reference by the collocations why can't you just put it by the first one? Benwing2 (talk) 21:09, 16 August 2023 (UTC)[reply]
@Benwing2 I suppose I could, it seems a bit odd to me. Vininn126 (talk) 21:11, 16 August 2023 (UTC)[reply]
@Vininn126 I guess, more to the point, why do you need this reference at all? This term isn't rare or anything, and so collocations will be a dime a dozen and it seems to me you don't need to cite where you got them from. Benwing2 (talk) 21:17, 16 August 2023 (UTC)[reply]
@Benwing2 I figured it's better to be clear about it and also reference the fact I am using a special online tool built by others and credit their work. The template is still useful for other things, such as finding the first attestation of certain words, but if people feel there's no need for it, I suppose it could be removed. Just trying to be thorough and honest. Vininn126 (talk) 21:22, 16 August 2023 (UTC)[reply]
@Vininn126 In that case maybe it should just go in ==Further reading==. Benwing2 (talk) 21:28, 16 August 2023 (UTC)[reply]
@Benwing2 I think in that case it might be better removed (perhaps by a bot?), as you said. It might be better to save refs for old collocations or something. Vininn126 (talk) 21:31, 16 August 2023 (UTC)[reply]
@Vininn126 Makes sense. Benwing2 (talk) 21:39, 16 August 2023 (UTC)[reply]
@Benwing2 If we do that (I suppose we should), we should also convert any bare links in Old Polish entry from ===References=== to ===Further reading== while leaving any wrapped refs. Vininn126 (talk) 22:09, 16 August 2023 (UTC)[reply]
@Benwing2, Vininn126: What's the specific maintainability issue? I'm a bit confused about this discussion, the criterion is not nebulous at all—either material has been used from a source or it hasn't. —Al-Muqanna المقنع (talk) 09:02, 17 August 2023 (UTC)[reply]
@Al-Muqanna The collocations are pulled from the Corpus link (which has a collocator tool) in the references section and there's no easy way to inline reference it with multiple inline collocations. Vininn126 (talk) 09:07, 17 August 2023 (UTC)[reply]
@Vininn126: I know, I'm asking what the maintainability issue is that Benwing mentioned with having non-inline references. I'll say again if we want to set the existing policy aside then it should be put to a vote, otherwise it's odd for people to be pressured into following a different policy IMO. —Al-Muqanna المقنع (talk) 09:10, 17 August 2023 (UTC)[reply]
I'll give a specific example of where I would use non-inline references as well. For Medieval Latin entries I add or expand, I'm generally synthesising material from at least two, often four different dictionaries (Medieval Latin dictionaries tend to have a regional focus). In that case I think it's better to simply list those dictionaries under "References", rather than adding loads of inline references after each gloss item. Shunting them to "Further reading" is possible, but misleading and arguably even plagiaristic. —Al-Muqanna المقنع (talk) 09:17, 17 August 2023 (UTC)[reply]
This is essentially what I do for Old Polish. I think this is a more common practice for LDL's, but I suppose it doesn't have to be limited to them. Vininn126 (talk) 09:19, 17 August 2023 (UTC)[reply]
@Al-Muqanna The maintainability issue is that the criteria for distinguishing Further reading from References is not something that most people seem able to follow, so in practice we end up with a mishmash of things randomly stuck in one or the other. Perhaps you and Thadh can be coherent about this, but I can speak from experience that it's not something generally enforceable. For Italian, for example, it was a total mishmash before I did a bot run and moved everything that wasn't <references> or {{reflist}} to Further reading. Maybe some other languages are cleaner, but many aren't, and if I follow your and Thadh's ideas, I can't do anything about the mishmash because maybe someone really intended some specific reference to be in References. There's no way for a bot to know (or even a human; it requires reading someone else's mind to divine their intention). Benwing2 (talk) 19:06, 17 August 2023 (UTC)[reply]
@Benwing2: I guess my question is what specifically are the bad consequences resulting from the line being blurry in the first place. I understand the current application is incoherent. IMO it is better to have false References than false Further readings, firstly because "Reference" vs. "Further reading" have widely accepted meanings in academic and technical contexts, and wrongly categorising actual reference material as further reading therefore comes off as plagiaristic; secondly for ease of use for lay readers because people will naturally expect further reading to contain actual further material, and references to be the same material. As I said, though, I think it's better to have a formal vote if the existing policy seems problematic—given the pushback here and in previous discussions, and that I received thanks for my comment above, I'm not sure there is consensus either way at the moment. —Al-Muqanna المقنع (talk) 19:47, 17 August 2023 (UTC)[reply]
@Al-Muqanna The line is so blurry that IMO we may as well not have a distinction between Further reading and References if most people aren't able to keep them straight. I also don't understand your comment about plagiarism. Most entries don't contain any References or Further reading and that doesn't automatically make them plagiaristic, so I don't see how adding an additional citation can possibly make things worse. Benwing2 (talk) 19:51, 17 August 2023 (UTC)[reply]
As for a vote, I don't know what the vote would contain and it would need careful thinking and crafting, otherwise it's likely to just fail not because the idea itself is rejected but just because of the wording (I've seen this time and time again). Benwing2 (talk) 19:52, 17 August 2023 (UTC)[reply]
@Benwing2: I don't actually disagree with merging them under an appropriately titled header. That would probably resolve most people's concerns. For what it's worth, though, many entries are not specifically put together by synthesising dictionary material so that's not an issue by itself. We do have a general understanding that plagiarism is bad, e.g. previous discussions have agreed that quotations should not be imported in bulk from the OED etc. However, for many languages (especially LDLs as noted above) the entries are mainly built off the work of other specialist lexicographers, and in that case they should generally be credited as references. Facts are of course not copyrightable, so this is mostly a matter of crediting scholarly work properly, which I do feel somewhat strongly about, rather than a legal thing (which I largely don't personally care about). I think, as Thadh has suggested, "References" also points people towards citations more efficiently where needed. —Al-Muqanna المقنع (talk) 19:57, 17 August 2023 (UTC)[reply]
The point about LDL's is rather on point - for things like Old Polish the corpus is so small that trying to build anything off of anything but one other dictionary is frankly impossible. There are only two dictionaries for Old Polish, and the second one is in large part a copy digitized version of the first, quotes and all. On occasion one can find a word in their corpus not listed as an entry, but the number of such words are vanishingly small, making anything other than bulk import rather unthinkable. Vininn126 (talk) 20:04, 17 August 2023 (UTC)[reply]
You could also have e.g. something like "References and further reading", and a specific "Footnotes" header for inline citations. But as you say it needs some further consideration. —Al-Muqanna المقنع (talk) 19:59, 17 August 2023 (UTC)[reply]
@Al-Muqanna I would be fine with merging the two and creating a separate Footnotes header. I think that would also deal with the aesthetic issues: Mixing footnotes and inline citations (is that the right term?) looks terrible. As for plagiarism, I completely agree we should give credit where it's due; I just don't see why it matters which header gets the citation for this purpose. Benwing2 (talk) 20:06, 17 August 2023 (UTC)[reply]
@Benwing2: You just prettify the CSS to separate footnotes and inline citations and unspecific references visually, then all can share the same logical section.
My distinction between Further reading and References so far has been driven by the consideration whether the linked resource supports or relates to the page content or rather goes beside or beyond the present claims, which is indeed blurry.
Apparently the priority of a fix is chill since everyone finds a peculiar personal sense in the section names. Fay Freak (talk) 21:08, 17 August 2023 (UTC)[reply]
@Benwing2: For the naming: it matters because they mean different things. "Further reading" is supplementary material. "References" is source material. I would personally be pretty insulted if someone lifted the bulk of a paper I wrote and then listed it under a "further reading" pointer instead of citing it as a reference, and I imagine many academics would feel the same way. If you don't mind yourself that's up to you, of course, but it's not a rare thought I think. I agree with Fay Freak about chill though. —Al-Muqanna المقنع (talk) 22:06, 17 August 2023 (UTC)[reply]
I think this is pretty easy to solve: If the term in question has been mentioned and/or implied (e.g. by using an inflection) in the reference work, it's a reference, if not, it's a further reading. Done - I see no difficulty in such a distinction and it saves a lot of trouble to readers and editors alike who want to figure out where we got our information from. Thadh (talk) 22:16, 17 August 2023 (UTC)[reply]
If the term in question has not been mentioned and/or implied in the reference work, why would you put the reference work under Further reading at all? Your proposal amounts to eliminating the Further reading section completely. —Mahāgaja · talk 07:13, 18 August 2023 (UTC)[reply]
Further reading may be used for providing information on the culture, background, and other similar things. For instance, a wikispecies link to Ursus arctos under the entry for "brown bear" would be a matter for further reading.
I would also say that wiki entries may be automatically put in further reading regardless of whether the term in question is mentioned or not, because we cannot use them as references. Thadh (talk) 11:50, 18 August 2023 (UTC)[reply]
Currently, however, the links to other Wikimedia projects at brown bear are under ===References===. I'm struggling to think of anything other than links to other Wikimedia projects that you would put under ===Further reading===, since information on culture, background etc. that doesn't even mention the term in question would be pretty useless in a dictionary entry. What if we renamed the section that contains <references/> ===Notes=== and renamed the section that contains a list of reference works ===References===? —Mahāgaja · talk 12:55, 18 August 2023 (UTC)[reply]
This is an interesting idea. I wonder if "notes" is the best nomenclature, but I think something in this general direction could be a good compromise. Vininn126 (talk) 12:58, 18 August 2023 (UTC)[reply]
That would work for me. Thadh (talk) 14:26, 18 August 2023 (UTC)[reply]
@Mahagaja This would work for me too, and AFAIK it's exactly what Wikipedia does as well. Benwing2 (talk) 19:19, 18 August 2023 (UTC)[reply]
@Benwing2 Perhaps that's yet another argument for using such a system; it creates uniformity across projects as well, allowing those coming from other projects to more easily navigate our pages. Vininn126 (talk) 19:20, 18 August 2023 (UTC)[reply]
Wasn't there a previous discussion that led to the adoption of the "Further reading" section? We should refer to that discussion and see what the rationale was and whether it still makes sense before introducing any changes. Like @Thadh (I believe) I use "References" for sources that are tagged with <ref>, and "Further reading" for links to other Wikimedia projects and sources not tagged with <ref> (for example, a journal article about an entry that is not actually cited in the entry—sometimes I find such sources in entries, and it doesn't seem appropriate to delete them just because they haven't been cited). I have also used "Notes" in addition to "References" for sources referred to in image captions, since such sources don't relate directly to the entry. — Sgconlaw (talk) 21:12, 18 August 2023 (UTC)[reply]
(this was exactly the opposite of how I use the headers btw) Thadh (talk) 23:43, 18 August 2023 (UTC)[reply]
@Sgconlaw: There were two votes: first Wiktionary:Votes/2016-12/"References" and "External sources" and then Wiktionary:Votes/2017-03/"External sources", "External links", "Further information" or "Further reading". In the first, it was agreed that the References section is for references verifying specific claims made, especially in etymologies and usage notes, but the point that ===References=== sections had to contain only <references/> did not pass; ===External sources=== sections were for things like external links to other dictionaries. Then in the second vote, ===External sources=== was renamed ===Further reading===. From the examples, it's pretty clear that ===Further reading=== is for links to other dictionaries' entries, at least for larger languages, where we didn't necessarily use those other dictionaries in writing our entry. It's less clear which section should be used for entries in small languages, where the external dictionary might be the only source consulted for writing the entry. So the vote doesn't completely come down on either my side or Thadh's side; there's room for interpretation. It is clear that my custom of putting nothing but <references/> under ===References=== is not a requirement (but I never said it was, I only said it's how I write entries, and I'm OK with the fact that other people write their entries differently). —Mahāgaja · talk 08:15, 19 August 2023 (UTC)[reply]
@Mahagaja I think this is an excellent idea and would resolve some of my reservations about the current set up. Helrasincke (talk) 09:36, 19 August 2023 (UTC)[reply]
@Mahagaja I'm also warm to this idea. Megathonic (talk) 19:14, 20 August 2023 (UTC)[reply]
Shall we bring this to a vote then? It seems to have general support. Vininn126 (talk) 16:30, 21 August 2023 (UTC)[reply]
@Vininn126 Go ahead and create it. Benwing2 (talk) 18:39, 21 August 2023 (UTC)[reply]
Wiktionary:Votes/2023-08/Renaming the sections References and Further reading Vininn126 (talk) 19:26, 21 August 2023 (UTC)[reply]
I only use the reference section for inline citations; everything else gets thrown under the further reading section. This is how virtually every entry in German is formatted. It would be an ugly mess to put an inline citation next to every single defintion taken out of a dictionary or corpus source --- and unworkabe, since sometimes it is used as the source, but often not: just as supplementary material in which definitions/usage can be verified. If these should be listed under references (as general reference material), even when no inline citations are used, I'm not opposed to that and it would be easy enough for a bot to move them, but in that case I'd just as soon eliminate the further reading section entirely. Megathonic (talk) 19:07, 20 August 2023 (UTC)[reply]
Megathonic You're right and I've done that a lot (because the German articles have been some of my main model entries whilst learning the ropes) but I think that it is problematic; I agree with Al-Muqanna's point about the placing sources used in entry creation under ===Further reading=== as being potentially plagiaristic and thus something we really should address (though I note Benwing2 that in the vast majority of cases this is actually an improvement on the status quo), but on the other hand there are some weird formatting inconsistencies which occur when using both reference templates and inline citations under the ===References=== section. In the interests of resolving that, I would propose making a clear (or at least workable) distinction on the one hand between footnotes (similar to as used on WP for inline citations, of the format: author last, year, page; I guess other kinds of notes such as usage notes could potentially go here, as is done on WP, though I don't have strong opinion) and references (which is where any citation templates appears, i.e. the full name & specifics of any work actually being cited; I note this as being primarily relevant for etymologies and LDLs) as well as on the other hand between references (works used as an aid in article creation) and general reading (general works, mainly encyclopedias & WP; probably not dictionaries, since IMO, if it's used in article creation it ought to go in references, and if it hasn't, well we don't really have any business citing it. The article space is not repository or a bibliography; though we could conceivably use the appendices for this in the small number of cases where this might be necessary or useful - I have been doing something similar specifically for lexical resources at WT:FREQ). Helrasincke (talk) 16:03, 21 August 2023 (UTC)[reply]

Request to use AutoWikiBrowser[edit]

Hi. I would like to request access to AWB in Wiktionary, mainly to work on Fala entries. For the moment, I intend to add references and page numbers easily. I've made great use of AWB in a non-Wikimedia wiki. Thanks, cheers. sware🗣🏲 16:38, 15 August 2023 (UTC)[reply]

@Swaare added. Please remember to turn off all the Wikipedia-specific fixes etc. This, that and the other (talk) 01:00, 19 August 2023 (UTC)[reply]

Definition dates and labels added by Speednat[edit]

I've been looking through Category:English terms with obsolete senses and just on the first page I've now come across about half a dozen terms with spurious definition dates and sometimes usage labels added 10 years or so ago by User:Speednat. These follow the same, somewhat janky format of having overlong defdates, each with an inline reference to the Shorter OED. I don't have access to the Shorter OED, but the full OED tends to show these defdates to be spurious and I suspect that Speednat inferred them from whatever citations happened to be in the Shorter OED. The last example I found was abactor (this revision before my edit), marked "obsolete" with the long defdate "Attested from the mid 17th century until the early 19th century" and an inline Shorter OED citation. Meanwhile, in the full OED it is not marked obsolete and the citations go up to 1996. Not sure if this might need a concerted effort to clean up. —Al-Muqanna المقنع (talk) 10:43, 16 August 2023 (UTC)[reply]

Blah. It's always the people who think they know what they're doing and don't that do the most damage. Benwing2 (talk) 20:02, 16 August 2023 (UTC)[reply]
A lesson I learn afresh every time I clean out Category:IPA pronunciations with invalid IPA characters. —Mahāgaja · talk 13:02, 18 August 2023 (UTC)[reply]

Unencoded scripts and characters[edit]

Do we have policies on these? I thought I'd seen a claim that we would not record words using unencoded characters, but I also remember seeing a statement for Old Khmer that the so-far unencoded Pallava script would be represented by the Khmer script.

As the belief is that locally the Pallava script morphed into the Khmer script, that seems a sustainable point of view. There's also reported to be a font that that renders the Khmer script as Pallava script and is suitable for rendering inscriptions, but I've not been able to test that claim. (I have some doubts, as the current Khmer encoding has problems with Southern Thai and I think some fairly recent Khmer.) --RichardW57m (talk) 16:01, 16 August 2023 (UTC)[reply]

Around 2005, there was a project to improve and clean up basic words in English. Since then, our quality standards have gotten much higher and some people have pointed out that many of our entries are disorganized or unclear compared to competing dictionaries.

Would anyone be interested in reviving this? We would first generate a list of "basic" or high-priority entries (equivalent to w:Wikipedia:Vital articles). We would then develop objective quality standards for evaluating entries, and entries which meet these standards will be given some kind of tag (equivalent to w:Wikipedia:Good articles). Note that this is a huge project so I don't want to start until we can gather a dozen or so active editors. Ioaxxere (talk) 19:11, 18 August 2023 (UTC)[reply]

I support, however I would be unable to help directly, as I am currently involved in a self-assigned project of the same kind for West Slavic. Vininn126 (talk) 19:13, 18 August 2023 (UTC)[reply]
I fully support a project to improve important English entries. Are we going to get some information on what non-contributing users look for here or at other dictionaries? We already know that our most popular entries are sex-, drug-, internet-, and computer-related, derogatory terms, and items in the news or in current entertainment. What should we do with them? If we can't or won't characterize our priority entries directly, by frequency of use or otherwise, can we do so indirectly, by learning about the needs of important target users? Are we aiming at English language learners (What levels?)? Are we aiming at potential contributors? DCDuring (talk) 19:40, 18 August 2023 (UTC)[reply]
(I also support focusing on entries that get more views.) Vininn126 (talk) 19:42, 18 August 2023 (UTC)[reply]
Support. Unlike a couple other people here, I would personally prefer to start with the most frequent English words, rather than most-viewed entries, for the following reason: focusing on words related to sex and drugs, especially profanities, makes us look less professional, not more, to make profanities look really good when more basic words are neglected. For instance, there are only 4 translation tables at car for 14 senses. Senses 19 through 32 of the verb run are not Wikified, making it difficult for anyone who wants to look up the words in the definitions. We're missing basic synonym information for any noun senses of call. These aren't huge issues, but they're just the ones I found with a few seconds of searching.
I really like the idea of having ways to mark quality articles, like Wikipedia does. I suggest starting by ensuring that all entries in the English Swadesh list are top-notch, then moving on to either our list of 850 or 1000 basic words as something concrete to work with. It shouldn't be too hard to develop standards, since our entries are a bit more straightforward than Wikipedia articles. Andrew Sheedy (talk) 20:55, 18 August 2023 (UTC)[reply]
This is all assuming that sex, drugs, and profanities are the ONLY top viewed entries. This is likely FAR from the truth. Vininn126 (talk) 20:59, 18 August 2023 (UTC)[reply]
Certainly, but I still don't think that should be our main criterion, as it's not what will give us credibility. Another example of basic entries needing attention is that there were only three verb senses at number two months ago. It was trivially easy to add five more, many of which are commonly used. We tend to have an issue with improving coverage of rare terms, while failing to ensure we have common, but boring senses. I think it will be easy to find editors who will keep on top of the most-visited pages. What's lacking is a systematic attempt to improve the quality of our most basic entries. Andrew Sheedy (talk) 21:03, 18 August 2023 (UTC)[reply]
I suppose there's a certain logic to that - at least for English. Typically with non-English l2's the most viewed pages mostly corresponds with the most core vocabulary. Vininn126 (talk) 21:05, 18 August 2023 (UTC)[reply]
I do think most frequently visited pages would be an obvious next step, though, once we've made sure our basic words are up to par. Andrew Sheedy (talk) 21:10, 18 August 2023 (UTC)[reply]
I would also suggest taking up something like an SAT vocabulary list of words that aren't exactly the core vocabulary/most common words, but still are of interest to not only learners, but also to native English speakers who could be interested in the specifics regarding them. lattermint (talk) 21:18, 18 August 2023 (UTC)[reply]
I agree using a standard vocab list to expand on the work would be a good idea. —Al-Muqanna المقنع (talk) 21:21, 18 August 2023 (UTC)[reply]
I think that our coverage of "sex-, drug-, internet-, and computer-related, derogatory terms, and items in the news or in current entertainment" is probably already 'better' than the coverage by other dictionaries. I know that some fairly basic words are not well-covered in the sense that we are relying on the definitions that Webster 1913 had and have not brought them up to date or added newly evolved senses. Are there not other categories and indicators of problem entries? Are error-prone words (not just homophones, false friends, and English spelling horrors) worth special attention? Another source of 'basic' words would be the defining vocabulary of any dictionary that make them available. (Longman's DCE is one.) Routledge has a frequency dictionary for contemporary American English (and other languages). The American English one has 5,000 words, from a 385 million word corpus, and 20-30 collocates for each. DCDuring (talk) 22:22, 18 August 2023 (UTC)[reply]
@Andrew Sheedy Starting with the much more manageable Swadesh List seems like a great idea. I'm going to go through them and make some comments. Ioaxxere (talk) 22:25, 18 August 2023 (UTC)[reply]
@Andrew Sheedy I'm also in favor of picking a set of words that have a high frequency in the language. I just wanted to throw in the idea of using a frequency-ordered wordlist from any modern, representative corpus of English. I would probably steer away from corpora based mostly or entirely on text scraped from the Internet, because those end up encoding certain biases of Internet culture. I'd also give some thought to the balance between different varieties of English. Chernorizets (talk) 05:28, 20 August 2023 (UTC)[reply]
This is more or less what I have been doing for Polish. Definitely worthwhile, in my opinion. Vininn126 (talk) 09:32, 20 August 2023 (UTC)[reply]
What high-frequency word lists, beyond the Swadesh list, preferably reflecting contemporary usage and usage in informal speech, are available for our use? Is the 2023 General Service List good for our purposes?
Is there any such list that looks at frequency of each word as a PoS, eg, will, have, see as noun vs. verb?
Is there any such list that orders definitions by frequency> DCDuring (talk) 14:43, 20 August 2023 (UTC)[reply]
Swadesh lists don't look well suited for our purposes. They seem like lists for linguistic anthropologists. The English is very light on function words. Fewer than half of the first 50 words are function words. On the New General Service List fewer than 10 of the first 50 are NOT function words. DCDuring (talk) 02:08, 21 August 2023 (UTC)[reply]
Yes, in frequency lists you get a lot of function words, I had to go through a ton in the beginning. Vininn126 (talk) 09:06, 21 August 2023 (UTC)[reply]
You are not saying we should skip them, are you? DCDuring (talk) 16:56, 21 August 2023 (UTC)[reply]
Not at all! They should honestly maybe even get more attention - most of the time it's a closed group, but they have a lot more uses. I think a frequency list would be the best choice, similar to what I have been using for Polish (see {{pl-freq 1990}}). Vininn126 (talk) 17:00, 21 August 2023 (UTC)[reply]
Often need to resort to non-gloss definitions for them in order to avoid definitions that can only be understood by philosophers of language. DCDuring (talk) 19:06, 21 August 2023 (UTC)[reply]
I do intend to work on definition coverage, definition wording, and sense/subsense structure for English words that are both on the GSL list and on any lists of words most commonly viewed by users, either at Wiktionary or at other dictionaries, print or online (probably learners' dictionaries). [Can anyone link me to articles or databases on what entries users look up most and what their learning process is?] Please let me know if I am doing it wrong. I dipped by toe in the water on [[the]] and [[be]]. DCDuring (talk) 03:14, 4 November 2023 (UTC)[reply]

Problems with Th*who[edit]

I'm posting this in regard to problems I'm seeing with Th*who regarding edit requests involving language data modules.

I pinged Th*who to complete uncompleted requests for Kapampangan and Kinaray-a that has been left unanswered and I got these replies falsely assuming me of using them as a PA and telling me the module talk pages are not the right venue for edit request. They asked me to post then to the grease pit, but I refused. Then they resorted to claiming I ignored answered threads; I asked them to assume good faith but they have been argumentative. I already asked uninvolved admits to complete but he responded with accusations I don't sign my posts (I edit from mobile and the threads gadget automatically adds a Signature to each post even if I don't add 4 tildes)

Th*who has already been partially blocked for similar behavior at Koavf's talk page, but the latest incident leaning toward assuming bad faith and abuse of administrator privileges may warrant a desysop and a longer (and possibly indefinite) block. TagaSanPedroAko (talk) 22:15, 18 August 2023 (UTC)[reply]

It would be helpful if you linked to talk pages, threads, or relevant diffs. Thanks. —Justin (koavf)TCM 22:19, 18 August 2023 (UTC)[reply]
@Koavf See module talk:languages/data/3/k#Kinaray-a and module talk:languages/data/3/p#Kapampangan. This has been just happening. Th*who is directing me to the wrong channel for edit request, and has been argumentative. TagaSanPedroAko (talk) 22:23, 18 August 2023 (UTC)[reply]
@Theknightwho as they deserve to know, instead of passive-aggressive censoring. Very immature approach to censor the name. Vininn126 (talk) 22:26, 18 August 2023 (UTC)[reply]
I specifically told @TagaSanPedroAko that they should ask for these requests at the WT:Grease pit instead of pinging random admins to do things for them. Instead, they repeatedly argued with me, and have decided to make a complaint. It would have been far easier to simply start the Grease Pit thread. Theknightwho (talk) 22:29, 18 August 2023 (UTC)[reply]
For me, it could be my fault for causing the issues for pinging them instead of another one. I didn't knew of their block.
@Vininn126 I intentionally just did that to avoid calling attention. Adding the name without a link to their user page can have the same effect. By the way, I would replace those instances with the complete one, but with no link. TagaSanPedroAko (talk) 22:28, 18 August 2023 (UTC)[reply]
I see no reason why these should have to be posted to the Grease Pit as such, and looking at both of these talk pages, there are several edit requests. —Justin (koavf)TCM 22:29, 18 August 2023 (UTC)[reply]
@Koavf The reason is obvious: I told them not to bother me or other admins with pings. They are free to ask on that talk page, but it is obvious that it is unlikely things will get done if no-one is checking. Theknightwho (talk) 22:31, 18 August 2023 (UTC)[reply]
The reason you wrote on that page was just "that's how things are done here" and yet, there are many requests on that page that are addressed on that page, so that is at the very least confusing and not accurate. You also wrote that "what the module editor says is not something that we have control over, unfortunately" and yet, there is MediaWiki:Protectedpagetext, which we can, in fact, edit. If all of these edit requests are supposed to be posted to the Grease Pit, where was discussion of this and why are you writing that we can't modify the appropriate interface to fix this confusion when we can? —Justin (koavf)TCM 22:35, 18 August 2023 (UTC)[reply]
@Koavf Are you trying to frame this as me being intentionally dishonest, instead of being unaware of MediaWiki:Protectedpagetext? What reason would I have for lying about this? It's standard Wiktionary practice for these kinds of requests to be posted at the WT:Grease pit, instead of languishing unanswered for months on talk pages that almost no-one looks at. That's simply a fact. Theknightwho (talk) 22:38, 18 August 2023 (UTC)[reply]
You answered my questions with questions. I'll ask them again in the hopes that you'll act in good faith and answer my simple questions: "If all of these edit requests are supposed to be posted to the Grease Pit, where was discussion of this and why are you writing that we can't modify the appropriate interface to fix this confusion when we can?" —Justin (koavf)TCM 22:41, 18 August 2023 (UTC)[reply]
@Koavf The answers to both questions can be clearly inferred from my previous comment. Thanks. Theknightwho (talk) 22:46, 18 August 2023 (UTC)[reply]
Neither can be inferred from your previous answer, so it would be helpful if you would just go ahead an actually answer the questions instead of wasting time. E.g. re: the latter question, you could be incompetent and ignorant, you could have an English language competency issue, you could have been dissimulating, etc. Rather than assume these things, I'm giving you an opportunity to explain yourself: why are you writing that we can't modify the appropriate interface to fix this confusion when we can? —Justin (koavf)TCM 22:55, 18 August 2023 (UTC)[reply]
Abuse of administrator privileges is a serious allegation. What abuse was there here? Are you framing refusing to perform the requested edits as abuse of admin privileges? —Justin (koavf)TCM 22:30, 18 August 2023 (UTC)[reply]
@Koavf I could have dropped that charge but the issue is with Theknightwho's behavior: it's some sort of "I'm right, you're wrong" issue. Does look like a bad example for an admin. But I see now. Maybe I'm not aware of some new edit request policy. TagaSanPedroAko (talk) 22:37, 18 August 2023 (UTC)[reply]
I won't comment on if his behavior is optimal or considerate or helpful, but I don't see any abuse of admin privileges: any other user could have been just as helpful/unhelpful, rude/considerate, competent/incompetent as him in these cases. Were he using admin tools like blocking or deleting in an inappropriate manner, that would be a lot more serious. If he was just rude/ignorant/time-wasting, that's a problem, but not as serious. —Justin (koavf)TCM 22:40, 18 August 2023 (UTC)[reply]
@Koavf Well I see some of Thekightwho's points regarding protected module edit requests: maybe it would be easily responded at a larger channel like grease pit, but the current protected page editor note tells anyone to post an edit request at the talk page. As for other edit request threads that has been there for long without response, those should be resolved.
For Theknightwho, until the protected page note is updated, they could have done better responding to the edit request than throwing accusations because they felt disturbed by the ping whose intention is to call attention to an issue.
I will insist the module's talk page remains the right venue. And in case requests get unanswered, they should try to bring this to the attention of someone who can edit the module. As a secondary option is to create a new requested edit page, in the lines of RFC, RFV, RFD etc.: that will cover anything protected from editing, not just modules.
Moving any of those edit requests to the grease pit would have diverted attention away from much bigger technical issues that are usually posted there.TagaSanPedroAko (talk) 22:54, 18 August 2023 (UTC)[reply]
Agreed. If this is a systemic thing, then it makes sense to edit the interface rather than have constant back-and-forth on a talk page. I'm hopeful that this admin will answer why he wrote that we can't modify the appropriate interface to fix this confusion when we can. —Justin (koavf)TCM 22:56, 18 August 2023 (UTC)[reply]
Can we end the thread here where it is? The request should have been made at the grease pit, and no admin powers were abused. I don't want yet a bajillionth thread clogging up the BP wasting everyone else's time. Vininn126 (talk) 22:43, 18 August 2023 (UTC)[reply]
@Koavf @TagaSanPedroAko @Theknightwho I'm going to ask you three to stop. The questioning, doubling down on something that definitely seems wrong, and overall attitude of this thread is just uncomfortable to look at. Please move on and let's just focus on the actual requested change instead of this other BS. Vininn126 (talk) 22:57, 18 August 2023 (UTC)[reply]
@Vininn126 Thank you. Theknightwho (talk) 23:00, 18 August 2023 (UTC)[reply]
I'm going to edit the appropriate part of the interface so that we don't have these issues in the future, unless users deliberately choose to have these issues in the future. I'm hopeful this will be onwards and upwards, but who knows. —Justin (koavf)TCM 23:00, 18 August 2023 (UTC)[reply]
@Vininn126 Also thank you.
@Theknightwho As with the existing edit requests on the talk pages in question, those should be solved pronto but for the next time, anyone requesting an edit should be directed to the appropriate Wiki-discussion pages (which in the case of templates and modules, the grease pit). TagaSanPedroAko (talk) 23:05, 18 August 2023 (UTC)[reply]
Please see the edit I just made to https://en.wiktionary.org/w/index.php?title=MediaWiki%3AProtectedpagetext&diff=75687182&oldid=57952499. Do you think this will help avoid these kinds of discussions in the future? —Justin (koavf)TCM 23:08, 18 August 2023 (UTC)[reply]
@Koavf +1 TagaSanPedroAko (talk) 23:17, 18 August 2023 (UTC)[reply]
Oh my god, are we seriously at a point where this deserves a BP thread?
TKW: Be nice to newbies. Also, let them ping whomever they want to ping, the people in question will tell them not to bother them themselves.
TSPA: Talk pages are usually not patrolled, so it's more effective to post request such as this at GP.
Can we now focus on editing instead of bickering?
Thadh (talk) 00:00, 19 August 2023 (UTC)[reply]
@Thadh: Everyone was, to their credit, taking steps to de-escalate this. Please don't revive it! In general, we should find a way to gracefully shut down any discussion that morphs from an argument about something into an argument about the argument. Those tend to become interminable wastes of space that feed on themselves without accomplishing anything. Chuck Entz (talk) 00:09, 19 August 2023 (UTC)[reply]

Old Leonese → Old Astur-Leonese[edit]

What we have labelled Old Leonese serves as the ancestor of not just Leonese, but also Asturian. (Mirandese as well, of which we have no medieval record, I believe.)

To avoid privileging Leonese over Asturian, perhaps we could rename the language to Old Astur-Leonese, as we have done recently with Old Galician-Portuguese (< 'Old Portuguese').

There are very few lemmas to deal with, so this should be fairly easy. Nicodene (talk) 00:17, 19 August 2023 (UTC)[reply]

@Nicodene Sounds good to me. Benwing2 (talk) 01:28, 20 August 2023 (UTC)[reply]
How is it referred to in literature? Because Old Leonese doesn't necessarily refer to "Ancestor of Leonese", but rather "Ancient language formerly spoken in Leon" - just like Old Spanish isn't called Old Spanish-Ladino, and Old French isn't called Old French-Walloon-Gallo-Picard-and-many-other-Oil-languages (nor is it Old Oïl btw). Thadh (talk) 02:43, 20 August 2023 (UTC)[reply]
But "Astur-Leonese" or "Asturleonese language" is an actual term. Who has ever said "French-Walloon-Gallo-Picard-and-many-other-Oil-languages" or even "Spanish-Ladino/Spanyol/etc."? —Justin (koavf)TCM 02:47, 20 August 2023 (UTC)[reply]
Which is why I'm asking about the usage in literature, because I don't think we should be coining new terms for reasonably named and commonly accepted scientific terms. Thadh (talk) 02:50, 20 August 2023 (UTC)[reply]
Complicating things is the fact that Asturleonese seems to refer to a language that includes Asturian and Leonese (not to mention Mirandese) as dialects, not to a group of languages. Of course, the same has been said about Galician and Portuguese. Chuck Entz (talk) 03:11, 20 August 2023 (UTC)[reply]
1 2 3 4 5 6 7 8
9 10 11 12 13 14 15 16
Nicodene (talk) 18:40, 20 August 2023 (UTC)[reply]
Moreover, no source that I am aware of ever labels the ancestor of Asturian as 'Old Leonese'. Either we split up Old Leonese and Old Asturian, which seems unnecessary to me, or we unite them under a name that actually reflects both. Nicodene (talk) 18:42, 20 August 2023 (UTC)[reply]
@Nicodene This leads to the obvious question: should Asturian and Leonese be merged into Asturleonese, with Asturian and Leonese given as etymology-only variants? I don't know enough about the languages/dialects in question but they seem very similar. There are 5000+ Asturian lemmas but only 269 Leonese ones, so merging should be possible. Benwing2 (talk) 20:59, 20 August 2023 (UTC)[reply]
I am mostly neutral on that question, slightly leaning towards keeping the status quo because combining them would submerge the relatively few Leonese lemmas under 5,466 Asturian ones. (Granted, we could use regional labels to categorize them distinctly.)
@Oigolue is quite knowledgeable on the matter and can perhaps provide some input. Nicodene (talk) 21:17, 20 August 2023 (UTC)[reply]
It's a quite difficult matter, as these names are more geographical than anything, leonese is basically occidental asturleonese, what they speak in occidental Asturias is closer to what they speak in León and Zamora than what is spoken in central or oriental Asturias. There's even a tiny area in León (Riañu) that seems to speak oriental. Mirandese is also occidental but it obviously needs to be mentioned apart anyway. My personal preference is the label asturleonese subdivived in occidental, central and oriental. Oigolue (talk) 21:45, 20 August 2023 (UTC)[reply]
@Nicodene I think using regional labels should solve all the issues. I imagine most of the lemmas will be shared, and the ones that aren't can easily be categorized appropriately using a regional label. The advantage of merging is it avoids a lot of duplication and makes the lives of Asturleonese editors easier. Benwing2 (talk) 22:17, 20 August 2023 (UTC)[reply]
Fair enough. Down the line, some revision of the categorization may be advisable, per what Oigolue has said. Certainly there are many lemmas that are shared under one spelling already.
Astur-Leonese would undoubtedly include Mirandese (410 lemmas). Extremaduran could possibly qualify as well, but this is less clear.
As this is a significant decision, I am pinging various users that have been involved in Asturian/Leonese/Mirandese:
Nicodene (talk) 23:49, 20 August 2023 (UTC)[reply]
@Nicodene I am not so familiar with the differences between Asturian, Leonese and Mirandese, but Mirandese appears to use a different (Portuguese-based) spelling system, so it could be argued that Mirandese should stay separate while Asturian and Leonese merged. I dunno. Benwing2 (talk) 00:12, 21 August 2023 (UTC)[reply]
I think the structural factors, which strongly align it with the neighbouring (on three sides) Leonese, and against Portuguese, outweigh the spelling differences. The quite different orthography would of course be mentioned under Wiktionary:About Astur-Leonese, perhaps as the very first topic. Nicodene (talk) 00:43, 21 August 2023 (UTC)[reply]
Asturian terms ending in -a have -es in plural, while in Leonese the plural form is still -as (like in Spanish) AFAIK. Rodrigo5260 (talk) 00:22, 21 August 2023 (UTC)[reply]
According to Spanish Wiki, feminine plural /-as/ is characteristic of the entire bloque occidental, which includes western Asturias.
Perhaps we could automate nouns and adjectives to display two different feminine plurals, labelled (western) and (central/eastern). Nicodene (talk) 00:51, 21 August 2023 (UTC)[reply]
That sounds like a good solution. Benwing2 (talk) 01:18, 21 August 2023 (UTC)[reply]
Are we sure that the differences between Leonese and central (normative) Asturian can be handled systematically in this way? There appear to be spelling differences in addition to systematic morphological ones: Llionés (in Leonese), Lleonés (in Asturian). So if for instance {{ast-noun}} and {{ast-adj}} were changed to create -es/-as feminine plural variants in every instance, we could risk displaying forms that don't exist (exclusively central Asturian stem, Leonese/occidental Asturian ending).
Mirandese should definitely be kept separate (based on my reading of the Mirandese Wikipedia, where the spelling differences are very apparent).
On the original question posed in this discussion, changing Old Leonese to Old Asturleonese seems fine to me. I would merely temporarily caution against folding Leonese into Asturian for the above reason. Voltaigne (talk) 01:48, 21 August 2023 (UTC)[reply]
@Voltaigne The spelling differences such as Llionés vs. Lleonés are easily handled using labels such as {{lb|ast|Leonese}} and {{lb|ast|Asturian}} respectively. These will categorize appropriately into Category:Leonese and Category:Asturian (or whatever we choose to name the categories). This is similar to how we handle spelling differences between European and Brazilian Portuguese, and between British and American English. As for {{ast-noun}}, they can and should have parameters to indicate that a term is exclusively Asturian or Leonese, so that that we don't generate both plural variants. Benwing2 (talk) 02:23, 21 August 2023 (UTC)[reply]
So each time an editor creates an entry for a (central/normative) Asturian word, they would need to check whether or not the word is spelt the same or differently in Leonese for the purpose of adding the appropriate parameter to the headword template. That might not always be so easy to do given the apparent relative paucity of resources in Leonese. Voltaigne (talk) 12:31, 21 August 2023 (UTC)[reply]
This is true for any poorly or pluricentrically standardised language. Splitting three unstandardised lects won't help with that. Thadh (talk) 12:35, 21 August 2023 (UTC)[reply]
It should be noted that the status quo, which claims that Leonese only has feminine plural -as, and Asturian only has feminine plural -es, is wrong. Both plurals exist in both zones. A merger, in which both types of plurals are represented by default, should actually increase the accuracy of plural depictions.
Naturally the merger bot would set any exclusively Mirandese lemma (with differentiated spelling) to use the western type plural with -as. And periodically we could run another bot to sweep through new lemmas and set any that are tagged as exclusively Mirandese to use the western type plural only. Nicodene (talk) 16:31, 21 August 2023 (UTC)[reply]
Something relevant about plurals: a couple of occidental dialects have plurals in -es (but just a couple); but as for oriental dialects, the most western ones use -es and the rest -as.
Another big difference is the neuter:
- in occidental it just doesn't exist
- in central it is with -o (la sidra bono)
- in oriental it's mainly with -u (la sidra bonu)
Pasiegu dialects of cantabrian have their own system with neutralization but it's a whole different matter. Oigolue (talk) 19:24, 21 August 2023 (UTC)[reply]
If the Leonese vs. Asturian distinction cross-cuts the actual dialectal divisions (Occidental/Central/Oriental), then definitely there should be a single Asturleonese language IMO; this is similar to Serbo-Croatian. Otherwise you end up with duplicating the Occidental vs. Oriental stuff in both "Leonese" and "Asturian". Benwing2 (talk) 19:29, 21 August 2023 (UTC)[reply]
I share the opinion of Voltaigne that changing Old Leonese to Old Asturleonese is a good measure, but I'm still not fully convinced the benefits of merging Asturian, Leonese, and Mirandese will outweigh the problems such measure might create. - Sarilho1 (talk) 09:47, 21 August 2023 (UTC)[reply]
@Benwing2 It appears we all agree on the original proposal, at least, so perhaps we can proceed with that whilst the matter of possibly merging the modern languages remains to be decided. Nicodene (talk) 23:42, 21 August 2023 (UTC)[reply]

Adding Old Franco-Provençal[edit]

The ancestor of modern Franco-Provençal. Mentioned frequently by the FEW, hence often manually added on various Wiktionary entries.

There is no shortage of reliable academic sources that discuss it, as can be seen by searching "Old Franco-Provençal" or "ancien francoprovençal" on Google Books.

Suggested language code: roa-ofp.

It appears to have had an active case-system for nouns, incidentally, much like Old French and Old Occitan. Nicodene (talk) 01:36, 19 August 2023 (UTC)[reply]

Is it distinct enough from modern FP to be a separate language as opposed to an etymology language? —Mahāgaja · talk 08:39, 19 August 2023 (UTC)[reply]
At least as much as Modern French is separable from Old French and for essentially the same reasons: noun-case-system loss, radical sound-changes (leading to noticeably different spelling), various grammatical/syntactic changes. Nicodene (talk) 09:12, 19 August 2023 (UTC)[reply]

Visually indistinguishable character combinations[edit]

I remember reading through many previous discussions about visually indistinguishable (or difficult to distinguish) characters in headwords, including instances where characters from multiple script types were being mixed (here, here, here, here for Latin 'æ' vs. Cyrillic 'ӕ' in Ossetian, also here and here for Palochka, here for Hebrew Geresh and here for Chuvash; there are surely many more). Now for a question about difficult to distinguish letters within the same script type. I've noticed that there is at least "one" character combination in Yiddish which looks in most situations identical (an exception is when using monospaced fonts such as here in the source editor) but is apparently not: a combined diphthong and a simple sequence. For instance, the sequence וי (oy) (which for clarity is entered as two letters, "ו" ('vov') and "י" 'yud') vs. the combined form ױ (oy) (a diphthong called 'vov yud', which is entered as one letter). I had assumed that the latter was maybe just a precombined form of the same diphthong - the diphthong's name suggests it to be a simple combination of the two basic letters, both wikilinks point to the same page, the transliteration is the same and a strict google search brings up the same wiktionary page (and I cannot see any redirects in place). Yet, we apparently treat words spelled with the two variants as distinct, for instance the bluelink געבוי (geboy) and redlink געבױ (geboy). Perhaps others can chime in, are these combinations to be treated (sometimes or always) as meaningfully distinct in the Yiddish language (or in other languages using the Hebrew script), either generally or in the computing space? And how should we treat them Wiktionary, should we treat them as alternate spellings, should we normalise all entries containing diphthongs, ? Helrasincke (talk) 10:47, 19 August 2023 (UTC)[reply]

See Wiktionary:About Yiddish § Alphabet and transliteration: "the only Unicode ligature used is ײַ". As I understand it, the other ligatures available in Unicode (vov-yud, tsvey-yudn, tsvey-vovn) are deprecated, not just at Wiktionary but generally. There may be a few hard redirects from forms using a deprecated ligature to the two-character forms, but that isn't our standard practice. The same goes for the precomposed Alphabetic Presentation Forms available for Yiddish: we use the two separate characters, not the precomposed one. For example, for pasekh-alef, we use U+05D0 U+05B7, not U+FB2E. —Mahāgaja · talk 11:01, 19 August 2023 (UTC)[reply]
Thank you @Mahagaja, that's good enough for me. I don't know how I missed that. Helrasincke (talk) 17:39, 19 August 2023 (UTC)[reply]

standardChars field in Module:languages/data/2 et al.[edit]

At the moment, languages can have a standardChars field, which ostensibly contains all the standard characters in that language's alphabet. In practical terms, this tells the headword template to track terms which contain characters not in the standardChars field, in categories like Category:English terms spelled with Æ. standardChars is not used for anything else.

Unfortunately, this causes some difficulties when languages have rare letters in their official alphabets. For example, the Mongolian alphabet officially contains the letter Щ (Šč), but as Category:Mongolian terms spelled with Щ shows, very few words actually contain it. This happens to a greater or lesser degree in many alphabets.

The current name standardChars is a source of confusion here, because some interpret "standard" to mean "official", while others interpret it to mean "common". Given the practical reality that standardChars actually means "don't track these characters", it would probably be better for it to be renamed commonChars. Pinging @Thadh @Fenakhay who have also shown interest in this. Theknightwho (talk) 02:56, 20 August 2023 (UTC)[reply]

I was mostly just noting that since the name is apparently confusing, we should change it to something less so.
I'm also not sure these tracking categories for 'rare' letters are useful, but that's a different discussion. Thadh (talk) 03:19, 20 August 2023 (UTC)[reply]
I'm not sure how useful they are either, but they've been around for a while and I have occasionally found them useful when needing to track down rare examples for testing stuff. Theknightwho (talk) 03:32, 20 August 2023 (UTC)[reply]
I would support a rename. Benwing2 (talk) 04:31, 20 August 2023 (UTC)[reply]
I support a rename since it is sometimes not used for the intended purpose.
It would be nice to have support for rare digraphs (trigraphs, etc...) tracking using trackChars, for example. — Fenakhay (حيطي · مساهماتي) 06:28, 20 August 2023 (UTC)[reply]
That isn’t necessarily straightforward to do because of compound words. Theknightwho (talk) 11:35, 20 August 2023 (UTC)[reply]
@Theknightwho This issue came up with palindromes in Hungarian, where digraphs should be treated as a single char for palindrome purposes but two characters next to each other that happen to look like a digraph should not be. My solution was to implement |nopalindromecat=1 that can be set on {{head}}, and something similar could be done here. (The only problem with this is that this param has to be threaded through all the templates that wrap {{head}}. I have long planned to implement a general headword-handling library in Module:headword utilities, so people don't have to keep rolling their own headword modules. If this is implemented and a given headword template is using it, adding new pass-through params is easy.) Benwing2 (talk) 20:37, 20 August 2023 (UTC)[reply]
@Benwing2 I wonder if it’s worth having a way of checking the etymology section which can determine stuff like this, which would save us from having an increasingly long list of parameters like these which will no doubt get forgotten about by most users. Theknightwho (talk) 21:18, 20 August 2023 (UTC)[reply]
@Theknightwho I suppose, although it seems fragile. Benwing2 (talk) 22:18, 20 August 2023 (UTC)[reply]
@Benwing2 I agree, but the issue of multigraphs also comes up with sorting, too. Japanese entries already do this via the kanjitab templates (given Japanese sorting is based on the reading), but I think it would help to have a more general-purpose way of handling the issue. It could also come in handy for automatic pronunciation as well, as there tends to be a phonemic break at the compound boundary (though of course there are many exceptions). Theknightwho (talk) 22:47, 21 August 2023 (UTC)[reply]

IPA transcription of voiceless consonants[edit]

Sorry if this is the wrong place (or wrong format) for this, because this is my first time contributing here.

On the page for einzig I noticed that the IPA symbol [ɡ̊] was used in a place that I expected to see [k]. To my knowledge, these represent the exact same phoneme. It seems to me that [k] is a generally simpler and easier to understand transcription, but I wanted to ensure that there was no policy reason for using [ɡ̊] before I edited the page. 24potato (talk) 04:10, 20 August 2023 (UTC)[reply]

Given that it's actually /ɡ̊/ in the entry (phonemic, not phonetic, transcription) and that the German pronunciation key does not list /ɡ̊/, I think it should probably be changed to /k/. BTW, the better place for general questions is Wiktionary:Information desk. The Beer Parlour is for policy discussions or issues with a broad reach, the Tea Room for discussions of particular words, and the Grease Pit is for technical/template/bug-related issues. Andrew Sheedy (talk) 04:22, 20 August 2023 (UTC)[reply]
I’ll update it to [k]. I wasn’t quite sure whether this qualified as a policy discussion and just figured that it was better to be too specific than not specific enough. I’ll keep that in mind for future questions that I have though! 24potato (talk) 04:28, 20 August 2023 (UTC)[reply]
Honestly, it's a bit fuzzy sometimes. The Beer Parlour isn't necessarily a bad spot for it and the Tea Room would probably have been fine too. The Information Desk tends to be a bit underused, though, compared to the often bloated Beer Parlour. Andrew Sheedy (talk) 04:30, 20 August 2023 (UTC)[reply]
Obviously, I’m new here, but it seems that maybe the site (and this page in particular) could benefit from a dedicated page for policy issues. 24potato (talk) 04:48, 20 August 2023 (UTC)[reply]
@24potato The Beer Parlour *IS* the place for policy issues. The problem is that it isn't always completely obvious what type of issue a particular question is, so sometimes non-policy issues get posted here because it's the place everyone looks. IMO creating another discussion page would just lead to confusion; things are fragmented enough as-is. Benwing2 (talk) 20:32, 20 August 2023 (UTC)[reply]

{{C.E.}} vs. {{CE}} and the like[edit]

Turns out, {{CE}} and {{C.E.}} aren't the same. The former displays CE and the latter displays C.E. with dots between the letters. Same with {{BCE}} and {{B.C.E.}}. There also exist {{BC}}, {{B.C.}}, {{AD}} and {{A.D.}}, which redirect to the common-era equivalents, but not in a sensible way, as can be seen in this table of uses:

Aliased template Canonical template #Uses
Template:BCE Template:BCE 1172
Template:BC Template:BCE 261
Template:B.C.E. Template:B.C.E. 1420
Template:B.C. Template:B.C.E. 25
Template:CE Template:CE 2371
Template:C.E. Template:C.E. 3102
Template:A.D. Template:C.E. 29
Template:AD Template:C.E. 208

In particular, {{AD}} redirects to {{C.E.}} and not to {{CE}}. So two questions:

  1. Do we need the BC/AD templates at all? As can be seen, they aren't used so much. I'd rather eliminate them.
  2. I also think we should eliminate the distinction between {{CE}}/{{C.E.}} and {{BCE}}/{{B.C.E.}} in favor of the dotless variants, which take less room look better and are more in line with modern usage. Usage on Wiktionary is nearly equal between the dotful and dotless variants already vastly prefers the dotless variants; this can be seen by the fact that the dotless variants are defined using the dotful variants, so you have to subtract the total of the dotless variants from the dotful count (and discount the aliases, as their counts are already included in the count of the non-aliased equivalent). There cannot reasonably be any objective criteria for why some pages use one or the other; it comes down to editor choice or pure randomness, and leads to inconsistency across the dictionary.

Thoughts? Benwing2 (talk) 04:28, 20 August 2023 (UTC)[reply]

I think it's silly for these to exist. Writing "In 24 BC, a solar eclipse could be seen in Polynesia" is perfectly appropriate. The only reason I can see for having something like this is for trying to keep typographic styling for quotations, but even then, it's not terribly hard to apply that styling with more generic templates. —Justin (koavf)TCM 04:32, 20 August 2023 (UTC)[reply]
I thought it was for compatibility with some sort of gadget, so that people could choose to see "AD/BC" or "CE/BCE" according to preference, avoiding the issue of us possibly enforcing the use of one over the other for consistency. Andrew Sheedy (talk) 04:47, 20 August 2023 (UTC)[reply]
@Andrew Sheedy That's correct. There's a Preferences setting that says "Switch BCE/CE to BC/AD". Benwing2 (talk) 04:51, 20 August 2023 (UTC)[reply]
I think they should be kept. I don't think there are that many occasions for their use, so I wouldn't expect them to be used often. I agree with favouring the dotless variants. Andrew Sheedy (talk) 04:32, 20 August 2023 (UTC)[reply]
Since there is a gadget for switching B.C.E./C.E. to B.C./A.D., I guess we don’t need the latter set of templates. Yes, we could type the abbreviations without templates, but I suppose the templates serve the purpose of providing useful links which explain “B.C.E.” and “C.E.”. As for whether we should switch to using the version of the templates without dots, this raises the wider issue of whether we should drop full stops for all abbreviations throughout the dictionary. I concede that anecdotally the modern trend appears to be in favour of that, though there are some notable holdouts (like The New York Times and some academic journals) and personally I prefer the dots. I have a feeling this is one of those issues which will be difficult to achieve consensus on. (I’m not very convinced by the argument that eliminating the dots saves space—surely any space thus saved is de minimis.) — Sgconlaw (talk) 05:11, 20 August 2023 (UTC)[reply]
@Sgconlaw One other thing to consider is we could easily make dot vs. no-dot an additional choice in the Preferences gadget. Then we could consolidate to only {{BCE}} and {{CE}}, and you could set your preferences so that they show up as B.C.E. and C.E. (similar to the Oxford comma debate). Benwing2 (talk) 07:48, 20 August 2023 (UTC)[reply]
@Benwing2: I guess so, though the issue of consistency with other abbreviations (to dot or not to dot?) still remains. — Sgconlaw (talk) 18:36, 20 August 2023 (UTC)[reply]
@Sgconlaw Which other abbreviations are you referring to? Benwing2 (talk) 20:51, 20 August 2023 (UTC)[reply]
@Benwing2: as I mentioned yesterday, all other abbreviations in Wiktionary (like e.g., i.e., transl., U.S., viz., etc.). — Sgconlaw (talk) 21:09, 20 August 2023 (UTC)[reply]
@Sgconlaw Most of those are irrelevant as there's only one accepted way of writing them. From your list, the only relevant one is US or U.S., and that's a totally separate issue (IMO) because it's not even in the same wheelhouse and isn't conventionally written in small caps or any other formatting (and in any case I think Module:place/shared-data accepts both and canonicalizes them, although I'd have to check the code to be sure). I would say, let's not worry about this for the moment. Benwing2 (talk) 22:22, 20 August 2023 (UTC)[reply]
@Benwing2: not sure I’m following. Why should the dots in, say, '’B.C.E.'’, be removed but those in e.g. and etc. be left in place? For consistency shouldn’t either all abbreviations have dots, or none? — Sgconlaw (talk) 01:15, 21 August 2023 (UTC)[reply]
No, because it's not standard to omit the dots in abbreviations or initialisms of Latin terms. It is common, however, to omit them in English-based acronyms and initialisms (more commonly the former than the latter). Andrew Sheedy (talk) 01:57, 21 August 2023 (UTC)[reply]
Is it not? I see it all the time. Theknightwho (talk) 22:41, 21 August 2023 (UTC)[reply]
I have very rarely, if ever, seen them omitted in formal/academic writing. It's common in casual writing, but so is nonstandard capitalization. Is that what you're talking about? Or is it becoming common in certain fields? I tend to read things in fields associated with more conservative writing styles. Andrew Sheedy (talk) 22:54, 21 August 2023 (UTC)[reply]
As far as I know it is much more common than not to omit them nowadays for the dating: Routledge, OUP, CUP (the recently-published Cambridge Histories also omit the dots), I can't imagine American usage is much different. —Al-Muqanna المقنع (talk) 23:03, 21 August 2023 (UTC)[reply]
Sorry, I misread you @Andrew Sheedy, I think we agree—these publishers will still write "e.g." and so forth for the standard non-capitalised Latin abbreviations, "AD/BC" is an exception. —Al-Muqanna المقنع (talk) 23:09, 21 August 2023 (UTC)[reply]
@Andrew Sheedy In legal drafting, I find it common to see etc or eg. It’s generally seen as a matter of house style. Theknightwho (talk) 14:19, 22 August 2023 (UTC)[reply]
I'm actually surprised a distinction would be drawn between e.g. and etc. on the one hand and B.C.E. and C.E. on the other, as I don't see any particular logic for that. I would have thought it would be an all-or-nothing situation. — Sgconlaw (talk) 14:28, 22 August 2023 (UTC)[reply]
@Benwing2 what is the proper use case (or cases) for these templates? E.g. from within a quote, I'd expect that we simply use whatever the quote uses - dots or no dots - but just using regular English letters instead of a template. If it is to date a quote, as in "13th century CE", keep in mind that templates such as {{roa-opt-cite-cantigas}} embed those, so if we were to remove the AD/BC equivalents, we should check we're not breaking anyone. Finally, if it's to provide extra information in, say, an etymology section, then we could continue to allow both, or reserve the right to have an editorial preference as a dictionary. But honestly, if it ain't broke, don't fix it :-) Chernorizets (talk) 06:03, 20 August 2023 (UTC)[reply]
@Chernorizets These are predominantly used to provide dates to quotes, not to indicate actual citations of BC/BCE/CE/AD inside of quotes. The latter would be inappropriate uses, since the preferences gadget allows for switching between BC/AD and BCE/CE formats, and presumably we don't want those preferences to apply to literal citations. @Sgconlaw It turns out that the definitions of {{CE}} and {{BCE}} make use of the {{C.E.}} and {{B.C.E.}} templates. This means that the vast majority of uses are actually dotless and my statement above about nearly equal use is incorrect. You can see this, for example, if you click on Special:WhatLinksHere/Template:B.C.E. and pick a page like km or that is listed; the BCE/CE instances that show up are all dotless. Also what I meant by "saving space" is that it looks aesthetically better without the dots taking up space; badly worded on my part. Benwing2 (talk) 07:42, 20 August 2023 (UTC)[reply]
@Benwing2: yes—in fact, I think it was me who made {{BCE}} and {{CE}} reliant on {{B.C.E.}} and {{C.E.}}. — Sgconlaw (talk) 18:39, 20 August 2023 (UTC)[reply]
Would there be an issue with just keeping AD and BC as redirects to CE and BCE respectively, in keeping with the preference settings, and just getting rid of the dotted versions? I have the AD/BC setting on and sometimes forget which one is the canonical form. —Al-Muqanna المقنع (talk) 11:41, 20 August 2023 (UTC)[reply]
I support this approach. This allows editors to use whichever they prefer, not just users. Andrew Sheedy (talk) 18:52, 20 August 2023 (UTC)[reply]
@Andrew Sheedy @Al-Muqanna Yeah that is OK with me. I'm more concerned with getting rid of the dotted versions since it creates needless inconsistencies across the dictionary. Benwing2 (talk) 19:02, 20 August 2023 (UTC)[reply]

Handling people writing in English but with a foreign-script name[edit]

Hi everyone. I have done some changes to quote-* templates to allow for inline modifiers specifying translations of authors, titles and the like. But User:Geographyinitiative points out an issue I didn't account for, which is books written in English by foreigners. In Chinese, for example, the foreigner has both a Chinese-script name and a Latin-script name. I suppose we should follow the source in deciding whether to put the author's name in Latin script or Chinese script, but (a) in the case where the Latin script name is given, do we also want to include the Chinese script? and (b) What if both names are given side by side or one above the other? If we want to support both names and the Latin script name is the one found in the source, how should it be notated? Normally, we notate like this: 王晰宁 [Wang Xining] and I what I don't want is to have some cases where it's written exactly the same but with the names reversed e.g. Wang Xining [王晰宁], which is what User:Geographyinitiative has been doing, either manually or by putting the Latin script in the |author= field and the corresponding Chinese script in the |trans-author= field (the opposite of how these are supposed to be used). Technically I can support this using a new inline modifier, something like Wang Xining<f:zh:王晰宁> where f means "foreign", but I don't know the best way to display this or even whether from a policy standpoint this is correct. BTW Same issues apply to the title. Benwing2 (talk) 19:15, 20 August 2023 (UTC)[reply]

This came up at Qiemo and Quwo and happens elsewhere. I do not have any opinion on how these cites should work- my goal is just to make high-quality citations that refect well on Wiktionary. I am glad benwing is making these changes and clean-ups and I will work to adapt to these changes.
Someone coming to an advanced minor geography page like those two above that will not respect cites with bare-bones English in them- they will be bilingual, and they will want to know something about the Chinese character names and titles involved. Someone who has gotten to the level of getting to these entries has bona fide expertise and I want to show them everything I can so Wiktionary is seen as reliable to that expert. The expert is thereby encouraged to make their own edits and additions. I think this strategy has worked at least two or three times so far. --Geographyinitiative talk) 19:31, 20 August 2023 (UTC)[reply]
It is fairly standard nowadays in academic works to give the Chinese characters of author names (in fact I had to do it in something I submitted recently). It can be a bit of a nightmare to track them down otherwise if they're not authors known in the West, even if the book they've written is in English. So I would support a functionality on these lines. —Al-Muqanna المقنع (talk) 09:15, 21 August 2023 (UTC)([reply]
@Al-Muqanna Right, so how do we format the names? That's my main question. Do we put the Chinese name first regardless of what shows up on the cover? Or do we put what's on the cover, and the Chinese name follows? If the latter, what's the formatting? Something like "Fang Kuei Li (Chinese name 李方桂)"? Benwing2 (talk) 18:36, 21 August 2023 (UTC)[reply]
@Benwing2: Just name as on the cover and then something like [李方桂] I think, so similar to what GI was already doing. That's the printed standard and probably fine here too—people can see it's Chinese characters (and if they can't it's unlikely to be relevant to them anyway). (edit: I guess you could also use rounded brackets to distinguish it from transliteration into Latin, or with "Hanzi:" similar to Wikipedia does or something. "Chinese name" comes off as vague—the point is to list the original Chinese characters, not simply to list other languages.) —Al-Muqanna المقنع (talk) 22:37, 21 August 2023 (UTC)[reply]
@Al-Muqanna I thought about this and I realize we need to have a prefix, but the prefix can variously be a language, a script or something else. I ran into this with the following case:
{{quote-book
|it
|title=L'œuvre complète de Tchouang-tseu
|tlr=Liou Kia-hway<fs:Hant:劉家槐><f:Pinyin:Liú Jiāhuái>
|original={{w|Zhuangzi (book)|Zhuang-zi}}
|by=莊周<t:Zhuang Zhou>
|origlang=och
|origyear=c. 300 {{BCE}}
|year=1969
|year2=1982
|year_published2=2010
|newversion=translated as
|tlr2=Carlo Laurenti; Christine Leverd
|title2=Zhuang-zi [Chuang-tzu]
|series2=Gli Adelphi
|seriesvolume2=41
|chapter2=Libertà naturale
|trans-chapter2=Natural Freedom
|edition2=6th
|location2=Milan
|publisher2={{w|Adelphi Edizioni}}
|page2=17
|isbn2=978-88-459-0950-4
|text=Se le acque si alzassero fino al cielo, non annegherebbero. Se la siccità '''liquefacesse''' i metalli e infiammasse le montagne, non ne sarebbero neppure sfiorati.
|t=If the waters were to rise to the skies, they wouldn't drown. If drought were to '''liquefy''' metals, and set fire to the mountains, they wouldn't even be touched by it.
}}
Here, we have an Old Chinese original, translated into French in 1969, and in turn translated into Italian in 1982, with the quote from the 6th edition of the Italian translation, published in 2010. The French author is given on the title as Liou Kia-hway, which is 劉家槐 in Traditional Chinese and Liú Jiāhuái in Pinyin. In order to render the French edition author's name, we need to at least be able to qualify a given version in a script (Traditional Chinese) and a transliteration system (Pinyin), and there are probably cases where we want to qualify using a language name. My thought here is that the <f:...> inline modifier is followed by either a language name, script name or arbitrary text, and will display something like "Liou Kia-hway (Traditional Chinese: 劉家槐; Pinyin: Liú Jiāhuái)". This means we can't do any validation on the qualifier, so that if you mistype the language or script name, it shows up as-is, but that is probably OK. The alternative is to have different inline modifiers, one that accepts a language or script name and one that accepts arbitrary text, but that might end up being more confusing than it's worth. I'm also not completely satisfied with the use here of |origlang= and |worklang=, but I suspect this is the best we can do unless I implement support for a third series of parameters, so that effectively we have multiple "newversions". Benwing2 (talk) 23:57, 22 August 2023 (UTC)[reply]
@Benwing2: It's not as simple as you make out. For the author of a Handbook of Comparative Tai, which of the following do we want to give in the reference:
  1. Fang Kuei Li (his name on the front cover)
  2. Fang Kuei LI (as above, but with surname capitalised)
  3. Li Fang-Kuei (the name of the Wikipedia page about him)
  4. Name in Chinese character (方桂)
  5. Name in Mandarin pinyin
  6. Name in Cantonese transliteration
  7. something else yet again
There's a bit less variation in Thai names, as in Pittayawat Pittayaporn. Of course, we could have fun with mixed-race dual nationals, like Giles Ji Ungpakorn - and I wonder if he has a name in Chinese characters!
However, I believe the primary purpose of these names is to locate the work being quoted. To that end, I think we should only quote the names on the book etc. I suppose we could link to an appendix of alternative author names for authors not prominent enough to have Wiktionary Wikipedia tie their names together. --RichardW57m (talk) 09:12, 21 August 2023 (UTC)[reply]

There is quite a number of Burmese places lurking in this category. I think most need formatting with {{place}}, so if anyone interested in Burma wants to tackle them, please do. There's some other places and odds and sods in the category which need checking. I have cleared out quite a few. DonnanZ (talk) 14:12, 21 August 2023 (UTC)[reply]

@Donnanz: I've cleared out all the proper nouns from CAT:en:Towns and moved them to country-specific categories. —Mahāgaja · talk 22:22, 21 August 2023 (UTC)[reply]
@Mahagaja: Wow, you did a great job, more than I hoped for, a clean sweep. I will look at other place categories later. Thankyou! DonnanZ (talk) 22:50, 21 August 2023 (UTC)[reply]

Valid etymology source?[edit]

The Institute for Bulgarian Language at the Bulgarian Academy of Sciences provides a language reference service to the general public, available by phone or via an online question form. It can be used to ask questions about orthography, grammar, transliteration and other topics, including the etymology of individual words.

I've used the service to get etymologies for a few words missing from the Bulgarian Etymological Dictionary ({{R:bg:BER}}), and I've gotten replies over email. Can I use that in entries?

I've asked the Institute whether they could publish the etymologies on the language reference website, the way they've done for e.g. царевица. I've also asked them if they could provide me with a bibliography (which they don't include along with their answers). It seems as if they don't get such requests very often, because I can't seem to get much traction. In effect, I'm sitting on good etymological data that I'm not sure I can use on Wiktionary. Thoughts?

Thanks, Chernorizets (talk) 02:31, 22 August 2023 (UTC)[reply]

Sanskrit Lemmas[edit]

Made a longer post here, but as tl;dr—For Sanskrit verb entries, is there any consensus on which forms are actually verbs, and where the various conjugational forms should actually appear? Generally, it's obvious that the third-person singular present-tense (e.g. करोति (karoti)), is a lemma form, and the present-tense forms (e.g. the first-person singular करोमि (karomi)) should be presented below it within the "Conjugation" section's table. But beyond this, there are the imperfect, future, aorist, benedictive, conditional, perfect, optative, imperative, etc forms. For each of these tenses, there is an argument that:

  • Its third-person singular is a "lemma" (like how Proto-Indo-European *linékʷti, *lelóykʷe, and *léykʷt are all different lemma entries, each with a separate conjugational paradigm shown)
  • Or, they are all various non-lemma forms of the करोति (karoti) (like how the all forms, including the imperfect, perfect, future, etc are shown under Latin bibo)

If going with the first idea, it makes sense (at least to me) to display all the lemma verbal forms on the page of the root in Sanskrit (e.g. कृ (kṛ)) in some sort of table. Just interested to hear what other people's ideas are. (Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat, RichardW57): Dragonoid76 (talk) 04:24, 22 August 2023 (UTC)[reply]

I'll start with an account of what has been done in Pali. To begin with, Pali roots are much less robust than Sanskrit roots. Assimilatory changes have made them less easy to pick out, and deciding on a root's form may be arbitrary. It's not for nothing that the PTS dictionary refers to the Sanskrit form for simplicity, though the Pali grammarians also identified roots, which are cited in a declinable form compatible with normal words in the language.
The present stems are quoted in the 3s present active - the issue of deponent verbs seems yet to have arisen. (The inflection template has been set up to handle them.) The past participles and gerundives (prototypically the verbal adjectives in -ta and -tabba) are treated as derived lemmas. However, while present stems may be derived from roots, the mode of formation has almost no bearing on how the verb is conjugated - there is little if any dependence in the conjugation from the present stem on whether the ending -ayati, -eti, -oti or -āti includes the root or not. (The Pali inflection templates are set up to handle limitless irregularities - all this is demanded is that identical forms transliterate the same.) The original plan was to treat aorists, futures and perfects, which are formally derived from the verb root rather than the present stem, as independent derivatives. However, in practice many aorists and futures are actually derived from the present stem, and it proved easier to just accommodate them in a single table. The perfect tenses conjugated in all persons are rare and the text books disagree on the forms. Wiktionary's principle of description therefore dictates that we must collect individual forms, and they are therefore currently treated as limitlessly irregular.
Synonymous present stem formations are accommodated in common tables from a single table invocation, e.g. Pali jeti (to win), to which the multiple stems are passed. The different 3s present active terms are treated as lemmas, using {{pi-verb}}; the non-lemma headword template {{pi-verb form}} would not record the mode of stem formation. the Having made this design decision, multiple aorist formations are naturally combined. This is actually natural, as it seems that the first and second persons plural of aorists in -esi are mostly supplied by the parallel aorist in -ayi, e.g. Pali deseti (to teach). (Both these aorist stems are synchronically built on the present stem.)
Causative, (double causative) and passive verbs are treated as quasi-inflectional derivative verbs, being listed along with the inflections of the simple verb. The same will be done for intensives and desideratives, which are actually quite rare in Pali.
So far, no problems have been encountered with treating the past tenses as merely irregular derivations of the present tense stems. A past participle that served for both the simple verb and its causative would be straightforward to handle; they have their own entries like other adjectives, and it would simple have two senses, as participles of the simple verb and its derivative.
Pali roots have been severely neglected; derivation from roots is frequently not recorded. The current, undocumented practice is to record the verbal forms (present stems and forms immediately derived from the root) under the roots. I am undecided on recording verbal forms derived from the present stems, and may have been inconsistent.
For recording the relationship between Sanskrit and Pali, it is better to be able to record the 'descendants' of Sanskrit present stems. There may also be arguments for recording the relationships between aorist and future stems, though I don't know how strong the links are. The link between the synthetic futures may be more one of system than individual words; I haven't traced the relationships of Pali's irregular futures with Old Indic. There seems to be a lot of restructuring between Sanskrit aorists and Pali aorists; there may only be a little that meets our definition of inherited. Post-canonical perfect tense forms may relate well to Sanskrit forms, but this may be because they have been artificially borrowed. --RichardW57m (talk) 12:33, 22 August 2023 (UTC)[reply]
Couple ideas:
  • Makes perfect sense to record the secondary-derivation verbal forms (e.g. causative, desiderative, and intensive) under the root, as is done in Pali. This is happening right now at भृ (bhṛ), for instance. The causative third-person singular present tense form feels distinct enough from its non-causative counterpart in Sanskrit to deserve to be treated like a lemma.
  • My understanding is based on https://en.wikisource.org/wiki/Sanskrit_Grammar_(Whitney)/Chapter_XIV, but it does appear that tenses like the perfect, aorist, future, etc can be built on the secondary-derivation or denominative verb forms. In verbs like प्रियायते (priyāyate, holds dear), the tenses aren't so easily tied to a root and are developed from the actual nominative प्रिय (priya). This, I guess, makes the option of putting everything under the present tense seem appealing, but these examples are fairly rare compared to verbs derived from a root. We might be able to handle them differently (e.g. treating the third-person present as a quasi-root for the other forms like aorist, perfect, future, etc).
Are you actually for or against the idea of showing all the tenses underneath the third-person singular present tense lemma, for Sanskrit. Dragonoid76 (talk) 01:46, 23 August 2023 (UTC)[reply]
@Dragonoid76: I'm opposed to the notion of a third-person singular present tense lemma, though I think it should be the citation form for some lemma (or alternative form). There are two things I would accept it being a citation form for:
  • The present system of verb - those things derived from the present stem, such as the present and imperfect tenses and the imperative, subjunctive and optative, in both the active and middle voices, along with the clearly associated non-finite forms - definitely including the present active participle.
    I would cross-reference the citation forms for the other tense systems, and also the non-finite forms.
  • The whole system of a verb.
    Different present stems can still go on in different lemmas - Whitney reports that the stem formation used sometimes distinguishes meanings.
A critical feature for a lemma is that its set of meanings (strictly, for non-English, translations) should extend across its inflected forms. Complications that we need to cater for are:
  1. Inflected forms having 'additional' meanings or translations. Deverbal adjectives, particularly participles, are prone to this. I handle this for Pali by treating participles as derived terms listed in an adjunct to the inflection table, and then inherit the verb's meanings by giving one of its senses as a particular formation.
  2. Inflected forms lacking certain meanings or translations (mutatis mutandis).
If we make the whole system of a verb the lemma, and there are distinct present tense stems that share non-present system forms, then if we can't merge the present tense systems in a single lemma, I think we should call up the tense system table for the commoner lemma in the inflection of the other systems, so that we don't have duplicated information in need of improvement. --RichardW57m (talk) 10:36, 24 August 2023 (UTC)[reply]
Now, if we base lemmas around present tense stems (and perfect tense stem pairs, etc.), one thing I am not happy with is including morphologically passive forms under such stems. The present passive is not formed from the present active (or middle); for Pali, I treat it as a derived verb parallel with the causative. This also allows me to use the morphological contrast of active and middle in Pali, for Pali also has some middle forms for passive verbs, as well as the active forms. What is the plan for handling the active forms of the Sanskrit passive? (Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat, Dragonoid76): : Are there any plans at all yet for handling BHS?
I'm not sure how one should handle the aorist system is one makes that a separate lemma. It seems the active and middle have different forms, not to mention the isolated passive. --RichardW57m (talk) 10:36, 24 August 2023 (UTC)[reply]
When thinking of implementation details that may have to affect the organisation of information, bear in mind that all the inflectional detail needs ultimately to be repeated for each tolerated Sanskrit writing system. (I thought we had a policy that restricted tolerance to Indic scripts, but I can't find it.) While the inflection templates I've examined can be generalised to systems faithfully represented by IAST, the choice of invocations and their parameters would currently have to be repeated manually, as is done for Pali and Prakrit. I'm slowly working on ways of easing the burden for Pali, where irregularities are a noticeable burden. --RichardW57m (talk) 10:36, 24 August 2023 (UTC)[reply]
@RichardW57m This is all interesting stuff—definitely agree that the passive is formed out of a completely different stem than the verb, e.g. क्रियते (kriyate, is done) versus करोति (karoti), and you'll note that right now both of those are lemmas. If we are splitting, it makes sense to group, as you mentioned, the "present system", "perfect system", "aorist system", etc. Dragonoid76 (talk) 00:29, 25 August 2023 (UTC)[reply]

> Whitney reports that the stem formation used sometimes distinguishes meanings
Are you talking about verb classes here (e.g. बिभर्ति (bibharti) vs भरति (bharati))? Is there an example of this? Dragonoid76 (talk) 00:29, 25 August 2023 (UTC)[reply]
Class II वेत्ति (vétti, to know) v. Class VI विन्दति (vindáti, to find). And I think there are a lot more, but I couldn't find a list. --RichardW57m (talk) 15:44, 25 August 2023 (UTC)[reply]
Notably, those are listed as two separate roots [4]here (and in Monier-Williams), even if they have the same etymology. Whenever we get situations like this, we could also simply mark it as two roots. Dragonoid76 (talk) 07:13, 30 August 2023 (UTC)[reply]

> Inflected forms having 'additional' meanings or translations.
This is interesting. I don't think there are many examples where, for instance, an aorist system of a verb has a different core meaning than the present system. The ancestors of these two words would be considered different lemmas in Proto-Indo-European—but this isn't the case in Latin and Ancient Greek, and I'm starting to think that might be a better option. If we consider these different systems to be different principal parts of the same verb, we can show all these forms together under करोति (karoti). Dragonoid76 (talk) 00:29, 25 August 2023 (UTC)[reply]
Not so much core meanings, as subtle modifications. For example, the perfect वेद (véda) can have the same meaning as the present tense वेत्ति (vétti, to know). Examples in English include the perfect have got of get also meaning 'to have' and the perfect of be also meaning 'to have gone (and returned)'. --RichardW57m (talk) 15:56, 25 August 2023 (UTC)[reply]

> bear in mind that all the inflectional detail needs ultimately to be repeated for each tolerated Sanskrit writing system.
Reasonable. There's no reason the conjugation or declension modules wouldn't work with scripts besides Devanagari, and if so they should be fixed. Dragonoid76 (talk) 00:29, 25 August 2023 (UTC)[reply]
Yes, there is a subtle problem with unpredictable spellings. If SLP1 doesn't distinguish them, then inflected terms may show different spellings to the citation forms! Round AA v. tall AA gave Pali inflection problems when selecting the right form for the endings, and different options for encoding consonant stacks can be a problem - some of the shapes in Farther India are selected by the encoding. (This might actually be a problem in the Malayalam script.) We were told by Everson that in the Burmese script the encoding should control whether subscript 'v' is permitted to be triangular. That won't be a problem only because the encoding to force round 'v' is mostly treated as an error and therefore doesn't get used. --RichardW57m (talk) 16:23, 25 August 2023 (UTC)[reply]

Lastly, besides the active/middle participles, I now don't think we should display any non-finite forms underneath the actual verbal lemma. Like you say, forms like the past passive participle and gerundive are more derivations from the root, and should be displayed in some sort of table there. Dragonoid76 (talk) 00:29, 25 August 2023 (UTC)[reply]
@Pulimaiyi (tagging since you responded on your talk page) and @Benwing, what do you think of this? Basically, the main question is still about the "lemma"-ness of forms like the third-person singular aorist, perfect, future, conditional, benedictive, and imperfect.
To tl;dr some points we've come to
  • Most dictionaries list only the root as a lemma and all other verbal forms (sometimes also including the causative, desiderative, and intensive) as derivatives. We've already committed to making the third-person singular present base-form, causative, desiderative, and intensive forms all "lemma" forms.
  • As far as I know, the core meaning of the aorist, perfect, future, conditional, etc is not ever different from the core meaning of the third-person present. Based on the ideology that the non-lemma forms of a lemma should have the same core meaning as the lemma, this is evidence that we should include all primary verbal inflections (aorist, perfect, future, conditional, etc) under the third-person singular present verbal lemma.
  • However, this is also kind of the case for Proto-Indo-European *linékʷti, *lelóykʷe, and *léykʷt, and these are all treated as separate lemmas. In many cases, these so-called primary derivatives, like रिच्यते (ricyate, is left, passive of रिच् (ric)) and रिरेच (rireca, left, perfect of रिच् (ric))), are formed from the root in a completely differently way than the third-person singular present, like रिणक्ति (riṇakti, leaves). If we've committed to making the third-person singular present a lemma, then this is the evidence for considering all these forms as separate lemma derivatives of the root itself.

By the way, a separate issue is that basically all Sanskrit verbs like वेत्ति (vetti) are defined in terms of the English infinitive (e.g. "to know"), when they are more accurately third-person singulars (e.g. "knows"). This is done properly in Latin, where duco is defined as "I lead" and not "to lead". If we're updating Sanskrit verbal entries en masse, this is something I think we should start fixing. Dragonoid76 (talk) 07:53, 30 August 2023 (UTC)[reply]
I strongly disagree. When we use वेत्ति (vetti) as the lemma or citation form, it stands for the entire verb and should be glossed with the English lemma or citation form, which is the infinitive. And that's true of Latin entries as well – if duco is currently glossed as "I lead", it should be corrected to "to lead" (or simply "lead", though I prefer glossing verbs with to since so many English verbs are spelled the same as nouns). —Mahāgaja · talk 08:08, 30 August 2023 (UTC)[reply]
I strongly agree with @Mahagaja. --RichardW57m (talk) 08:44, 30 August 2023 (UTC)[reply]
I also agree with Mahagaja. I noticed that about Latin last week. I guessed it must be the policy for Latin. —Caoimhin ceallach (talk) 15:53, 31 August 2023 (UTC)[reply]
@Mahagaja Alright, if it's the policy for Sanskrit then it's reasonable. Should definitely be in Wiktionary:About Sanskrit. Dragonoid76 (talk) 00:18, 3 September 2023 (UTC)[reply]
What do you mean by 'core meaning'? By it, I understand one or sometime more senses, not the whole panoply of senses. However, for a lemma, the inflected forms should derive their meanings from the senses of the lemma by the rules of an adequate grammar. Our treatment of additional senses is probably inconsistent, but they could generally (always?) be promoted to derivative lemmas in their own right. --RichardW57m (talk) 09:26, 30 August 2023 (UTC)[reply]
I think we should be flexible with the concepts of non-lemma and lemma. Thus a derived term can inherit senses from the term it is derived from, as indicated with a sense definition as an inflection, and have senses of its own. A stressing case would be how to record the meaning 'to know' for वेद (veda), when it is not an automatic consequence of it being the perfect of वेत्ति (vetti, to know). --RichardW57m (talk) 09:26, 30 August 2023 (UTC)[reply]
As an example of Wiktionary having a derivative inherit the senses of its source, we have English composable, whose definition carefully employs the source compose, but composable is still categorised as a lemma, whereas the foremost entry of composing remains a non-lemma. --RichardW57m (talk) 09:26, 30 August 2023 (UTC)[reply]
@RichardW57m Sorry for the long post, but after having read that, I'm just gonna put something forward now and see what you all think of it (and mark what should be lemma):

Root page, e.g. भृ (bhṛ) (lemma) is defined as a simple gloss, e.g. "to bear".
  • There is a table on the root page containing/organizing link-outs to the following lemma verbal forms:
    • (The present system) The present tenses, e.g. भरति (bharati) (lemma) and बिभर्ति (bibharti) (lemma). The full verbal definitions (defined in the infinitive), usages, synonyms, etc are shown here. These contain the conjugation for the active and mediopassive (but NOT the passive, as we discussed) for the indicative, optative, imperative, participle, and imperfect (akin to *bʰéreti)
    • (The future system) The future tense at भरिष्यति (bhariṣyati) (lemma). This contains the active and mediopassive for the future and conditional.
    • (The aorist system) The aorist at अभार्षीत् (abhārṣīt) (lemma). This contains the active and mediopassive.
    • (The benedictive system) The benedictive at भ्रियात् (bhriyāt) (lemma). This contains the active and mediopassive. Should definitely be optional, as it's very rare in Classical Sanskrit.
    • (The perfect system) The perfect at बभर (babhara). This contains the active and mediopassive. Here, वेद (veda) would be a lemma, and we could note that the meaning is active "to know".
  • Then, in a separate related table, we show the secondary derivations (the passive, causative, desiderative, and intensive)
  • Finally, there's a table with the regular derived verbal forms (e.g. past participle वित्त (vitta, known), verbal noun भरण (bharaṇa, filling), infinitive, gerundive, etc). We should be able to manually add other forms through the template.
  • After this, we show the descendants on the root page directory, e.g. Category:Terms derived from the Sanskrit root भृ. @Benwing2 is working on that here.
The benefits of this are that each lemma only shows related conjugations. Under रिणक्ति (riṇakti, to leave), we should see अरिणक् (ariṇak, it was leaving) and रिङ्क्ते (riṅkte, mediopassive) since they're etymologically related. But we shouldn't see रेक्ष्यति (rekṣyati, it will leave), since it's formed directly from the root in a different manner than रिणक्ति (riṇakti, to leave). If we want to see all related forms, we have that information on the root page.
@RichardW57m I agree that the best method is to be flexible with the concepts of non-lemma and lemma. A page like भरिष्यति (bhariṣyati) can be a lemma but the page should just contain a definition section as "future of भृ (bhṛ)" and conjugation section. We can do things like composable if a particular form has special meaning, e.g. वेद (veda).
If all this is agreeable, I'll add or suggest it to Wiktionary:About Sanskrit. Dragonoid76 (talk) 01:02, 3 September 2023 (UTC)[reply]
One really needs to make it easy to address long posts point by point. I do it by signing individual paragraphs (with the downside of inflating the post count); another, incompatible method is by soft numbering of the points (but compatibly one can use hard numbering). --RichardW57m (talk) 09:19, 4 September 2023 (UTC)[reply]
@Dragonoid76: That probably works, though I think there be more opportunities for multiplicity, and I'm not sure all the aorists should be lumped together. Suck it and see, I suppose. I fear that the output of {{rootsee}} will need manual reworking, and I know of no mechanism to know whether the reworking is up to date. (It's rather like keeping etymologies and descendants in sync.) For Pali roots with significant numbers of derivatives, I've taken to giving a reworking and then making a throwaway reference to the category. --RichardW57m (talk) 09:19, 4 September 2023 (UTC)[reply]
By 'regularly derived verbal forms', I think you mean something like 'non-finite forms'. Do we want the forms that inflect to have their own entries? I would. --RichardW57m (talk) 09:19, 4 September 2023 (UTC)[reply]
They definitely should. Forms like the infinitive, gerundive, etc are formed directly from the root like the other verbal and nominal forms, and they should have the same status. Dragonoid76 (talk) 21:57, 4 September 2023 (UTC)[reply]
@Dragonoid76: Adding pages for all inflected forms is generally a bad policy, and gets worse when inflection is ill-documented. (Some textbooks help one recognised an inflected form when one meets it, but don't tell one when the forms exist. Whitney gives the impression that Sanskrit may be like that.) Adding Pali a-declension nominative singulars for consistency with other dictionaries' citation forms already necessitates the creation of a lot of entries for Hindi vocative plurals and Sanskrit sandhi forms, and the apparently systematic inclusion of Finnish inessives calls forth entries for Pali genitive/dative singulars, e.g. matassa. To add to the problem, it seems that automatic 'transliteration' of Hindi vocative plurals often gets schwa deletion wrong. Wiktionary may not be short of space, but we don't have a limitless supply of editors' time. --RichardW57m (talk) 08:43, 5 September 2023 (UTC)[reply]
@RichardW57m I'm not sure what you mean. There are tons of cases in Finnish, Latin, etc where each inflected form has a page, and this is done automatically via bots. The issues with Hindi transliteration, sandhi variants, and whatever this Pali-Finnish thing is, are separate issues. Dragonoid76 (talk) 08:58, 5 September 2023 (UTC)[reply]
@Dragonoid76: They're issues that are made more salient by such bots. There are a lot of potential clashes in Indic languages - the situation is helped by Prakrit mostly recording inferred rather than attested script forms, and the former informal policy of demanding at least a Google hit for non-Devanagari Sanskrit lemmas coupled with a general failure to produce inflection tables for non-Devanagari Sanskrit terms. Notifying @AleksiB 1945, (Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat, Dragonoid76): . --RichardW57m (talk) 09:42, 5 September 2023 (UTC)[reply]
The Pali-Finnish thing is that there are a lot of Finnish inessive singulars that are the same as a Pali non-feminine genitive/dative singular. Matters are OK so long as both stay in an inflection table. As soon as the form gets a page of its own, the other has to go there too, or else most users will see a blue link to a Pali term from a Finnish inflection table, or vice versa. That's bad content. --09:42, 5 September 2023 (UTC) RichardW57m (talk) 09:42, 5 September 2023 (UTC)[reply]
It seems like currently, the infinitive and gerund are considered a non-lemma form of the root (See Category:Sanskrit infinitives and Category:Sanskrit gerunds). Not sure why or how this was decided. Dragonoid76 (talk) 09:01, 5 September 2023 (UTC)[reply]
@Dragonoid76, Benwing2 It's actually the category structure that deems infinitives and gerunds (= absolutives) to be forms of verbs. I think we may be into wave-particle duality here. This is a decision made across languages, and originally it was assumed that 'gerund' had the same core meaning for the languages to which it was applied. I stuck my oar in and got the description customised.
For Pali, an absolutive is associated with a verb (present + aorist) rather than a root - simple and causative tend to have different absolutives, and different simple stems can have different formally associated absolutives (cf. jeti). That said, it may be derived from the root rather than the present stem.
I don't know how it works out in Sanskrit, or, looking at the selection of gerunds categorised as such, Vedic Sanskrit. I think we want a link from the present stem lemma, the aorist lemma and the perfect lemma to the semantically associated absolutive. We will also want a link from the present stem lemma to the infinitive - I'm not sure about the other tenses. I wonder if @JainismWikipedian can reconstruct his thought processes for us. --RichardW57m (talk) 10:46, 5 September 2023 (UTC)[reply]
@RichardW57m Take a look at the तप् (tap). I redid the page and most of the roots to reflect an instance of doing this properly. Everything is categorized properly.
@Benwing also, let me know what you think of this styling for Sanskrit roots. If it's good, most of the Sanskrit root pages would need to be manually edited based on a dictionary like Monier-Williams to reflect it. Dragonoid76 (talk) 08:15, 8 September 2023 (UTC)[reply]
@RichardW57m: I very much dislike the concepts of derivative roots as preposition plus root when such composition is fairly obvious. Eliminating this will take some thought. I do have much more to say; I have yet to sit down and address it.
Calling on the Sanskrit editors to make some contribution - (Notifying AryamanA, Bhagadatta, Svartava, JohnC5, Kutchkutch, Inqilābī, Getsnoopy, Rishabhbhat, Dragonoid76): . --RichardW57m (talk) 10:15, 11 September 2023 (UTC)[reply]
@RichardW57m where do you suggest they go then? They are derived "terms" from the base root. It seems only right that they should go there. Dragonoid76 (talk) 05:51, 12 September 2023 (UTC)[reply]
The problem is that they are not roots. The prefixes detach in Vedic Sanskrit, and I think they have peculiarities of accentuation that distinguish them from unprefixed verbs even when the prefix is not detached. They also form the absolutive differently.
As we have the senses of your example verb, which is a poor example because its forms show no interaction with semantics, on the present stem तपति (tápati), leaving the root's senses as a brief summary, there are several options:
1. Derive the present stems of the compounds from the simple verb. I haven't addressed the possible complication of the present tense of a compound verb not existing.
2. List all the compound present stems under the root, all the compound aorists, all the compound perfects, all the compound futures, all the compound passives etc.
There are numerous ramifications, which is why I haven't put a full response together yet.
I see we've hit a level of confusion between class IV तप्यते (tápyate) and passive तप्यते (tapyáte). Does the latter have a dual in the active? I also suspect that all the descendants listed under तप्यति (tapyati) actually belong under तप्यते (tapyáte) - in which case, technically, they might not be inherited!
The senses of the Class IV present seem to be a subset of the Class I present's senses. --RichardW57m (talk) 13:46, 12 September 2023 (UTC)[reply]
We haven't addressed when to make the active and when to make the middle the citation form for the present stem. There seem to be a number of verbs where Classical Sanskrit uses only the middle but epic Sanskrit also uses the active. Perhaps the solution is to have a principle that in such cases, the other should point to the citation form used for the lemma. For words occurring in Classical Sanskrit, we should normally make the 3s present of the middle form the citation form of the passive. --RichardW57m (talk) 09:19, 4 September 2023 (UTC)[reply]
@RichardW57m Based on my understanding, we should generally make the citation/lemma form the Classical Sanskrit form and make reference wherever Vedic differs. This could also be a case-by-case issue depending on the verb. Dragonoid76 (talk) 22:09, 4 September 2023 (UTC)[reply]
The issue is Buddhist Hybrid Sanskrit, where the passive usually has the active form rather than the middle form. Wiktionary isn't paper, so one wouldn't find the active form when looking up the usual middle form.--RichardW57m (talk) 08:18, 5 September 2023 (UTC)[reply]
@Dragonoid76: I quite like the derived forms as you have set it up. I am also in favour of separate pages for preposition + root since, while the composition is obvious, the meaning is not always obvious from the sum of the parts. That is presumably why MW lists them as separate entries, and if a paper dictionary does it then we have no excuse not to. —AryamanA (मुझसे बात करेंयोगदान) 22:12, 12 September 2023 (UTC)[reply]
I'd like to note that the Petersburger Wörterbuch does do it the way @Dragonoid76 is suggesting, with compounds arranged below the root. But as I understand it, compound verb stems (निष्टपति (niṣṭapati), etc.) are to get their own page anyway. I'd say since we have the abstract concept of the root, we might as well use it to its fullest extent as a hub for all derived terms. —Caoimhin ceallach (talk) 01:47, 13 September 2023 (UTC)[reply]
And come to think of it, Graßmann's dictionary of Vedic does the same. —Caoimhin ceallach (talk) 12:30, 13 September 2023 (UTC)[reply]
@AryamanA, Dragonoid76, Caoimhin ceallach: Have you noticed that MW shows its status as a quasi-root in the transliteration? For example, he transliterates संतप् (saṃtap) as saṃ-√tap. Using this convention would be consistent with the Wiktionary policy of showing accentual information, as well as tagging it as not really a root. The relevance to accentuation is mentioned in
William Dwight Whitney, 1885, The Roots, Verb-forms, and Primary Derivatives of the Sanskrit Language, Leipzig: Breitkopf and Härtel, page 6
Is perhaps a form of ā+√ṛ 'resort to, have recourse to' ; ā́ryanti in both occurrences is accented as if it contained a preposition.
Another option might be to simply use some term other than 'root', though I don't know any precedent. @Benwing2? --RichardW57m (talk) 12:33, 13 September 2023 (UTC)[reply]
@RichardW57m I don't think we have a term for this. I would call it a root compound but maybe we can call it a compound root by analogy e.g. with Category:Sanskrit compound nouns, which is defined as "Sanskrit nouns composed of two or more stems". Here, this is composed of two stems and is arguably a root. If you look in Category:Types of compound terms by language you can see we have lots of "Compound X's by language" for various parts of speech. I think if you use an etymology that reads {{af|sa|सं|तप्|pos=root}} you'll get this categorized into Category:Sanskrit compound roots, and we can modify the category system to recognize such categories. Benwing2 (talk) 20:00, 13 September 2023 (UTC)[reply]
@Benwing2 That makes sense to me. In a lot of cases, these compounds roots are "genuine" roots (as in they have derived nominal forms, various verbal tenses). There should be a param in the {{sa-root}} template that classes them as compound roots. Dragonoid76 (talk) 21:43, 13 September 2023 (UTC)[reply]
@Dragonoid76: But it seems they have different accentuation rules, and they also usually exhibit separation in Vedic Sanskrit. The synthetic perfect is also rather different. They need to be tagged as different. --RichardW57m (talk) 15:40, 14 September 2023 (UTC)[reply]

Usage of dated/archaic/obsolete with Bulgarian terms[edit]

I've recently read Wiktionary:Obsolete and archaic terms, and I believe the way we've been applying the distinction between dated, archaic and obsolete for Bulgarian terms is likely incorrect.

The main normative Bulgarian dictionary is published by the Bulgarian Academy of Sciences. It covers vocabulary starting with the Primer with Various Instructions published in 1824, and continuing until the present. The dictionary uses a convention for indicating that some words are older and perhaps not in active contemporary use. I'm not sure how to map that convention to Wiktionary's divisions between dated, archaic and obsolete, so I'll translate the convention here and ask for your input.

  • marker остар. - words whose referents/realia have a different name, phonetic variant, or pattern of word formation (e.g. affixes) in the contemporary language. They are included in the Dictionary according to their prominence in older literature.
    • also applied to: words that are becoming part of the language's "passive vocabulary " due to political, economical and cultural change - e.g. 1) old names of institutions, jobs, honorary titles, educational concepts; 2) names of social and political movements from around the time of Bulgaria's liberation (1870s), and 3) names of certain social organizations, political parties and concepts, as well as their members and supporters, from the recent or further past
    • also applied to: "Church Slavicisms" - only those that were used in Revival (19th c.) literature and contributed to the formation of the modern literary Bulgarian language. Those tend to be 1) words found in significant Revival works that had viable modern synonyms even back then, or 2) words used by Revival and later authors for stylistic purposes.
  • marker старин. - archaisms found in the works of Revival and contemporary authors and used for stylistic purposes.

This is how I'd render those in Wiktionary nomenclature:

  • base use of остар. - archaic
    • "passive vocabulary" use - dated (requires personal judgment)
    • "Church Slavicism" use - archaic
  • старин. - archaic

I don't think that any of the word categories meet the bar for obsolete, because 1) the dictionary only goes back to the 1820s, and 2) words are chosen based on their attestation in non-marginal literature from that period. It's true that some words are perhaps known only to a handful of speakers with an interest in 19th century literature, but I feel like if those were English words, they'd still be classified as archaic rather than obsolete. I might be wrong.

Let me know what you think! Thanks a lot! Pinging @Benwing2, Kiril kovachev, but others are welcome to chime in.

Chernorizets (talk) 05:47, 22 August 2023 (UTC)[reply]

"Only" going back to 1820 is more than enough time for obsolete. Obsolete has more to do with recognizability and if the term sees use at all. A word can drop out in a matter of a few years, but more likely decades. 2 centuries is plenty of time. You might want a special label called Church Slavicism. Vininn126 (talk) 06:54, 22 August 2023 (UTC)[reply]
@Chernorizets User:Vininn126 is correct. There are even 20th century words that are now obsolete. sulphite "a person who is spontaneous and original in thought and conversation" is one such. We have quotes as late as 1922 but this word is now completely fallen out of even the passive knowledge of most English speakers, which is the criterion for an obsolete word. Similarly, the common 19th century term "nervo-bilious" is not found even in Wiktionary or any other modern dictionary and hence is totally obsolete. (See Phineas Gage, who was unfortunate enough to get an iron rod exploded through his head, leaving him still able to function but with profound personality changes:
Physician John Martyn Harlow, who knew Gage before his accident, described him as "a perfectly healthy, strong and active young man, twenty-five years of age, nervo-bilious temperament, five feet six inches [1.68 m] in height, average weight one hundred and fifty pounds [68 kg], possessing an iron will as well as an iron frame; muscular system unusually well developed‍—‌having had scarcely a day's illness from his childhood to the date of [his] injury".[H] (In the pseudoscience of phrenology, which was then just ending its vogue, nervo-bilious denoted an unusual combination of "excitable and active mental powers" with "energy and strength [of] mind and body [making] possible the endurance of great mental and physical labor".)​​[M]
) Benwing2 (talk) 07:56, 22 August 2023 (UTC)[reply]
@Chernorizets That said, I greatly appreciate your willingness to figure this stuff out. In Russian when translating устар. we tend to render it as "dated" when probably "archaic" would be better, but it's hard because the sort of distinctions made by English dictionaries between dated, archaic and obsolete aren't always very clearly distinguished in Russian dictionaries. Benwing2 (talk) 08:01, 22 August 2023 (UTC)[reply]
Seconded - there are similar issues sometimes with Polish dictionaries, notably PWN and "arch". Vininn126 (talk) 08:02, 22 August 2023 (UTC)[reply]
This reminds me how in Spanish dictionaries a word labelled as "anticuado" can mean either "dated" or "archaic" (an obsolete word is referred as "[en] desuso", so in that case there is no confusion). Rodrigo5260 (talk) 04:31, 23 August 2023 (UTC)[reply]
@Rodrigo5260 Ojalá que tuvieramos algo parecido a "en desuso" en nuestros diccionarios, pero desafortunadamente no. Chernorizets (talk) 06:53, 23 August 2023 (UTC)[reply]
@Benwing2 @Vininn126 it's not actually hard for me to agree with you. Just today I added a few words from the Bulgarian requested entries page, and as an educated Bulgarian speaker who has read 19th century lit for school and for fun, I'd never heard them before.
One good thing about the BG dictionary is that it provides quotations. So how about this:
  • words marked остар. with quotations either mostly or entirely from the period (arbitrarily) ending in 1918 (end of WWI, ~100 years back) - obsolete. Exceptions granted to historical terms and concepts that we still study about in school. The goal is to draw an invisible line around "stuff the average educated person of today will probably not recognize", and make that line consistent across editors.
  • words marked остар. related to Bulgaria's communist period (1945 - 1989) that are still fairly recognizable, but no longer culturally relevant anymore - dated
  • everything else under остар. - archaic
  • старин. maps to either obsolete or archaic according to the same rules as for остар..
Chernorizets (talk) 08:27, 22 August 2023 (UTC)[reply]
I think this isn't bad - you have to use your intuition as well as I'm sure there will be exceptions to these rules. I also still wonder if a category for Church Slavicism would be useful. Vininn126 (talk) 08:29, 22 August 2023 (UTC)[reply]
@Vininn126 oh, it won't be perfect for sure, and some individual judgment will be necessary. However, considering that today we often render остар. as dated rather than archaic or obsolete, it would hopefully be an improvement.
As for a "Church Slavicism" category or label - personally, I don't think it's either feasible or helpful. Church Slavicisms aren't marked as such in the dictionary, and it would be prohibitively difficult IMO to chase that information down. Besides, many terms that were re-borrowed into the modern language from Church Slavic live healthy, productive lives in the present, since they form a good chunk of the literary register. That's one of the factors contributing to the amount of lexical overlap between the higher registers of Russian and Bulgarian. Rather than create a sui generis parallel Church Slavic dictionary piggybacking on Bulgarian, I'd just take advantage of situations where the Bulgarian Etymological Dictionary provides that kind of detail, and put it in the relevant etymology section.
Chernorizets (talk) 08:48, 22 August 2023 (UTC)[reply]
In that case all seems to be a big improvement. Vininn126 (talk) 08:49, 22 August 2023 (UTC)[reply]
@Chernorizets I've always struggled to find the right label for "остар."... and confounded more by my blind assumption that "старин." meant "old-fashioned", and therefore "dated". I appreciate these definitions, though, I would be happy to abide by them. I notice, though, that nowhere in this framework is there a place for the marking of dated: what judgement should we employ to distinguish something dated from something properly archaic or outright obsolete? Is there ever a case when we should read any of those Bulgarian-language labels as "dated"? Or just stick to "archaic" and "obsolete"? Kiril kovachev (talkcontribs) 10:56, 22 August 2023 (UTC)[reply]
@Kiril kovachev Have you read the glossary explanations? Part of the issue is that each label is marked. "Dated" words have a certain "feel", archaic their own, and obsolete are not recognized. Vininn126 (talk) 11:06, 22 August 2023 (UTC)[reply]
@Vininn126 I did just now; I see what you mean; it's just that getting the right "feel" can surely be difficult at times. To know whether something's dated or archaic, moreover, you probably need to understand whether older people would still use the words in question or not. The only Bulgarians of that generation that I know are my grandparents, so judging that is very hard for me. The delineation of Communist-era terms as being "dated" is sufficient for me, since at least there is then a basis for what you can comfortably call dated. Indeed many people who lived through those times are now around, and are the only people still using such dated terms. Kiril kovachev (talkcontribs) 11:23, 22 August 2023 (UTC)[reply]
@Kiril kovachev Dated is in there :-)
  • words marked остар. related to Bulgaria's communist period (1945 - 1989) that are still fairly recognizable, but no longer culturally relevant anymore - dated
The definition should be broadened to not just focus on the communist period. The examples given are words like поручик, фелдфебел, текезесар, политпросвета. Chernorizets (talk) 11:07, 22 August 2023 (UTC)[reply]
@Chernorizets Woop, sorry, I don't know where I was looking! Lol. Thanks for clearing that up. Kiril kovachev (talkcontribs) 11:19, 22 August 2023 (UTC)[reply]
@Chernorizets Additionally, about broadening the definition: I am happy to accept any enlarged definition, but I believe I'm not an adequate candidate for broadening it myself... the communist era was a while ago now, and per my above reply I find that it's probably the point in which many now-"dated" terms were in active use: exactly people from that era are now elderly, and so the user-base of such words is now getting constrained. Hence it seems adequate that terms from that time be called "dated". On the other hand, some of those terms are historical, since they are just no longer relevant post-communism.
My main shortcoming in this is that I'm not familiar with the older times, and mostly just know (vaguely) modern-day literature and vulgar language. Thus I'm not in any position to give a worthy opinion of what counts as what with regard to these categories... If you or @Bezimenen or other more educated fellows are able to define these things, I'm happy to follow :) Kiril kovachev (talkcontribs) 11:30, 22 August 2023 (UTC)[reply]
@Benwing2, Kiril kovachev, Vininn126 I've written up the provisional criteria for which label to use in Wiktionary:About Bulgarian#Marking a word dated, archaic or obsolete. The examples used are taken from the Dictionary itself when discussing the categories of included vocabulary. Lemme know if it looks good enough - I don't claim it's perfect. Chernorizets (talk) 12:21, 22 August 2023 (UTC)[reply]
Looks nice to me! Vininn126 (talk) 12:32, 22 August 2023 (UTC)[reply]
@Chernorizets Brilliant summary imo ^^ nice work again! Kiril kovachev (talkcontribs) 15:27, 22 August 2023 (UTC)[reply]

More languages in quotes[edit]

@RichardW57, Al-Muqanna I am planning on implementing some extra fields:

  1. |origlang=: The original language of the quotation, when the quotation given is a translation from some other language. Consider for example this:
    {{quote-book|en|year=1912|author={{w|Fyodor Dostoevsky}}|origlang=ru|tlr=[[w:Constance Garnett|Constance Garnett]]|title=[[s:The Brothers Karamazov/Book V/Chapter 5|The Brothers Karamazov]]|section=part II, book V, chapter 5|passage=“Is it simply a wild fantasy, or a mistake on the part of the old man — some impossible '''quid pro quo'''?”}}
    Here the quotation is in English but the original is in Russian. The "in Russian" will appear somewhere; still working that out. Sometimes that original title is given using e.g. |original=Братья Карамазовы, in which case translation of Братья Карамазовы (in Russian) could be given. If not, maybe either translation of original (in Russian) or just (original in Russian) can be given. This situation is extremely common, and comes up nearly any time we have a translation as a quotation used to illustrate a (usually English) term.
  2. |origworklang=: The original language of the work as a whole, in cases where a work is translated but the quotation itself is not a translation. This happens in a case like this:
    {{quote-book|az|tlr=Məmməd Qocayev|title=Cinayət və cəza|trans-title=Crime and Punishment|author=Fyodor Dostoyevsky|origworklang=ru|year=2004|chapter=Cinayət və onun cəzası|trans-chapter=The crime and its punishment (foreword)|publisher=Öndər Nəşriyyət|location=Baku|url=http://anl.az/el/latin_qrafikasi/kae/dfm_cc.pdf|text=Fyodor Mixayloviç Dostoyevksinin "Cinayət və cəza" [[roman]]ı XIX [[əsr]]in 60-cı [[il]]lərində (1866) [[meydan]]a [[gəlmək|gəlmişdir]] [[və]] [[rus]] [[milli]], [[mədəni]], [[ictimai]] [[tarix]]inin [[mürəkkəb]] [[dövr]]lərindən [[birini]] [[əks etdirmək|əks etdirir]].|t=Fyodor Mikhailovich Dostoevsky's novel "Crime and Punishment" appeared in the 60s of the XIX century (1866) and reflects one of the most complex periods of Russian '''national''', cultural and social history.|page=4}}
    Here, we have an Azerbaijani translation of a Russian work, again by Dostoyevsky, but the quotation comes from the foreward, which is written in Azerbaijani and not a translation of anything. I still think it's useful to indicate the language of the original work as a whole; maybe it will display as (original work as a whole in Russian).
  3. |origtext= (in {{quote-book}} et al.; |orig= in {{usex}} and {{quote}}): The original-language version when a translated quotation in a non-English language is used to illustrate a (usually non-English) term. In such a case, there are three languages and three versions of the quotation involved: (A) the original-language version; (B) the version translated into the language of the term in question (very rarely, it's possible for the term and quotation lang to differ in such a scenario, but let's not worry about that for the moment); (C) the translation of (B) into English. An example:
    {{quote-book|ota|title=ota:ادارهٔ حرب و سیاست<tr:Idare-i harb ve siyaset>|newversion=transation of|year2=1922|location2=Berlin|title2=Kriegsführung und Politik|2ndauthor={{w|Erich Ludendorff}}|publisher2=E. S. Mittler & Sohn|location=Istanbul|year=1922|publisher=ota:مطبعهٔ عثمانیه|page=[https://babel.hathitrust.org/cgi/pt?id=uc1.aa0006749931&view=1up&seq=182 176], originally [https://archive.org/details/kriegfhrungund00lude/page/199 199]|passage=شرقی غالیچیا، و بوقوونیاده شمندوفر نقاباتی تأسس ایتدكدن صكره شرقده '''صلحی''' تأمین ایچون دینیه‌ستر نهری جنوبندن روس — رومن جبهه‌سنه قارشو حربه دوام ایتمك فكرندن فراغت ایتمك لازمكلیورایدی. بر قصقاچ تأثیری یاپه‌بیلمك ایچون بری قلاص غربندن یوقاره‌سه‌ره‌ت اوزرینه و دیكری دینیه‌ستر جنوبندن اولمق اوزره مضاعف بر تعرض اجراسی لازمكلیور ایده.|tr=Şarki Galiçya, ve Bukovinyada‌ şömendüfer nakabatı tesis itdikden soñra şarkda sulhı temin içün Dinyester nehri cenubundan rus-rumen cebhesine karşu harbe devam itmek fikrinden feragat itmek lazım geliyor idi. Bir kıskaç tesiri yapabilmek içün biri Kalas garbından yukara-yı Seret üzerine ve diğeri Dinyester cenubundan olmak üzere muzaaf bir taarruz icrası lazım geliyor idi.|translation=Unsere weitere Absicht, nach Herstellung der Eisenbahnverbindungen in Ostgalizien und in der Bukowina zur Erzwingung des Friedens im Osten den Feldzug gegen die russisch-rumänische Front südlich des Dnjestr fortzusetzen, mußte aufgegeben werden. Er sollte in einem doppelten Angriff über den oberen Sereth westlich Galatz und hart südlich des Dnjestr bestehen und auch zangenartig wirken.<p>Our further intention, the military campaign against the Russian-Romanian front to be continued south of the Dniestr to force '''piece''' in the east, after procurement of the railroad connections in Eastern Galicia and in the Bukovina, had to be abandoned. It was supposed to consist in a double attack over the upper Siret west of Galați as well as hard south of the Dniestr and go like pincers.</p>}}
    Here, the original book is Kriegsführung und Politik ("Warfare and Politics") by Erich Ludendorff in German. It seems it was published around 1922, and translated that same year into Ottoman Turkish as ادارهٔ حرب و سیاست [Idare-i harb ve siyaset, Administration of War and Politics] by someone or other, and a quotation from this version is used to illustrate the Ottoman Turkish word صلح (sulh). As can be seen in the {{quote-book}} above, where I converted it to |newversion= style but left the quotation as-is, the "translation" actually contains both the German original and the English translation (which I suspect is based off of the German rather than the Ottoman Turkish). I'm planning on adding a field for the original in this circumstance, manually displayed in brackets like this:
    [original: Unsere weitere Absicht, nach Herstellung der Eisenbahnverbindungen in Ostgalizien und in der Bukowina zur Erzwingung des Friedens im Osten den Feldzug gegen die russisch-rumänische Front südlich des Dnjestr fortzusetzen, mußte aufgegeben werden. [] ]
    The order would be the quotation followed by the original text followed by the translation. The original text field can also be used when a work was translated from English to some other language and the translation is used to illustrate a word in a foreign language. That happens frequently in Portuguese with Harry Potter works; it seems some Brazilian contributor just loves Harry Potter, and frequently uses quotes from the Brazilian translation of one or another Harry Potter book. Here I think it's better to put the English original in the |origtext= field, since it's not properly a "translation" of the Brazilian Portuguese quotation. Similarly if a translation of a work into English is used to illustrate an English word, we can put the original text in the |origtext= field.

Thoughts? Benwing2 (talk) 07:47, 22 August 2023 (UTC)[reply]

Looks excellent to me. I know I could definitely make use of the |origlang= parameter fairly regularly, and the others look like progress too. Kiril kovachev (talkcontribs) 11:01, 22 August 2023 (UTC)[reply]
Looks good to me too, thanks for your work on this Benwing. —Al-Muqanna المقنع (talk) 11:03, 22 August 2023 (UTC)[reply]
Seems like a thorough solution. Fay Freak (talk) 12:58, 22 August 2023 (UTC)[reply]
Looking at the example for |origworklang=, namely:
2004, Fyodor Dostoyevsky, “Cinayət və onun cəzası [The crime and its punishment (foreword)]”, in Məmməd Qocayev, transl., Cinayət və cəza [Crime and Punishment]‎[5], Baku: Öndər Nəşriyyət, translation of original (overall work in Russian), page 4:
Fyodor Mixayloviç Dostoyevksinin "Cinayət və cəza" romanı XIX əsrin 60-cı illərində (1866) meydana gəlmişdir rus milli, mədəni, ictimai tarixinin mürəkkəb dövrlərindən birini əks etdirir.
Fyodor Mikhailovich Dostoevsky's novel "Crime and Punishment" appeared in the 60s of the XIX century (1866) and reflects one of the most complex periods of Russian national, cultural and social history.
the following thoughts occur:
We should have, I presume, |main author=Fyodor Dostoyevsky|author=Məmməd Qocayev. Shouldn't the main author's name be in English, so 'Dostoevsky', not 'Dostoyevsky'? Also, shouldn't we have |chapter=Foreword? Perhaps a quoted footnote would give a better example. Perhaps a footnote to a book of the Apocrypha?
If |origworklang= serves no purpose, why add it? The parameter |tlr= also seems redundant - is it intended to help locate a copy of the quoted foreword?
We potentially have a legal issue with the lack of a way of crediting the translator of the quotation. --RichardW57m (talk) 16:38, 22 August 2023 (UTC)[reply]
Looking at the pdf, the chapter title is specifically as given above, not just "Foreword", and Dostoyevsky is a normal rendering of his name in English, just not Wikipedia's preference. —Al-Muqanna المقنع (talk) 17:04, 22 August 2023 (UTC)[reply]
Hmm. If we quote a translation of NT Greek into Northern Thai, should we be offering translations of both the NT Greek and the Northern Thai? What about NT Greek into early Modern English? --RichardW57m (talk) 16:38, 22 August 2023 (UTC)[reply]

Really great work on the quote templates, thanks a lot. I have a couple of loosely related things in mind that might be worth standardizing (although I'm not sure whether they need separate parameters).

  • When illustrating a foreign term, if we give the corresponding passage from the English (literary) translation of the quoted work in |translation=, how should we indicate the name of the translator? Two different approaches can be seen at sóher and iskola. Should we also indicate the bibliographic details of the English edition?
  • When illustrating a foreign term (either with an original work or with a translation from another foreign language), what should go in |trans-title= if the English translation has a different title from the literal translation of the quoted work? An example is őrizkedik (sense 2), where the Hungarian translation (Csikóéveink, literally “Our Foal Years”) of Merle's novel En nos vertes années (literally “In Our Green Years”) is quoted, which was also translated into English as City of Wisdom and Blood. If we want to include each version, the output could be something like this: Csikóéveink [Our Foal Years] [] translation of En nos vertes années [In Our Green Years, translated into English as City of Wisdom and Blood] (in French).

These might look like special cases, although they are probably not too uncommon in non-English entries. Einstein2 (talk) 18:13, 22 August 2023 (UTC)[reply]

Yes, if I quote a Qurʾān translation in German then I add the Arabic original and the English translation also has a translator. Even in the individual cases where the Arabic is quoted in Arabic entries and we only have Arabic and English text and no third one then the translator can vary case by case. This is a separate issue not yet implemented, as I understand. I do it like at können or أَنْجَى (ʔanjā) which looks neat to me but a hack. Fay Freak (talk) 20:57, 22 August 2023 (UTC)[reply]
@Einstein2 These are good questions. I would say, it's optional to give the bibliographic details of both original and translation. There are (at least) two ways of indicating a translation; one is with the |original= and |by= fields, which let you indicate the original author and title and otherwise give the bibliographic details of the translation (although I haven't yet worked out whether it makes more sense to put the original author in |author= or in |by=); the other is with |newversion=, along with |2ndauthor=, |title2=, |location2=, etc. which let you give all bibliographic details of the original and translation. In the case of sóher, one possibility is to put the more literal translation in the |lit= field and the idiomatic translation in |t=; the literal translation is of course needed because the term in question is rendered idiomatically as "stone broke" instead of literally as "pauper". Not sure how to indicate the translator of the literal version; the solution given is probably OK till we figure out a better one. Your case of En nos vertes années is interesting. You can now attach a translation of the original title using |original=En nos vertes années<t:In our Green Years> but I don't have a solution yet for giving both a literal and idiomatic rendering. Benwing2 (talk) 21:41, 22 August 2023 (UTC)[reply]

"corruption of" in Etymologies[edit]

There are more than 1100 occurrences of 'corruption of' in entries, about half in Etymology sections. Is there some technical meaning of this? Or is it just a way of privileging classical, standard or formal, and mainstream language usage over current, dialectal, and informal usage? DCDuring (talk) 15:56, 22 August 2023 (UTC)[reply]

It seems to be a catch-all for nonstandard changes being made. Off the top of my head, rotor is a corruption of rotator, because it’s an irregular elision. We shouldn’t use it purely because something is dialectal/colloquial/informal, because those kinds of changes can be regular as well. Theknightwho (talk) 16:00, 22 August 2023 (UTC)[reply]
I seem to recall seeing something similar in etymological dictionaries as well. Vininn126 (talk) 16:04, 22 August 2023 (UTC)[reply]
It's deprecated now in technical works, and should be replaced by whatever specific change is being described (e.g. clipping, apocope, etc.). In old works pretty much any sound change can be described as a "corruption". —Al-Muqanna المقنع (talk) 16:16, 22 August 2023 (UTC)[reply]
I would hope so. In the most charitable view it seems dated. DCDuring (talk) 16:19, 22 August 2023 (UTC)[reply]
If I can't figure out exactly what change has been made, I'll change "corruption of" to "alteration of", which is just as vague but less judgmental. —Mahāgaja · talk 17:18, 22 August 2023 (UTC)[reply]
A corruption is an "irregular change". Clippings, apocopations, and dissimilations are more specific and even sometimes predictable. (@DCDuring you calling me out? LMFAO) -- Sokkjō 04:14, 23 August 2023 (UTC)[reply]
IMO, the pejorative connotations make the word an undesireable way of communicating "irregular change", even worse than some technically correct potential substitute terms that will discourage normal users. It does seem hard to be both technically correct and understandable to a non-technical potential user population. DCDuring (talk) 13:59, 23 August 2023 (UTC)[reply]
It's not really a term used in any serious linguistic work these days (except in very specific contexts such as textual corruption). I would support its general replacement with more precise terms. Nicodene (talk) 16:23, 22 August 2023 (UTC)[reply]
In textual criticism even that is now deprecated in favour of "secondary readings" (an entry we don't have). —Al-Muqanna المقنع (talk) 16:59, 22 August 2023 (UTC)[reply]
I doubt this. In some circles only somebody may have tried to push the euphemism treadmill further. If someone still edits a text with Latin commentary then it will have to be corruptio, corruptēla, and they still teach to young German philologists the calques verderbt and entstellt. Latinisms have been conserved for eternity here. What is disapproved of in linguistics is the term Verfall, Sprachverfall. Korruption has never found as broad abuse as in such English-language etymologies though (another stubby foreign-language gloss in Wiktionary). It would probably be okay to write Entstellung in these places if you don’t want anything more specific. Fay Freak (talk) 20:45, 22 August 2023 (UTC)[reply]
Feel free to doubt, but that is how it is in contemporary English-language textual criticism at least: "Because it posits nothing about the correctness of a reading, the term “secondary reading” obviates the dispute and the distinction between variant and error" [6], "In textual criticism one should the term "secondary (or derived) reading" instead of 'error'" [7] etc. It is pretty easy to find authors who use both "secondary reading" and "corruption" nonetheless, sometimes even in the same sentence, so the value-judgement thing has obviously not stuck yet. —Al-Muqanna المقنع (talk) 20:54, 22 August 2023 (UTC)[reply]
These texts indicate it is not a synonym. But a different point of view. Apart from value judgment one can still just make a claim about originality, and the fashion in which the original has been converted, which is indeed expressed by the term as vaguely as intentionally. Maybe due to the principle of charity one should assume, in an etymological or textual context, that it is not actually a bias or value judgment that the author wanted to express, by using loaded language. Together with a new term not actually having the same meaning this is a reason why a term that was coined to avoid value implications rather than attain conceptual progress tends to not catch on. There is no clearcut solution for all these etymologies. DCDuring asked if there is a technical meaning: There was a technical understanding, but not too specific a meaning, but no, (hopefully) no privileging. Fay Freak (talk) 21:09, 22 August 2023 (UTC)[reply]
So I should feel free to find less pejorative and preferably more precise substitute terms, at least in those cases where I am not in over my head. DCDuring (talk) 00:40, 23 August 2023 (UTC)[reply]
I have nothing against "corruption of", and have used this on occasion. It can occur in place names, for example. The Oxford Dictionary of English has for corruption: "[count noun] the process by which a word or expression is changed from its original state to one regarded as erroneous or debased: a record of a word's corruption | the term 'hobgoblin' is thought to be a corruption of 'Robgoblin'." DonnanZ (talk) 08:19, 23 August 2023 (UTC)[reply]
That definition exactly rephrases @DCDuring's description above: "privileging classical, standard or formal, and mainstream language usage over current, dialectal, and informal usage". —Al-Muqanna المقنع (talk) 08:27, 23 August 2023 (UTC)[reply]
You could say, in one instance, Sugar Creek is a corruption of Sugaw Creek, rather than an alt form. I wonder which name appears on maps. DonnanZ (talk) 09:48, 23 August 2023 (UTC)[reply]
As often, the quotes given by the OED do not attested the definition given. Usually DCDuring is even more critical than anyone else about such cases. In that sentence with hobgoblin I would exactly not read it as being “regarded as erroneous or debased”; Sokkjō instead accurately indicated what it means. And then they don’t even define it all by using the definiendum in the definition while they should explain what the word means in a specific context—we already knew that this sense is “a record” of a general idea of corruption. Fay Freak (talk) 20:02, 23 August 2023 (UTC)[reply]
(The Oxford Dictionary of English is not the OED, which has a different definition and set of citations. —Al-Muqanna المقنع (talk) 20:19, 23 August 2023 (UTC))[reply]
Yes. A maiore ad minus what is at fault with the OED is also with the ODE (an unintuitive abbreviation and it’s their fault), since it is not independent—an impression even which they sought to avoid with this branding. Surely the OED is the first dictionary ODE editors look into for inspiration. Fay Freak (talk) 21:02, 23 August 2023 (UTC)[reply]
With 2069 numbered pages, the ODE is a quite heavy tome. DonnanZ (talk) 21:31, 23 August 2023 (UTC)[reply]

"There are more than 1100 occurrences of 'corruption of' in entries, about half in Etymology sections. Is there some technical meaning of this? Or is it just a way of privileging classical, standard or formal, and mainstream language usage over current, dialectal, and informal usage?" YES, love this thought. --Geographyinitiative (talk) 08:24, 23 August 2023 (UTC)[reply]

I value recognition of 'standard' vs. non-standard in spellings, pronunciations, and definitions for the benefit of some common, well, standard for a language, but 'corruption' seems like a pejorative too far. DCDuring (talk) 13:40, 23 August 2023 (UTC)[reply]

North Korean pronunciations[edit]

I notice that even for North Korean words like the North Korean name of Korea, we only give the South Korean pronunciation. Even for words like 녀자 or where other references discuss how different the Northern pronunciation is from the Southern pronunciation (여자, ), it appears as if we're labelling the Northern pronunciation as a "South Korean Standard" pronunciation...? Do we have any interest in adding North Korean pronunciations? The only entry I can find which has one is 원쑤. - -sche (discuss) 17:41, 22 August 2023 (UTC)[reply]

@-sche: Hi. The Korean pronunciation is mostly phonetic, even if there are predictable and less predictable phonetic changes in both South and North, the differences go hand-in-hand with the spelling. Labelling the regional pronunciation on 원쑤(怨讐) (wonssu) is redundant (the definition line is siufficient), since, the geminated [s͈] is reflected in the gemination of jamo (s), which makes the spelling different from the South Korean 원수(怨讐) (wonsu). Notice the transliteration too, differs in the number of s's.
It's the same with 녀자(女子) (nyeoja) (North) vs 여자(女子) (yeoja) (South). The former starts with (n), the latter starts with (which make no consonant at the beginning of a syllable). They say that some North Koreans (maybe speaking a similar dialect to that of South Koreans?) still have trouble pronouncing clearly [n] of "i" or iotatted vowels or [ɾ] (r) in the beginning of a word but our systems renders North Korean accents as they are supposed to be and phonemically.
Compare also 리유(理由) (riyu) (North) vs 이유(理由) (iyu) (South), with the North Korean [ɾ] in "unpronouncable" positions. Anatoli T. (обсудить/вклад) 03:15, 23 August 2023 (UTC)[reply]
@-sche: In short, difference in the orthography is mostly sufficient, since words that are pronounced differently, are also spelled differently. Anatoli T. (обсудить/вклад) 03:19, 23 August 2023 (UTC)[reply]
But isn't it inaccurate that we're telling readers that people in Seoul pronounce "reason" [ɾi(ː)ju]? We explicitly {{accent}}-label that as a South Korean, Seoul pronunciation, but... isn't the point that it's actually a North Korean pronunciation that people in the South/Seoul don't use? - -sche (discuss) 03:35, 23 August 2023 (UTC)[reply]
@-sche: South Koreans don't use 리유(理由) (riyu), they use 이유(理由) (iyu) but they "describe" the North Korean word (both the spelling or the pronunciation) with the same (North Korean) pronunciation. The initial "ri" would still be used in the south for loanwords, as in 리스본 (Riseubon, Lisbon). Isn't it what you're asking is something like describing how (north) Germans pronounce an Austrian word that's never used in Germany? Anatoli T. (обсудить/вклад) 04:20, 23 August 2023 (UTC)[reply]
Our entries say (and as far as I can tell this is broadly correct) that South Koreans write the word for girl "여자", and pronounce it /j-/. Our entries also say North Koreans write it "녀자". So far, so good, right? But according to other reference works, it's North Koreans who pronounce the word /ɲ-/ (agreeing with their spelling) ... whereas according to our entry, /ɲ-/ is the South Korean Standard pronunciation, the pronunciation used by people who speak the Seoul dialect/accent... which AFAICT is wrong, since people in Seoul actually pronounce the word for girl with /j-/ not /ɲ-/, and /ɲ-/ is a North Korean pronunciation. Hence, it seems to me as if your last sentence has it backwards; "describing how Germans pronounce an Austrian word that's never used in Germany" is what our entries are currently doing — they have "Seoul" pronunciations that are based on "well, if people with a Seoul accent didn't speak with a Seoul accent and instead spoke with a Pyongyang accent, this is how they'd say it" but then they present that as a (Seoul)-accent pronunciation even though AFAICT it's never used in Seoul; I'm suggesting we take the more obvious route of labelling the Pyongyang pronunciation as a Pyongyang pronunciation. - -sche (discuss) 14:00, 23 August 2023 (UTC)[reply]
I don't know Korean, but my questions are: (1) Are there words spelled the same in both countries but pronounced differently? (2) If a South Korean were to read aloud a text written in North Korea and encountered a word that is spelled and pronounced differently, such as 리유 (riyu) or 녀자 (nyeoja), how would the South Korean read that word? —Mahāgaja · talk 14:53, 23 August 2023 (UTC)[reply]
@Mahagaja: If they know the spelling was intentional to render North Korean, they would try to read as northerners. They just find these pronunciations hard or awkward. "ri" and "ni" do also occur in loanwords and southerners can handle them.
There are some minor differences, such as the vowel (eo) but there's no agreement on if they are real.
There are differences in the pitch accents but they are handled only for one specific South Korean dialect. Anatoli T. (обсудить/вклад) 22:35, 23 August 2023 (UTC)[reply]
@-sche: OK, I agree. I think there should be an option to remove "SK Standard/Seoul" label (before the IPA) on entries that are North Korean only and or/add a North Korean label. It's more of a technical issue, no knowledge of the North Korean would be required. Anatoli T. (обсудить/вклад) 22:27, 23 August 2023 (UTC)[reply]
(Notifying TAKASUGI Shinji, Atitarev, HappyMidnight, Tibidibi, Quadmix77, Kaepoong): I have my own opinions on this, but they're not yet well-formulated so I'll be back later on. AG202 (talk) 14:41, 23 August 2023 (UTC)[reply]
@AG202: Do you have any input on the matter? Korean modules may use some improvements. Anatoli T. (обсудить/вклад) 08:03, 25 August 2023 (UTC)[reply]

Label (and perhaps add more) obsolete English genitives[edit]

Following this and this, we have a few scattered entries for apostropheless English genitives like kinges and kingis, but those aren't currently labelled at all: they should be labelled obsolete, and any like kings that are still found today should at least be labelled something like "obsolete or now nonstandard". Perhaps someone can generate a list of all English entries with "genitive of..." lines, and we can apply labels... and perhaps add more such entries? (BTW, in the other direction, I see we have index's as an obsolete plural of index.) - -sche (discuss) 20:00, 22 August 2023 (UTC)[reply]

@-sche you'd probably also want to distinguish between Middle English and (Early) Modern English (ModE) for those, which can be done with the help of dated quotations. But yes, they should all be labeled obsolete. I'm not sure how much effort it would be to generate such a list, as ModE spelling was a moving target through much of the 16th and 17th centuries. Chernorizets (talk) 07:08, 23 August 2023 (UTC)[reply]
Since the possessive -'s is diachronically the same ending I am not sure how helpful—albeit technically accurate—it is to define these separately in Modern English contexts as "genitive of" as opposed to just early modern spellings of king's etc. "king's", "kinges", and e.g. "kinge his" (the his genitive) are different orthographies for the same word. —Al-Muqanna المقنع (talk) 08:35, 23 August 2023 (UTC)[reply]
I suppose you could write both, so "genitive of king: king's", "Early Modern spelling of king's: genitive of king". or something, then it's clear for readers. —Al-Muqanna المقنع (talk) 08:53, 23 August 2023 (UTC)[reply]
Sure. Indeed, if we created a specific template for this, it could generate all that verbiage and even verbiage (whether in parentheses like a label, or as part of the non-gloss) and categorization about it being obsolete, and indeed could generate an "obsolete form..." category rather than an "obsolete term" or "obsolete sense" category like {{lb}}. Or perhaps we just use {{obsolete form of}} but with a parameter like {{standard spelling of}}'s from= so we could make it display "Obsolete Early Modern English form of king's"? Now the question: for ones like mans that are still used in nonstandard English today, do we want to handle EME and modern use on the same sense line, handle them on different sense lines, or try to exclude covering modern use as much as possible? (We do run into this issue, that something is both an Early Modern and a modern mis- or even intentional spelling, in other cases too: e.g. whyte.) - -sche (discuss) 13:40, 23 August 2023 (UTC)[reply]
@-sche: N.B. just recently I created {{en-early modern spelling of}} along the lines of deprecated spelling templates for other languages, which points to Appendix:Early Modern English spellings, a page I also made a while back (a discussion of the genitive forms should probably be added there, come to think of it). I haven't gone round adding it to entries beyond a handful of examples like ouer, vnder though. —Al-Muqanna المقنع (talk) 14:51, 23 August 2023 (UTC)[reply]
Have added two paragraphs on it at Appendix:Early Modern English spellings#Genitive forms. —Al-Muqanna المقنع (talk) 21:44, 23 August 2023 (UTC)[reply]

"ゐ" and "ゑ" should be respectively "wi" and "we"[edit]

The two obsolete kana "ゐ" and "ゑ" still see occasional use (though more commonly as a stylistic variant of "い" and "え" respectively, than their intended usage). Their katakana forms are still used as intended, such as "スヰーデン" for "Sweden", "うゐりあむあだむす" for [William_Adams_(pilot)|"William Adams"], and "スラシング ヱーブ" for Haruyuki Walker's "Slashing Wave".

Moreover, the kana with the i-glide such as "ヰャ" for "wya" are incorrectly romanized as "ya".

Should we romanize "ゐ" and "ゑ" as what is originally intended, like the historical forms? Romanizing "ゐ" as "i" is just confusing.

--MULLIGANACEOUS-- (talk) 20:24, 22 August 2023 (UTC)[reply]

In スヰーデン and うゐりあむあだむす, the "w" sound is represented already by the ス and う respectively, so I don't see how they count as evidence of ヰ and ゐ being used in modern Japanese to represent a pronunciation with w on their own. The example ヱーブ doesn't have this issue. How common is the use of ヱ in place of ウェ? The transcriptions "wi" and "we" seem to have the potential of confusing readers into thinking that archaic kana spellings would be pronounced with "w" by modern Japanese speakers, which I understand not to be the case. As far as I know, in English it's more common to use romanizations that tend towards transcription rather than transliteration for Japanese.--Urszag (talk) 20:55, 22 August 2023 (UTC)[reply]
If the spelling did not contain "w" for the former two, I bet they won't use the w-row kana to transliterate to Japanese.
Although "w" is basically the consonant form of "u".
--MULLIGANACEOUS-- (talk) 02:54, 23 August 2023 (UTC)[reply]
I just realized I don't even know in what context you are talking about changing the romanization. When I look at 社員 I see "社しゃ員いん • (shain) ←しやゐん (syawin)" with a link to the Wikipedia article on historical kana; using wi in that context seems right to me. Is this what you are talking about, or is there some other place where you would like it to be romanized as "wi" but where it currently isn't? (e.g. is this just about the articles ゐ, ゑ themselves?)--Urszag (talk) 17:25, 23 August 2023 (UTC)[reply]
@Urszag See ヰスキー (isukī) as an example where it really should be wi. Theknightwho (talk) 17:32, 23 August 2023 (UTC)[reply]
Got it. Well, I agree now that wi and we seem better for transcription in that context as well, so I'd support changing it. I'm confused about where stuff like ヰャ even turns up; more context at those articles would be helpful.--Urszag (talk) 17:37, 23 August 2023 (UTC)[reply]
Yeah, I’d support changing it as well. I also think we could also set to transliterate as wo when not used in isolation, too, because that guarantees it’s not being used as a particle. We did something similar in Bulgarian recently, where the accent in ѝ (ì) is ignored for pagenames except when it’s used as a word in its own right (ѝ (ì)). This feature works in multiword terms, too. Theknightwho (talk) 17:46, 23 August 2023 (UTC)[reply]
I also support this. I additionally disagree with our romanization even for を, は, and へ on their own anyway: why is our transliteration into Latin taking into account the pronunciation? The symbols are what they are, and when reading them in Japanese you have to discern what's "ha" and what's "wa" anyway, so why don't we always transcribe characters as literally as possible? Kiril kovachev (talkcontribs) 21:31, 30 August 2023 (UTC)[reply]
We follow a modified version of the Hepburn romanization scheme, which treats particle は as wa etc. in line with how they are pronounced.
For that matter, the romanization schemes I've encountered that insist that は is always ha tend to be academic in nature and more concerned with one-to-one conversion, rather than usability for readers trying to figure out how to pronounce things. ‑‑ Eiríkr Útlendi │Tala við mig 22:00, 30 August 2023 (UTC)[reply]
@Eirikr I understand it is less pragmatic for users, pronunciation-wise, if we write the letters how they appear vs. how they're pronounced, but that is to me the most accurate way to represent the original text. を and お are not the same letter, so they shouldn't, in my opinion, be rendered the same. The distinction between は (ha) and は (wa) in a Japanese text always falls on the reader when reading in Japanese, so can we not expect people to distinguish this in English-oriented transcriptions as well? Representing these phonological features in writing means that we're duplicating content: that information should already be reflected in pronunciation sections. Meanwhile, the accurate, equivalent Roman-alphabet transcription of the original symbols is absent.
Anyway, I'm not desperate to die on this hill, I accept the status quo. These are just my idle thoughts. However. What problem the current fashion in general faces, though, is ゑ, を, etc. being transcribed according to the same principles as え, を:
  1. The page for ゑ now transcribes this character as "e" rather than "we", which is odd, considering it falls under "wa-gyō". Wikipedia and Unicode both render it as "we", which reflect that it is a distinct entity to え, even if its modern pronunciation is usually the same.
  2. The transliteration of e.g. 末子, which I created today, is quite strange, in my opinion: in the old kana orthography, it was すゑこ, and would normally be rendered as 'suweko', but because of some change or another (earlier this year it was as above), it has changed to 'sweko' My knowledge may be somewhat lacking in the accurate phonetic representation of this word whenever the historical spelling was contemporary, but this doesn't strike me as very accurate. If the cause for this change is to represent things like すゑーでん as Swēden, I certainly think the switch is unwarranted. The far more common use of these characters is historical, so their modern phonetic value is far less relevant.
The above 末子 is but one of many historical spellings that I'm sure is affected. I didn't recognize this when I posted my comment the other day, but this is imo a considerable concern. Kiril kovachev (talkcontribs) 14:29, 2 September 2023 (UTC)[reply]
The conflict here seems to be between transliteration and transcription: do we show the pronunciation for the individual kana or for the word as a whole? There are good reasons for both, but we need to be very clear about which it is we're showing. If we're showing the individual kana, we don't want the reader to interpret that as the final pronunciation- we need to avoid people saying "konnichi ha" to their Japanese friends, without implying that "は" is always the correct pronunciation of "wa". Likewise going the other way. Perhaps we need to format in clues like middots between characters, or even having both with a arrow in headwords and parentheses "ko·n·ni·chi·ha→konnichiwa". It would take up more space, but it would avoid confusion. Either that, or have one as a tooltip for the other. As for romaji entries, we need a standard note making clear that konnichi wa (konnichi wa) is based on the pronunciation and that *konnichi ha is based on the spelling. I apologize if my examples are a little off the mark, but this is one of the few Japanese phrases I know. I hope you get the idea, anyway. Chuck Entz (talk)

────────────────────────────────────────────────────────────────────────────────────────────────────Like Kiril kovachev, I'm confused about すゑこ being transliterated as sweko rather than suweko: I'm not familiar with the historical pronunciation, but would it really have been two syllables? I think the current treatment of は = wa etc. is fine and in my opinion doesn't need to be clarified any further on the main entry pages; I've never seriously studied Japanese but I remember quickly encountering this convention when I looked at some introductory material. BUt maybe it would be good for all Japanese romanizations to be followed by an unobtrusive link to an Appendix that lays out the principles that we follow, like the appendices for pronunciation that we have linked after IPA transcriptions.--Urszag (talk) 21:45, 2 September 2023 (UTC)[reply]

My feelings are generally in line with yours. The impression I’m getting from this discussion is that there generally is a consensus to make the change in relation to the archaic kana, but with special consideration for when they don’t apply. We already apply contextual logic to distinguish between (ha) and (wa) anyway, so I don’t see the issue in applying something similar here. Theknightwho (talk) 22:07, 2 September 2023 (UTC)[reply]
Re: sweko, presumably you're talking about the hhira rendering at 末子? If so, that is an error -- this would never have been realized as two-mora swe.ko. Historically, it would have instead been three-mora su.we.ko (setting aside the timing of the merger between /we/ and /je/ and then /e/). ‑‑ Eiríkr Útlendi │Tala við mig 05:06, 5 September 2023 (UTC)[reply]
@Eirikr Thanks for clarifying this, I thought it would be su.we.ko but I wasn't confident to say so outright. @Theknightwho I agree with this conclusion, as long as all the historical spellings are clarified correctly. By the way, do you know the cause for すゑこ being rendered as "sweko"? In what case would this (I assume deliberate) change from the expected suweko be used? Kiril kovachev (talkcontribs) 13:18, 9 September 2023 (UTC)[reply]
@Kiril kovachev: This is to correctly spell historical yōon involving w like (くゑ) (kweru) and (くわつ)(よう) (kwatuyou). The correct pronunciation for words like (すゑ)() (suweko) can be easily shown by adding the dot. Mcph2 (talk) 07:29, 10 September 2023 (UTC)[reply]
@Mcph2 Ah, I see, that's perfect, thank you. I fixed the entry now. I didn't know you could do the dot trick. Kiril kovachev (talkcontribs) 11:12, 10 September 2023 (UTC)[reply]

Proposal for new set: organ stops[edit]

Equi told me this'd be the best place to ask this. Would it be possible to make a category/set for organ stops (i.e. Category:en:Organ stops)? We have lots of organ stop entries such as keraulophon, gemshorn, clarabella, corno di bassetto, sesquialtera, unda maris, voix céleste, &c. This isn't really a need, just a small thing I'm sure would be helpful to more people than just me. Jodi1729 (talk) 04:54, 23 August 2023 (UTC)[reply]

@Jodi1729 Yes, this is a good idea. English tends to borrow these in an unadapted form, which can create odd quirks like Blockflöte, where German capitalisation is still used (though it’s probably citable without). Theknightwho (talk) 16:42, 23 August 2023 (UTC)[reply]

Category:X terms spelled with numbers[edit]

Module:headword I think about including superscript and subscript numbers too. [0-9⁰¹²³⁴-⁹₀-₉] Modifying code is not hard. How do you think?

PS. I also think about non-Hindu-Arabic numbers but they should be detected in each language -- to reduce cross-language error. They do have the numbers in words. Octahedron80 (talk) 02:52, 24 August 2023 (UTC)[reply]

What code where? Also, are you referring to words using digits or words composed only of digits? --RichardW57m (talk) 13:15, 24 August 2023 (UTC)[reply]
OK, I've found it now - it was a simple as finding '0-9'. And you just mean words including one of more of them.
The code might actually involve a lot of fiddly exceptions. Do we, for example, do we have any languages where tones in terms are noted with superscript digits? Subscript and superscript digits have also been used to write Sanskrit in Tamil script. For generalisation, there are several Lanna script marks which are visibly just superscript digits - MAI SAM, TONE-4 and TONE-5, and we may need to review repetition marks.--RichardW57m (talk) 13:54, 24 August 2023 (UTC)[reply]
You got wrong point. Tone (or else) marks are not determined to be numbers. Examples to be included are like H₂O, CO₂, , إم بي ٣, ᧑᧒ᦗᧃᦓᦱ which could alternatively be written in normal number, but tone marks couldn't. --Octahedron80 (talk) 02:17, 28 August 2023 (UTC)[reply]
No, I was worrying about things that could be misinterpreted as numbers. What first came to mind was superscript numbers such as Jyutping tone marks; it seems that words spelt in Jyutping are not reckoned as words. I have seen Thai transcribed using superscript numbers, but it seems that Thai transcriptions are successfully excluded from Wiktionary's main space, again dodging that potential problem. I'm not sure about Tamil-script Sanskrit written with sub- or superscript numerals to distinguish the oral stops - I haven't yet seen such entries in Wiktionary. I've a feeling there are some reduplication marks that are indistinguishable from '2' - or would you include them? I remember reading a book that abbreviated Meles meles to Meles², but that notation seems to have died out.--RichardW57m (talk) 15:34, 31 August 2023 (UTC)[reply]
In Thai, yamok ๆ used to be number ๒ and it is not considered a number todays. The 4 tone marks are also the concept of 1 (one line), 2 (two crossing lines and became cursive), 3 (shape of ๓ with tail), and 4 (four directions) but they are not actual numbers. The same logic could apply for other languages. So, a language that uses '2' as reduplication mark (like Malay) should not fall into the category. A language that uses small numbers as tone marks should not fall into the category either. They must have extra scripts to check on their own. Octahedron80 (talk) 00:20, 2 September 2023 (UTC)[reply]
Tamil-script Sanskrit with numbers is not in those two cases, that might be collected in the category. But I still wonder if it is widely written the same notation (or it only is the transcription not lemma). Is there full text of it in a book? And why don't they use local Tamil numbers? (IMO, these small numbers will mess up Tamil conjuncts a lot.) --Octahedron80 (talk) 00:50, 2 September 2023 (UTC)[reply]

Sanskrit Script Instruction[edit]

'About Sanskrit' says, in paragraphs 2 and 3 of Section 'Scripts':

"The same word in other Indic scripts may be referenced under the Alternative spellings header, see WT:ELE

"The headword/inflection line should show the Devanagari or other Indic script, with the IAST transliteration in parenthesis, with accent marks on vowels where present; example at अश्व (aśva):"

The entries ᡧᡵᡳᡳ (šrii) and ᠱᠷᢈᢈ do not comply, the terms being in the 'Manchu' and 'Mongolian' scripts respectively. Should we remove the word 'Indic' from these paragraphs? (We might want to substitute 'Asian' or 'non-European' instead.) Or should we take these words to WT:RFDN, as they do not comply? If we keep them, we also need to improve the transliteration schemes. --RichardW57m (talk) 11:57, 24 August 2023 (UTC)[reply]

@RichardW57m We should change the wording of WT:About Sanskrit, of course. What is the basis for changing the transliteration to match the Devanagari? That suggests a fundamental misunderstanding of how these scripts were used to transcribe Sanskrit. I also note the Mongolian wrongly uses , which is generally discouraged as it's seen as a simple variant. That should be corrected. I've moved it to ᠱᠷᠢᠢ (šrii). Theknightwho (talk) 21:42, 24 August 2023 (UTC)[reply]
So which way should we change the two paragraphs in About Sanskrit?
  1. Delete 'Indic'?
  2. Change 'Indic' to 'Asian'?
  3. Change 'Indic' to 'non-European'?
The second and third options maintain the discouragement of Cyrillic for Sanskrit, at the price of language more readily identified as racist. --RichardW57m (talk) 13:01, 25 August 2023 (UTC)[reply]
@Theknightwho: It seems bizarre that headword lines for Sanskrit should show the IAST transliteration (plus accent) (WT:ASA#Scripts), but references elsewhere should show a different transliteration. I don't see what the fundamental misunderstanding is; is it like transliterating from the Bengali script or from the Burmese script or from the Velthuis system to IAST? --RichardW57m (talk) 13:01, 25 August 2023 (UTC)[reply]
@Theknightwho: Please give a reference for the use of MONGOLIAN LETTER ALI GALI I being discouraged. @AleksiB 1945. --RichardW57m (talk) 13:01, 25 August 2023 (UTC)[reply]
It's not enough to quote https://www.unicode.org/L2/L2017/17333-mong-mixed-scheme.pdf; it certainly hasn't made its way into the Unicode Standard; I'd want to see evidence it was on its way there. Anyone can submit a proposal. --RichardW57m (talk) 13:17, 25 August 2023 (UTC)[reply]
I cant read the Mongol script, I just added the word śrī́ to all Sanskritic scripts from the Alternative scripts template. @Mahagaja, AryamanA, Pulimaiyi AleksiB 1945 (talk) 13:32, 25 August 2023 (UTC)[reply]
@RichardW57m That’s not bizarre at all - it merely demonstrates that the script transcribed Sanskrit in a way that carried inherent changes. Using an identical transliteration to Sanskrit would therefore be misleading at best, or else simply wrong.
I’m not interested in pedantic debates over whether Unicode have or have not formally agreed to deprecate usage. What matters are the facts of the characters themselves, and the plain fact is that the Galik form is a mere variant. Also, you seem to be implying that the Unicode Script Ad Hoc Group just like to make things up, which is pretty bizarre. Is that really what you’re saying? Theknightwho (talk) 03:56, 26 August 2023 (UTC)[reply]
L2/17333 may include work and suggestions from the Ad Hoc Group, but it doesn't bear their approval. Indeed, it not only deprecated characters, but also proposed several new characters in their place, and they haven't been promulgated yet. It's really a different encoding, one in accordance with Unicode principles. Indeed, you seem to be saying that almost (or exactly?) every previously Unicode-encoded word with unreformed Mongolian spelling from the Mongolian language should be treated as invalid! To be precise, it deprecates the Mongolian-language (mn) Mongolian-script (Mong) usage of A, E, I, O, U, OE, UE, WA and YA inter alia, proposing a more glyphic approach, with new characters for the more glyphic approach. --RichardW57m (talk) 11:13, 26 August 2023 (UTC)[reply]
To exclude ALI GALI I, we need an agreed policy, such as the one for the more extreme exclusion of Old English Latin-script wynn. And have we now started excluding Arabic script letter variants? And what about the hornets' nest of Myanmar script variants? --RichardW57m (talk) 11:13, 26 August 2023 (UTC)[reply]
@RichardW57m You've massively jumped the gun in terms of what I'm saying. I never said that we should follow the encoding given in that document, and indeed you're the one who brought it up. What I actually said was that is a stylistic variant of (i), and said that document is evidence for it. It's not comparable to wynn, where we need an agreed policy, because it's literally just the same letter, and (as that document agrees) should never have been encoded separately in the first place. Theknightwho (talk) 15:22, 26 August 2023 (UTC)[reply]

Should we delete the Tibetan half numerals?[edit]

The ༱ symbol can be seen on the right side, third symbol from the top.

I'm talking about these guys: , , , , , , , , , and . These slashed Tibetan numerals, which are purported to be representing a value one half less than their unslashed counterparts, are notoriously undocumented and of unknown provenance. I couldn't find any Google Books results which were not OCR errors. As far as I'm aware, there is only one known usage of any of these glyphs: a seven and a half skar stamp from 1918, shown on the right.

Currently these glyphs each have a Translingual section, and their only contents are {{rfdef}}s. Should we delete them until more documentation is found? or is it the consensus to just have a page for every Unicode character?

See also the discussion on deleting pages with just {{rfdef}} above. Jodi1729 (talk) 21:00, 24 August 2023 (UTC)[reply]

Well, if you go back to https://www.babelstone.co.uk/Blog/2007/04/numbers-that-dont-add-up-tibetan-half.html, you'll find reference to an eye-witness report of a slashed one. Thus, we have on the balance of probability, evidence of two of them. From the original proposal, we have evidence of a digit-modifying diacritic that "reduces a value by half", whatever that means. Unfortunately, it is not the diacritic that is encoded, but the precomposed form.
There's currently a moratorium on modifying one character letter-like things, so we can't progress this beyond discussion. I would note that the usage note on (7.5) is currently wrong - this is adequately documented!
We've got two combinations that have been seen. The other eight are like inflections of rare words. They're regularly formed, but are so rare that we have little hope of actually encountering them. The precedent for inflected forms is to allow regular forms we have no reason to disbelieve, so I suggest we document what is known of their meaning, and slam {{LDL}} on them.
Actually, for the characters in isolation, we do seem to know their meaning. What is uncertain is the meaning or validity of the sequence of digit and then slashed digit. I think we have enough information to allow the slashed digits. Finally, if we add notes that discourage their use for unhidden communication, Wiktionary is more useful with them than without them. --RichardW57m (talk) 13:53, 25 August 2023 (UTC)[reply]
Alright, thank you. As for the 7.5 glyph's usage note being wrong, can you link some of the documentation you mentioned which debunks it? It's not that I don't believe you, I'm just curious. ~ Jodi1729 。・:*:・゚★,。 23:52, 25 August 2023 (UTC)[reply]
The information is in Andrew West's Babelstone blog entry linked to above. I can't link it at the character's page until the moratorium is lifted. --RichardW57m (talk) 11:22, 26 August 2023 (UTC)[reply]

Image upload rights[edit]

Currently, only sysops are able to upload local images (see Wiktionary:Images). There are many entries where I would like to include a fair-use image. Could this right be extended to autopatrollers? Ioaxxere (talk) 21:58, 24 August 2023 (UTC)[reply]

@Ioaxxere: That's what Commons is for. — Fenakhay (حيطي · مساهماتي) 22:06, 24 August 2023 (UTC)[reply]
@Fenakhay Commons doesn't allow fair-use images. Ioaxxere (talk) 22:10, 24 August 2023 (UTC)[reply]
@Ioaxxere The word 'many' raises a red flag. Relying on 'additional terms may apply' to prevent their aggregation already seems risky to me. Being an autopatroller does not assure scrupulousness - even allowing sysops to upload them is taking a risk. --RichardW57m (talk) 14:10, 25 August 2023 (UTC)[reply]
Yeah tbh I am not sure in what situation we would be needing lots of fair-use images, we don't deal with stuff like individual works or people that are what fair-use images are generally used for on Wikipedia. Are there some specific examples? —Al-Muqanna المقنع (talk) 14:42, 25 August 2023 (UTC)[reply]
@RichardW57m, Al-Muqanna Some of the most pressing examples are for entries about memes, where it's pretty ridiculous to have to describe the meme but not be able to show it. A very small selection: amogus, shocked Pikachu, gigachad, trollface, soyjak, The Dress. Ioaxxere (talk) 16:00, 25 August 2023 (UTC)[reply]
Reasonable but you can and do have a Know Your Meme link in any of these cases, where then you press I for a whole gallery, or the entry itself without pressing it will have a better maintained specimen. There is no actual, only a theoretical need here, as users know where to click/tap. Fay Freak (talk) 16:32, 25 August 2023 (UTC)[reply]

Please fix the archiving of WT:RF* to be like WT:BP again[edit]

The archives exist for half of them just like here in WT:BP. Somehow no automatic archiving is working. There should only be 2 months of history on those pages, possibly 1 year at most, with the rest in the archive.

Those pages are getting too big. They take a few seconds to load on my computer, and even longer to start editing / preview / save. It takes even longer to load on my phone. Please do something to lower the page size. Daniel.z.tg (talk) 05:14, 27 August 2023 (UTC)[reply]

We're working on it, but so far as I know, there is no automatic archiving on any of those pages, nor has there ever been, because every discussion requires human judgment to close. Every now and then I post on a very old discussion to try to bring attention to it, and I've seen others do the same .... I haven't closed any of the old discussions myself, but I think I'm still helping out in a small way. Soap 05:22, 27 August 2023 (UTC)[reply]

Hi, I noticed this category as a redlink on several entries, so I blithely created it... Now it complains that "[t]he label given to the {{topic cat}} template is not valid. You may have mistyped it, or it simply has not been created yet. To add a new label, please consult the documentation of the template."

I'm not super sure how to proceed (or what twice-borrowed terms are, in general), so some guidance would be helpful.

Thanks,

Chernorizets (talk) 05:19, 27 August 2023 (UTC)[reply]

The issue is that none of them are New Latin terms. {{af|[language]|[New Latin term]|[term]|lang1=NL.}} should not add this category. J3133 (talk) 05:28, 27 August 2023 (UTC)[reply]
@J3133 I haven't been able to figure out how {{af}} does that. Twice-borrowed term categories are created by Module:etymology and sub-modules, and I haven't found the call-chain that starts at Module:affix and reaches the etymology logic. Perhaps someone who's more familiar with the code can help. Chernorizets (talk) 08:46, 27 August 2023 (UTC)[reply]
I have posted a question in Grease Pit about it to get some tech assistance, hopefully. Chernorizets (talk) 09:09, 27 August 2023 (UTC)[reply]

Japanese いぃ, うぅ, イィ, ウゥ[edit]

To check whether they represent yi and wu or ī and ū, Here are the top hits from Google. Hits of the same proper names, mojibake and cases where it is not possible to tell, like うぅ居酒屋 (name of an izakaya), are ignored.

  • いぃ
    1. 那珂宣伝部/いぃ那珂暮らし (a city promotion group of Naka 那珂, "good Naka"): ī
    2. 【音量注意】"樋口いぃいいぃいぃぃいいい" ("Higuchi gooooood"): ī
    3. 新喜劇アキ【いぃよぉ~講座】("good"): ī
    4. 海を感じる、エモいぃ~スポット ("emotional", inflection ending): ī
    5. いぃべあー楽天 (a company name, "e-Bear"): ī
    6. 台本のないコメディーvol.5 ~全部アドリブでいぃよぉ~~("good"): ī
    7. いぃ〜バンド(e-band)結束バンド、アソート: ī
  • イィ
    1. yee(イィ) (a fashion brand): yi
    2. イィの英訳 - 英辞郎: yi
    3. 10年後イィ女になるために!! ("good"): ī
    4. Satomi (演歌)/イィ...女 ("good"): ī
    5. 闇の洗礼をうけるがイィ! 暴君ハバネロ外伝 ("good"): ī
    6. ヴィエイィニュアンセ (a brand name, "vieille nuance", French "vieille" /vjɛj/): yi?
    7. 鳥イィ? | 上地雄輔 OFFICIAL SITE ("Like some birds?"): ī
  • うぅ
    1. うぅ・・・の人気イラストやマンガ/ううぅ、うぅ、あ・・(息が苦しい)/うぅぉおお/うぅぅ~暑いっ、コンビニ行こっ! (interjections "urgh", "hmm" or the likes): ū (many hits from different sites)
    2. ひねくれ領主の幸福譚 性格が悪くても辺境開拓できますうぅ!(lengthening of the sentence's ending): ū
    3. ぶぅふぅうぅ農園 (@boohoowoofarm): wu
    4. さうすまうぅん sausumaUun (an artist from Sapporo?): ū
    5. 【ホットペッパービューティー】デフィ(defi)のフォトギャラリー:カラフルぅうぅううぅー!("colorfuuuuul"): ū
  • ウゥ
    1. 腕時計 ヴィヴィアンウゥストウッド (a brand name, "Vivienne Westwood"): wu
    2. ウゥ~~~ン、店を出てからは振り返りたくない店かもしんないな/ウゥ~ン・・・落ちた~??/ ウゥ〜(interjections "urgh", "hmm" or the likes): ū (many hits from different sites)
    3. アズノウゥアズピンキー 美品 ロングシーズン ロゴ刻印ボタン (a brand name, "As Know As"): wu?
    (Many ウゥ hits seem to be misspelling of ウィ, ウェ and ウォ, like "ウゥストウッド" above and "ミニウゥレット" for "ミニウォレット".)

My conclusion is these syllables are more often ī and ū. Especially when in haragana, they are almost always ī and ū. Wikitionary transliterating all of them into yi and wu is a mistake. yi and wu should be taken as special cases, like we have ヲチ (芸能人ヲチ, etc) as unusual wochi instead of regular ochi. -- Huhu9001 (talk) 08:58, 27 August 2023 (UTC)[reply]

I don’t think interjections should take precedence here, which is the only way that this works. They’re inherently a lot more woolly in their realisations, too. Theknightwho (talk) 09:01, 27 August 2023 (UTC)[reply]

Block[edit]

Sigh. Now Theknightwho blocked me for just these 2 edit special:diff/75802934, special:diff/75803015.

  1. Didn't someone tell me there is a principle that one admin involved in a conflict should leave the judgement to other admins? Is this principle still in effect? Is there a consequence for breaking it?
  2. Theknightwho again broke mod:ja-translit so it does not correctly recognize わくゝゝ as わくわく but wrongly as わくくく. But I get attacked for the revert.
  3. @Chuck Entz Since you say you watched my user page. Could you please tell me how this is fair?

-- Huhu9001 (talk) 16:15, 27 August 2023 (UTC)[reply]

@Huhu9001 You never, ever try to discuss things. Ever. You always silently revert and don't explain yourself, which is exactly the opposite of what you should do in a collaborative project such as this. It's clear you have zero intention of changing your behaviour, because every single time you make a personal complaint on the Beer Parlour instead. It's highly disruptive, and your approach makes any kind of resolution completely impossible. You are more than intelligent enough to realise this, given you've been told it several times by now, which means that I am forced to conclude that you are being intentionally provocative. There is no personal conflict on my part here, either, and I'd be delighted if you contributed productively. You're just trying the oldest trick in the book for evading accountability, which is to start attacking anyone who tries to hold you accountable so that you can claim they're biased. It's not going to work.
I have every confidence that you will either ignore this comment or you'll try your usual tactic of passive-aggressively responding by talking about me in the third person. Either way, that will only reinforce my point that you have proven time and again that you are incapable of handling disagreements in a sensible, productive or even rational way. Theknightwho (talk) 16:51, 27 August 2023 (UTC)[reply]
Also just to add: while the new version of Module:ja-translit does need fixing to properly support multi-kana iteration marks, the old version was broken because it didn't handle voicing marks correctly (e.g. じゝ was transliterated as jiji and not jishi). Any reasonable person would have pointed out the new issue so that both issues could be fixed, instead of silently reverting to the old version while not caring that it was still broken. Theknightwho (talk) 17:18, 27 August 2023 (UTC)[reply]
The issue with Module:ja-translit has been fixed: わくゝゝ (wakuwaku) and じゝ (jishi) both transliterate correctly. Even ridiculous things like わくゝゝとゝゝう (wakuwaku tokutō) instead of わくわくとくとう (wakuwaku tokutō), which no person in their right mind would actually write. Theknightwho (talk) 20:13, 27 August 2023 (UTC)[reply]
Lol. A month passed and no one else noticed this わくわくとくとう shit is wrong. Japanese experts here are really amusing. -- Huhu9001 (talk) 01:02, 27 September 2023 (UTC)[reply]
How is "わくわくとくとう" wrong? —Justin (koavf)TCM 01:49, 27 September 2023 (UTC)[reply]
I assume he means grammatically, which is totally irrelevant to the point at hand. Theknightwho (talk) 20:37, 27 September 2023 (UTC)[reply]
I'm asking Huhu9001 (talkcontribs), so I'm assuming he knows what he meant. —Justin (koavf)TCM 21:11, 27 September 2023 (UTC)[reply]
LMAO. I would like to see if "Japanese experts" here ever manage to find out the correct answer, despite it being extremely obvious. The "means grammatically" guess is wrong, of course. -- Huhu9001 (talk) 04:04, 28 September 2023 (UTC)[reply]
Okay, so what is correct and does it have any implications for the relevant module? —Justin (koavf)TCM 04:13, 28 September 2023 (UTC)[reply]

Review the Charter for the Universal Code of Conduct Coordinating Committee[edit]

Hello all,

I am pleased to share the next step in the Universal Code of Conduct work. The Universal Code of Conduct Coordinating Committee (U4C) draft charter is now ready for your review.

The Enforcement Guidelines require a Building Committee form to draft a charter that outlines procedures and details for a global committee to be called the Universal Code of Conduct Coordinating Committee (U4C). Over the past few months, the U4C Building Committee worked together as a group to discuss and draft the U4C charter. The U4C Building Committee welcomes feedback about the draft charter now through 22 September 2023. After that date, the U4C Building Committee will revise the charter as needed and a community vote will open shortly afterward.

Join the conversation during the conversation hours or on Meta-wiki.

Best,

RamzyM (WMF), on behalf of the U4C Building Committee, 15:35, 28 August 2023 (UTC)[reply]

Proposal: Koavf is banned from blocking Wonderfool[edit]

(Anyone unaware of the background is directed to 1 and 2.) Multiple admins and 'crats have asked Koavf to stop making disruptive blocks of WF accounts that make valid edits. Koavf said he was doing it because WF was evading his block, and proposed we vote to unblock him if we thought he shouldn't be blocked... so we did... but Koavf continues blocking WF for now spurious reasons.
Normally, when a user is disruptive and refuses to stop, we remove their ability to continue, especially if they misuse admin tools, but because Koavf makes good edits in other areas, I'd like to avoid desysop if at all possible. So, even though we just tried one alternative (the vote removing the grounds for the blocks), I'd like to try one more thing to avoid this going to a desysop vote. Per AG202's suggestion to tackle the blocks directly:
I propose that Koavf is banned from blocking Wonderfool. Let us return to the status quo ante. If Wonderfool does anything disruptive, there are ~83 other admins who will deal with it.
Let us decide whether to adopt this proposal. - -sche (discuss) 21:22, 28 August 2023 (UTC)[reply]

I vote yea. This whole thing is annoying people, and gets in the way of anything productive. CitationsFreak (talk) 21:26, 28 August 2023 (UTC)[reply]
  • As it's pretty buried in the wall of text above, I just want to point out that Koavf will not be able to respond to this thread for a week, as I've blocked him from the Wiktionary namespace for being extremely disruptive in this thread. Theknightwho (talk) 23:31, 28 August 2023 (UTC)[reply]

"if we do not have a local policy (which, as you just wrote, we don't), then I am deferring to the sock puppetry page on Meta": No, this is not how it works. No one is interested in "deferring" to rules from other wikis. Support Ioaxxere (talk) 23:54, 28 August 2023 (UTC)[reply]

  • Weak support I feel quite bad for Koavf. It seems he thinks he's still doing the right thing, and it hurts to see someone who is so deadset on doing what he thinks is right being surrounded by disagreement. But there is no way you can ban someone who is no longer banned at all. Though this isn't what the vote is about, I would also vote to amnesty all other past and future WF accounts, and accounts of any banned member for that matter, if they're exonerated once. There should be no argument anymore that WF has been allowed to return. Kiril kovachev (talkcontribs) 22:01, 30 August 2023 (UTC)[reply]
Support Chuterix (talk) 14:13, 29 March 2024 (UTC)[reply]

Counter-proposal: Create a policy for multiple user accounts[edit]

Instead of having a personality-based rule or bickering about a particular use case, how about we have a guideline or policy that just applies to everyone equally? Since Wonderfool is "allowed to use multiple accounts just like any other user" and we have no equivalent to w:en:Wikipedia:Sockpuppetry, then just make a page at Wiktionary:Sockpuppetry or Wiktionary:Alternate accounts and define when and how it's okay to have multiple accounts instead of having to defer to m:Sock puppetry. If the community agrees that it's okay to make 1,000 accounts just because you want to, then that's fine by me. Otherwise, there is no local policy about when multiple accounts are okay and under what circumstances. What say you to that, User:-sche and User:CitationsFreak? — This unsigned comment was added by koavf (talkcontribs) at 21:50, 28 August 2023 (UTC).[reply]

You should have proposed something like this in the first place instead of taking action. Vininn126 (talk) 21:52, 28 August 2023 (UTC)[reply]
Our blocking policy is to block sockpuppet accounts used to evade a block. —Justin (koavf)TCM 21:58, 28 August 2023 (UTC)[reply]
So you want to retroactively block any old socks? This is a case of "technically correct" while completely missing the point of the vote or policy and blindly following the letter of the law and shows poor critical thinking. Vininn126 (talk) 22:00, 28 August 2023 (UTC)[reply]
What old socks? Not also that he actually used a sockpuppet in the vote, which is obviously inappropriate. As I've written multiple times now, let's actually save your personality-based accusations for a vote instead. That will just formalize this and stop from having any pointless back-and-forth. If you think that writing that I have "poor critical thinking" is somehow liable to get me to agree with you, then you have poor critical thinking. —Justin (koavf)TCM 22:17, 28 August 2023 (UTC)[reply]
Sure. Makes sense, solidifies common practice. (Also, sign your comments!) CitationsFreak (talk) 21:54, 28 August 2023 (UTC)[reply]
Whenever you all make a decision or policy on whether users can have more than one account, I want you all to forget everything you know from between the years 2005 and 2023, and return to the time of the origins of the website. At that time, would multiple accounts for one user be okay? Nowadays, freedom is being rolled back everywhere and literally no one believes in freedom of speech or self-expression. A country like Denmark is considering banning burning a holy book, when this was a normal atheist activity not too long ago. Let the warm sunlight of the wonderful era that birthed these WMF sites shine down onto the wasteland of the dystopian now. --Geographyinitiative (talk) 21:57, 28 August 2023 (UTC)[reply]
What? —Justin (koavf)TCM 21:59, 28 August 2023 (UTC)[reply]
What Geographyinitiative means is that if you make a policy, you should consider what the people who birthed WMF would have thought of your proposal. CitationsFreak (talk) 22:03, 28 August 2023 (UTC)[reply]
What the hell are you talking about? Vininn126 (talk) 22:03, 28 August 2023 (UTC)[reply]
@Koavf regardless of the Wonderfool debate, I agree we should have local policies for sockpuppetry and using multiple accounts, with the understanding that:
  • the policies should solicit broad feedback, both during drafting and during an eventual vote. In particular, people who use multiple accounts for reasons they believe are legitimate should have the chance to share those reasons.
  • any enumerated set of valid use cases for multiple accounts will, at some point, need to be revised to keep up with the times. The proposal should include a bit of wording on its own change process.
Chernorizets (talk) 21:31, 29 August 2023 (UTC)[reply]

Determining language for a vote on multiple accounts[edit]

I propose that we have a vote on the legitimate uses of multiple accounts on en.wikt. Per m:sock puppetry, 1.) generally, users are requested to have one account and 2.) local communities can decide their own policies for multiple accounts. Since there is no local policy for this and there seems to be an appetite for defining it, I am motivated to make a vote to create the page Wiktionary:Alternate accounts which outlines when these are acceptable. For the vote would be the following clauses:

  • Privacy concerns: e.g. you may have more than one account if you are logging in to a public terminal versus a personal device.
    • Disclosure of privacy concerns alternate accounts: If you have a privacy-concern alternate account, it must be disclosed on at least one of the user pages.
  • Research or designated roles: you may have more than one account if you are acting in a designated role for the Wikimedia Foundation or for approved research on Wikimedia Foundation wikis (e.g. testing the software, publishing academic papers on Wiktionary, etc.)
    • Disclosure of designated roles alternate accounts: If you have a designated roles alternate account, it must be disclosed on at least one of the user pages.
  • Just because you feel like it. Users may make as many accounts as they want and edit from them all.
    • All alternate accounts must be disclosed on at least one user page of the alternate accounts.

And the following caveats:

  • Alternate accounts may never be used to game votes or discussions to appear like multiple users have consensus in a discussion, dispute, vote, or any other context requiring coordinated discussion and consensus building in Wiktionary.
  • Alternate accounts may never be used to circumvent blocks, bans, or locks of any kind.

With the concluding if–then:

  • If a user circumvents any of these stipulations, any of that user's accounts are subject to being blocked, consistent with our blocking policy and the judgement of the blocking admin.

The outcome of the vote (if any clauses are supported and there is a consensus) would be a local policy.

Thoughts? —Justin (koavf)TCM 22:18, 28 August 2023 (UTC)[reply]

I think we could just replace the first section of bullets with something like "User may make as many accounts as they want for any reason.". (Also, I'm a little uneasy with the second caveat. Feels like an excuse to keep doing what you're doing.) CitationsFreak (talk) 22:23, 28 August 2023 (UTC)[reply]
Agreed that if the third one passes, the former two are redundant. If someone breaks this policy, what do you propose be done? —Justin (koavf)TCM 22:26, 28 August 2023 (UTC)[reply]
I'm confused as to how that relates to my comment. What do you mean by "third one"? And I am reacting to your policy. I'd suggest changing it to "if a user makes few productive edits and is a pest", or something of that nature. CitationsFreak (talk) 22:31, 28 August 2023 (UTC)[reply]
I'm confused by what's confusing here. You wrote "User may make as many accounts as they want for any reason" which is functionally the same as the third one (that is, the third reason I had listed above): "Just because you feel like it. Users may make as many accounts as they want and edit from them all." Those are the same thing. And if the vote is in favor of "Just because you feel like it. Users may make as many accounts as they want and edit from them all."/"User may make as many accounts as they want for any reason", then the other two (i.e. privacy and designated roles caveats) are redundant and do not need to be formalized. I'm not sure that "being a pest" is particularly good language, since it's so open to interpretation. —Justin (koavf)TCM 22:39, 28 August 2023 (UTC)[reply]
Thanks for clarifying. Yeah, I agree. Not sure why 1 and 2 are needed in this case. They're already included with the meta-wiki page. (The thing about the pest can be better stated as "Alternate accounts may not be used if they are solely [or mostly] used to not contribute positively to the project.".) CitationsFreak (talk) 22:44, 28 August 2023 (UTC)[reply]
1 and 2 are needed because I would vote "yes" for 1 and 2 and "no" for 3, as would many other users. The language can say "if #3 passes, then #1 and #2 are implicit or redundant". Unfortunately, it seems like we are at a loggerheads about if the Meta page is relevant at all, so this just formalizes those caveats from Meta. We may, e.g. reject them. And I think your language proposal makes sense. —Justin (koavf)TCM 22:51, 28 August 2023 (UTC)[reply]
I don't see how having a policy on multiple accounts is a counter-proposal - we can have one while also banning Koavf from taking any action relating to WF. Theknightwho (talk) 22:51, 28 August 2023 (UTC)[reply]
It's a counter-proposal because it would render the concerns redundant. If we had some kind of policy or guideline about when users are allowed to have multiple accounts, then it wouldn't be a matter of interpretation on my part from the vote. The vote was that Wonderfool is allowed to use multiple accounts in the same manner as any standard user. Now I am asking you, Theknightwho: what are the conditions on which someone is allowed to use multiple accounts on en.wikt and where has this been discussed, documented, formalized, etc.? —Justin (koavf)TCM 22:58, 28 August 2023 (UTC)[reply]
It hasn’t been formalised. However, all that means is that you don’t have absolute discretion to ban users for doing it. You don’t get to start making up your own rules while endlessly arguing that everyone else needs to prove you’re in the wrong - especially when you refuse to even understand what the problem is. It just makes me think you lack the judgment required to have admin tools. Theknightwho (talk) 01:07, 29 August 2023 (UTC)[reply]
Blocking users without a policy to back you up is abuse of admin powers. Please stop doing so. If you want to block socks of unblocked accounts, create a vote first. Thadh (talk) 01:16, 29 August 2023 (UTC)[reply]
I agree with basically everyone other than koavf who's expressed an opinion here. It's one thing to be a stickler for rules, acting arbitrarily and justifying it with rules you've made up on the fly is another. —Al-Muqanna المقنع (talk) 07:56, 29 August 2023 (UTC)[reply]
I don’t think we need an additional rules text, only Koavf apparently due to otherwise being unable to mete proportionate admin actions—not even Wonderfool had a problem with it. The “request” or ”guideline” on Meta was enough; it is not supposed to mean e contrario that any imperfect account use, one less adhering to it, is an offence paving the way for some penalization: rather it is likely to acknowledge that there are people of disordered identity; of course editing Wikis, and participating on the internet, tends to have pathological character, that’s why people drop accounts and create new personae. This is not ideal, wherefore one rightly explictly states the ideal as a guideline, but nothing more, because you cannot exactly ensure that everyone is right on the internet and the consequences would be too stifling.
Any further texts cause the impression of restriction, an environment suppressing creativity of editors, and decrease respect for rules in general, because that’s what happens when they are overly specific, rather than constituting cornerstones for procedure: There is a certain rationality, beyond their being long enough, why modern constitutions, and treaties constituting international organizations like the European Union, contain only election principles and general outlines of the workings of the organizational bodies, specifically for their symbolic, that is to say psychologic effect, which will happen whether or not you are aware of it, like most kinds of conditioning; I quote a law journal to demonstrate that I do not talk out of my arse: Talking about freedom and equality of parties and deputees and voters: Morlok, Martin (2022) “Das Recht des politischen Prozesses”, in JuS[8] (in German), volume 62, number 1, page 3:Diese Ansiedelung auf Verfassungsebene hat auch eine symbolische Bedeutung: Sie erhöht die Wertschätzung für die demokratischen Ecksteine.This location on the constitutional level also has a symbolic meaning: It increases the appreciation for democratic cornerstones.
The Weimar Republic failed in part because specific arbitrary election regulations were given constitutional status, together with other mistaken forecasts of the constitution’s authors which caused lacking respect, namely the widespread belief that fundamental rights were only programmatic and not binding the legislation and consequentially its application, the easy application of emergency law and dissolution of Parliament by the Reich president, in combination with the idea that unconstitutional legislation could be reinterpreted as derogating the constitution materially if brought about with sufficient majority without explicitly changing the text of the constitution, and that’s how albeit ideologically pure, due to systemic error, you create Hitler—he is everywhere around the corner unless you discern the cornerstones he comes around. Don’t obscure the cornerstones behind piles of texts! Societies don’t fail because people are wrong but because they miss the methods to deal with people being wrong, failing to to balance rules and principles by import. So due to lacking rational tradition, hundred millions of Russian society are convinced to wage war on their neighbouring country because muh Nazis and NATO—vague ideas enticing up to everything because of being balanced by nothing. As Theknightwho tries to point out, judgment and discretion are necessary to deter people from damage, in opposition to rigid enforcement of some overvalued ideas, which is the principle of the wretched tyranny. As Geographyinitiative noted, twenty years turn around atmospheres. Stupid needed to reign between 1913–1933 in Central Europe, or 1999–2022 in Russia, people becoming dumber and smaller everywhere for great men to concede “I don’t want to be here anymore!” Have you also heard of reactance? It’s why rascals need to experience treatment like anyone else, to have a golden bridge, if they can contribute; it’s actually a postulate of dignity, that humans can be viewed differently over time. Fay Freak (talk) 02:06, 29 August 2023 (UTC)[reply]
I really would like to have another normal account to test as how scripts etc. affect normal users. Is this eligible to have? (By the way, I do have a bot already.) --Octahedron80 (talk) 06:42, 29 August 2023 (UTC)[reply]
I think the vast majority of people would say yes. Only koavf might ban you, but I don't think people would support that action. Vininn126 (talk) 08:32, 29 August 2023 (UTC)[reply]
@Koavf I think the vote-gaming and block-evasion aspects of your proposal are likely more consequential than trying to list every "benign" case of having multiple accounts. The fundamental question is: what can go wrong with people having multiple accounts? How would we detect the wrong thing happening? What's the escalation path, and what are the applicable, correctly-scoped enforcement actions? It's probably worth spending a bit more time thinking through that collaboratively. Chernorizets (talk) 21:44, 29 August 2023 (UTC)[reply]
Is it actually worth it, though? The point of @Fay Freak's long essay above, I think, is that this sort of speculative construction of policy is unlikely to be useful and may even be counterproductive in the long run, which I broadly agree with. In this case the overwhelming sentiment so far (as Fay Freak also says) seems to be that the problem is not multiple accounts, it's how to restrain admin actions that are out of step with community consensus. Perhaps there will be serious issues with the use and abuse of multiple accounts in the future that go beyond the limits of administrative common sense. I don't see any now, particularly when this specific case has been explicitly pronounced on by the community, and would oppose formulating rules merely for the sake of having rules. —Al-Muqanna المقنع (talk) 21:56, 29 August 2023 (UTC)[reply]
@Al-Muqanna are you saying that things like block evasion and vote gaming are already covered by "administrative common sense"? If so, then great. I wasn't implying that there were convoluted, over-engineered other scenarios to be weary of - I was just encouraging people to think through the ramifications. I'd assume it's still overwhelmingly the case that Wiktionary contributors use a single account for their contributions, and IMO this discussion - whether intentionally or not - explicitly lends support to people having multiple accounts. Nothing wrong with that, but let's spend 5 minutes to think about a Wiktionary in which it's become more commonplace.
With all due respect to Fay Freak, I didn't read the whole thing :-) I don't think this is as deep as election regulations, and I'd prefer to keep the discussion grounded in practical concerns. Chernorizets (talk) 22:07, 29 August 2023 (UTC)[reply]
Well, those (sockpuppets, block evasion) in fact are covered by WT:Blocking policy and Wiktionary:Voting policy#Voting eligibility already so not just administrative common sense. There's just common sense required in applying the policy. I agree any discussion should be grounded in practical concerns, which is why I don't think idle speculation about what may or may not happen with multiple accounts in future is productive. If there are specific problem patterns that are actually observed, we can act on those. —Al-Muqanna المقنع (talk) 22:12, 29 August 2023 (UTC)[reply]
@Al-Muqanna some people prefer proactive policies, and some people prefer reactive policies. I'm more in the former camp. At any rate, I've presented my opinion - I'll go with whatever the community decides to do (or not do). Chernorizets (talk) 22:28, 29 August 2023 (UTC)[reply]
@Chernorizets While you have a point, the main reason for this proposal seems to be an attempt by Koavf to derail the conversation away from his abuse of power. While it might be good to have some kind of policy, Koavf is utterly wrong when he says If we had some kind of policy or guideline about when users are allowed to have multiple accounts, then it wouldn't be a matter of interpretation on my part from the vote. In fact, it often makes situations with rules lawyers like him worse, because they'll act like their personal interpretation is the voice of god. Theknightwho (talk) 23:22, 29 August 2023 (UTC)[reply]
@Theknightwho understood. I was trying to separate the proposal from the proposer, which I see might be difficult in this thread. I've seen it pointed out several times in other discussions that we don't have our own sockpuppet policy, and we "loosely" defer to the Wikipedia one, so in each of those instances I thought it would be nice if we had own local version. What I gather from the discussion here is that either there ought not to be such a designation as "sockpuppetry" on Wiktionary (thou shalt have as many accounts as thou wishest), or that it's a behavioral distinction of sorts (what you do with your many accounts). If you think this is already pretty clear - including to newcomers - then fine, although I have my doubts. Chernorizets (talk) 02:28, 30 August 2023 (UTC)[reply]
I must say I'm very much persuaded by @Fay Freak's essay here, and I would prefer that we just keep things anarchist and don't write any laws about this. I believe common sense already prevails such that every user creates new accounts only for sensible reasons; in the case of abuse, we already know what to do when it's done to rig votes, troll, whatever. In the end, did not WF create all these accounts only because of the original block? There is literally nothing that will stop someone who wants to make loads of accounts from doing so; these rules only create a burden for the average user, who now has another policy document to rule over him. Yet another reason I prefer this site to Wikipedia: there're far fewer rules. And thus I reiterate: I believe in the prevailing common sense and reason. Kiril kovachev (talkcontribs) 22:18, 30 August 2023 (UTC)[reply]

Citing Translingual[edit]

Hey bastards. It just came to my tiny mind: how does one cite Translingual, and in what language? If I create an RFV of a Translingual term, how can it be fulfilled? Do let me know. Your friend, Equinox 07:22, 30 August 2023 (UTC)[reply]

The obvious answer is that one cites it in several termslanguages using it. If one uses the quote-* templates, one can supply {{para|termlang=mul}}, as I have done at อ‍ย. For letters, one can usually use {{m+}}, as with the quotations in waiting at User:RichardW57/mul_evidence.
The nastier part is distinguishing loans from translingual words, and in part that seems to be done by convention. I have to admit I am puzzled as to why we don't have Translingual TV (television) or Translingual VDO (video); the latter seems to have close to a milliard google hits, but perhaps it's only Thai and English. There is precedent for adding pronunciation sections for multilingual words, but I neglected to note the example I encountered. --RichardW57m (talk) 10:19, 30 August 2023 (UTC)[reply]
@RichardW57m: Quite so. But then of course we are citing a "Translingual" term in another language (suppose we added three citations for the chemical symbol S for sulfur: perhaps one English, one French, one German). Is that good citing? What if we had some awful cite like homebrew Esperanto? Does anyone actually ever cite these things? I appreciate your comment, Richard, which I don't disagree with, but I suppose I'm rather asking about our policy regarding Txlingual and who cares, and who wants to care. (Wait until I get busy with the emoji.) Equinox 10:26, 30 August 2023 (UTC)[reply]
@Equinox This is a great question. Check out KHH and modify it/see how you like it. --Geographyinitiative (talk) 10:28, 30 August 2023 (UTC)[reply]
We've had |termlang= in {{quote-book}} since 2011. And there is precedent for quoting a use of a word in one natlang embedded in a quotation in another natlang, though one would mostly expect it to be for mentions. And for translingual words, that is exactly what one would expect. So yes, that is valid citing, though for natlangs, not the best. For translingual, I'd like to see a quotation in a non-Roman script, just to emphasise the translinguality. --RichardW57m (talk) 11:01, 30 August 2023 (UTC)[reply]
See nemo turpitudinem suam allegans auditur for a term that was RFV-resolved as Translingual (for whatever it's worth). —Al-Muqanna المقنع (talk) 11:05, 30 August 2023 (UTC)[reply]
I understand your question. Like everything else they are cited, but you want to know how you would know that the quotations show the term to be translingual rather than an international term borrowed without adaptation in multiple languages, since translingual quotations are not even in translingual usually. Well we can at least verify that it exists at all; in some ceases I wanted to believe that a term is translingual in spite of assuming it only having been used in one language ever. Like some of the translingual translations of bejel, which look like taxonomy but in part may have only been used by German medicians—who themselves did not reckon the terms German, and the evidence for their being Latin otherwise is also tiny by default. Any proper taxonomic nomen novum was used in only one language at a point, and only colleagues in the same language may have reused it next or at all.
Seems like we just weigh what speaks for translinguality, it’s not about “how you cite it”. “Counting with language” can’t work in any arbitrary context, contrary to cherished Cartesian cliché looming with a demand of exact science and objectivation; the Vichist oratory art wins again. Fay Freak (talk) 11:55, 30 August 2023 (UTC)[reply]
Actually, if usage can be shown, surely a translingual entry still objected to should be taken to WT:RFM where a successful challenge would result in it being reassigned to a specific language or specialised to entries in multiple languages, or possibly to a subforum of WT:RFD if it can be argued that the entry is redundant. I'm not sure as to the latter approach; it seems to me that it comes under the merger remit of wT:RFM. --RichardW57m (talk) 14:58, 31 August 2023 (UTC)[reply]
Thanks everybody for your thoughts. We might benefit from a little written policy in this area, I don't know. Equinox 13:08, 8 September 2023 (UTC)[reply]

Correctly handling Bulgarian reflexive verbs[edit]

Pinging @Benwing2, and if you could point me to an experienced Spanish editor, that might be useful due to certain similarities about how reflexive verbs work.

Reflexive verbs in Bulgarian are formed with the reflexive pronouns "се" (se) or "си" (si), which don't decline for person and number as in Spanish. Wiktionary entries for Bulgarian verbs list different reflexive versions using labels, e.g.:

My question is: when should reflexive forms of a verb be part of the same Wiktionary entry, and when should they get entries of their own? In some case, the reflexive meaning may be quite different from the non-reflexive one, e.g.:

FWIW, in this case, the normative Bulgarian dictionary lists senses for the reflexive and non-reflexive forms separately, although only the non-reflexive form is available for direct lookup online:

  • отказвам”, in Речник на българския език [Dictionary of the Bulgarian Language] (in Bulgarian), Sofia: Bulgarian Academy of Sciences, 2014

Is there a preference on Wiktionary about how this kind of thing is handled? I see for example that Spanish dormir also lists the senses of "dormirse", even though one means "to sleep" and the other "to fall asleep". English, in comparison, has separate entries for lose and lose oneself despite the semantic closeness.

Thanks,

Chernorizets (talk) 08:26, 30 August 2023 (UTC)[reply]

@Chernorizets This isn't consistent. For example, Italian and Russian always lemmatize reflexive and nonreflexive variants separately, whereas Spanish and Portuguese put the reflexive variant under the nonreflexive variant if it exists (but lemmatizing reflexive-only verbs as such), and I think Polish used to put even reflexive-only verbs under the corresponding nonreflexive verb. Not sure what the preference is cross-linguistically. Benwing2 (talk) 13:02, 30 August 2023 (UTC)[reply]
The situation in English is actually not consistently one way or the other, although it surely should be 😅 ... some entries are at foobar oneself and some are at the main verb, as discussed here and at the discussions linked there. (I mean to start a thread about that (English) once the Beer Parlour, erm, shows up on WT:BP again.) My 2c, for whatever they're worth: where do other dictionaries of the language normally lemmatize these things, and hence what form are speakers most likely to look up? Having all the senses in one place could make them easier to find if speakers can be expected to look up the 'root' verb (without the reflexive) in the same way they probably/hopefully know to strip the inflection and look up грехна if they see an unfamiliar use of гре́хнеш or find upon finding an unfamiliar use of he finds... but OTOH having senses at whichever form they're actually used in can also be intuitive, so there's no one obviously-correct answer (although obviously, yes, we should pick one and be consistent). If we put the reflexive content on a separate page, it and the unreflexive page must link to each other... and in my opinion, should link in a way that makes clear there's additional definitions there (plural: messages has technically always been linked in the headword line of message, but since it's just an inflected form, I doubt anyone realizes to look for more definitions there, and hence I agitated for more prominent linking / highlighting of the existence of other senses there). If we put all the content on the unreflexive pages, we should consider whether to have soft- ({{foobar form of}}) or hard- (#REDIRECT) redirections from the reflexive forms. - -sche (discuss) 15:33, 30 August 2023 (UTC)[reply]

Templates for related terms[edit]

This month, I started in Swedish Wiktionary to create templates for related terms. An example is the Ukrainian word ріка (river), where the related terms are found in the template sv:Mall:uk-besläktade-rika (as mixing alphabets in template names is prohibited, the template name contains the transcribed base word "rika"). Is any language of Wiktionary using this, other than Swedish and Russian Wiktionary? LA2 (talk) 10:53, 30 August 2023 (UTC)[reply]

@LA2 Since we have the ability to scrape a given page to pick up things like related terms, I think it's a bad idea to have related-term templates. We already scrape pages to find descendants, for example, and we could do the same for related terms. Benwing2 (talk) 13:04, 30 August 2023 (UTC)[reply]
I welcome parallel approaches. Let's see what you and others can do. Now I interwiki-linked sv:Mall:ru-besläktade-derevo (Swedish Wiktionary's template for Russian words related to derevo - tree) with ru:Шаблон:родств:дерев and the categories sv:Kategori:Wiktionary:Mallar för ryska besläktade ord with ru:Категория:Шаблоны родственных слов/ru. Since the two templates contain different sets of words, they should ideally be merged and moved to Wikidata. Perhaps Wikidata already has structures for related words? I present my approach, let's see alternatives to compare with. --LA2 (talk) 13:45, 30 August 2023 (UTC)[reply]
@LA2 I'm afraid I agree with Ben Wing here. I think the approach of scraping terms is preferable, because it's more user-friendly to not have to list them somewhere else. Theknightwho (talk) 14:08, 30 August 2023 (UTC)[reply]
@LA2 The approach you propose is very hard to maintain. It was the original approach tried here for descendants, and it didn't work; that's why we switched to page scraping. Benwing2 (talk) 14:12, 30 August 2023 (UTC)[reply]
On the Wikidata point, in general I don't think it's a good idea to rely on transcluding info from somewhere where it's outside of our editorial control. —Al-Muqanna المقنع (talk) 14:13, 30 August 2023 (UTC)[reply]
@LA2 I agree with Benwing and Theknightwho here. Wikidata, which might also have different inclusion criteria to consider, would deter people even more than having to work in the template namespace even for such basic things. It took me years to even surmount the psychological barrier to create reference templates quick. And the comparison to Russian Wiktionary forgets to mention that their templates are botted, scraped from older dictionaries / scanned communist or crowdsourced projects, hence they are also full of Unicode homoglyph mistakes; someone really needs to sweep through ru.Wiktionary.org templates before anyone tries to mimick their structures or, worse, centralize content on their basis. Fay Freak (talk) 14:20, 30 August 2023 (UTC)[reply]
@LA2 details of implementation aside, I actually think this idea potentially has some real promise and I've been thinking about something similar for a while. Of course, there are some considerations, especially in the directions of folk etymology (the Russian wiktionary implementation is a real doozie in places). Slavic languages, like many others, have many homophonous roots. A perfect example comes right from your template: річник (ričnyk, annual, yearbook) is not related to the word ріка (rika, river) but to рік (rik, year). Some further examples for Russian from Townsend: ПАХ1 'smell; blow, sweep' ПАХ2 'plow'; МУК1 'flour' МУК2 'torture'; or ПОЛ1 'half; sex' ПОЛ2 'field' ПОЛ3 'floor'; add to that the pervasive series of consonant mutations. The introduction to Worth et al's Russian Derivational Dictionary outlines the kinds of complications introduced by consonant alternations: for instance, ж alternates with both д and з, so ВОД1 'lead, drive', ВОД2 'water' and ВОЗ 'lead' can all appear under certain circumstances (mostly morphologically conditioned) as ВОЖ: прово́женный (provóžennyj), past participle of both проводи́ть (provodítʹ, conduct, lead, guide) and провози́ть (provozítʹ, transport convey), обезво́женный (obezvóžennyj, dehydrated). Evidently there is also a more distant relation between ВОД1 and ВОЗ, but they are generally considered in the modern language as well as the scholarship to be distinct roots. There is no real way to automate this and there will be situations where it's not clear cut. Luckily there's already been some serious academic study done on the Russian root system (much of which will be easily transferred to other languages-Slavic and possibly further-by people with a healthy amount of humility and actual knowledge). If you're interested in doing this topic justice, see if you can get a hold of Roman Jakobsen's original presentation of the single stem verbal system Russian Conjugation (1948), which later works build on and extend to other parts of speech, or Robert Channon's The Single-Stem Verb System Revisited, 1975, which assesses the single-stem system against the traditional two-stem approach. There are several more responses later for and against but I don't currently have access to the rest. In terms of a full-blown, almost exhaustive presentation of the system of roots as used in the derivational morphology of verbs, nouns and adjectives, Charles E. Townsend's Russian Word Formation cannot be recommended highly enough. I own a print copy and it is my intention to create at least an introductory appendix based on the stem and root system in the near future (see a WIP here). Less in-depth is Charles Gribble's Russian Root List With a Sketch of Word-Formation and the least nuts and bolts but still useful to get an idea is Browing et al's Leveraging Your Russian With Roots, Prefixes, And Suffixes 2001. There's also the above mentioned Russian Derivational Dictionary as well as Tikhonov's Новый cловообразовательный словарь русского языка (particularly interesting in this context), as well as some of his morphemo-orthographic dictionaries here, which don't appear to distinguish between homophonous roots nor have much in the way of what I'd be looking for.
I can think of three distinct (possibly complimentary) ways to implement such a thing:
  • following the general centralised approach of the Russian wiktionary, gathering all forms of a given root under a template.
  • in a separate morphology section with roots also recieving their own individual entries as we already do for affixes or else in the appendices, basically as a synchronic equivalent for what we already do diachronical in the form of reconstructed forms for etymologies. This would have the added benefit of removing surface analyses from etymologies, something which has bugged me for a while. Both approaches are liable to user error and overzealous misapplication.
  • through an in-depth overhaul of the module:ru-verb, etc. (as well as uk-verb, etc.). This would be my preference, though I recognise that it may be a long term goal, as @Benwing2, @Atitarev and countless others have already invested much love and energy into the current systems, and the Zaliznyak systematisation no doubt has it's own validity even for our purposes, but ultimately I think that we could kill two or three birds with one stone here: simplifying the code-base and ease of input for new verbs, nouns, etc (only requires knowing the stem type and the infinitive or the third person plural; for stress notation we already have a pretty workable system, if I understand it properly); elucidating morphological transparency, for use in a system for morphologically related words; possibly such an abstraction would possibly also increase transferrability to other languages (see here for instance Appendix:Navajo roots and stems derivation; such a system would no doubt also be transferable to many Austronesian languages too). As to how prone to user error it is, this is going to be an issue in any system. No doubt this is where citations to reputable morphological works can come in.
Although I take the point that scraping has some distinct advantages, perhaps we can find a way to use both (populating some appendices with conjugations using our XYZ templates?). Some word nests number in the hundreds or possibly thousands (which would be a reason to use appendices or even categories, though the latter seems less elegant). Possibly, there is value in following several of the approaches and allowing them to inform each other. Helrasincke (talk) 17:56, 1 September 2023 (UTC)[reply]
@Helrasincke I have a copy of Townsend's book behind me on the shelf; it is a great book. As for changing Module:ru-verb etc., my long-term plan is to switch it to use inline modifiers similarly to how Module:uk-verb and Module:be-verb already work; but I'm not sure it how "simplified" it can be. Slavic verbs are complicated and you need to be able to specify all the principal parts one way or another. There are different systems for how these verbs are analyzed but it's not clear to me that switching to a different system would get you anything really, and it's a lot of work. For Russian, using Zaliznyak's system has the advantage that there's an exhaustive grammatical dictionary that Zaliznyak wrote that lists all the verbs (and nouns, adjectives, etc.) and their conjugation using his system. I don't know if other systems have anything comparable, and typical bilingual dictionaries don't give enough information in many cases to completely conjugate the verb (e.g. they omit literary forms like the present passive participle, don't tell you whether a past passive participle exists, and omit many irregularities even in that participle despite its ubiquity, etc.). Benwing2 (talk) 19:48, 1 September 2023 (UTC)[reply]
This is only marginally related, but I feel we could create pan-Slavic headwords - nouns, adjectives, adverbs, and verbs usually end up needing some similar information. Vininn126 (talk) 19:50, 1 September 2023 (UTC)[reply]
Thanks @Helrasincke. I had found річник under related terms in Polish pl:ріка, where it is a red link, and just copied it to sv:Template:uk-besläktade-rika, but now when you pointed this out, I moved that word to sv:Template:uk-besläktade-rik instead. I also removed it from pl:ріка. --LA2 (talk) 16:50, 2 September 2023 (UTC)[reply]
@LA2 actually I have to apologise, I spoke too soon. I just happened to notice at Горох that there is also a sense for річник as a synonym for річковик (ričkovyk, river transport worker), which would thus make it related to both ріка and рік. A little knowledge is dangerous! Helrasincke (talk) 12:06, 3 September 2023 (UTC)[reply]
@Benwing2 these are all very fair points and I will admit you've probably got a much better understanding of how it's ultimately going to translate into code. I still think that such a system could achieve things currently not possible, but they also require a certain amount of familiarity with the system to use - and as you say, Zaliznyak's system is very tempting simply because it's there and ready made. Perhaps since a working module based on that is already in place, we could instead expand on it - using the Zaliznyak classification to additionally determine a morphological breakdown and assign a ROOT category (manually specifiable of course), though as mentioned there is still the complication of what do we do with homophonous roots. For roots which have been truncated such as об-(В)ЯЗ–а–ть (обяза́ть (objazátʹ)) and вы-ну-ть (вы́нуть (výnutʹ)), perhaps we use a manually specified list of known roots and anything which doesn't fit gets added to a category where it could be analysed/discussed on a case by case basis? If you think that sounds worthwhile, I could have a play around in the next while to give an idea of how that might look. P.s. you're probably well aware already but another great resource for checking the existence and forms of participles, double checking past stress, etc. is the Малый академический словарь. Bilingual dictionaries sadly never seem to offer the level of grammatical detail I somehow assume they should. Helrasincke (talk) 13:02, 3 September 2023 (UTC)[reply]
@Helrasincke Not sure I completely understand you but I think you want to classify verbs according to their root? I think that should be possible with a little help in some cases. Benwing2 (talk) 05:19, 4 September 2023 (UTC)[reply]
@Benwing2 yeah pretty much - ideally also nouns, adjectives and maybe adverbs too. These root classes could then form the basis of @LA2's templates. Helrasincke (talk) 09:31, 4 September 2023 (UTC)[reply]

For me, for Swedish Wiktionary, the new templates are a clear improvement over what we had before, which was long lists duplicated in each article. As far as I can see, the English article on бачити has a long list of derived terms, that appears to be manually maintained in wiki code, similar to what Swedish Wiktionary has also used until now.

Can you explain how "scraping" is used for descendants? I know entries like anti- have a section "Derived terms" (using {{prefixsee|en}}) that presents a long listing of a Category:English terms prefixed with anti- where all such words are included, because they use particular templates. For example antialgebra in the Etymology section says {{prefix|en|anti|algebra}}. But this wouldn't work for related terms, as there is no relevant category. --LA2 (talk) 14:25, 31 August 2023 (UTC)[reply]

@LA2 The scraping code, if invoked on a given page and told to find the descendants of given term, loads the page text of that term and looks for a ==Descendants== section, and if so, copies all the text of that section into the invoking page. It would work similarly by looking for a ==Derived terms== section. Benwing2 (talk) 18:43, 31 August 2023 (UTC)[reply]
All this said, sometimes it would be nice to have an easier way to do this - unless we more regularly scrape. Vininn126 (talk) 18:46, 31 August 2023 (UTC)[reply]
@Vininn126 Do you mean there should be an easier way than scraping, or we should have an easier way of scraping (e.g. a scraping library)? Benwing2 (talk) 18:51, 31 August 2023 (UTC)[reply]
@Benwing2 Either or. I'd be fine with not having a template if scraping happened more regularly. Vininn126 (talk) 18:52, 31 August 2023 (UTC)[reply]

can we please use {head|en|phrase} to avoid cluttering the headword line in long phrases?[edit]

Take shit or get off the pot for example ... do we really need to list any conjugated forms at all? Consider that someone using this expression or even looking it up is certainly familiar with the word shit, and should they be not, it's only a click away. Consider also that this expression, should we decide to call it a verb, is essentially defective, as it contains a clause-breaking conjunction or ... there's almost no context in which anyone is going to use it in any form other than the imperative or what I believe is the subjunctive (in the use-example). There's no *Well, I really shat or got off the pot.

I would much rather change the header to {{head|en|phrase}} and leave it unconjugated.

Now, I would extend this to some other phrases as well, such as my perennial love/hate target, get one's panties in a bunch, BUT for a different reason (since that one actually can be conjugated), so I dont want to tie these two ideas together. The reason I'd want to trim down the panties header is because it's cluttersome, whereas the entry I linked above is not only cluttersome, but potentially misleading and confusing as it lists forms that are at best extremely uncommon and perhaps ungrammatical.

Thanks, Soap 10:49, 31 August 2023 (UTC)[reply]

It looks messy but a lot of these—at least the basic ones—do seem to be well-attested so I'm not sure they should all be removed. ("Shat or got off the pot" isn't well-attested, but for example "shit or got off the pot" and "shitting or getting off the pot" are.) It might be worth going through and checking them individually first. —Al-Muqanna المقنع (talk) 10:55, 31 August 2023 (UTC)[reply]
There might be use in having a section that directs the user to the main verb for conjugation. Compare Polish verbal phrases such as robić z igły widły. Vininn126 (talk) 10:57, 31 August 2023 (UTC)[reply]
Are you suggesting adding an inflection section? I cant work out which main verb you have in mind that would help with the perfect tense, which we don't actually document. Both verbs inflect. --RichardW57m (talk) 14:18, 31 August 2023 (UTC)[reply]
You... can input multiple verbs, Richard. Vininn126 (talk) 14:19, 31 August 2023 (UTC)[reply]
@Vininn126: To what? And conjunctions and conjugation interact, at the very least optionally. And when you have two verbs whose conjugation varies regionally, the interaction can get very complicated. --RichardW57m (talk) 15:51, 31 August 2023 (UTC)[reply]
@RichardW57m In the conjunction box. That's why just listing "for conjunction, see X and Y" would be easier than every combination in the headword. Vininn126 (talk) 15:55, 31 August 2023 (UTC)[reply]
What's the 'conjunction box'? It makes little sense to me even if you meant 'conjugation box', and 'conjugation table' makes no sense to me as something to input to. Do we have some tool that one can input verbs to to get their conjugation? That would be interesting news to me. Likewise, do we have some tool that applies the reduction that occurs when two phrases are joined by a conjunction? That too would be interesting news.
The phrase you suggest is easier because it ducks the issues. It's an editor-friendly approach as opposed to a user-friendly approach, but it does follow the Wiktionary policy of staying silent rather than being wrong. (I'm beginning to think I don't control the phrase well enough to conjugate it correctly.) --RichardW57m (talk) 16:16, 31 August 2023 (UTC)[reply]
What on Earth are you prattling about? Did you even open the link I mentioned, or are you replying to something you think I said? You're the only one having a hard time grasping what I'm referring to. Vininn126 (talk) 16:18, 31 August 2023 (UTC)[reply]
Yes, I looked at the example of Polish robić z igły widły, which has just one verb in it. Did you not notice that the English expression given as an example has two co-ordinate verbs in it? --RichardW57m (talk) 08:48, 1 September 2023 (UTC)[reply]
Do you understand you can put two terms in the template? That is what I am saying. You can list multiple things in one template to point at different things. Everyone else understood that but you. Bold Vininn126 (talk) 08:54, 1 September 2023 (UTC)[reply]
@Vininn126: As there are some strange vocabulary usages floating around here, starting with @Soap's calling what I would regard as the infinitive (in '...need to shit or...') a subjunctive, perhaps I need to point out that by 'conjunction' I particularly had in mind the coordinate conjunctions or and and.
By 'section', did you mean portion of the invocation of the template {{en-verb}}, and thus an enhancement of {{head|en|verb|[[shit]] or [[get]] off the pot}}, which itself yields "shit or get off the pot". I took it to mean a section of the page. If you mean the former, then the answer to 'To what?' could simply have been 'The headword line template'. (It's particularly unfortunate that the word 'you' doesn't distinguish between the rôles of editor and user.)
So, if we do have the term 'inline modifier' for what goes in ASCII angle brackets at the end of a parameter, what Vininn126 is suggesting is an inline modifier to tell the template to tell the user to follow the link (to the verb or whatever) for inflections. This could also be used to indicate (or hint?) whether 'pot' becomes 'pots' with plural subjects. --RichardW57m (talk) 09:46, 1 September 2023 (UTC)[reply]
I did mean conjugation - however none of this is relevant to the conversation at hand and very little relates to what has been said thus far. Vininn126 (talk) 09:49, 1 September 2023 (UTC)[reply]
  • Would it be unreasonable to refer users to the component terms for inflections, eg, "For inflected forms see shit, get, pot."? — This unsigned comment was added by DCDuring (talkcontribs).
@DCDuring This is essentially my proposal above. Vininn126 (talk) 15:02, 31 August 2023 (UTC)[reply]
I think it would be useful to only link to non-lemmas that actually exist for multi-word terms. CitationsFreak (talk) 15:25, 31 August 2023 (UTC)[reply]
Combining the forms gets quite complicated for non-native speakers, and even for native speakers. When does one pluralise 'pot' for a plural subject? --RichardW57m (talk) 15:57, 31 August 2023 (UTC)[reply]
Consider my proposal a mere kluge, pending the better future when all such things are fully investigated, with attestation, so we can give all the incontrovertible advice any English learner could possibly want. Or we can first do a shortcut for these and then get around to improving entries for the most common words or some other bright shiny object. DCDuring (talk) 16:36, 31 August 2023 (UTC)[reply]
@Soap The {{en-verb}} code lets you control which inflections of individual verbs are displayed, e.g. you could change it so the past tense only shows as shit rather than shat. You can also control the individual inflections themselves and suppress some of them. Maybe it should do this automatically if there's an or in the expression, although that might get complicated depending on how regular it is. I do think it should be changed not to link to the entire inflected expression but to the individual parts; that's a change I've been thinking of making. Benwing2 (talk) 18:54, 31 August 2023 (UTC)[reply]
It does not even make much sense to have those inflections created since there can be objects and more between. Like bust down can be bust it down, bust man down etc. For searchibility it would be advisable for them to be an invisible part of the entry; they are more useful for computers than humans directly, who will refer to the entry of the simple verb anyway if they don’t know it yet. Fay Freak (talk) 01:54, 1 September 2023 (UTC)[reply]
@Fay Freak: Whether such reference happens depends on the competence of users, though it has been argued that users who lack the competence need not be supported. What would help is a statement of what part of the expression changes with content, though that can be partly addressed with usage examples in lieu of mark-up. Also, usage notes might help, such as:
The words shit (verb) and get within the expression are verbs with a common subject and get inflected, while pot may agree in number with the subject of the verb.
For navigating to the entry, it will help to put the phrase under the derived terms of shit (verb). --RichardW57m (talk) 10:22, 1 September 2023 (UTC)[reply]
The usage note that you suggest would seem to be one that could be templatized and include use of headword parsing to identify inflectable component words in the headword. The same headword parsing could provide a good default for the inflection line, directing users to the component-word entries for inflection information. @User:Benwing2 would know about feasibility/difficulty of implementation. DCDuring (talk) 15:00, 1 September 2023 (UTC)[reply]
@DCDuring @RichardW57m If I understand what you're saying, this can all be automated with appropriate specs (inline modifiers) in the param given to the headword. So for example an inline modifier can be attached to 'pot' to indicate that it would pluralize with a plural subject, and the word 'one' in get one's panties in a bunch can be marked to indicate that it agrees (his/her/their) with the subject. With this information at hand, the usage note can be auto-generated. Benwing2 (talk) 19:29, 1 September 2023 (UTC)[reply]
My thought was to have as the inflection line:
shit or get off the pot (For inflected forms see shit, get, pot.)
Moreover I thought this should be the default if no inflection was manually specified. This has the advantages of economy and superior aesthetics, especially if the inflected forms are redlinks, as they often are. It seems silly to spend even a minute making entries for the inflected forms, not to mention that they are not likely to be entered in the search box. DCDuring (talk) 00:06, 2 September 2023 (UTC)[reply]
It might be silly for the user to make entries, however I do feel like this should be recorded somewhere on Wikt, if only because I feel like they should be on a wordlist of English terms on Wikt. CitationsFreak (talk) 05:37, 2 September 2023 (UTC)[reply]
As I mentioned above, I am going to change the inflected forms so the individual words are linked rather than the overall inflections. This might allay some people's concerns. In general I disagree with User:DCDuring's approach except maybe for very long expressions, because if we have the full information on how to inflect a verb phrase, we may as well list the inflections directly rather than requiring a presumably non-native speaker to figure them out by looking up the individual inflections and pasting them together. Most of the phrasal verbs are short in any case, e.g. take out, put up with, give up the ghost. (I should add, 'pot' doesn't inflect in the expression shit or get off the pot AFAIK; you would say "they need to shit or get off the pot" rather than *"they need to shit or get off the pots". I think the past tense is possible in the form "shat and got off the pot".) Benwing2 (talk) 05:57, 2 September 2023 (UTC)[reply]
One possibility in handling long verbal expressions that I could imagine is to truncate parts using "...". Benwing2 (talk) 05:59, 2 September 2023 (UTC)[reply]
Pots could inflect. In addition the is not the only determiner that could fit in its slot. And adjectives can be inserted as well, as in shit or get off the fucking pot (well attested and the first possibility that came to mind). We rely on the failed-search page to take language learners from all of these variants to our lemma page. We should test to see whether failed-search pages work for the inflected forms of such terms rather than continue to go down the path of conjecturally useful inflected form entries. DCDuring (talk) 14:32, 2 September 2023 (UTC)[reply]
@Benwing2: Inflecting the verb shit may be on the edge of a native speaker's competence. I didn't pick up the word until I went to secondary school. --RichardW57m (talk) 09:55, 4 September 2023 (UTC)[reply]
This is exactly what the conjugation "see" box that I suggested does. Vininn126 (talk) 11:07, 2 September 2023 (UTC)[reply]
We have hardly any conjugation boxes in English (only be comes to mind), because we have only three or four inflected forms for verbs, excluding archaic and obsolete forms. We don't ask or expect most of those using enwikt as a monolingual dictionary to even know what conjugation and inflection mean. DCDuring (talk) 14:32, 2 September 2023 (UTC)[reply]
@DCDuring: I think that by 'box' @Vininn126 means a section of text with borders as an instruction to the user, rather than a table.
None of this addresses issues with conjunctions in such phrases as shat and got off the pot, which is likely to need a separate entry if it is to be related to the phrase we've been discussing. --RichardW57m (talk) 09:44, 4 September 2023 (UTC)[reply]
Roger that. I always hope that we can avoid linking bare function words in these kinds of entries except as part of phrasal verbs or unless they are used unusually. DCDuring (talk) 15:41, 4 September 2023 (UTC)[reply]
We do have some like for speak and run and the like. Mostly for the common ones, not for the rare verbs like drunkpost. CitationsFreak (talk) 16:42, 4 September 2023 (UTC)[reply]
But they add only obsolete forms to what appears on the inflection line. Don't they appear after definitions? I wonder how many monolingual English users of enwikt would care to know about these things and/or would know where to look for them or would not be put off having accidentally opened the Conjugation box. Just sayin'. DCDuring (talk) 16:55, 4 September 2023 (UTC)[reply]
You have to factor in the people who like them, such as those who are teaching English through enwikt. CitationsFreak (talk) 17:38, 4 September 2023 (UTC)[reply]

{{ja-readings}} changes[edit]

Hello, I apologize for having done this if it's a problem, but I was running a bot last night to scrape reading data from our coverage of Japanese kanji readings of each kind via {{ja-readings}}. Alongside this, I noticed that we now automatically hyperlink both the kana and romaji readings that are provided, which means certain old syntax (which I saw in my scraped readings output) such as [[えだなし]] (edanashi) is redundant, and also differs from the output (e.g. in italics of the romaji) of the template as of now. @Theknightwho Might you perhaps be the one who made this improvement? If so, could you please confirm that this is going to stay? I'd like to run a bot to remove these redundant annotations if possible. Thank you, Kiril kovachev (talkcontribs) 15:49, 31 August 2023 (UTC)[reply]

@Kiril kovachev Yes, this should be permanent. Theknightwho (talk) 15:58, 31 August 2023 (UTC)[reply]
Thanks! Kiril kovachev (talkcontribs) 16:01, 31 August 2023 (UTC)[reply]
@Theknightwho By the way, there may be an issue with hyperlinking the romanization for katakana readings: see 𬼄. If you change the reading to hiragana, the romaji gets hyperlinked, but it currently does not. Kiril kovachev (talkcontribs) 17:09, 31 August 2023 (UTC)[reply]

P.S. You can see the list of kanji that need amendment (if I have calculated them correctly) at User:KovachevBot/Kanji syntax removal. Each kanji is arrayed in a big list; they each have either manual romaji, manual links, or, most likely, both. Kiril kovachev (talkcontribs) 15:49, 31 August 2023 (UTC)[reply]

Also, @Theknightwho, @Benwing2: do you have any advice as to the best way to remove links from wikitext? My current method is to parse using mwparserfromhell, filter all wikilinks; for each, either select its text (second parameter) if specified, else its title; and then use parsed.replace(link, selected text content). For the manual romaji specification, I'm using a regex: \(\w+?\). My regex engine supports ō, etc. as part of its \w character class, so I assume this should work. Finally, I strip whitespace left as a result of this removal. This should be applied to each element of each reading type, e.g. kun=[[ひとつ|ひと-つ]] (hitotsu),[[ひと|ひと-]] (hito), etc. Is this okay? Kiril kovachev (talkcontribs) 16:00, 31 August 2023 (UTC)[reply]
@Kiril kovachev This all sounds good. For me I just use the following code to remove links, which is similar to what we do internally:
def remove_links(text):
  # eliminate [[FOO| in [[FOO|BAR]], and then remaining [[ and ]]
  text = re.sub(r"\[\[[^\[\]|]*\|", "", text)
  text = re.sub(r"\[\[|\]\]", "", text)
  return text
Benwing2 (talk) 18:37, 31 August 2023 (UTC)[reply]
@Benwing2 Ah, right, that looks a fair bit simpler, thanks for the help :). Kiril kovachev (talkcontribs) 19:03, 31 August 2023 (UTC)[reply]
@Kiril kovachev I realize though this doesn't handle the "pipe trick" correctly, which is when people write [[FOO|]], which links to FOO. Benwing2 (talk) 19:09, 31 August 2023 (UTC)[reply]
@Benwing2 It's regrettable that the parser also doesn't recognize this afaik, perhaps it was because it of cross-wiki differences or similar, but I tried the parser method on [[FOO|]] and it just returned an empty string. Fortunately, this can still be amended in the parser case: if the left and right are both specified, but the right is empty, then just return the left. However, I'm confused about the syntax: why would anyone write that instead of just leaving out the bar? That would surely still link to the same thing, right?
P.S. The code I am using right now for this link-stripping purpose is
:::::def convert_link_to_plaintext(link: mwparserfromhell.wikicode.Wikilink) -> str:
:::::  if link.text is not None:
:::::    if link.text == "": return link.title
:::::    else: return link.text
:::::  else:
:::::    return link.title
:::::def remove_links(text: str) -> str:
:::::  parsed: mwparserfromhell.wikicode.Wikicode = mwparserfromhell.parse(text)
:::::  links = parsed.filter(forcetype=mwparserfromhell.wikicode.Wikilink)
:::::  for link in links:
:::::    plain = convert_link_to_plaintext(link)
:::::    parsed.replace(link, plain)
:::::  return str(parsed)
:::::
Kiril kovachev (talkcontribs) 19:41, 31 August 2023 (UTC)[reply]
@Kiril kovachev I *think* the pipe trick deletes prefixes, so that [[w:FOO|]] displays FOO as well. You'll have to ask User:Theknightwho, who reimplemented Module:links. Benwing2 (talk) 21:10, 31 August 2023 (UTC)[reply]
@Benwing2 Yes that's correct. @Kiril kovachev mwparserfromhell is very sophisticated, but it has a number of inaccuracies that tend to show up when dealing with unusual cases. I don't think it handles the pipe trick properly, and it also doesn't handle link suffixes properly either (e.g. [[apple]]s displaying as apples). There are actually some fundamental problems with its implementation when you delve into the template logic, but that's probably not going to be relevant here. Theknightwho (talk) 21:21, 31 August 2023 (UTC)[reply]
@Theknightwho @Benwing2 I see... the suffix part is accounted for by the GitHub description, which says that it differs between different wikis what link suffixes are supported, so that it doesn't process those makes sense on the one hand, but does make things unpleasant for just making things work on here.
However, as you said, I doubt there should even be any alternate texts in this particular endeavour, as it's just linking to different hiragana spellings.
Thanks for your help both, Kiril kovachev (talkcontribs) 21:30, 31 August 2023 (UTC)[reply]

cleaning up Template:cite-book et al.[edit]

In my cleaning up of the {{quote-*}} templates I forgot about the {{cite-*}} ones. I'm planning on cleaning them up as follows:

  1. Remove aliases of {{cite-*}} templates except the primary ones.
  2. Rewrite {{cite-*}} using Module:quote instead of {{cite-meta}}.
  3. Harmonize the arguments so that they are broadly the same as the corresponding {{quote-*}} templates.

In general it's a bit strange that we have both {{quote-*}} and {{cite-*}} (it is especially strange in that {{cite-*}} lets you provide a quoted passage as well), but the least we can do is make them as similar as possible. Note also that we have {{quote-hansard}} and no {{cite-hansard}}, as well as {{quote-us-patent}} and no {{cite-us-patent}}, {{quote-mailing list}} and no {{cite-mailing list}}, and {{quote-wikipedia}} and no {{cite-wikipedia}} (arguably we should never cite Wikipedia); and conversely {{cite-thesis}} but no {{quote-thesis}}.

The following table shows the current {{cite-*}} templates and aliases, and what I plan to do with them.

Aliased template Canonical template #Uses Outcome
Template:cite-book Template:cite-book 340612 Keep.
Template:Cite book Template:cite-book 2534 Orphan and deprecate.
Template:Cite-book Template:cite-book 174 Orphan and delete.
Template:cite book Template:cite-book 4738 Orphan and deprecate.
Template:cite-text Template:cite-book 1158 Orphan and delete. Keep; used by User:AutoDooz when converting raw-text citations.
Template:cite-journal Template:cite-journal 10593 Keep.
Template:Cite news Template:cite-journal 50 Orphan and delete.
Template:cite-magazine Template:cite-journal 13 Orphan and delete.
Template:cite-paper Template:cite-journal 202 Orphan and delete.
Template:cite-newspaper Template:cite-journal 9 Orphan and delete.
Template:cite paper Template:cite-journal 2 Orphan and delete.
Template:cite journal Template:cite-journal 305 Orphan and delete.
Template:cite news Template:cite-journal 87 Orphan and delete.
Template:Cite-journal Template:cite-journal 36 Orphan and delete.
Template:Cite journal Template:cite-journal 15 Orphan and delete.
Template:cite-web Template:cite-web 186391 Keep.
Template:cite web Template:cite-web 746 Orphan and delete.
Template:cite-av Template:cite-av 83 Keep.
Template:cite-song Template:cite-song 70 Keep.
Template:cite-newsgroup Template:cite-newsgroup 64 Keep.
Template:Cite newsgroup Template:cite-newsgroup 0 Delete.
Template:cite newsgroup Template:cite-newsgroup 5 Orphan and delete.
Template:cite-usenet Template:cite-newsgroup 5 Orphan and delete.
Template:cite-video game Template:cite-video game 48 Keep.
Template:cite video game Template:cite-video game 0 Delete.
Template:cite-thesis Template:cite-thesis 74 Keep.
Template:cite-meta Template:cite-meta 496479 Obsolete and delete.

Benwing2 (talk) 22:43, 31 August 2023 (UTC)[reply]

Generally agree to merge the proposed mergers, but I'll comment that it makes perfect sense to have both - I'll use a {{cite-*}} for example in {{etydate}} - I wouldn't want a template for quotations that also adds the page to a category - sometimes the cited work is a dictionary or something. Furthermore, most R templates are built on cite-*. Vininn126 (talk) 23:19, 31 August 2023 (UTC)[reply]
(e/c)OK, so I guess the {{cite-*}} templates use a rather different output format from the {{quote-*}} templates. Potentially Module:quote could be modified to output the arguments rearranged to the {{cite-*}} standard; this wouldn't be so hard probably. The question is, though, do we want to maintain two totally different output structures, or do we just want to switch the {{cite-*}} templates to use the citation format of {{quote-*}}? Benwing2 (talk) 23:22, 31 August 2023 (UTC)[reply]
@Vininn126 It is very easy to turn off categories in Module:quote. The bigger question is about the output format differences, what do you think? Benwing2 (talk) 23:24, 31 August 2023 (UTC)[reply]
The different output is also rather important, I feel. Vininn126 (talk) 07:07, 1 September 2023 (UTC)[reply]
Agreed, references should be formatted as references. It's also useful for references to display short quotations in-line for example. —Al-Muqanna المقنع (talk) 08:49, 1 September 2023 (UTC)[reply]
@Al-Muqanna @Vininn126 OK, that is fine. I need to look at the differences between the output formats but I think I can just pass in a flag to Module:quote and code it up to conditionally display in the citation order. Benwing2 (talk) 08:55, 1 September 2023 (UTC)[reply]
Seconded. --RichardW57m (talk) 10:35, 1 September 2023 (UTC)[reply]
@Benwing2: no objection to having both reference and quotation templates working off the same backend, provided that the different formats for the templates can be detected. For instance, reference templates do not have the date or year of publication at the beginning of the reference and in boldface, nor is the quoted text automatically set out in a separate paragraph (unless it is quite long).
One thing that we may wish to try and resolve for reference templates is the position of the date or year of publication. At the moment, if a work has an author the year is indicated in parentheses after the author's name (for example, "Joe Bloggs (2023), Testing 123"), but in the absence of an author it appears after the title and publisher (for example, Testing 123, New York, N.Y.: Testing Inc., 2023). I tried to standardize this by shifting the year after the publisher to align it with {{quote-book}} et al., but this was objected to by another editor. — Sgconlaw (talk) 13:48, 1 September 2023 (UTC)[reply]
@Sgconlaw Hmm, I see your point about standardizing the position of the year but at the same time I like the idea of putting it next to the author if possible. This is based off of the way that references are typically referred to in shorthand, which is author's last name + year. That would actually suggest to me the year should follow the title if there's no author, e.g. Testing 123 (2023), New York, N.Y.: Testing Inc. but maybe that is too radical a change. Benwing2 (talk) 19:12, 1 September 2023 (UTC)[reply]
I'd support such a change, for what it's worth, @Sgconlaw, Benwing2. This, that and the other (talk) 09:06, 3 September 2023 (UTC)[reply]