Wiktionary:Beer parlour/2023/August: difference between revisions

From Wiktionary, the free dictionary
Jump to navigation Jump to search
Content deleted Content added
RichardW57m (talk | contribs)
Line 222: Line 222:
:IMO no it is not in order. To me, {{tl|rfdef}} means that a term (or glyph as I suspect you're talking about) is verifiable, but the author of the entry isn't sure what it means or how to phrase a sense. If someone thinks it ''isn't'' verifiable or that there's some convincing reason why it's unsuitable for inclusion then it should go through the normal RFV and RFD process. "No content" is not a good reason for speedying in that case. —[[User:Al-Muqanna|Al-Muqanna]] المقنع ([[User talk:Al-Muqanna|talk]]) 14:44, 4 August 2023 (UTC)
:IMO no it is not in order. To me, {{tl|rfdef}} means that a term (or glyph as I suspect you're talking about) is verifiable, but the author of the entry isn't sure what it means or how to phrase a sense. If someone thinks it ''isn't'' verifiable or that there's some convincing reason why it's unsuitable for inclusion then it should go through the normal RFV and RFD process. "No content" is not a good reason for speedying in that case. —[[User:Al-Muqanna|Al-Muqanna]] المقنع ([[User talk:Al-Muqanna|talk]]) 14:44, 4 August 2023 (UTC)
::Sorry, I forgot to confess that the question was prompted by the deletions of Unicode characters with minimal research and their imminent deletions. I am actually bothered by some of the recently invented characters that have been encoded, but proper research can be hard work - and even harder if one doesn't have easy access to a good specialist library. It occurred to me that I found quite a few alleged Pali words without definition, and no referenced evidence of their existence. Some of these I've condemned to RfV because I can find no evidence and I hope they were just typos or misanalyses; fortunately, most I was able to find at least in dictionaries. I was wondering how many simply got speedied because they irritated whoever tried to expand them, a waste of everyone's effort. --[[User:RichardW57m|RichardW57m]] ([[User talk:RichardW57m|talk]]) 16:53, 4 August 2023 (UTC)
::Sorry, I forgot to confess that the question was prompted by the deletions of Unicode characters with minimal research and their imminent deletions. I am actually bothered by some of the recently invented characters that have been encoded, but proper research can be hard work - and even harder if one doesn't have easy access to a good specialist library. It occurred to me that I found quite a few alleged Pali words without definition, and no referenced evidence of their existence. Some of these I've condemned to RfV because I can find no evidence and I hope they were just typos or misanalyses; fortunately, most I was able to find at least in dictionaries. I was wondering how many simply got speedied because they irritated whoever tried to expand them, a waste of everyone's effort. --[[User:RichardW57m|RichardW57m]] ([[User talk:RichardW57m|talk]]) 16:53, 4 August 2023 (UTC)
::What would be in order in most cases would be an RfV. Defining an obscure or otherwise hard-to-define term without some real evidence is impossible.

Revision as of 17:19, 4 August 2023


Early thoughts on Banat Bulgarian

The Banat Bulgarian dialect (Glottolog bana1308), unlike other Bulgarian dialects, has a codified written norm, and an independent literary tradition going back to the 19th century. The norm is recognized by the Institute for Bulgarian Language - our language regulator - as a literary variety of a pluricentric Bulgarian language. Banat Bulgarian is spoken in the area of Banat, which is mostly in Romania, and partly in Serbia and Hungary. It is classified as a Rup dialect of Bulgarian.

There are several differences between Banat Bulgarian and standard Bulgarian that present potential challenges for how one would properly add such entries to Wiktionary:

  • Banat Bulgarian is written in the Latin alphabet, unlike standard Bulgarian which is written in Cyrillic. I'm assuming we could draw inspiration from Serbo-Croatian, which is also written in both alphabets, and where some of its codified varieties (e.g. Croatian) only use one of the alphabets.
  • As an Eastern Bulgarian dialect, Banat Bulgarian has phonological features that set it apart from standard Bulgarian, such as more pronounced vowel reduction for unstressed vowels, consonant palatalization between front vowels, the vowel /ɨ/, etc. Those differences are reflected in spelling. This means that {{bg-IPA}} wouldn't work for Banat Bulgarian unless we effectively double the code size and complexity.
  • As a consequence of the above two factors, inflection templates like {{bg-ndecl}} and {{bg-conj}} would similarly be insufficient for handling Banat Bulgarian unless they become a lot more complex.

I'm not personally a speaker of Banat Bulgarian, so this is me trying to get an idea of the groundwork that would need to be laid before we even start adding entries for it on Wiktionary. Here's a non-exhaustive list of questions that come to mind:

  • would it need its own language code - e.g. bg-ban - or would we instead change some of the configuration of the bg language code, e.g. to allow it to recognize the Banat Bulgarian Latin alphabet in addition to Cyrillic?
    • if it were its own language code, what would its {{inh}} relationships be with Bulgarian and Old Church Slavic?
    • if it were its own language code, would that effectively double the {{topics}} space, by having e.g. bg:Cats alongside bg-ban:Cats? Or is there a way for things to all go under bg:Cats?
  • what would be the proper form-of template to relate a Banat Bulgarian form like ugništi to its corresponding standard Bulgarian form огнище (ognište)?
  • would it be preferable to modify existing {{bg-FOO}} templates to handle Banat Bulgarian, or develop new templates like {{bg-ban-FOO}}?
  • what other important considerations are there that I'm not asking about?

Thanks,

Chernorizets (talk) 03:04, 1 August 2023 (UTC)[reply]

@Chernorizets There are various considerations here:
  1. Whether to make it an etymology-only variant of Bulgarian or a full-fledged separate language.
  2. How to handle the different script. IMO Serbo-Croatian isn't so good an example because the Latin and Cyrillic scripts map almost one-to-one onto each other and the same underlying dialect is represented, modulo a few trivial differences. Maybe a better example is Tajik vs. standard Persian vs. Dari Persian. These are linguistically three dialects of the same language, but Tajik uses Cyrillic while the other two use Perso-Arabic script, so maybe for that reason Tajik is a full-fledged language. Somewhat similarly is Ottoman Turkish vs. standard Turkish, where there is a script difference as well as a ton of Persian and Arabic loans in Ottoman Turkish that aren't in regular Turkish, even though the dialect based is very similar; in this case again, Ottoman Turkish is its own full-fledged language. Hindi and Urdu are another case where the dialect base is the same but the scripts are different and we've chosen to go with two separate full-fledged languages based partly on the fact that the higher registers are markedly different. Mongolian is another case where there are two scripts and the two scripts do not at all map one-to-one; they represent etymologically different approaches to spelling the language (one much "deeper" than the other). I suspect the dialect base is the same for both scripts, and here we have chosen to unify the two scripts into a single language. I'm not sure in this case how the different spellings are handled in Mongolian inflection tables, maybe User:Theknightwho or User:Atitarev can comment. So it sounds like maybe we need a separate full-fledged language code; although User:Theknightwho has recently added the ability to have things like per-etym-language and per-script transliteration, so the technical considerations forcing a separate full-fledged language are less than before. Regardless, because of the different scripts I would recommend having different headword templates like {{bg-ban-FOO}}; whether you can reuse a single inflection module like Module:bg-verb and Module:bg-nominal depends on how well the scripts and morphology map between the two of them. If the mapping is close, maybe you can have one module using an underlying representation that's then mapped to a surface representation in either Cyrillic or Latin; but if there are lots of differences, you might want two verb modules and two nominal modules, with common code factored out into Module:bg-common or Module:bg-verb-common, Module:bg-nominal-common modules. User:Theknightwho can you comment further?
  3. As the for etymological relationships, I dunno. If Banat Bulgarian is a dialect, maybe it should derive from Middle Bulgarian, same as standard Bulgarian? Presumably terms from standard that end up in Banat are borrowings or calques, and vice-versa? Benwing2 (talk) 05:09, 1 August 2023 (UTC)[reply]
BTW if we have a full-fledged language for Banat, it gets its own topic space. That probably makes sense because of the different script IMO, but I'd accept the other way as well (which you'd get if Banat were an etymology-only language). Benwing2 (talk) 05:11, 1 August 2023 (UTC)[reply]
Thanks for the comments, @Benwing2! I've sent an email to the Institute for Bulgarian Language to see if they'd be willing to submit a change request to SIL so that Banat Bulgarian gets its own ISO 639-3 code. I'll update this thread if/when I get a response.
The Banat dialect has speakers, blogs, books and newspapers in the present day, so it doesn't make sense to me to treat it as an etymology-only variety - it's in active use. Considering that the Banat population is descended from a migration in the 17th century, the argument for having it descend from Middle Bulgarian is compelling.
As for how full-fledged it is, the delta between it and standard Bulgarian is not nearly as large as between e.g. Chakavian and Kajkavian, or between the dialects of Slovene. As a speaker of an Eastern Bulgarian dialect myself, I have quite an easy time understanding it, except for the extensive loanwords from Romanian, German and a few other languages that have replaced native terms. It is officially considered a regional norm of Bulgarian; how that maps to the concept of "language" on Wiktionary is something I'm still ramping up on.
But yes, the comparison with Serbo-Croatian is not the best, since in SCr the different scripts do encode the same orthography and pronunciation, which wouldn't be the case here. I'm looking forward to learning more about the technical details of handling languages where script variation is accompanied by variation along other dimensions as well.
Cheers,
Chernorizets (talk) 05:38, 1 August 2023 (UTC)[reply]
@Chernorizets Etymology-only languages are not at all restricted to dead languages. E.g. we have en-US (American English) and en-GB (UK English) as etymology-only variants of en (English). Benwing2 (talk) 06:11, 1 August 2023 (UTC)[reply]
@Chernorizets We don't necessarily need an ISO code to create a new langcode. Vininn126 (talk) 08:04, 1 August 2023 (UTC)[reply]
@Vininn126 good to know; I was curious about the viability of the ISO route. Maybe nothing happens - I'm a random guy who wrote to the Bulgarian Academy of Sciences :) But if we do get an ISO code, we don't have to make one up.
@Benwing2 I guess I'll have to read up more on "etymology-only" languages and how they are used. Thanks for the example. Chernorizets (talk) 08:39, 1 August 2023 (UTC)[reply]
@Chernorizets An exmample of an Ety-only language might be Middle Polish, for which we have a label, category, special infrastructure, and a code, but it's nested under Polish and any links generated by zlw-mpl point to Polish. I'll also point out there is no ISO code zlw-mpl. Vininn126 (talk) 08:42, 1 August 2023 (UTC)[reply]
And it is in flagrant breach of BCP47, which does have provision for private codes. --194.74.130.171 08:48, 1 August 2023 (UTC)[reply]
What the hell are you talking about? This is something regularly done. Vininn126 (talk) 08:50, 1 August 2023 (UTC)[reply]
{re|Vininn129} Which makes the breaches even more flagrant. --RichardW57m (talk) 09:15, 1 August 2023 (UTC)[reply]
@RichardW57m There is no way languages that only have ISO codes is a good idea. You will find essentially no one that would support that. It's a sort of moot point to claim that it's wrong. Vininn126 (talk) 09:16, 1 August 2023 (UTC)[reply]
@Vininn126: Private use codes should have an 'x-' in them, denoting private use. Most of our codes would be fine with that subtag inserted in second place. --RichardW57m (talk) 09:27, 1 August 2023 (UTC)[reply]
@RichardW57m Good luck convincing people of that. Vininn126 (talk) 09:30, 1 August 2023 (UTC)[reply]
@Vininn126@Chernorizets Yeah, were it not for the script issue I would definitely recommend that Banat Bulgarian be treated as an etym-only language given its apparent similarity to standard Bulgarian. Maybe even with the script issue we should do that but it definitely complicates things; I'd like to get some thoughts from others who can provide comparable situations with other languages (e.g. Malay allows either "Jawi" = Arabic or "Rumi" = Latin script, Kazakh supports Cyrillic, Arabic and Latin, Mongolian supports Mongolian script, Cyrillic and maybe Latin, etc.; but all of these have the same dialect base). Benwing2 (talk) 08:50, 1 August 2023 (UTC)[reply]
Sounds like the situation of South Azerbaijani – and some other Caucasian languages which have a major script we standardize to and an alternative one used in another country, like Laz in Turkey. Lexical peculiarities do not even give rise to the idea that there should be an etymology code, more like code separation only causes annoyance. You just add “Latin spelling of” entries and if not, if there are isolated full entries in the regiolectal script, nothing bad happens either. Of course inflection tables for Azerbaijani in Arabic script work differently. Vulgar Latin reconstruction entries have different Proto-Romance tables. Fay Freak (talk) 10:30, 1 August 2023 (UTC)[reply]
@Fay Freak is it possible to have custom relationships or labels besides {{spelling}} and the like, e.g. "Banat term for"? It's not the case that Banat Bulgarian is just a Latin-alphabet transliteration of standard Bulgarian - the Banat forms adhere much more closely to phonetic spelling, reflecting the dialect's peculiarities. So, for instance, Bulgarian огнище (ognište) doesn't correspond to Banat **ognište, but to Banat ugništi, which is how it's pronounced in Eastern Bulgarian dialects (including Banat, and coincidentally my own). Chernorizets (talk) 11:39, 1 August 2023 (UTC)[reply]
@Chernorizets{{alt form}} with |from=. Fay Freak (talk) 11:44, 1 August 2023 (UTC)[reply]
@Fay Freak Just found it too as I was looking at some Laz examples :) I guess, if we wanted to take a cue from Serbo-Croatian after all in terms of at least supporting multiple scripts under a single language code, there is Module:labels/data/lang/bg where one could add "Banat Bulgarian" with a proper Wikipedia link and categories. Chernorizets (talk) 11:51, 1 August 2023 (UTC)[reply]
@Benwing2: Where is the per-etym transliteration capability documented? It sounds like something I should align Pali to. --RichardW57m (talk) 09:07, 1 August 2023 (UTC)[reply]
@Chernorizets: Per-script variation in inflection is supported for Pali, with the simplest implementation in Module:pi-decl/noun. This treats stems and inflection separately, with provision for utter irregularity. (The price of separation is some complexity in gluing stem and inflection together.) The inflection tables are stored in per-script data modules, with fallback by transliteration from the Witionary-main script. There are flags to handled variations in spelling system within the scripts. Prakrit works almost similarly, but with tables of inflections by regional dialect, but no provision for gross irregularity and the use of transliteration from Roman to target script for stem and inflection together, exhibiting a touching belief in the fidelity of transliteration. (The Prakrit scheme was developed from the Pali scheme with a fair amount of tl;dr.) --RichardW57m (talk) 09:07, 1 August 2023 (UTC)[reply]
@RichardW57 It's not documented yet AFAICT; ask User:Theknightwho. Benwing2 (talk) 20:54, 1 August 2023 (UTC)[reply]

formatting last=/first= in authors in quote templates

@Sgconlaw, -sche, DCDuring Currently if you separate the authors in quote-* templates into |first=Joe, |last=Schmoe, |first2=Jane, |last2=Doe, etc. you get the authors displayed as:

  1. 2025, Schmoe, Joe; Doe, Jane; Roe, Richard, FUBAR: A Memoir of Wiktionary, BF Egypt: Wonderfool Publishing, Inc.

but if you specify the authors more naturally as |author=Joe Schmoe, |author2=Jane Doe, etc., you get

  1. 2025, Joe Schmoe; Jane Doe; Richard Roe, FUBAR: A Memoir of Wiktionary, BF Egypt: Wonderfool Publishing, Inc.

I propose to change this to use the consistent First Last; First2 Last2; etc. order regardless of how the authors are specified in the wikicode. Any objections? (In general I think we should avoid the |first=/|last= format, as it doesn't make sense in many non-English-speaking cultures.) Benwing2 (talk) 05:31, 2 August 2023 (UTC)[reply]

@Benwing2: personally I don't see much point in using the "last name, first name" format in quotation templates used in entries, so I wouldn't object to your proposal. (It makes more sense to order a list by last names in a bibliography, but I don't see that happening frequently here at the Wiktionary.) — Sgconlaw (talk) 06:01, 2 August 2023 (UTC)[reply]
I agree, I'm not really sure why that display is an option for quote templates since it just creates an opportunity for formatting inconsistency for no real reason. Benwing's proposal sounds good. —Al-Muqanna المقنع (talk) 08:26, 2 August 2023 (UTC)[reply]
@Benwing2, Sgconlaw, Al-Muqanna: Seeing as an approved request to delete {{red}}, partly on the grounds of its rareness, was used as justification for deleting the commoner {{lime}}, I will address the effect on citations, as in {{cite-book}}. There is the problem that Wiktionary documentation is inadequate, but the parameter |last= appears to call for a person's surname, the meaning Wiktionary gives for last name. I use it to good effect in {{R:th:Li}}, where the author's surname is 'Li', the name on the book appears as 'Fang Kuei Li', and the Wikipedia page for the author is w:Li Fang-Kuei. In the <last>, <first> convention, this works well.
There is also the issue of Japanese names. In English, they have hitherto appeared as <personal name> <family name>, but there has recently been a Japanese ordinance that in English they should appear in the Japanese order, <family name> <personal name>. Again, the format <last>, <first> is proof against such a changeover. --RichardW57m (talk) 09:41, 2 August 2023 (UTC)[reply]
Of course, there is the problem that the ill-documented parameter 'first' might be mistaken for the first personal name, so were we to quote a work by the former PM Boris Johnson, someone might decide on |first=Alexander! There are a lot of us who use our second forename rather then our first.
Benwing2's proposal would change the interpretation of |first= and |last= to mean the first and last parts of a person's name. --RichardW57m (talk) 09:41, 2 August 2023 (UTC)[reply]
@RichardW57m: This is about quote templates, not citation templates. Fyi, though, the standard in Wikipedia's style where last/first is mandatory is that the entire name should be in the "last" parameter for Chinese names et al. —Al-Muqanna المقنع (talk) 09:49, 2 August 2023 (UTC)[reply]
That works if a Chinese name is always given in Chinese order - it's the occasional Anglicisation of the order that causes problems. The same goes for any names with inconsistent ordering. --RichardW57m (talk) 12:05, 2 August 2023 (UTC)[reply]
Indeed Anglicisation of a Chinese order name causes such problem. I should further elaborate that the ordering of names of people from Hong Kong are very complicated, see this Wikipedia article (though it's still pretty lacking). My practice is generally to use the ordering and spelling according to the source (or use the Chinese name directly when it is not available), and link to the Wikipedia article if possible. Note that since the surname is in between the given names, there are a number of cases where this becomes awkward to deal with - Li, David C. S. is one such example at Wiktionary:About Chinese/references#L, where the typical ordering would be David LI Chor-sing (note the use of capitalisation to denote which part is the surname), but he publishes academically as David C. S. Li (one may say C. S. is the middle name), while Google Scholar also lists LI Chor Shing David and Li Chor-Shing as alias. (If I remember correctly, he has even written a paper on the name phenomenon itself). But I digress. Basically the name situation is so complicated that I would say that keeping |first= and |last= only adds to the complications, and so I think they should be deprecated/deleted. – Wpi (talk) 18:18, 2 August 2023 (UTC)[reply]
I wonder whether it is worth the effort, but I certainly wouldn't object to the result. DCDuring (talk) 13:08, 2 August 2023 (UTC)[reply]
I agree with the proposal, and I might also suggest that if we eliminate "Last, First; Last2, First2", then I wonder if we could just use commas between the names; it strikes me as weird to have "Year comma First Person semicolon Second Person comma Work Title" as if there is some break in content and the year is more closely associated with only the first person, and the Work more closely associated with only the second person. - -sche (discuss) 18:57, 2 August 2023 (UTC)[reply]
@-sche Yeah it seems backwards. Usually semicolons indicate a higher-level grouping than commas, but here we have it the other way. The only issue comes when we have editors or translators. E.g. currently if you write
{{quote-book|en|author=Hayden Carruth|author2=Joe Schmoe|editor=Mary Bloggs|translator=Richard Roe|title=The Hudson Review|location=New York, N.Y.|publisher=Hudson Review, Inc.|year=2025}}
You get this:
2025, Hayden Carruth, Joe Schmoe, translated by Richard Roe, edited by Mary Bloggs, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
which is a muddle of names, and if it's all commas you get this:
2025, Hayden Carruth, Joe Schmoe, Richard Roe, transl., Mary Bloggs, editor, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
which is maybe even worse. If we use and like I proposed at some point above, this becomes
2025, Hayden Carruth and Joe Schmoe, Richard Roe, transl., Mary Bloggs, editor, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
which is a bit better. Maybe we should reword the translator and editor so you get this:
2025, Hayden Carruth and Joe Schmoe, translated by Richard Roe, edited by Mary Bloggs, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
With this, it's also possible to dispense with the word and:
2025, Hayden Carruth, Joe Schmoe, translated by Richard Roe, edited by Mary Bloggs, The Hudson Review, New York, N.Y.: Hudson Review, Inc.:
Benwing2 (talk) 19:16, 2 August 2023 (UTC)[reply]
If it was up to me I would go with either your last option or an abbreviated version ("transl. Richard Roe, ed. Mary Bloggs"). I also find the current style quite confusing with the comma/semicolon mess and "transl." and "ed." added afterwards. —Al-Muqanna المقنع (talk) 21:23, 2 August 2023 (UTC)[reply]
Yeah, putting "edited by" in front of the editor(s) strikes me as the clearest option, your last option above. (I wouldn't object to also using semicolons to set authors vs translators vs editors apart, if anyone wants to, it's just using semicolons to set authors apart from other authors that seems weird to me.)
One other idea: for technical works that have like twenty-nine authors or editors, do we want to display only the first N and hide the rest behind an "et al." that displays their names if you hover over it, like [] ? (They were hidden in an HTML comment here.) Or is it fine to list them all? I don't really mind listing them all, but it does tend to make the bibliographic data take up as much or more space than the quote. In any case, looking through the results for searches like insource:"author17" makes me notice that in Nopeville and science-fictionish we're citing collections of stories by lots of authors and just listing all the authors whose works are in the collection, when we should only list whoever authored the quote we're quoting... - -sche (discuss) 05:29, 3 August 2023 (UTC)[reply]
There might be the occasional issue with licensing (e.g. CC-by-SA) if we truncate the list of authors and quote the text. I'm not confident that quoted text will also be supplied through |text= rather than other means. For quotations, it often isn't. --RichardW57m (talk) 09:51, 3 August 2023 (UTC)[reply]
Scrub the above - putting the less visible names in a tool tip should be good enough. --RichardW57m (talk) 10:11, 3 August 2023 (UTC)[reply]
I have always manually truncated lists of more than three authors or editors with et al. The information should be enough to identify the book, I'm not sure who it's actually benefiting to provide vast lists of authors when just the ISBN or the lead author and title (and edition number if necessary) will be sufficient to identify it anyway. Afaik et al. with a tooltip is what Wikipedia CS1 does. —Al-Muqanna المقنع (talk) 10:07, 3 August 2023 (UTC)[reply]

gl-IPA

I think someone who knows about Lua (and who has enough time to make it) should make a pronunciation template for Galician, anyway, its phonology is not that different from that of Portuguese (except maybe the presence of /θ/ in the Eastern dialects, the merger of voiced fricatives with their voiceless counterparts and the lack of deaffrication of ch, which remains an affricate like in Spanish. Rodrigo5260 (talk) 13:59, 2 August 2023 (UTC)[reply]

@Rodrigo5260 Unfortunately all of these modules take time to write. The Portuguese module, for example, runs to 2,001 lines of code, due to all sorts of complications in the Portuguese spelling-to-phonology mapping as well as there being multiple region-specific pronunciations. Galician might not be quite so bad but it also has multiple regional pronunciations. If anything you'd want to start with the Spanish rather than the Portuguese module, since Galician (at least in the standard rather than the reintegrationist spelling) largely uses Spanish spelling conventions. Benwing2 (talk) 19:24, 2 August 2023 (UTC)[reply]
Ah, yes, you're right, but the standard spelling fails to distinguish the open e and o from their closed counterparts (tho the minimal pairs are fewer than in Portuguese). Rodrigo5260 (talk) 19:37, 2 August 2023 (UTC)[reply]
The few regional pronunciations I know of are the seseo found in Western Galician and the the gheada (when the /g/ sound becomes a /h/ like sound, like in Ukrainian). Rodrigo5260 (talk) 21:24, 2 August 2023 (UTC)[reply]
@Rodrigo5260 I have heard that there are lots of East-Central-West differences in whether e and o are open or closed in specific words (due to different handling of metaphony), as well as differences in the handling of original nasal vowels, etc. Benwing2 (talk) 21:35, 2 August 2023 (UTC)[reply]
Agreed. I actually started a sandbox a while ago, where I copied and condensed transcriptions from the Manual of Romance Phonetics and Phonology, which has a chapter dedicated to Galician. I never did finish the task, but perhaps what material I gathered can be of some use. Nicodene (talk) 22:22, 3 August 2023 (UTC)[reply]

Naming the "ecclesiastical" pronunciation in Latin entries

The pronunciation currently named "ecclesiastical" in Latin entries—actually what is called Italianate in the literature—is not universally used in the church either historically or, as @Andrew Sheedy recently noted on my talk page, geographically (e.g. it is not used by German Catholics). In fact its dissemination is a relatively recent phenomenon dating to the late 19th and early 20th centuries, and to some extent it has also now receded—though salient features like /t͡ʃ/ for ⟨c⟩ before certain vowels are still typical for Catholics in English-speaking countries, there is much less effort to exactly mimic Italian pronunciation in the way that some prelates enforced 100 years ago. ([z] for ⟨s⟩ used to be strictly proscribed, for example; not so much now.)

Calling the pronunciation we provide "Ecclesiastical Latin" with no qualification is misleading. Some editors have apparently inferred from the name that it applies to Medieval Latin, for instance, but in fact there's no convincing reason why either a modern Italianate pronunciation or a classical pronunciation should be invented for terms in Medieval Latin without any later ecclesiastical usage. My preference would just be for it to be renamed "Italianate", but if that term is too obscure there might be a more comprehensible alternative like "Italianizing", or simply "Italian Ecclesiastical". Tagging @Urszag, Nicodene who I think have discussed EL pronunciation before. —Al-Muqanna المقنع (talk) 18:21, 2 August 2023 (UTC)[reply]

I would suggest a label beginning with 'modern' in order to clear up any chronological ambiguity. That can be followed by 'Italian(-ate/-izing)', 'Roman', or similar. Nicodene (talk) 18:28, 2 August 2023 (UTC)[reply]
That's a good point actually, maybe "Modern Italianate" or something. —Al-Muqanna المقنع (talk) 18:51, 2 August 2023 (UTC)[reply]
"Italianate" seems fine. Several documents presenting this style (e.g. the Liber Usualis) refer to it as the "Roman" pronunciation (referring to contemporary, not ancient, Rome) but this would obviously be undesirably ambiguous without further qualifiers. Ganss 1951 ("Pronunciations of Latin in Church") refers to it simply as "the Italian pronunciation" and reserves "the Roman pronunciation" for the reconstructed ancient pronunciation. In theory there are presumably some differences between a) an idealized standard based on accepted traditional Italian usage (but this hypothetical standard may not even have a definite existence); b) the various pronunciations used by native Italian speakers today when the speak Latin and c) the pronunciations used by non-Italian speakers who try to imitate what they think Italian pronunciation is like, but in practice I don't think it's feasible to clearly distinguish these.--Urszag (talk) 19:06, 2 August 2023 (UTC)[reply]
I like User:Nicodene's suggestion of including the word "Modern"; otherwise it wouldn't be obvious to someone like me (a semi-informed outsider) that "Italianate" refers to a modern Church pronunciation. Benwing2 (talk) 19:18, 2 August 2023 (UTC)[reply]
Yeah, I'm inclined to think "Modern Italianate" is clear enough. It's worth noting that it's spilled out of ecclesiastical contexts at this point—people routinely pronounce Latin borrowings like in excelsis in "Italianate" style now (see [1] @ 1:46:22)—so I'm not sure how important it is to flag up the church aspect explicitly. I added a sense at Italianate which summarises the history. —Al-Muqanna المقنع (talk) 20:06, 2 August 2023 (UTC)[reply]
I oppose naming it "Modern Italianate". Almost anywhere you look on the internet and in many reference books refers to two pronunciation systems: Classical and Ecclesiastical. The latter is inherently prescriptive--the Roman pronunciation was officially proclaimed the standard for the Catholic Church by Pope Pius X. All other pronunciations are not Ecclesiastical, they are simply regional/historic (e.g. Germanic, English, Baroque French). I think the labels are fine as they are, but I would support additional pronunciations, which would be helpful for some users (for instance, performers/directors of Medieval/Renaissance music often try to reflect the historical pronunciations used by the composer, which it would be useful to include for that purpose). For the Roman pronunciation, I would also be fine with "Modern Ecclesiastical" to distinguish from the historical pronunciations. Andrew Sheedy (talk) 21:05, 2 August 2023 (UTC)[reply]
Also, thanks for pinging me. I'm keenly interested in this discussion, but I would have missed it otherwise, because I'm travelling. Andrew Sheedy (talk) 21:07, 2 August 2023 (UTC)[reply]
I'm happy to accept "Modern Ecclesiastical" as a compromise, since it at least accounts for the historical aspect, which is IMO the more important and as it stands misleading. However, I'm not convinced by the prescriptive point: yes, Pius X instituted it as a standard, but John XXIII also obliged seminarians to learn Latin and nobody with the pertinent authority has been much interested in canonically enforcing either of those provisions for many years now. As it stands today Italianate pronunciation is generally (at least for clergy) something that has been learned informally rather than something enforced. —Al-Muqanna المقنع (talk) 21:14, 2 August 2023 (UTC)[reply]
While that is true, the Ecclesiastical pronunciation is what most Catholics are interested in learning when it comes to praying/liturgical usage. Unlike Pope John XXIII's injunction, Pius X's was actually widely taken up, if not universally, at least enough to be considered "the" Ecclesiastical pronunciation. I would be fine with "modern Italianate Ecclesiastical", as long as we don't drop "Ecclesiastical", which is what many people will recognize. Not everyone interested in Ecclesiastical pronunciation will realize that it is Italianate. Andrew Sheedy (talk) 20:52, 3 August 2023 (UTC)[reply]
'Modern Italianate Ecclesiastical' is quite long. I like the suggestion of 'Modern Italianate', and I propose using this precise link to clear up any confusion. (At the moment, we link to the article Ecclesiastical Latin, which is rather broad and does not focus on the modern pronunciation in question.) Nicodene (talk) 02:10, 4 August 2023 (UTC)[reply]
I object to the label 'ecclesiastical' for two reasons:
1) It isn't specific enough. The catholic church has decreed that the Italian norms are to be the standard pronunciation, certainly, but many catholics carry on using different styles; see here, for instance, for various Polish examples. It should also be pointed out that the catholic church does not 'own' Latin, even as far as christians are concerned, since there exist churches, mainly protestant ones, which both employ Latin and reject the Pope's authority (which would include, naturally, the authority to decide on the 'proper' rendition of Latin).
2) It's too specific. The described pronunciation is not specifically 'ecclesiastic'; it is simply how most Italians pronounce Latin, in any context.
Nicodene (talk) 22:05, 2 August 2023 (UTC)[reply]
As an alternative to "Ecclesiastical", I would be fine with "modern Catholic" or "Italianate Catholic" or "prescribed Catholic" as a label. It's true that other Catholics use different pronunciations, but those aren't really ecclesiastical pronunciations, they are simply national pronunciations (which I would also like to include, although collapsed by default). Andrew Sheedy (talk) 20:54, 3 August 2023 (UTC)[reply]
How are they 'not really ecclesiastical' when they are used by other christians, whether Catholic or otherwise, in ecclesiastical contexts? Nicodene (talk) 01:54, 4 August 2023 (UTC)[reply]
Having looked up the specific statement by Pius X in question btw it was not a formal regulation, just an informal recommendation in a letter pertaining specifically to France (p. 169 in these acts), so it was never prescriptive in more than a loose sense. I don't think there needs to be specific emphasis on its Catholicity. —Al-Muqanna المقنع (talk)
I was going to suggest this even before Andrew's comment, but what about "Italianate Ecclesiastical" or "modern Italianate Ecclesiastical" (or lowercase "e", or uppercase "m", whatever)? I assume whatever we do we'll wikilink it to somewhere that explains the nuances. It does strike me as potentially slightly confusing that we currently seem to mean two distinct things when we say "Ecclesiastical" in {{a}} vs in {{lb}}; as far as I can tell, the {{label}} would also be used in the case of peculiarities of Latin as used by German/French/British/etc Catholics or Protestants, while the {{accent}}-label wouldn't, which makes it a little weird to use the same name for both. - -sche (discuss) 01:18, 3 August 2023 (UTC)[reply]
That's a good point about the Ecclesiastical sense label vs. pronunciation qualifier. I think that ultimately goes back to my main point above, that labelling the pronunciation "Ecclesiastical" (not to mention linking it to the Ecclesiastical Latin wp page) implies it pertains to "Ecclesiastical Latin" as a whole, i.e. the entire body of Christian usage stretching back to late antiquity referred to by the gloss label and the ety language, when it doesn't by a long shot. —Al-Muqanna المقنع (talk) 21:00, 3 August 2023 (UTC)[reply]

Issues regarding the Inuit languages

(moved from Wiktionary:Grease pit/2023/August)

Recently I managed to get a copy of the Comparative Eskimo Dictionary so I wanted to start adding some entries and checking entries that have already been made. And I noticed a few (potential/possible) issues.

  1. Currently we have Inupiaq (ik), Northwest Alaska Inupiatun (esk), and North Alaskan Inupiatun (esi) all listed as languages on the same level. This feels like a strange choice considering esk and esi are the two sub-groupings of ik. So would it not be better if these were something like etymology-only languages under ik? Or some other solution, I do not remember if there is a precedent for this on here that we could stick to.
  2. As of now only Inuktitut iu has Canadian syllabics (Cans) as a script code, while the other varieties used in Canada do not have it as a listed script (even though it should be). These languages are Inuvialuktun (ikt) and Inuinnaqtun (esx-inq).
  3. Same as with the Alaskan Inuit varieties I mentioned before, currently, we also list Inuinnaqtun (esx-inq) as being on the same level as Inuvialuktun (ikt). This would be incorrect as Inuinnaqtun is one of various varieties of Western Canadian Inuit (Inuvialuktun). So a) it feels strange that we only have a special lang-code for that variety and not for other varieties like Siglitun, Netsilik, or Kivallirmiutut and b) that it is listed on the same level and not one level down.

BartGerardsSodermans (talk) 06:31, 2 August 2023 (UTC)[reply]

I can answer point 1: WT:LT says, for Inupiaq, "only the macrolanguage is treated as a language". We have no entries under the Cat:Northwest Alaska Inupiatun language and Cat:North Alaskan Inupiatun language trees. All words should be created under Inupiaq. Where do you see that they are "listed as languages on the same level"?
(As a side note, I wonder why the lects are named "Northwest Alaska Inupiatun" and "North Alaskan Inupiatun" - it seems we could be more consistent.)
In general this discussion might be better suited to the beer parlour, as it does not concern purely technical issues. This, that and the other (talk) 07:22, 2 August 2023 (UTC)[reply]
@This, that and the other Yea I agree this is a better fit for the beer parlour too, don't know why I posted this here. I made a mistake with the the languages being on the same level too. But what exactly is the purpose of having categories for Northwest and North Alaska(n) Inpiatun languages if they don't include any entries and all words should be created under Inupiaq? I also noticed the strange naming of the lects, though it isn't a big issue it is inconsistent. Apart from that I'll add the other questions I had to the beer parlour. BartGerardsSodermans (talk) 09:09, 2 August 2023 (UTC)[reply]
I was wondering that too. If these languages aren't languages according to WT:LT, why are they listed in WT:LOL at all? Can anyone else help? This, that and the other (talk) 11:05, 2 August 2023 (UTC)[reply]
@This, that and the other, BartGerardsSodermans I moved this discussion to the Beer Parlour. User:-sche can you answer some of these questions? Also presumably we should rename 'Northwest Alaska Inupiatun' to 'Northwest Alaskan Inupiatun' (rather than the other way around), do you agree? Benwing2 (talk) 23:21, 2 August 2023 (UTC)[reply]
If we're treating only the macrolanguage(s) as a language, and not the dialects, then yeah, let's either move the dialect codes to the etymology-languages module (ike is currently invoked in several etymologies), or comment them out (like is done for e.g. "tw" in Module:languages/data/2 — I started doing that after seeing various people readd codes, unaware of earlier discussions and assuming they were just missing; perhaps that's even what happened here). BartGerardsSodermans, how many languages do you think it's sensible to have language headers for? Just ik and iu, or are you proposing ik, iu, and ikt as co-level languages? (I'm not sure, from your division into three points above.) For "Northwest Alaska(n) Inupiatun" both forms seem about equally rare; no objection to renaming if it's kept in some capacity. - -sche (discuss) 07:10, 3 August 2023 (UTC)[reply]
I am personally in favor of treating the two Inupiaq lects as etymology-languages. Especially because words often show variation between the lects and neither of the two can really be seen as the standard Inupiaq, they also use slightly different orthographies, and this way you can easily show both languages in the descendant section of reconstructions while still having all words be in the Inupiaq language category (where they can then be sorted into the correct regional forms).
I also think it might be a good idea to use ikt for headers as well, simply because there are some differences between it and iu. There is also the issue of script, in that ikt is rarely written using Canadian syllabics and nearly exclusively uses the Latin script, while for iu the situation is a different in that both the syllabic and Latin script are used (though the syllabic script does seem more common to me). I am not fully aware of how different varieties have to be to qualify for a language header status and if something like orthography should be taken into account. I do see one issue possibly arising in having iu and ikt be different co-level languages in that iu is also used by some people for the Western Canadian lects and not exclusively for the Eastern Canadian lects, which might confuse people not aware of how we decided to treat the different language codes. So if we do split them it should be made clear that iu is only used for the Eastern Canadian lects. BartGerardsSodermans (talk) 08:05, 3 August 2023 (UTC)[reply]
If we want to have ikt for the Western lects and then make clear that the other language is only intended to encompass the Eastern lects, we should just keep using ike for that (for those Eastern lects), since ike is the code that denotes "the Eastern lects specifically", and iu is the code that encompasses both Western and Eastern together, AFAIK. (Wikipedia does not make this as clear as it could, since it redirects w:ISO 639:ike to an article it titles bare "Inuktitut [...] also known as Eastern Canadian Inuktitut", and gives iu as the code for both that lect and "Inuvialuktun (part of Western Canadian Inuktitut)", in each one's infobox.) - -sche (discuss) 09:36, 4 August 2023 (UTC)[reply]
In translations, should dialects like Inuinnaqtun be entered as separate languages, or should they be under the umbrella language? e.g. for arvaq, linked from the English page hypothenar eminence, should it be listed under Inuinnaqtun (esx-inq) or Inuvialuktun (ikt)? Or neither? Thanks, Soap 20:14, 3 August 2023 (UTC)[reply]

English anagram run

@Ioaxxere @CitationsFreak Hello, regarding the English anagram run you asked for in the Bulgarian thread: I basically got everything running, i.e. I have the full Wiktionary wordlist for English and can run the bot with the same capabilities as with the Bulgarian version. I'd just like your advice on a few parameters of the run, mainly being what to do about duplication on the page, i.e. do we skip adding an anagram if:

  • It's already in the see-also section at the top of the page?
  • It's already in the alternative forms section:
    • As an L3
    • As any header level, so long as it's under the English header?
  • It's linked to whatsoever in the English entry?

I see the previous anagram bot author appears to have skipped out all the ones in the see-also. Check out MacBees: my bot tried to add Macbees as an anagram, although it's already linked in the see-also. I later reverted it, because evidently both terms existed well before the previous anagram run, yet the author decided not to duplicate them regardless. Similar, on 1000-metre, 1000 metre, 1,000-metre were not linked before, but 1,000 meter, 1000 meter, 1000-meter all were. This time, they do appear in the alternative forms, but they're also duplicated as anagrams.

Do we want anagrams to overlap with other obvious links like these? Specifically, in what configuration?

Also, here is the process by which I normalize words for the anagram calculation.

  • Remove all whitespace at the start and end.
  • Convert to lowercase.
  • Convert æ to ae, œ to oe, and ı to i. These are but a few of the possible equivalences we can have, I don't know what mappings we'd like, nor what the previous author had, but even without this we have a lot of anagrams anyway.
  • Decompose all characters to their simplest, e.g. é becomes e + ACUTE. Specifically, this is Unicode NFKD normalization, which may convert e.g. № to No, which is why we need to convert to lowercase again after this.
  • Remove all irrelevant elements (non-alphanumeric characters: the alphanumeric characters I recognize as unique are abcdefghijklmnopqrstuvwxyz0123456789βðπø).

If you have any concerns about this, please let me know; and also I'd love your input on the affair with deciding what to duplicate. Kiril kovachev (talkcontribs) 18:56, 3 August 2023 (UTC)[reply]

Why "βðπø" in particular? —Justin (koavf)TCM 18:59, 3 August 2023 (UTC)[reply]
@Koavf It was originally a rather long list of characters. I checked all English entries, and first got rid of all alphabetic characters, which left a number of symbols and diacritics. In the end, the only alphabetic characters that may make any difference (i.e. have a lot of terms where they're used) are those 4. In truth they might not matter, maybe they can be excluded. Kiril kovachev (talkcontribs) 21:32, 3 August 2023 (UTC)[reply]
Actually, this makes me realise that what I did was inadequate, because when I remove those four, we suddenly get around 60 more anagrams than before. Which makes sense if you think about it—removing those characters makes those letters disappear in any word which features them, which can lead to collisions with words that don't have them at all. I think I need to re-introduce those other characters as well, but are there any other ideas as to how to deal with this to make sure there are no mistakes, e.g. β-carotene becoming an anagram of carotene? Kiril kovachev (talkcontribs) 22:00, 3 August 2023 (UTC)[reply]
UPDATE: I changed it to use the regular expressions \w and \d for alphabetic and numerical characters respectively, which I'm hoping means that any alphabetic characters that are used will be respected. However, there may still need to be exceptions made for e.g. 🧢 (cap), because "🧢ing" would be an anagram of "-ing", which is definitely not right. Kiril kovachev (talkcontribs) 22:16, 3 August 2023 (UTC)[reply]
@Kiril kovachev: At minimal analysis, all symbols should be taken into account. There may be exceptions, but I can't think of any. --RichardW57 (talk) 05:44, 4 August 2023 (UTC)[reply]
I think you want to skip anything where normalized(a) == normalized(b). JeffDoozan (talk) 21:19, 3 August 2023 (UTC)[reply]
I see, this is a good idea. I believe I've implemented this now. I forgot to mention that the code is here, as I've usually been sharing it for other scripts.
Given that, is there anything else you think needs doing? Kiril kovachev (talkcontribs) 21:37, 3 August 2023 (UTC)[reply]
I think we do want anagrams that are alt forms with different spelling (eg not cafè and cafe but theater of war and theatre of war). For, say, Scrabble players, since (as Eq says) maybe you want an R in the Double Letter Score section. cf (talk) 21:24, 3 August 2023 (UTC)[reply]
@Kiril kovachev Just for reference, User:OrphicBot used to update the {{also}} sections at the top of each page and worked off of an equivalence list that's documented on the bot's page. Also I agree with User:CitationsFreak about including alt forms with a different normalization (which is also essentially the converse of what User:JeffDoozan is saying). Benwing2 (talk) 23:43, 3 August 2023 (UTC)[reply]
Also when computing equivalences, it sounds like you want to convert to NFD form and remove diacritics (in the range U+0300 through U+036F), as well as maybe anything identified by Unicode as punctuation, but not all symbols; that will avoid the issue you mentioned above with the 🧢 symbol. Benwing2 (talk) 23:46, 3 August 2023 (UTC)[reply]
@Benwing2 Thanks, this is a good idea. I'll try it out tomorrow when I can. As for alt forms: I'm not sure I understand what we're after. Should we put words with the same normalisation (e.g. on the page cafe, café) under an Alternative forms header by default? Or in see-also? Kiril kovachev (talkcontribs) 23:52, 3 August 2023 (UTC)[reply]
By "see also" are you referring to the "See also" hatnote at the top of the page? That is supposed to be for orthographically-similar terms in a language-independent fashion, whereas "Alternative forms" are language-dependent forms that are related by both form and meaning. So they are logically independent/orthogonal. Benwing2 (talk) 23:58, 3 August 2023 (UTC)[reply]
@Benwing2 Yes, that's what I thought, e.g. on 1000 meter, placing 1000-meter in that hatnote. Although it's language independent, I believe it still works fine to put such variants in there because they're just small variations in representation of the same letters.
Anagrams that are actually rearrangements should clearly go under ==Anagrams==; and so, would it be fitting to put forms with the same normalization in the {{also}}, or somewhere else? (Or not anywhere at all?) Kiril kovachev (talkcontribs) 12:02, 4 August 2023 (UTC)[reply]
@Kiril kovachev Why are you folding ı and i? --RichardW57 (talk) 05:49, 4 August 2023 (UTC)[reply]
Because they are considered to be the same kinda letter in English. cf (talk) 05:50, 4 August 2023 (UTC)[reply]
@CitationsFreak What? 'ı' is a blatantly foreign letter. Or are you claiming a subtractive diacritic? --RichardW57m (talk) 08:43, 4 August 2023 (UTC)[reply]
Yes it is treated as a subtractive diacritic in English. Among the handful of place names in Turkey for which we have English entries using it it is freely replaced with 'i' in actual English writing (e.g.) and has already been treated as equivalent to 'i' in anagrams as at Ağrı. —Al-Muqanna المقنع (talk) 09:36, 4 August 2023 (UTC)[reply]
Thank you, I was going to point to the exact same example. That's where I originally got that mapping from. I don't know how to find others like it, but it's one of the few I could find. Kiril kovachev (talkcontribs) 11:56, 4 August 2023 (UTC)[reply]

Family tree of the Slavic languages

Hello. Please update the Slavic language tree. It is necessary to eliminate errors and inaccuracies.

Slavic language tree
  1. Old Novgorodian (zle-ono) is not a descendant of Old East Slavic (orv). This is a "sister language", which has very archaic features that were not in the OES. Therefore, it is necessary to make the Old Novgorodian a descendant of East Slavic (zle) family;
  2. Carpathian Rusyn (rue) should be made a descendant of the etymological Old Ukrainian (zle-ouk), together with the Ukrainian (uk).
  3. Etymological Old Ukrainian (zle-ouk) and Old Belarusian (zle-obe) should be renamed to "Middle Ukrainian" & "Middle Belarusian". Because the forms of these languages with "Old" belong to the period of Old East Slavic, like dialects.
  4. Bulgarian (bg) and Macedonian (mk) are very related. Why then is only one of them listed as a descendant of Old Church Slavonic (cu)? Either make Macedonian & Bulgarian descendants of OCS, or reconsider the position of Bulgarian as a descendant of OCS.

ZomBear (talk) 12:46, 4 August 2023 (UTC)[reply]

Interslavic language

How about adding a language code for an constructed language — Interslavic (isl)? Make it a descendant of Proto-Slavic (sla-pro). Supports both Latin & Cyrillic script (just like Serbo-Croatian). After all, Wiktionary has Esperanto. ZomBear (talk) 13:03, 4 August 2023 (UTC)[reply]

Use of {{rfdef}}

Is it in order to request the deletion of entries whose sole definition is an invocation of {{rfdef}} on the grounds that there is no content? The documentation of the template rather suggests that this is a request for content, and that we should not simply implement a policy of refusing requests point blank. I appreciate that some linger because no-one chooses to make the effort - which for a good definition can be considerable. --RichardW57m (talk) 14:30, 4 August 2023 (UTC)[reply]

Some such entries have been speedied, which suggests to me that the templates's documentation needs to be changed, perhaps to suggest a time after which such requests may simple be rejected on the basis that the editors can't be bothered to create a definition. --RichardW57m (talk) 14:30, 4 August 2023 (UTC)[reply]

IMO no it is not in order. To me, {{rfdef}} means that a term (or glyph as I suspect you're talking about) is verifiable, but the author of the entry isn't sure what it means or how to phrase a sense. If someone thinks it isn't verifiable or that there's some convincing reason why it's unsuitable for inclusion then it should go through the normal RFV and RFD process. "No content" is not a good reason for speedying in that case. —Al-Muqanna المقنع (talk) 14:44, 4 August 2023 (UTC)[reply]
Sorry, I forgot to confess that the question was prompted by the deletions of Unicode characters with minimal research and their imminent deletions. I am actually bothered by some of the recently invented characters that have been encoded, but proper research can be hard work - and even harder if one doesn't have easy access to a good specialist library. It occurred to me that I found quite a few alleged Pali words without definition, and no referenced evidence of their existence. Some of these I've condemned to RfV because I can find no evidence and I hope they were just typos or misanalyses; fortunately, most I was able to find at least in dictionaries. I was wondering how many simply got speedied because they irritated whoever tried to expand them, a waste of everyone's effort. --RichardW57m (talk) 16:53, 4 August 2023 (UTC)[reply]
What would be in order in most cases would be an RfV. Defining an obscure or otherwise hard-to-define term without some real evidence is impossible.