User talk:Benwing2

Archive[edit]

Catalan inflections[edit]

Hi Ben, any chance we could have automatic Catalan inflections? There's User:DTLHS/catalan bot requests, but it doesn't seem to be running very often, and it's tedious to add manually to a list. Jberkel 18:12, 11 December 2023 (UTC)[reply]

@Jberkel Yeah I have looked into this. The thing is that I'd probably have to rewrite Module:ca-verb to work like Module:es-verb or Module:pt-verb. The Spanish, Portuguese and Galician modules were all written mostly by me and implement JSON fetching of the inflections as well as {{es-verb form of}} and similar to automatically fetch the correct inflections for a given verb form. The former wouldn't be too hard to add to the existing module but the latter would be painful, and it would probably be better to rewrite the module instead. I have looked into doing this but I don't have that good a handle on Catalan verbs, esp. those in -er/-re. Do you have any good references that explain how Catalan verbs work, especially focusing on the -er/-re verbs, which is where the irregularities seem to be? The current module seems to push a lot of the complexity down into the template call, e.g. veure's invocation looks like this, which is a mess:

{{ca-conj-ure-ia2|v|e<!--
-->|past_part=vist<!--
-->|past_part_mpl=vistos<!--

-->|pres_ind_1_sg=veig<!--

-->|pret_ind_stem=vei<!--
-->|pres_sub_stem=veg<!--
-->|impf_sub_stem=vei<!--

-->|pret_ind_1_sg=viu<!--
-->|pret_ind_2_sg2=veres<!--
-->|pret_ind_3_sg2=véu<!--
-->|pret_ind_1_pl2=vérem<!--
-->|pret_ind_2_pl2=véreu<!--
-->|pret_ind_3_pl2=veren<!--

-->|impr_2_sg=veges|impr_2_sg2=ves<!--
-->|impr_2_pl2=veieu<!--
-->}}

I'd want to have this stuff all in the module itself, similarly to what's being done for Spanish, Portuguese, Italian, French, etc. Benwing2 (talk) 23:15, 11 December 2023 (UTC)[reply]

Ok, thanks for looking into it, I sent you some reference material via email. Jberkel 09:01, 12 December 2023 (UTC)[reply]

@Jberkel Thanks, I received it. Benwing2 (talk) 21:12, 12 December 2023 (UTC)[reply]

@Jberkel I have a question, not sure if you know the answer. In -ar verbs whose root vowel is e or o, is that vowel pronounced è or é (or ò or ó for roots in o) in root-stressed forms (e.g. first-singular present indicative), or does it vary from verb to verb? In Proto-Romance it varied from verb to verb, and this is still the case in modern Italian. Spanish has a reflex of that in verbs that unexpectedly have ie or ue in root-stressed forms, but Portuguese has regularized the vowel quality (for example, using low-mid vowels in -ar verbs). I think in conservative varieties of Occitan at least, it varies from verb to verb, and this is reflected in the spelling. Benwing2 (talk) 08:41, 13 December 2023 (UTC)[reply]

Pinging @Vriullop from ca.wikt. Ultimateria (talk) 23:28, 13 December 2023 (UTC)[reply]

@Vriullop @Ultimateria It appears that it varies from verb to verb in Catalan, at least based on the two verbs pegar, which ca.wikt says has /ɛ/ in Central Catalan (consistent with its origin from Latin short ĭ), and membrar, which ca.wikt says has /e/ in Central Catalan (again, consistent with its origin from Latin short ĕ). But the situation is complicated by the dialects, where many dialects have /e/ for both verbs. I'm interested in finding a dictionary that indicates these vowel qualities so that maybe we can include them in the conjugation table, similarly to how the French and Italian conjugation tables give pronunciation; this would only be for Central Catalan for now (maybe forever), since the dialects are complicated. Benwing2 (talk) 00:19, 14 December 2023 (UTC)[reply]

BTW if what I've said is correct, where can I find in Catalan dictionaries the indication of how the stressed vowel is pronounced for a given verb? Benwing2 (talk) 05:32, 14 December 2023 (UTC)[reply]

For variation in dialects see the notation used with {{ca-IPA}}: ê for /ɛ/ in Central, /e/ in Valencian and /ə/ in Balearic. Similarly with ô, and è, é, ò, ó has no variations. This is fair consistent with few exceptions.

It is etymological, ê from Latin ĭ or ē, but with some exceptions.

The only dictionary that indicates the rhizotonic stress is the DNV, for example membrar says é, but it is only for Valencian and it could be either ê or é. It is only helpful for è and ò. I have not found any other source indicating systematically the rhizotonic stress, even the dictionary of pronunciation I have in my bookshelf only includes some paradigmatic verbs. Frankly, there are some verbs I don't know how they are pronounced, apart from my personal perception, not a good sample. The only clue is a noun related with the verb, and the etymology of inherited ones. On ca.wikt I include a rhizotonic parameter verb by verb with ca-IPA notation. Vriullop (talk) 09:25, 14 December 2023 (UTC)[reply]

@Vriullop Thank you! I wonder why Catalan dictionaries are so bad at including the rhizotonic vowel quality patterns. Pretty much all monolingual Italian dictionaries list the rhizotonic quality (and position) for all verbs. What about the pronunciation of other forms, such as verbs with pres 3s in -ou or -eu? Are there any dictionaries indicating the vowel quality of these and other endings? Thanks for any help you can give. Benwing2 (talk) 09:57, 14 December 2023 (UTC)[reply]

I'm not sure what you mean, 'mou' from 'moure' and 'veu' from 'veure' have the same stress that the infinitive.

Endings that may be ambiguous, without any graphic accent:

-em, -eu, as in cantem, canteu, cantarem, cantareu: ê
-essis, -essin, as in cantessis, cantessin: é
-eres, -eren, as in temeres, temeren: é
infix -eix- (-eixo, -eixes, -eix, -eixen, -eixi, -eixis, -eixin): ê, but not used in Valencian that change to -ix-

This is a summary from different sources, coherent with the etymology. Vriullop (talk) 12:38, 14 December 2023 (UTC)[reply]

@Vriullop OK thanks, I suppose that the DCVB dictionary gives the infinitive pronunciation of words like moure. This is very helpful; if I have other questions I'll let you know. Benwing2 (talk) 19:55, 14 December 2023 (UTC)[reply]

DCVB is fine for pronunciation, but in some cases is not complete or confuse. If necessary, you can compare it with the GDLC in the link "francès" that includes translation ca-fr and also pronunciation in Central Catalan, and the DNV for Valencian. Vriullop (talk) 20:57, 14 December 2023 (UTC)[reply]

@Vriullop Thanks! Benwing2 (talk) 21:41, 14 December 2023 (UTC)[reply]

@Jberkel I wrote a preliminary Catalan conjugation module; see User:Benwing2/test-ca-conj for examples. It has a few bugs in it that I'm working out, but it's close. Benwing2 (talk) 22:13, 17 December 2023 (UTC)[reply]

Already looking good, thanks for working on this! Jberkel 22:26, 17 December 2023 (UTC)[reply]

Pronunciation of feu is correct, 2n pl. regular with -eu, and the irregular past was spelled féu in pre-2016 orthography which is more helpful.

The pattern /e/ in Central and /ɛ/ in Valencian is possible, but rare. It can appear for different reasons:

Pronunciation of stressed e is not as uniform in Central Catalan as in other dialects. For example, some word can be /e/ in Barcelona and /ɛ/ in Girona or vice versa. In general, one of the two is considered formal and the other local or dialectal. The formal one is usually the expected one or the same as in Valencian and Balearic.
Recent loanwords may have hesitations in their adaptation. They are usually adapted with è, but with é for the Spanish ones.

The DCVB indicates these local details. In this case I trust the GDLC more. The DCVB comes from fieldwork in the 1920s. Some of the pronunciations have not been registered in other late 20th c. fieldwork. The GDLC compiles the pronunciation of the main reference work used for radio and TV speakers in Central formal speech. In short, this pattern is rare in formal pronunciation. As far as I can remember, it doesn't happen with verb forms, and it can be treated like other irregular cases that do not follow an expected pattern. --Vriullop (talk) 18:00, 19 December 2023 (UTC)[reply]

Although the /e/-/ɛ/ pattern above is rare, the other way is more common: /ɛ/ in Central and Balearic, /e/ in Valencian. This is noted on cawikt as ë (double e), a variant of ê (triple e). Stressed schwa in Balearic is used in inherited words and inflections. In cultisms or loanwords (i.e. cafè), or just words perceived as literary (i.e. mestre), instead of schwa it is /ɛ/ as in Central. There are indeed verb forms with rhizotonic vowel ë. There is no equivalent with stressed o, but for consistency it could be noted ö (double o) instead of ô. Vriullop (talk) 08:02, 21 December 2023 (UTC)[reply]

@Vriullop Thanks for all your help. I have implemented ë in Module:ca-IPA. Can you help me by fixing the default rules in the module that currently default to ê to instead default to ë when it's correct? For example, cens defaults to cêns when it should be cëns. This is in the mid_vowel_e() function of Module:ca-IPA. I don't know Catalan well enough to fix it myself, and the corresponding cawikt module in ca:Module:ca-pron/AFI seems to have the same rules we currently have. Benwing2 (talk) 20:49, 21 December 2023 (UTC)[reply]

As stressed schwa depends on inherited v. cultism, there is too much variation with -ens, -ena, -enes endings to be able to redefine the rule. I have added a tracking and I have checked where it was being applied by default. After adding hint ê or ë, I think it is safer to remove this rule: Special:WhatLinksHere/Template:tracking/ca-IPA/ens-ena-enes. Later, I'll look other rules with default ê. Vriullop (talk) 09:19, 22 December 2023 (UTC)[reply]

@Vriullop Thank you. I agree about removing the rule. In general I'm not much in favor of rules like this that are wrong a significant fraction of the time, and prefer to be explicit except when it's nearly completely predictable. Benwing2 (talk) 11:06, 22 December 2023 (UTC)[reply]

@Vriullop I just discovered that cerndre is irregularly missing the first r in pronunciation. Does this carry through to inflected forms like cerno, cerns or are they pronounced regularly with /r/? Benwing2 (talk) 03:05, 24 December 2023 (UTC)[reply]

BTW there is a bug in cawikt's handling of Balearic pronunciation with ê; hard /k/ shows up as /c/ in the first of two alternants. See ca:cerca for an example. Benwing2 (talk) 03:08, 24 December 2023 (UTC)[reply]

@Vriullop OK, I have several more questions. I'll try to list them all here and avoid pinging you individually.

cors "privateering campaign" and cors "Corsican" are given without the /r/ in Eastern Catalan pronunciation both here and in cawikt. However, GDLC says /kórs/ for the former and /kɔ́rs/ for the latter. Which is correct, and if the /r/ is correct, do we need to update Module:ca-IPA?
I am going through mid-vowel verbs trying to update the inflected forms to have the correct vowels. I am probably going to implement something soon in {{ca-conj}} and/or {{ca-verb}} to let you specify the mid-vowel quality and display it, similar to what cawikt does. I cannot determine the vowel quality of the following verbs so far: cessar, conrar, copar, copsar, crepar, dopar, drenar, gestar. Can you help?
I am going to update Module:ca-IPA so you can individually specify the pronunciation of different dialects, as I have found some need for this. Apropos of this, I notice that the cawikt version of {{ca-pron}} supports ẽ; do you think we should support this, or just use the per-dialect support I am going to add?
Also, I'm more and more convinced that we should have few default rules for mid-vowel quality, and require it to be given explicitly in all cases that don't involve a well-known affix.
fossa "pit, grave, etc.": does it have /o/ [per GDLC] or /ɔ/ [per DNV, DCVB and cawikt]?
llei "law": does it have /e/ or /ɛ/ in Eastern Catalan, or some complex mixture? cawikt says /ɛ/, GDLC says /e/, DCVB says a complex mixture.

Thanks for your help, Benwing2 (talk) 06:41, 24 December 2023 (UTC)[reply]

Lot of stuff here, but I'm happy to help.

'Cerndre' losts first r when followed by sequence -ndr-. That is infinitive, future and conditional. All other forms have regular pronunciation. This happens also with prendre and derived verbs. See ca:Categoria:Rimes en català -ɛndɾe including 14 verbs ending with -prendre. Sequence -rndr- only occurs in 'cerndre' and there is not any other term with sequence -rendr- other than these 14 verbs.
/c/ in Mallorcan is an allophone of /k/, i.e. local pronunciation [məˈʎɔ̞ɾ.ca̟]. You're right, this is phonological and not phonemic. Catalan works often include some phonological symbols in phonemic representations for dialectal contrast, but this is not the case of [c] with restricted use. I plan to remove it for being misleading.
'Cors' fixed on cawikt. This r is really retained, respelled 'corrs'. The module should not assume the lost of -r(s) in final coda for monosyllables. While most polysyllables do, most monosyllables don't. The problem is how to manage that.
My guest on rhizotonic vowels:
- cessar: é; inherited from Latin ě not followed by an opening context, and DNV é.
- conrar: ó; from unstressed o, reduction of conrear, DNV ó.
- copar: ó; from French /u/ and analogous to noun copa, DNV ó.
- copsar: ó; inherited from Latin ǔ, DNV ó.
- crepar etym 1: ë; as noun crep from the same French root, neologism not attested in Balearic, DNV é.
- crepar etym 2: é; from Latin ě, only used in Balearic.
- dopar: ó; neologism as in Spanish, close to the English original, DNV ó.
- drenar é; idem.
- gestar: é; from Latin ě, as the noun gesta from the same root, DNV é.
Notation ẽ is hardly used. It is better to fix that with parameters per-dialect: ca:Special:Diff/2245937. I'll remove it on cawikt.
Some rules for mid-vowels are theoretically justified. I have this pending to review the unwanted side effects. I agree that it shouldn't lead to erroneous results.
Fossa should be ò from Latin ǒ, but there have been some modern changes during the 20th c. that I am still unable to explain. The DCVB shows the situation in the first third of the 20th c. in accordance with etymology. Probably in Central today is hesitant. In this case, I would say ó in Central and ò in Balearic and Valencian, two dialects more conservative.
Llei fixed on cawikt. From Latin ē it should be ê, but the diphthong has changed it: é in most Central, retained è in northern Central, /ə/ in Balearic, é in Valencian.

Vriullop (talk) 18:27, 26 December 2023 (UTC)[reply]

@Vriullop Thank you! I have applied the changes offline to the specific verbs and other words mentioned above, and I will push them soon. Still working on Module:ca-IPA. A few more questions:

More verbs where I'm not sure of the rhizotonic vowel quality: menar "to lead" (is this ê?), menjar "to eat" (apparently it uses now-deprecated ẽ?), mentir "to lie" (?), molar "to mock" (from Spanish; ó?).
mesa "altar, mense, table": cawikt says /e/ for both East and West, which agrees with DCVB, but GDLC says /ɛ/. Mistake?
messes "harvest time": again, cawikt says /e/ for both East and West, which agrees with DCVB, but GDLC says /ɛ/.

Benwing2 (talk) 05:46, 27 December 2023 (UTC)[reply]

'menar': ê.
'menjar': é but Balearic ə. I'll modify the rizo parameter to accept an explicit /e/, /ə/, only used here.
'mentir': é in forms without -eix-.
'molar', to rock, from Spanish: ó.
'mesa' as a noun has two etyms with different pronunciations, but GDLC only show one in translations. Here DCVB is correct.
'messes', I would say é but irregular è in Central.

Vriullop (talk) 09:47, 27 December 2023 (UTC)[reply]

@Vriullop Thanks for your quick response! I have made the offline updates. Some more questions (for N and O) ...

noble: I already pinged you about this. DNV says /o/ for Valencian but DCVB says /ɔ/.
nombre: cawikt and DCVB say /o/ for Eastern Catalan but GDLC says /ɔ/. /o/ is etymologically expected.
odre: Same. cawikt and DCVB say /o/ for Eastern Catalan but GDLC says /ɔ/. /o/ is etymologically expected.
ofi "office": Vowel quality? Maybe /o/ since the o is unstressed in oficina?
oi: DCVB splits the interjection into /ɔj/ "yes" from Latin hoc and /oj/ (expression of pain or surprise). GDLC and DNV group these two meanings and say the pronun for both is /ɔj/. Who is right?
orla "border, fringe": DCVB and cawikt say /ɔ/ for Valencian. DNV says /o/.
oro "suit in a Spanish deck or cards": Same as previous: DCVB and cawikt say /ɔ/ for Valencian. DNV says /o/. (Not in GDLC.)

Benwing2 (talk) 01:03, 28 December 2023 (UTC)[reply]

For P:

peli "film" (clipping of pel·lícula): cawikt says pel·li has ê, so I assume this is the same, but it seems strange to have ê for a recent coinage.
perca "perch (fish)": cawikt says /ɛ/ for Valencian but DNV says /e/. DCVB doesn't give a pronunciation.
pesta "plague": cawikt and DCVB say /ɛ/ for Central but GDLC says /e/ (mistake?).
pleca "vertical bar": Balearic vowel? Is it ê?
poblar "to populate": DNV says stressed vowel is /o/ despite poble having /ɔ/. Mistake?
porro "leek; spliff": cawikt and DCVB say /ɔ/ but both GDLC and DNV say /o/.
posa "pose" (not in cawikt): GDLC says /o/ despite this being derived from posar, which has /ɔ/. (Are there two different pronuns/etyms here?)
postres "dessert": cawikt and DCVB say /ɔ/ for Valencian but DNV says /o/.
pregar "to pray": Presumably /e/ (same as prec)?

Benwing2 (talk) 05:23, 28 December 2023 (UTC)[reply]

For P:

'peli' is an informal spelling of 'pel·li'. The latter is used in the press and has been consolidated, unlike other clippings. I spontaneously pronounce it è just like any word beginning with consonant + stressed e + l, including inherited ones from Latin both ě and ē. Being of general use and not exclusively colloquial, I would say ê, fully adapted in Central and the same value as unstressed in Balearic and Valencian.
'perca': ë. Expected é but è per context C+ě+r, not fully changed in learned borrowings.
'pesta' is weird, expected é but with some irregular è not enough explained in context C+ě+s. From the sources, è but irregular é in Central, although the irregularity is the other way around.
'pleca': ë, as a technical word, schwa is improbable in Balearic.
'poblar': I can't find any explanation for the difference between 'poble' and 'pobla'. Without any confirmation, for now I would say ò.
'porro': ó. Expected ò but usually changes to ó before -rr-.
'posa': noun ó and verb ò. Expected ò both from 'pausa' and 'pausare', but most current senses of the noun are calques of French or Spanish, both ó.
'pregar': é.

Vriullop (talk) 13:30, 29 December 2023 (UTC)[reply]

On cawikt the pronunciation was first added according to DCVB. Revision with GDLC is partial, not completed. Inclusion of pronunciation on DNV is recent, not yet checked. Your guesses are usually correct.

For N and O:

'noble': ô. Expected ó, on first syllable changed to ò per consonant context, except on areas with Mozarabic influence as in Valencian.
'nombre': ô. The same case, but I trust DCVB for Balearic with irregular ó.
'odre': ô, but Balearic ó.
'ofi', I've never heard it in Catalan. My guess is ó either from an unstressed vowel or from Spanish.
'oi' both ò and ó. I trust DCVB with three groups, the last one used specially in Balearic. The two authors of the DCVB were Balearic, and both 'oi las' (surprise) and 'ois' (moans) result familiar to me heard from Balearic people. Probably outside the Balearic Islands people don't care about the difference with barely used senses.
'orla': ô. Again, an expected ó changed to ò except in Valencian, confirmed in descriptive works.
'oro': ô, hesitant by analogy with inherited 'or'.

Vriullop (talk) 15:32, 28 December 2023 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ For R:

reble "rubble": cawikt and DCVB say ê, but GDLC says /e/ for Central Catalan.
recar "to regret": DNV says /e/; DCVB suggests /e/ everywhere, is that right?
regar "to water": Etymologically should be ê, is that right? (OTOH reg has /e/ everywhere per GDLC and DNV)
regna, regne, regnar: These seem to have [ŋn]. Do all words in -gn- have this? If so we should fix Module:ca-IPA to do this automatically. (Is this Eastern Catalan only? Valencian seems to have [gn].)
reptar: In the meaning "to reprimand; to challenge" it seems to have rhizotonic /e/. In the meaning "to crawl" I am not sure.
resar "to pray": Since this is a Spanish borrowing, does it have /e/? res "prayer" seems to have /e/.
retre "to give back, to return": cawikt and DCVB say /e/ in Eastern Catalan but GDLC says /ɛ/.
rosca "screw thread": cawikt and DCVB say /ɔ/ for Valencian but DNV says /o/.
rosta "fried bacon, fried bread": cawikt says /ɔ/ for both Eastern and Western; DNV says /ɔ/ for Valencian but GDLC says /o/. DCVB has /ɔ/ and /o/ dialectally.
rosta (feminine of rost "steep"): Same. cawikt says /ɔ/ for all, DNV says /ɔ/ but GDLC says /o/. Here, DCVB has only /ɔ/.
rotar: Two etyms: (1) "to belch": Does it have /o/ like rot "belch"? (2) "to rotate": Does it have /o/ because it's borrowed from Spanish?
rotllo: "roll; annoyance": DNV says it has /o/ but rotlle has /ɔ/. Mistake? cawikt and DCVB say forms have /ɔ/ everywhere, and GDLC agrees that both forms have /ɔ/ in Central Catalan. Note also rotlo, where again DNV has /o/; here again, DCVB says /ɔ/ everywhere but in this case cawikt says uses ô to get /o/ in Valencian.

Benwing2 (talk) 08:37, 28 December 2023 (UTC)[reply]

@Vriullop Thanks again for your detailed responses, I really appreciate the work you're putting into the responses. Issues I found involving terms with S:

seca "mint": GDLC says /ɛ/, DNV says /e/ and cawikt says ê, which are all compatible, but DCVB says /ɛ/ everywhere. In this case I wonder if DCVB is actually correct while both DNV and cawikt are mistaken.
sedar "to sedate": DNV says /e/ for root vowel but unknown in Central Catalan.
sense "without": cawikt and DCVB say ê, DNV says /e/ but GDLC says /e/ rather than expected #/ɛ/.
sentir "to feel": DNV says /e/ root vowel. No dictionary attests the Central Catalan root quality, although /e/ is expected.
serva "serviceberry": cawikt and DCVB say ê, DNV says /e/ but GDLC says /e/ rather than expected #/ɛ/.
setge "siege; figwort": cawikt and DCVB say é, DNV says /e/ but GDLC says /ɛ/.
soga "rope": DNV and GDLC both say /ɔ/ but DCVB says variously /o/ or /ɔ/ for a bunch of obscure places that I'm not familiar with but seem mostly Northwest Catalonian. I assume Balearic must have /ɔ/ but not sure.
sonso "clumsy, gauche": cawikt and DCVB say /o/ for both East and West; DNV agrees with /o/, but GLDC says /ɔ/ for Central Catalan. Maybe this is a case of changing over the last century?
sorna "sarcasm": cawikt says ô, but both DNV and GDLC say /o/. DCVB doesn't give pronun.
sosa "saltwort, soda ash": cawikt and DCVB say ô, but both DNV and GDLC say /o/.
sostre "ceiling": cawikt says ó and DNV says /o/, but GDLC says /ɔ/. DCVB maybe has the real story: /ɔ/ in Barcelona, /o/ elsewhere. I'm going with the idea that Western Catalan (Northwestern and Valencian) have /o/, while Central has /ɔ/ and Balearic has /o/. Correct?
sotjar "to spy on": DNV says /o/ root vowel. No dictionary attests the Central Catalan root quality, but I am guessing /o/ based on the proposed etymologies. Correct?

Note that I'm now 87% through the set of 2,722 terms that I identified for auditing the mid-vowel quality, and have finished with S. T represents about 7% of the total, V represents 4-5%, and the remaining letters around 1%. So I'm quite close to finishing, with lots of help from you :) ... Benwing2 (talk) 09:20, 30 December 2023 (UTC)[reply]

For S:

seca: I think the correct one is ë, although I'm not sure about its evolution from Arabic.
sedar: expected ê from Latin sēdō.
sense and sens: expected ê, but such words often used as proclitics tend to become closed. So é but schwa in Balearic.
sentir: é as expected.
serva: ê is correct. As in other similar cases, the GDLC does not distinguish properly different pronunciations from different etyms.
setge: expected é, but è in Central per context subject to openness.
soga: ò in general. It was identified by Coromines in a handful of about 40 words that have changed an etymological ó by ò except in some specific areas. It is known as the Coromines law, and it is still unknown why it includes certain words and not others.
sonso: ó but ò in Central, for unknown reason to me.
sorna: ó in general.
sosa: ó in general.
sostre: it is one of the Coromines law, expected ó changed to ò. This law may have various degrees of extension. Probably most conservatives areas, Balearic and Valencian, maintain the old ó, while most Central has changed to ò. Usually Northwestern also changes by Central attraction, to be confirmed.
sotjar: not sure, but ó is the best guess.

Vriullop (talk) 08:31, 5 January 2024 (UTC)[reply]

For R:

reble: expected é. The DCVB with ê seems by analogy with other words. I would say é but with an irregular ə in Balearic.
recar: é as expected from an earlier 'a'.
regar: ê as expected. Nouns 'rec' and 'reg' are interrelated and are not a good indicator for the verb.
All -gn- between vowels are pronounced [ŋn]. Also -n- followed by /k/ or /ɡ/, but this one was reverted per no phonemic.
reptar: é from Latin rěp(u)tō and ê from rēptō.
resar: é as noun 'res'.
retre: I really don't know which process applies here. By now I'd say ë, pending of confirmation.
rosca: ô.
rosta, as a slice of bacon usually fried with bread is a typical dish of the Pyrenees. Although it is the feminine form of 'rost', from the old sense "roasted", in the Pyrenees this ò usually changes to ó. In the DCVB, I read that the northernmost localities say ó, and ò it is quite far from the Pyrenees. In short, as a noun ó in Central, ò in Valencian and Balearic. As an adjective form: ò, although the GDLC does not separate it properly.
rotar: ó for both etyms.
rotllo, what a mess! It is not attested in Valencian until recent times, probably from Spanish rollo. This ó is archaic, not accepted in other areas where it is used from Old Catalan. 'Rotlle' is the inherited form, hardly used in Valencian where it is preferred the spelling 'rotle', both ò. 'Rotlo' is only used in Balearic, for me it is anecdotal how to try to pronounce it by outsiders with a range of alternatives spellings.

Vriullop (talk) 11:29, 4 January 2024 (UTC)[reply]

@Vriullop Thank you again! BTW I have gone through and added (offline) stressed root vowels to all enwikt Catalan verbs with e or o where I could determine it, using some combination of cawikt, DNV, GDLC and DCVB. (It looks like I was able to figure out the vowel for 1,174 verbs in -ar, 33 verbs in -ir and all relevant verbs in -re and -er, and only couldn't figure out the vowel for 72 verbs in -ar and 2 verbs in -ir.) I am mostly done coding the changes I want to make to Module:ca-IPA and I'll use the new code to support displaying the root vowel info. I'll post the list of undetermined verbs soon. Benwing2 (talk) 19:55, 4 January 2024 (UTC)[reply]

BTW I have finished the changes to Module:ca-IPA and Module:ca-headword and pushed all the root vowel additions. You can see them in action e.g. in flirtejar, besar, adreçar, annexar and several others. Benwing2 (talk) 07:45, 5 January 2024 (UTC)[reply]

Also, I added tracking for all terms with defaulted mid vowel quality, with the plan of removing some of the defaults. The first word I looked at, for example, is amulet, a recent borrowing that claims to have ê, which seems unlikely. Benwing2 (talk) 08:07, 5 January 2024 (UTC)[reply]

Here is the list of now 68 -ar verbs where I couldn't identify the Central Catalan root vowel (sometimes only in one etymology out of several): afogar, agregar, al·legar, alterar, amonestar, ancorar, atemptar, celebrar, col·laborar, commemorar, compensar, condensar, confessar, congregar, conrear, contemplar, crebar, delegar, denegar, depredar, desagregar, desintegrar, deteriorar, devorar, discrepar, dreçar, dropar, edulcorar, elaborar, elevar, encetar, engegar, enllumenar, ennuegar, ensopegar, entaforar, entollar, entrenar, esborrar, esbotzar, esmicolar, espitregar, esverar, evaporar, exacerbar, expectorar, explorar, gofrar, impetrar, increpar, integrar, interpretar, isolar, laborar, negar, perforar, prolongar, rememorar, retolar, rosegar, secretar, segregar, somorgollar, temptar, tomar, trafegar, trepar, trepollar. Benwing2 (talk) 08:12, 5 January 2024 (UTC)[reply]

In some cases I can't be completely sure, these are my best guesses: afogar ó, agregar é, al·legar ê, alterar é, amonestar é, ancorar ó, atemptar é, celebrar é, col·laborar ó, commemorar ô, compensar ê, condensar ê, confessar é, congregar é, conrear ë, contemplar é, crebar é, delegar é, denegar é, depredar é, desagregar é, desintegrar é, deteriorar ó, devorar ô, discrepar é, dreçar ë, dropar ó, edulcorar ô, elaborar ó, elevar é, encetar é, engegar é, enllumenar ê, ennuegar ë, ensopegar ê, entaforar ó, entollar ò (both), entrenar é, esborrar ó, esbotzar ó, esmicolar ô, espitregar ë, esverar é, evaporar ó, exacerbar é, expectorar ó, explorar ó, gofrar ó, impetrar é, increpar é, integrar é, interpretar é, isolar ô, laborar ó, negar é (both), perforar ó, prolongar ó, rememorar ó, retolar ó, rosegar ê, secretar ë, segregar é, somorgollar ó, temptar é, tomar ó, trafegar ê, trepar é, trepollar ó. Vriullop (talk) 08:23, 10 January 2024 (UTC)[reply]

Reviewing mid-vowel defaults tracked:

e/u: doesn't make any sense, probably it was intended for a diphthong -eu-.
o/u: also nonsense.
e/ct-cts-cts-ctes: too many variations è with cases of é only in Central.
e/dre-dres: mostly ë instead of é.
e/final-l: it is stable but needs to exclude -ell(s).
e/l-ls-ll: it's ok, I haven't found any problem.
e/ma-mes: too many variations
e/ens-ena-enes: too many variations ê/ë
e/nse-nses: it doesn't worth for a few words
e/nt-nts: mostly é with few exceptions, widely used
e/r-rs-ra-res: too many variations é/ê
e/rC: it's ok
e/sos-sa-ses: it's ok
e/t-ts-ta-tes: too many variations
è/s-blank: FIXME only in last syllable stressed, currently includes tèbia, època, ...
o/r-rs-ra-res: too many variations

Vriullop (talk) 09:20, 8 January 2024 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Vriullop I have finished everything up through T and pushed the offline changes to Wiktionary. Issues I found with T:

teca three etyms: (1) "food"; (2) "teak"; (3) "theca". All three have /e/ per DNV, and (1) and (3) have /ɛ/ per GDLC. (1) has /ɛ/ per DCVB, otherwise not indicated. I am guessing then that (1) and (3) have ë, and (2) must have either é or ë.
temprar: Exactly parallel to emprar. cawikt says ê but DNV says /e/ for tempre. Is /e/ more recent for Central?
temptar "to try": /e/ per DNV, I'm guessing é per etymology.
tesla "tesla": /e/ per DNV, I'm guessing é.
testar "to witness": /e/ per DNV, I'm guessing é per etymology.
teu "your": /ɛ/ per GDLC for Central Catalan but /e/ per cawikt. GDLC says /e/ for meu "my" so I wonder if this isn't a mistake in GDLC.
text "text": /tekst/ per GDLC, /tɛkst/ per DNV. Correct? DCVB says /test/ for everywhere, which may be antiquated.
tomar (1) "to catch"; (2) "to knock down". Root vowel?
tondre "to shear": /o/ for Central in cawikt and DCVB, but /ɔ/ in GDLC (DNV says /o/). However, note that tosa has /o/ in GDLC. What's going on here?
tora "aconite": GDLC and DNV both say /o/ but DCVB says /ɔ/ for both Western and Eastern. Is /ɔ/ antiquated?
torbar "to disturb", torba "disturbance" and "torba" peat: GDLC and DNV both say /o/ but cawikt says /ɔ/ for Central Catalan (/o/ for Valencian). Is /ɔ/ wrong or antiquated?
tors "torso": cawikt says /o/ (dialect not indicated), but GDLC says /ɔ/ for Central (and DNV says /o/ for Valencian). I am assuming GDLC is correct.
trempa, trempar: cawikt says ê everywhere, in agreement with DCVB for tremp and trempa, but GDLC gives /e/ for both tremp and trempa; maybe /e/ is more modern as DCVB's fieldwork is ~ 100 years old.
trenca "duffel coat": A borrowing from Spanish trenca. The other meaning of the noun "breakage; lesser grey shrike" has ê but this seems unlikely for a Spanish borrowing. I'm guessing ë.
trepa "trimming; stencil" also "mob, riffraff, rabble" also a form of trepar "to drill, to perforate". DNV says /e/ for all three etyms; GDLC says /e/ for the first two, but DCVB says /ɛ/ for the meaning "mob, rabble". I am not sure whether all three are etymologically related.
tropa "troop; crowd": cawikt says /ɔ/ everywhere (and DNV says /ɔ/) but GDLC says /o/. DCVB says /ɔ/ for Eastern but /o/ for Girona; maybe /o/ for Central is more recent.
trotllo "medusafish": cawikt says /ɔ/ everywhere but DNV says /o/, so I'm assuming ô.

Also a few other issues:

alliberar: cawikt says /ɛ/ everywhere but DNV says /e/.
beca and derived becar "to give a scholarship to": cawikt says ë but DNV says è.
clon: cawikt says ò but DNV says /o/. I am guessing ô then.
emprar: cawikt says ê but GDLC says /e/ for empre. Is /e/ more recent for Central?
perseverar: cawikt says ê, are we sure? sever has é.

Benwing2 (talk) 01:53, 6 January 2024 (UTC)[reply]

One more question (sorry for the barrage of questions): Currently the module section for Central Catalan unilaterally removes final single -r, whether absolutely word-final or followed by an -s. I'm thinking of making this less absolute, as follows:

Don't remove final -r(s) in monosyllables.
In non-monosyllables, remove final -r(s) in -ar, -er, -ir and in -[dtsç]or, but not otherwise. This is based on the fact that most words in -[dtsç]or are agent nouns and seem to fairly consistently remove the -r, while the remaining words in -or often (but not always) preserve the -r per GDLC. Here is a long list of such words: amor, humor, anterior, vapor, rumor, labor, major, tenor, tumor, terror, inferior, superior, clamor, posterior, furor, ulterior, tricolor, temor, rigor, vigor, menor, decor, olor, llavor, suor, licor, rubor, petricor, negror, remor, millor, albor, cremor, claror, grogor, blavor, maror, pitjor, frescor, senyor, finor, incolor, rojor, vermellor, blancor, lletjor, amargor, primor, favor, picor, escalfor, tremolor, esgarrifor, llacor, raor, xafogor. The idea is that to force the preservation of -r, write 'rr', and to force the non-preservation, write '-ó' (although if all these words preserve the -r in Valencian, we'd want some other signal, e.g. '-(r)'). Thoughts? Benwing2 (talk) 09:50, 6 January 2024 (UTC)[reply]

This plan sounds fine, assuming:

The non-preservation happens when the final syllable is stressed. When unstressed only affects some words, like créixer, càntir.
In Valencian it is always preserved. To force the non-preservation in Central and Balearic writing '-(r)' or '-(r)s' is intuitive. In fact, this is similar to the rhymes, i.e. Rhymes:Catalan/a(ɾ).
In Balearic there are more loses of final -r than in Central Catalan. See amor, although the result is correct, it is not consistent when the preservation is forced writing 'rr', and it is not reasonable to assume that in Balearic no final -r is ever pronounced. Maybe it should be fixed with a per-dialect parameter.

There are many pending things above that require more time. Vriullop (talk) 11:57, 6 January 2024 (UTC)[reply]

@Vriullop Thanks for your comments. I'm thinking that writing rr should force the pronunciation of final -r everywhere, while writing something like rh should cause it to be pronounced in Central Catalan but not Balearic. This is based on looking through the DCVB with a sample of the above nouns, some of which appear to have pronounced -r in the Balearics, some not, and for some it depends on where in the Balearics. More complex scenarios can be handled using dialect-specific params (which are now implemented; see llei for an example). Benwing2 (talk) 21:23, 6 January 2024 (UTC)[reply]

Another possibility in place of rh for "pronounced everywhere but Balearics" is (rr). This sets up a hierarchy of pronunciation: rr > (rr) > (r) > nothing. Benwing2 (talk) 01:13, 7 January 2024 (UTC)[reply]

BTW I am planning on making it required to specify the way final -r is pronounced, using one of rr, (r), (rr) [or maybe rh if we decide on that] or omitting it, except in the circumstances where it defaults to (r), which are multisyllabic words ending in stressed -ar, -er, -ir or -[dtsç]or. In all other circumstances, the pronunciation seems far too irregular to provide a default.

Note that I have already removed the majority of defaults for mid vowel o and added the vowel explicitly, and I'm planning on doing the same for mid vowel e. For the defaults I removed, either there were few places that made use of the defaults or there were many but with lots of errors, e.g. o and e in the penultimate syllable with -i or -is in the last syllable were defaulting to ò and è respectively, which makes sense for adjectives of this form but doesn't work for subjunctive verb forms, and there were lots of places where this default was being used for subjunctives, producing incorrect results.

One other thing: the pronunciation given in GDLC for meteor is [məteɔ́ɾ], with unstressed [e]. Is this correct? If so I'll need to add a special symbol to allow for unstressed unreduced vowels. However, maybe it's a mistake; I found a pronunciation on Forvo here [1], which sounds more like [mətəɔ́ɾ] (BTW cawikt says [mətəóɾ] with /o/, which may be wrong as well for Central Catalan). Benwing2 (talk) 05:08, 7 January 2024 (UTC)[reply]

I forgot to add, I'm implementing a shortcut notation to make it easier to specify things like the pronunciation of final -r without having to repeat the entire word. If you write [FROM:TO] where FROM is part of the spelling and TO is the corresponding respelling, it will make that substitution in the respelling as long as it's unambiguous. So you can write [or:ôrr] for meteor. To make it even shorter, in cases where the spelling and respelling are similar enough, you can just write the respelling, hence [ôrr], and the code knows that ô should match either o or ô in the original spelling and rr should match either r or rr. Another common example is [ks], which is equivalent to [x:ks] and can be used to respell x as ks in words like boxejador. This will all be documented in {{ca-IPA}} as soon as I push the code. Benwing2 (talk) 05:16, 7 January 2024 (UTC)[reply]

Great.

For final -r, I like the hierarchy rr > (rr) > (r)
'meteor' with unstressed [e] is correct. No need to do anything in the module, function reduction_ae does not apply any reduction in groups 'eà' and 'eò'.
A shortcut for respelling is useful.

Vriullop (talk) 10:27, 8 January 2024 (UTC)[reply]

@Vriullop I have implemented everything described above and fixed up all terms in final -r(s) appropriately. The use of the respellings for -r is documented in {{ca-IPA}}. The substitution notation like [ó(r)] is still being documented. Benwing2 (talk) 02:29, 10 January 2024 (UTC)[reply]

@Vriullop Thanks for your comments! I have added add the root vowels you specified and am going through the defaulted mid vowel conditions and fixing them up. One thing I notice is that written bl pronounced /b.bl/ and similarly written gl pronounced /g.gl/ aren't correctly handled. For bl at least it seems not all occurrences of bl result in this doubling, e.g. doblar does but sublim doesn't yet they have the same structure in terms of # of syllables, word shape, position of the accent, etc. What do you recommend? i tried manually adding written g.gl to segle, writing it as seg.gle, but then Valencian also gets the doubling, which is wrong. I see two approaches: (1) Manually require all doubled bl and gl to be written as bbl and ggl except maybe in certain suffixes (e.g. -able(s), ible(s)), and have the Valencian-specific code remove the doubling and convert it back to single stops; (2) Double bl and gl by default. This would mean we'd need some method of indicating the non-doubled occurrences, maybe by writing sub.lim or something (although this might be problematic when we start providing phonetic output with fricative [βɣð], which I'd like to do soon; not actually sure though if there will be an issue). Thoughts? Benwing2 (talk) 07:02, 12 January 2024 (UTC)[reply]

The groups -bl- and -gl- are geminate in Central and Balearic in post-stressed position: poble /ˈpɔb.blə/, regla /ˈreɡ.ɡlə/, including endings -able, -ible. That can be coded in the module. It doesn't happen in Valencian, nor in pre-stressed position, as in sublim. But all its derivatives are also geminate even if in pre-stressed position: poblar, població, reglar, reglament, ... That needs to be respelled pobblar, pobblació, regglar, regglament, and then undone in Valencian. Vriullop (talk) 08:22, 12 January 2024 (UTC)[reply]

@Vriullop Got it, thanks. I'll implement this. What do you think of just providing phonetic output and changing the /.../ to [...]? This seems consistent with what the various dictionaries do; or at least, they explicitly show the fricative allophones [βɣð]. This would mean, for example, that the issue of whether to display [ŋ] goes away: we just display it whenever it's pronounced as such. Benwing2 (talk) 20:04, 12 January 2024 (UTC)[reply]

I have implemented what you said for -bl- and -gl-. I am currently working on auto-adding secondary stress to adverbs in -ment. (In the process I'm adding a quick shorthand to indicate a part of speech for a given term, e.g. n/RESPELLING or just n/ for a noun, a/RESPELLING or just a/ for an adjective, etc. The idea here is that terms in -ment default to adverbs, which means they get secondary stress by default, but you can override this by specifying n/ for a noun like desembarcament or a/ for an adjective like vehement. Some terms need both a part of speech and respelling, e.g. desdoblament needs n/[bbl] to indicate that it's a noun and the -bl- is pronounced /bbl/.) I have a question though about this. Adverbs in the DNV are indicated with *primary* stress on the preceding component and no stress on -ment, e.g. see [2] for feliçment. This seems rather strange to me and it's contrary to what the Wikipedia article on Catalan phonology says. Is this really true or is it just something weird in the DNV? Benwing2 (talk) 23:40, 12 January 2024 (UTC)[reply]

BTW I found an exception to the rule that post-stressed -bl- is geminate: bíblic (and Bíblia). Are there others? If so and given how many exceptions there are in the other direction, I wonder if we shouldn't just make all -bl- and -gl- geminate by default in Central Catalan and Balearic, and require that all cases where this doesn't happen get rewritten using [b.l] or [g.l]. Benwing2 (talk) 04:01, 13 January 2024 (UTC)[reply]

I implemented the auto-adding of secondary stress to adverbs in -ment, along with the part of speech hints described above, and fixed up all nouns and adverbs in -ment appropriately. (I actually added pronunciations to all or almost all nouns and adverbs in -ment that were missing them; this took several hours for adverbs because there are around 800 of them in -ment, and many of them have secondarily stressed e or o, which needed looking up.) The mid vowel hint now applies to the part preceding the adverbial -ment, not to the -ment itself (which is always pronounced /men(t)/ with /e/). Note also that in the future, these part of speech hints can also help with things like terms in -ar, where adjectives in Central Catalan pronounce the final -r but nouns and verbs generally dont. Benwing2 (talk) 07:33, 13 January 2024 (UTC)[reply]

OK, from the GDLC it looks like there are actually three ways that -bl- can be pronounced: obligar has [βl], doblar has [bbl], and obliterar has [bl]. Is that correct? If so I'll need to come up with some notation to distinguish these three. Maybe we should write o-bliterar to get [bl]; this is consistent with words like hipoglucèmia, which have hard single [gl] following a prefix with secondary stress [ìpuglusɛ́miə]. This would suggest a respelling hípo-glusèmia. Then if we need post-stressed [βl], we write e.g. Bíb.lia, and if we need post-stressed [bl] for some reason we'd write e.g. Bí-blia or something, and to get post-stressed [bbl] we'd write e.g. Bíbblia (or rely on the default). Make sense? Sorry to dump so much text on you. Benwing2 (talk) 09:25, 13 January 2024 (UTC)[reply]

Great work here.

The inclusion of allophones βɣðŋɱ does not imply to change the transcription with brackets [...] In fact, /β/ is not a w:voiced bilabial fricative but a simplification without diacritic of an approximant [β̞]. Catalan works follow a convention of "broad transcription" with the inclusion of what is considered relevant and without any claim about phonemic values. A purely phonemic transcription is a theoretical discussion. According to different authors, between 25 and 31 phonemes can be considered in Catalan. For example, the schwa is a predictable dialectal allophone, but it is relevant in contrast with other Romance languages. If it were necessary to mark that it is not strictly phonemic, frwikt uses \backslashes\. They are also used by the Merrian-Webter as a notation for its own IPA transcription. The criteria followed in enwikt do not seem consistent enough to me.
The DNV does not show primary and secondary stress, nor does it in compound words. It is more noticeable in Eastern dialects without schwa in secondary stress. The stress showed in adverbs with -ment is misleading.
'Bíblic' and 'Bíblia' are the only exceptions to geminate bl.
I have not found any explanation for 'obliterar' and 'hipoglucèmia'. See https://giec.iec.cat/textgramatica/codi/4.4.3.3. Maybe as cultism in very formal speech, but I think it doesn't worth to make exceptions here. On the contrary, note that /β/ does not happen in Balearic and formal Valencian after a vowel, that is in dialects that distinguish /b/-/v/.

Vriullop (talk) 09:17, 15 January 2024 (UTC)[reply]

@Vriullop Thanks for your response, this is very helpful. I am currently working on fixing up terms with written x (there are a lot of mistakes) but I'm almost done with the offline portion and I think next I'll focus on adding the fricative allophones and correctly handling multiple words. For handling multiple words I need to know the following:

What are the unstressed words? I assume they are all the proclitic object pronouns em, et, es, el, la, els, les, li, ens, us, ho, hi, en; plus the enclitic ones -me, -te, -se, -lo, -la, -los, -les, -li, -nos, -vos/-us, -ho, -hi, -ne (which might already be handled correctly); the contracted ones with apostrophe (which may already be handled correctly); maybe the unstressed possessives mon, ma, mos, mes, ton, ta, tos, tes, son, sa, sos, ses; the prepositions a, de, per, amb (and obsolete ab?), en (what about cap, des?); the prepositional contractions al, als, del, dels, pel, pels; articles el, la, els, les (already handled as proclitic pronouns), personal articles en, na (what about indefinite articles un, u, uns?); maybe salat articles es/ets, sa, ses, so, sos; the conjunctions i, o (what about si?). Any others?
Which assimilation rules apply across words? The Wikipedia article Catalan phonology says that final -s voices before a vowel, which seems to cause a preceding consonant to voice as well, hence tots els has /dz/ in the middle. I assume that lenition of written b d g occurs across word boundaries as well. What about final omitted -r? Does it reappear before a vowel in the next word, e.g. in a phrase like vaig amar una dona? (And for that matter, does the -ig in vaig become voiced in this phrase?) Do you have any references on this?

Thanks again. Benwing2 (talk) 09:57, 15 January 2024 (UTC)[reply]

The list is correct: proclitic and enclitic pronouns, unstressed possessives, prepositions but not 'cap', 'des', contractions, articles including personal ones and salats, indefinite articles but not 'u', conjunctions including 'si' and 'ni', and also que as a pronoun and conjunction.

In general, contact between words have the same process of assimilation, voicing, or devoicing that inside words. A typical example is els avis /əlz/, els savis /əls/, and tots els is really /ˈtodz.əls/, and vaig amar /ˌbad͡ʒ.əˈma/. The final -t reappears followed by a vowel (sant Antoni /ˌsan.tənˈtɔ.ni/). The final -r of infinitives only reappears followed by a pronoun (anar-hi /əˈna.ɾi/). From chapter 4.4 onwards of the IEC grammar you can find a lot of examples. Vriullop (talk) 12:37, 15 January 2024 (UTC)[reply]

@Vriullop Thanks again for your help. I finally finished most of the work on multiword support. Still to go is approximant allophones of b/d/g, correct handling of apostrophes (represented with ‿), and ‿ as an indicator of liaison in respelling for cases like Sant Antoni respelled Sànt‿Antòni (which should produce /ˌsan.tənˈtɔ.ni/). I (more or less) read chapter 4.4 in the IEC grammar and I notice it also talks about certain cases of total assimilation where maybe cap de is pronounced /kad də/ or something, but I'm not sure we should implement that. I have some questions though:

Brunsvic (as in e.g. Nova Brunsvic) given as [bɾunzvík] in GDLC, is the v correct?
For drets humans, the module currently generates /ˈdɾɛdz uˈmans/, is that correct?
fer cas, fer acte de presència: Is the <r> pronounced in Central Catalan?
Sant Llorenç de la Salanca: the module currently generates /ˈsaɲ ʎuˈɾɛnz də lə səˈlaŋ.kə/ for Central and /ˈsand ʎoˈɾɛnz de la saˈlaŋ.ka/ for Valencia; correct? In general, does final -ç voice when the next word begins with a vowel?
The IEC grammar is equivocal about whether b/d/g become fricatives after /r/, /ɾ/ and /z/, what should we do in this case?
It appears double schwa /əə/ is often compressed to single schwa /ə/ in Central and maybe Balearic, but not in Valencian. This is indicated in GDLC and seems to operate fairly consistently if the second schwa is in a closed syllable (sobreescalfament, contraescarpa), but only sometimes in an open syllable (centreafricà, contraatac). Can you comment here? Likewise, /i/ or /u/ followed by schwa seems to elide the schwa in aeroespacial, autoescola, antiespasmòdic, but only sometimes if the schwa is in an open syllable (hence not in autoerotisme, antiemètic but yes in fotoelèctric, fotoelectricitat, macroeconomia). Likewise /uu/ seems to compress to /u/ if the second /u/ is in a closed syllable (microorganisme), but only sometimes in an open syllable. How do you think we should handle these cases?
I am trying to figure out what to do for written <tn>, <tm>, <tl>, <tll>. It seems that these tend to be pronounced as geminates in native words (e.g. cotna, setmana) but with [d] in cultisms/learned words. I'm thinking maybe we should make the cultism behavior the default and require respelling for the remainder, and least for <tm> where there are more terms like ritme, aritmètic, atmosfera than terms like setmana. But maybe this should differ depending on the different spellings, e.g. <tl> even in a cultism like atlàntic seems to have a geminate in it in Central Catalan but not in Valencian. Can you comment on what you think should be done?

Benwing2 (talk) 22:45, 26 January 2024 (UTC)[reply]

Note, I also revamped the testcases, see Module:ca-IPA/testcases (which demonstrate there's still a lot to fix). Benwing2 (talk) 23:26, 26 January 2024 (UTC)[reply]

Brunsvic is strange. It is supposed the GDLC includes pronunciation from the Diccionari ortogràfic i de pronúncia (DOP), but it turns out that the DOP does not include proper names. For non-Catalan place names I check ésAdir, a website for radio and tv journalists, and it shows /'bɾunz.βik/ as I expected.
'Drets humans' is correct.
'Fer cas', 'fer acte', are correct. The r of infinitives only reappear followed by pronouns: fer-se /ˈfer.sə/, fer-hi /ˈfe.ɾi/, fer-t'ho /ˈfer.tu/...
'Sant Llorenç de la Salanca' is correct. Final /s/ of Llorenç is voiced /z/ followed by a voiced consonant or by a vowel.
The IEC grammar is too much descriptive about approximants, when they may or may not appear. Considering that /β/ is rare in dialects with contrast /v/-/b/, that is Balearic and Valencian, and trying to be consistent with GDLC and DNV:
- No approximants r/s + b/d/g in Central.
- No approximants r/s + b in Balearic and Valencian.
- Approximants r/s + d/g in Balearic and Valencian.
In general, the concurrence of two identical vowels /əə/ (or /aa/, /ee/), /uu/ (or /oo/) is reduced to a single vowel. Variations may depend on formal v. informal, or common use v. cultism, or emphasis of some prefixes. It is hard to define any exception.
Written <tm> and <tn> are geminated in a handful of inherited words: cotna, reguitnar, setmana and its derivatives. But 'setmana' with a single /m/ in Valencian. 'Vietnamita' and 'sotmetre' are hesitant. Others like 'ritme', 'ètnic', 'algoritme' are cultisms /dm/.
Written <tl> is always /ll/ in Central and Balearic. In Valencian it is /ll/ in inherited words and /dl/ otherwise. Valencian inherited words include those with alternative spelling <tll>: ametla > ametlla, butla > butlla...
Written <tll> as alternative spelling of inherited <tl> is pronounced /ʎʎ/ in Central and /ll/ in Balearic and Valencian. Although the DNV includes 'ametlla', 'butlla'... it is not really used, and if written it is still pronounced as <tl>. As a cultism, like 'ratlla', 'bitllet' or 'butlletí', it is pronounced /ʎʎ/ in Central and /ʎ/ in Balearic and Valencian.

Vriullop (talk) 10:54, 29 January 2024 (UTC)[reply]

@Vriullop Thanks. I have (already) implemented most of the above things. I haven't yet implemented reduction of adjacent unstressed vowels or redone the implementation of <tl> and <tll>. As for Sant Llorenç de la Salanca, the module formerly generated [ˈsand ʎoˈɾɛnz ðe la saˈlaŋ.ka] for Valencia (note the [d] in /sand/) but I am guessing this is wrong, so I changed it so it now generates [ˈsaɲ ʎoˈɾɛnz ðe la saˈlaŋ.ka]. Basically I am guessing that elision of stops after nasals happens in Valencia before a consonant but not a vowel or utterance-finally. Is this correct? Benwing2 (talk) 01:53, 30 January 2024 (UTC)[reply]

I didn't notice 'sant'. It is correct, elision of t and assimilation of the nasal before a consonant, not before a vowel or isolated.--Vriullop (talk) 08:00, 30 January 2024 (UTC)[reply]

Your bot is removing valid categories[edit]

e.g. {{C|de|Western Sahara}} at Westsahara. —Justin (koavf)❤T☮C☺M☯ 00:55, 1 January 2024 (UTC)[reply]

@Koavf This is unavoidable. When you add a page to a category, sometimes it takes a little while for the category to register having the page in it, and in the meantime it shows up in CAT:Empty categories, which is what I use periodically to delete empty categories. I check that category before deleting the empty categories referenced, but I can't notice everything. Any non-empty categories so deleted will get re-created in a few days in any case. Benwing2 (talk) 01:06, 1 January 2024 (UTC)[reply]

What are you talking about? That category was on that page for 5.5 years and your bot removed it for no reason. How is that unavoidable? Are you telling me that your bot is going to re-add all of these categories and undelete them as well? —Justin (koavf)❤T☮C☺M☯ 01:09, 1 January 2024 (UTC)[reply]

Dude, fuck off. Seriously. Yelling at me is not going to get me to help you any quicker than writing nicely.

As for my response, I thought you were referring to my recent deletion of empty categories (as of a few hours ago) rather than a bot change from a month and a half ago. In the future I'd recommend you link to the specific diff. My removal of the category at that time was a by-hand change, not a script change, even though the bot pushed the change; that's what "manually assisted" means (and I have a strong feeling I've already explained this to you). The reason for the removal is that Module:place normally auto-adds categories of this nature, and I thought it would in this case; the reason it didn't is apparently because Western Sahara is listed in Module:place/shared-data as a country, but its definition identifies it (correctly) as a territory rather than a country. I'll fix this so it gets correctly auto-added. Benwing2 (talk) 01:30, 1 January 2024 (UTC)[reply]

I was much nicer than you were just now and was in no sense "yelling". There was no reason for that language. I didn't realize that what I wrote was ambiguous and I thought that referring you to the entry would be sufficiently clear where you can see what your bot (or script or by-hand you) did. Thanks for agreeing to fix this and undelete all of these categories. When will this happen? —Justin (koavf)❤T☮C☺M☯ 22:18, 1 January 2024 (UTC)[reply]

When will you or your bot undo these category removals? —Justin (koavf)❤T☮C☺M☯ 22:42, 15 January 2024 (UTC)[reply]

@Koavf Which removals are you referring to? Specifically to do with Western Sahara, or are there any others? Benwing2 (talk) 22:44, 15 January 2024 (UTC)[reply]

The only ones I am aware of are removals of the sort {{C|CODE|Western Sahara}} which emptied several categories that were then deleted. I'm not familiar with any others. —Justin (koavf)❤T☮C☺M☯ 22:46, 15 January 2024 (UTC)[reply]

When will you or your bot undo these category removals? —Justin (koavf)❤T☮C☺M☯ 01:37, 21 January 2024 (UTC)[reply]

@Koavf Did you not get my ping? I did this days ago. Benwing2 (talk) 02:37, 21 January 2024 (UTC)[reply]

I see that it did and no, I didn't. For some weird reason, I also did not get updates for this thread even after subscribing. :/ Thanks a lot. —Justin (koavf)❤T☮C☺M☯ 10:16, 21 January 2024 (UTC)[reply]

Twice-borrowed terms[edit]

I looked up παλάβρα, which is from παραβολή after passing through Ladino, and found out that, after moving all the "twice-borrowed terms" categories to "terms borrowed back into", there are still lots more Greek twice-borrowed terms than Greek terms borrowed back into Greek. This may also be true of other languages. Can you look into it? PierreAbbat (talk) 16:43, 1 January 2024 (UTC)[reply]

@PierreAbbat It’s because they were added manually due to the origin being Ancient Greek, which is a misuse of the category imo. Theknightwho (talk) 19:17, 1 January 2024 (UTC)[reply]

Yeah @Pierre, if I may expand on what Theknightwho said, it is indeed because of Ancient Greek being considered a separate language, and this is discussed at Wiktionary:Beer_parlour/2023/November#Does_'terms_borrowed_back_into_LANG'_include_cases_where_the_borrowing_was_from_an_ancestor? (and actually quite a few other places over the years, e.g. Wiktionary:Etymology_scriptorium/2016/June#Twice-borrowed_term_or_term_derived_from_an_older_stage_of_the_same_language?, Wiktionary:Beer_parlour/2011/October#Twice-borrowed_terms), and ... it's tricky. Because ... while I'm sympathetic to the potential complaint that it's somewhat arbitrary that a word used in the modern form of Hebrew or Latin (or Chinese) and derived from the variety spoken two thousand years ago can be automatically categorized as "borrowed back" while a word in modern Greek or English can't be, just because we decided it was most convenient to handle the changes those languages underwent as still being ==Hebrew==, ==Latin== (or ==Chinese==) but decided to split the changes Greek underwent between two languages ... we do have to draw a line somewhere or else we get into absurdities (e.g. a term from Proto-Indo-European, which went into French, and was borrowed into English, is twice-borrowed/borrowed-back?), and if we draw the line anywhere other than "whatever we've decided to consider a separate full language", it gets fuzzy and messy fast. But please comment in the November BP discussion linked above if you have suggestions. - -sche (discuss) 19:45, 1 January 2024 (UTC)[reply]

New :toBcp47Code() method[edit]

If I interpret this recent change to Scribunto correctly, it provides a way to convert from MediaWiki langcodes to proper langcodes directly. Might be worth incorporating, as I imagine it’ll simplify some of our code, and I think you’re more familiar with that side of things than me. Theknightwho (talk) 15:50, 2 January 2024 (UTC)[reply]

@Theknightwho Unfortunately I'm not sure this is useful for our purposes. Wiktionary language codes aren't always the same as MediaWiki language codes and I don't think we ever need to convert MediaWiki -> BCP47; instead if anything we'd need to convert MediaWiki <-> Wiktionary and Wiktionary -> BCP47. Benwing2 (talk) 22:47, 15 January 2024 (UTC)[reply]

Addition to quotation-template documentation[edit]

I just fixed a module error caused by WF converting a quote to {{quote-book}} without checking what goes where. The template documentation is thoroughly organized, voluminous, and useless for figuring out how to fix parameter values in the wrong slots. I was going to add a little index of positional parameters, but that would have required reverse-engineering your documentation module. Instead, I'm just going to dump a mockup here, and let you deal with it:

Positional parameters

Position:	1	2	3	4	5	6	7	8
Description:	Language code(s)	Year	Author	Title	URL	Page	Quote	Translation
Equivalents:		`\|year=`	`\|author=`	`\|title=`	`\|url=`	`\|page=` `\|pages=`	`\|text=` `\|passage=`	`\|t=` `\|translation=`
See group:	Quoted text	Date	Author	Title	Title	Page and line	Text	Text

An alphabetical index of parameter names might also be nice.

And, no, I don't want fries with that...

Thanks! Chuck Entz (talk) 06:14, 5 January 2024 (UTC)[reply]

@Chuck Entz Yeah there are so many params that organizing them properly is a very challenging task. For this reason I tried to do away entirely with positional params but some people squawked loudly enough that they are kept for {{quote-book}} and {{quote-journal}}, and disallowed for the rest. I think your mockup is a good idea. Benwing2 (talk) 06:17, 5 January 2024 (UTC)[reply]

Using the Old French conjugation table as an inspiration[edit]

I was trying to create a more complex conjugation table for the Old Spanish language. Then I started viewing other templates and learned that the one used for the Old French language is perfect. I might be able to perform some basic editions to adapt for the Old Spanish conjugation system. However, I couldn't get a sample of that template to edit as there are so many links together. So would you please share with me a simple, editable sample of the template of the Old French language so I can apply it to this page: Cantar? Besides, it'd be helpful to better standardize Wiktionary. Thalyson2019 (talk) 05:42, 6 January 2024 (UTC)[reply]

@Thalyson2019 The Old French conjugation tables aren't implemented using templates but rather using a module: Module:fro-verb. I agree that it's a good base to start with when designing a conjugation system for a language that wasn't really standardized. I'm not sure if you are comfortable working in Lua, because the module is written in Lua and it's not really possible to do what it does just using template syntax. Benwing2 (talk) 05:57, 6 January 2024 (UTC)[reply]

Is there any solution for that? I already have the verbs and their positions in mind. I'm not familiar with Lua, even though I create basic templates. Thalyson2019 (talk) 06:08, 6 January 2024 (UTC)[reply]

@Thalyson2019 You'd have to get someone to create the Lua module for you. I can't commit to something like this right now as I have already committed to several other projects. However if you create some mockups and link them here, then if/when I or someone else is able to contribute, the mockups can be a good starting point. Benwing2 (talk) 06:10, 6 January 2024 (UTC)[reply]

Such mockups should be in format of codes or pictures? Thalyson2019 (talk) 06:14, 6 January 2024 (UTC)[reply]

@Thalyson2019 Maybe some sample template calls for some simple verbs like cantar and some complicated verbs as well (tener? ir?). I or anyone working on this would in addition need some good resources on Old Spanish verb conjugation. Benwing2 (talk) 06:18, 6 January 2024 (UTC)[reply]

Finnish inflections[edit]

Hey Benwing, I know that WingerBot is used to mass-create the inflection pages for Romance verbs. Is there any way that it could do similar work with Finnish noun forms? According to Jberkel's last data dump there are literally millions of Finnish redlinks, most of which appear to be nouns, so bot help is probably necessary to make a real dent. Thanks for your time! Vergencescattered (talk) 20:01, 6 January 2024 (UTC)[reply]

@Vergencescattered: have you talked to @Surjection about this? As a native speaker with a bot, they would be a more logical choice, and more likely to be aware of potential problems. Chuck Entz (talk) 20:35, 6 January 2024 (UTC)[reply]

@Vergencescattered I agree with Chuck. Also pinging @Hekaheka. E.g. there may be a reason these forms aren't created (too many of them?). Benwing2 (talk) 21:20, 6 January 2024 (UTC)[reply]

There are probably somewhere around 200,000 nouns in Finnish and each has 30 inflected forms (15 cases in singular and plural) without taking into account any suffixes. This is the rough number found in Nykysuomen sanakirja. Adding dialects and slang one gets roughly to half a million or more. That would give 6 to 15 million entries. If we add the six (third person possessive suffixes are the same for plural and singular but to compensate this potential simplification there are two of them) possessive suffixes, the number of potential entries increases to 40 to 100 million. Some of the forms might be unattestable as abessive, comitative and instructive are quite rarely used but that does not cut more than 20% of the total. On top of this each verb has close to one hundred inflected forms if we take into account the possessive forms of some infinitives and participles.

This leads me to think that we might need a new approach to inflected forms in general. Perhaps they should have an entry of their own only in such rare cases in which the inflected form has a meaning or meanings that cannot be readily derived from the lemma form. In most cases the system would work so that a search for an inflected form would redirect to the article of the lemma form. --Hekaheka (talk) 23:33, 8 January 2024 (UTC)[reply]

@Hekaheka It would be great if MediaWiki could autogenerate the text of an inflected form, but in its current state it can't do either that or redirect from an inflected form to a lemma form. IMO the most useful thing about having inflected forms entered as such is when you have homophones or homographs between different inflected forms. This occurs fairly often in the Romance languages, for example, between noun and verb forms or between adjective and verb forms. It also occurs fairly often in Russian between noun and verb forms but rarely for adjectives except for short forms of adjectives; for this reason I have never done a bot run to create Russian adjective forms (besides the fact that there are a lot of them). If Finnish grammar is largely regular and doesn't have a lot of homonyms, I would think it's not useful to have inflected forms generated. I suppose for the moment we need to use our judgment as to whether it's worth it to create such forms. Benwing2 (talk) 23:38, 8 January 2024 (UTC)[reply]

I would definitely appreciate their input! I didn't know about Surjection or their bot before you mentioned them, so I apologize for bothering you about it. Thank you! Vergencescattered (talk) 23:27, 6 January 2024 (UTC)[reply]

Request to deploy `{{szy-pron}}`[edit]

I've created a Sakizaya pronunciation template, and I need help deploying it to all Sakizaya language entries on Wiktionary. Could you assist with this using your bot account? --TongcyDai (talk) 17:29, 7 January 2024 (UTC)[reply]

@TongcyDai What needs to be done here? Are there any cases where manual respelling or other help for the template is needed? Benwing2 (talk) 22:54, 7 January 2024 (UTC)[reply]

When adding the template, simply insert {{szy-pron}} into each Sakizaya entry, no parameters and respelling are needed. TongcyDai (talk) 10:16, 8 January 2024 (UTC)[reply]

Please let me know if there's anything else you need from me to deploy the template. --TongcyDai (talk) 18:38, 1 March 2024 (UTC)[reply]

@Benwing2 Is there anything I can help with? --TongcyDai (talk) 07:06, 17 April 2024 (UTC)[reply]

@TongcyDai Apologies for the delay, I am working on this now. However, the template should be called {{szy-IPA}} for consistency with other pronunciation templates. Do you mind if I rename it? Benwing2 (talk) 23:26, 17 April 2024 (UTC)[reply]

Thank you for the update. I appreciate your help and have no objections to renaming the template, please go ahead. --TongcyDai (talk) 07:33, 18 April 2024 (UTC)[reply]

@TongcyDai Done. Benwing2 (talk) 23:18, 18 April 2024 (UTC)[reply]

Relational -> demonym[edit]

Could you clean up Spanish demonyms like diff? It makes more sense than categorizing 900+ demonyms as relational adjectives just because they don't have a one-word translation in English. Ultimateria (talk) 19:23, 7 January 2024 (UTC)[reply]

@Ultimateria Hi, I actually wrote a script awhile ago to do exactly this but never ran it. I don't remember why; maybe it needed a few fixes. I'll go ahead and finish this. Benwing2 (talk) 22:52, 7 January 2024 (UTC)[reply]

Revert adding acceleration forms to `{{pl-conj-ai}}`[edit]

Hi @Benwing2. You just reverted the changes to the template {{pl-conj-ai}}. Could you please elaborate on what was broken? So I could see how it could be fixed while preserving the benefit of the acceleration forms? Incidentally, similar changes have been made to other templates, so the same error could arise for other verbs. You are referring to active adverbial participles, for which only one single form was used before, even though those adverbs have different forms depending on plural/singular and gender. Maybe the breaking tool needs to be updated to cater for those other forms. @Vininn126 JuChelou (talk) 14:04, 25 January 2024 (UTC)[reply]

@JuChelou For one thing, the specific value of 'active adjectival participle' (along with various other specific values) is processed specially in Module:accel/pl and causes the inflection to be set to 'actv|adj|part'. By changing this you broke this support, and caused it to use an invalid inflection tag set 'm|s|active adjectival participle'. The other inflections of the participle were similar. The correct thing to do is to leave the masc sing participial forms unchanged and if you want to add acceleration to the other forms, they should cause the form to be created as e.g. {{feminine singular of|pl|PARTICIPLE}} rather than as an inflection of the verb. You can see an example of how to handle this correctly by looking at the lines starting at Module:accel/pt#L-21. Benwing2 (talk) 22:50, 25 January 2024 (UTC)[reply]

Thank you @Benwing2 for your reply. @Vininn126

I tried something in Module:accel/pl and {{pl-conj-ai}} to add proper accel form support for the adjectival participles.

However, I am not fully satisfied with the result because:

1/ on the masculine singular form, it could add 2 forms, for example for wyrzucający wyrzucać

2/ the result would not be similar if the new wiki page is triggered from the conjugation chart or from the adjective declension chart (which I also added recently). For example, for wyrzucające, the new wiki page triggered from the verb link would "miss" the fact that it is also the form for accusative neuter and accusative non virile.

Any advice? Or should I just ditch the extra accel forms for the participle and contributors would use the new accel links from the adjective declension module? JuChelou (talk) 16:18, 26 January 2024 (UTC)[reply]

In theory you could generate wyrzucający and from there generate the others, but it's less than ideal. Vininn126 (talk) 16:40, 26 January 2024 (UTC)[reply]

@JuChelou Hmmm, I'm not quite sure how to handle #2; either you'd have to add all the non-nominative forms of the participles to the verb table so that the accelerator code knows about them automatically, or you'd have to hack the code in MOD:accel/pl somehow to add the remaining inflections in. (This latter thing is possible, as I think I added a hook that you can define in the accelerator module that operates at the end after all the inflections have been combined.) As for #1, the general principle I've followed is not to include definitions for non-lemma forms that are identical in spelling to the lemma. I followed this principle, for example, when I create a bot script to add Russian noun inflections. This also happens in Portuguese verbs (where the 1st and 3rd singular future subjunctive usually looks the same as the infinitive), and for Latin feminine nouns (where the ablative singular is spelled the same as the nominative singular, although the pronunciation is different as the ablative ends in long -ā while the nominative ends in short -a). I actually removed the cases where Portuguese verbs were defined normally but had an additional definition as the 1st/3rd singular future subjunctive, but I may have left alone the Latin ablative cases because of the different pronunciation. In the Polish case, the pronunciation is the same and so you could fix this by just not having an accelerator defined on the forms that look like the lemma.

In general, I would actually argue that instead of including only the nominative case forms, it's best not to include anything but the masculine nominative singular of the various participles in the verb table, and require that the remaining forms be defined using accelerators on the participle table, even though User:Vininn126 thinks this is non-ideal. This is how we handle participles in Russian, for example, which is similar in many ways to Polish. I think the main benefit to having non-lemma participle forms defined in the verb table is if there are irregularities in their formation, but I don't think this is the case in Polish. Benwing2 (talk) 23:20, 26 January 2024 (UTC)[reply]

An additional thought is maybe we shouldn't be defining non-lemma forms of participles at all, since AFAIK they're quite regular and there are a lot of them. See the discussion above about #Finnish inflections. This is the policy we follow for Russian, for example. Benwing2 (talk) 23:22, 26 January 2024 (UTC)[reply]

Where do we define non-lemma participles? Vininn126 (talk) 10:17, 27 January 2024 (UTC)[reply]

@Vininn126 Sorry, can you clarify what you mean? Benwing2 (talk) 10:37, 27 January 2024 (UTC)[reply]

I simply didn't understand your last message Vininn126 (talk) 10:59, 27 January 2024 (UTC)[reply]

Thank you @Benwing2 for your very detailed answer.

Basically, regarding your recommendations for #1, that would be easy to remove the accel form for the version identical to the lemma form.

For the #2 however, that would be more tricky as it would require to duplicate generating all the forms, opening room for discrepancies between the pl-adj module and the polish accel module.

If I understand correctly, your overall recommendation is to remove all the other forms of the participles in conjugation templates. Basically, we would just have "active adjectival participle: masculine singular nominative form".

It would be similar to what is done for the verbal noun, where there is only the masculine singular nominative form, even though other forms exist.

@Vininn126 what would be your opinion on removing the additional forms of the adjectival participles from the conjugation templates? JuChelou (talk) 17:02, 28 January 2024 (UTC)[reply]

Sounds fine to me; it's not typical to have them. Vininn126 (talk) 18:03, 28 January 2024 (UTC)[reply]

On the `{{quote-book}}` template[edit]

Hi,
I was wondering what exactly the combined use of the parameters |start_year= and |year= is supposed to communicate.
It's supposed to mean a range of dates, but—with an example 1390–1400—is range meant:

in the sense of "the composition of this work started in 1390, and ended in 1400"?

or in the sense of "this work was probably completed (or brought to its current state, if unfinished) somewhere between 1390 and 1400"?

Thanks in advance for any clarification. I've recently discovered these parameters, and I'm not sure I've been using them properly. —— GianWiki (talk) 15:24, 25 January 2024 (UTC)[reply]

@GianWiki These parameters were there before I started to clean the template up, so you might ask User:Sgconlaw, but I'm thinking it's used for works that took several years to create. Benwing2 (talk) 23:45, 25 January 2024 (UTC)[reply]

I see, I hadn't noticed that. I'll try asking them just to make sure.

Thank you for your time. —— GianWiki (talk) 08:18, 26 January 2024 (UTC)[reply]

@GianWiki: I don't think the parameters were clearly defined at the time when I first tidied up the {{quote-}} templates. Personally, I use them to mean a range of publication dates (for example, if a novel is originally published in parts in a magazine over many months), and if I intend a range of dates to mean anything else I add a qualification in parentheses for clarity like this: |year=c. '''1597–1600''' (date written). — Sgconlaw (talk) 10:54, 26 January 2024 (UTC)[reply]

WingerBot and Welsh animal genders[edit]

Hi, your bot edited garan ("crane") and petris ("partridge") so they would be “m or f by sense”, which isn’t correct. I've corrected them, but can you amend the bot so it doesn’t edit other animals like this please?

Garan is usually a masculine noun, that can be feminine due to dialect, rather than the sex of the animal (e.g. in Iolo Williams’s Llyfr Adar and the Geiriadur yr Academi) and petris is feminine.

I’ve consulted a bit with other Welsh speakers and the only source I can see for petris ever being masculine is the Geiriadur Prifysgol Cymru, which could easily be due to one or two examples from centuries ago. “A small cock partridge” would be ceiliog petris bach – where bach modifies ceiliog, not petris.

Cheers, Arafsymudwr (talk) 15:54, 30 January 2024 (UTC)[reply]

@Arafsymudwr This was a one-off run where I manually made the changes in question in a text editor and only used the bot to push the changes (that's what "manually assisted" means in the changelog message). So there's no script to amend but I'll make sure not to change the genders of animal terms in Welsh (or generally in any language, I think) in the future. Benwing2 (talk) 06:11, 31 January 2024 (UTC)[reply]

Links to English possessives in inflection-line templates[edit]

I wish I had included this in my request about links to components of hyphenated terms in English inflection templates. (How's that coming, BTW?) Many vernacular names of organisms are like Gundlach's hawk (See Gundlach's hawk). It would be better, especially for me, if the link were to Gundlach rather than the possessive. I can't think of any instances for which the possessive would be a better link target and believe that any such instances are relatively rare exceptions. DCDuring (talk) 16:29, 31 January 2024 (UTC)[reply]

@DCDuring Yes, in fact my concerns over how to handle apostrophes are why this hasn't already gotten done. I'm thinking that we should split any term with a trailing 's except for one's and someone's (with exceptions also maybe for he's, she's, it's), but not split other terms with apostrophes (e.g. I'm, don't, haven't). BTW I notice that we've split apostrophe-s into two terms, 's for the contraction and -'s for the possessive. Personally I think this is confusing and probably they should be merged into 's (without the hyphen). It also makes auto-linking more difficult; probably we should link all occurrences of 's into -'s since this is the more common case. Benwing2 (talk) 22:07, 31 January 2024 (UTC)[reply]

This 's/-'s distinction gets to how to indicate the distinction between an inflectional ending and a contraction, doesn't it? One one level one needs a linguistics or philosophy degree to be qualified and/or motivated to argue this, but I don't hold the right degrees. On another level, how to help users, it would seem both should be on the same page, almost certainly 's. It probably should go to BP, but you may be able to go ahead with what is convenient to implement and rely of links between [['s[[ and -'s to help users in the meanwhile. DCDuring (talk) 22:22, 31 January 2024 (UTC)[reply]

@DCDuring Please see User:Benwing2/test-en-multiword for some examples of the new headword link handling system that I'm testing. It includes the ability to change the link of one (or several) of the words of a multiword expression without having to write out the entire expression; see the examples that specify |head=~.... (This functionality was already implemented for Italian and later extended to other Romance languages.) Note that if there are both hyphens and spaces, the default behavior is to link the space-separated components but not break up hyphen-separated components, although this can be changed using |splithyph=1. Possibly the default should be reversed and hyphen-separated components broken up by default unless |nosplithyph=1 is given; what do you think? Benwing2 (talk) 00:01, 2 February 2024 (UTC)[reply]

I will look at it in about 16 hours. DCDuring (talk) 00:04, 2 February 2024 (UTC)[reply]

@DCDuring: OK, thanks. BTW I'm thinking we should indeed change the default when there are both hyphens and spaces, and maybe make an argument to convert hyphenated terms to space-separated terms, e.g. for cases like civil-rights movement and claw-hammer coat that should be linked as [[civil rights|civil-rights]] [[movement]] and [[claw hammer|claw-hammer]] [[coat]] (likewise closed-circuit television, clock-face timetable, coffee-table book, etc.), although there are also examples like close-up lens, coin-operated laundry, context-free grammar, co-occurrence network, etc. where we do want to link the hyphenated component as such. Benwing2 (talk) 00:58, 2 February 2024 (UTC)[reply]

I really like the more hyphenated forms because they reduce certain kinds of possible misreading of MWEs, but contemporary relative frequency may indicate that hyphenated forms are already much less frequent. For three-part English vernacular names of organisms, I often find that the hyphen is in the wrong place or is not useful. But black billed amazon is not a good substitute for black-billed amazon. DCDuring (talk) 01:10, 2 February 2024 (UTC)[reply]

@DCDuring I have redone the handling of terms with both hyphens and spaces so that it now looks up the hyphenated term to see whether it exists in order to determine how to link it. Specifically:

If the term exists as a space-separated compound, link to that. (We prefer space-separated compounds because the hyphen-separated form often exists as a soft redirect.)
Otherwise, if the term exists as a hyphen-separated compound, link to that.
Otherwise, link the hyphenated terms separately.

This handles most cases properly, although there are occasional situations where it fails; for example, close up and close-up both exist and are different, and by default close-up lens links (wrongly) to the former. For this reason I've provided params to override the default handling: |hyphspace=1 forces case (1) above, |nosplithyph=1 forces case (2) above, and |splithyph=1 forces case (3) above.

Benwing2 (talk) 05:27, 2 February 2024 (UTC)[reply]

I hope we will never have entries for terms like scaly-headed. So I'll have to use nosplithyph=1 for a vast number of vernacular names. I may as well not have asked for this favor. I suppose I could create a new template to wrap {{en-noun}} or {{head}}, specifiying the parameter, to save keystrokes for these vernacular name entries.DCDuring (talk) 13:41, 2 February 2024 (UTC)[reply]

@DCDuring If you need to use |nosplithyph=1 for a large number of vernacular names, that is defeating the purpose of things. Can you explain why you think you need to use this for so many? Things like scaly-headed are SOP so should be split, IMO. Benwing2 (talk) 20:37, 2 February 2024 (UTC)[reply]

I misread in haste, I think. DCDuring (talk) 22:43, 2 February 2024 (UTC)[reply]

@DCDuring I have implemented the various changes to the linking behavior of Module:en-headword. They are documented on the module documentation page Module:en-headword/documentation (although the section on link modifiers is still to be written). There is text in the documentation of {{en-noun}}, {{en-verb}} and {{en-adj}} pointing to the module documentation page for the specifics about multiword linking and suffix handling. Let me know if there's anything else needed documentation-wise. Benwing2 (talk) 00:10, 7 February 2024 (UTC)[reply]

The section on link modifications (renamed from link modifiers for clarity) is written. Benwing2 (talk) 00:46, 7 February 2024 (UTC)[reply]

devil's own[edit]

I reverted WingerBot's edit to this entry not just because of the module error (I think you added |def= to the noun and proper noun code, but not to the adjective), but because it looks to me like the syntax is more along the lines of "[the devil's] own" rather than the "the [devil's own]. Not that I would get into an edit war over this- I just wanted to make sure you were aware of that dimension before deciding how to fix things. Chuck Entz (talk) 04:23, 4 February 2024 (UTC)[reply]

@Chuck Entz Thanks. Yeah I forgot about handling adjectives with the in them. As for the syntax issue, all that |def=1 does is add the before the head; it doesn't assert any particular way of parsing the constituents. I suppose it could be interpreted as asserting an analysis like the [devil's own] but that wasn't my intention (and I'm not quite sure how we'd indicate such an analysis in the head). Benwing2 (talk) 04:47, 4 February 2024 (UTC)[reply]

But adjectives don't have the in them. We should review the entries that so claim and determine whether there is good reason to ever have the inside the headword template for adjectives. DCDuring (talk) 14:14, 4 February 2024 (UTC)[reply]

Never mind. I was thinking of leading the. We have numerous entries of purported adjectives with the embedded. Some of them seem like attributive use of a noun, but not all. DCDuring (talk) 14:23, 4 February 2024 (UTC)[reply]

Category:LANG nouns with other-gender equivalents[edit]

Hello Benwing. I hope that this does not take too much of your time. How should CAT:Telugu nouns with gendered forms be added to MOD:te-headword? I tried looking at MOD:hi-pa-headword, but could not figure out what and where to add the equivalent of:

table.insert(data.categories, data.langname .. " " .. plpos .. " with other-gender equivalents")

to MOD:te-headword. I noticed that this feature was missing for Telugu when I saw

Synonym: (female) రచయిత్రి

at the entry for రచయిత (racayita). ~~The Lua-fication of {{te-noun}} means adding features such as this is not as easy as adding~~

~~{{#if:{{{m|}}}{{{f|}}}{{{n|}}}|{{cln|te|nouns with other-gender equivalents}}}}~~

~~to {{te-noun}}~~. Kutchkutch (talk) 00:46, 5 February 2024 (UTC)[reply]

Adding

{{#if:{{{m|}}}{{{f|}}}{{{n|}}}|{{cln|te|nouns with other-gender equivalents}}}}

at the end of {{te-noun}} seems to work for categorisation but not for the headword line. Kutchkutch (talk) 00:59, 5 February 2024 (UTC)[reply]

@Kutchkutch Glad you figured it out. IMO Module:te-headword needs to be rewritten; it wasn't written by me and doesn't really follow the standard structure for such modules, which is probably why you had difficulty figuring out how to add the appropriate code. Benwing2 (talk) 22:33, 8 February 2024 (UTC)[reply]

Email[edit]

Btw, idk if you have notifications turned on for emails, but I sent you one. Vininn126 (talk) 22:24, 8 February 2024 (UTC)[reply]

Thanks, I responded. For some reason I didn't get an email notification here on Wiktionary even though I do have email notifications turned on. Benwing2 (talk) 22:32, 8 February 2024 (UTC)[reply]

bùzháodiào[edit]

Hello. Could you help me fix the Traditional Chinese conversion here? Thanks. ---> Tooironic (talk) 00:31, 11 February 2024 (UTC)[reply]

@Tooironic What's the exact issue? BTW in general I am not too familiar with how the Trad <-> Simp conversion works; User:Theknightwho knows more. Benwing2 (talk) 00:32, 11 February 2024 (UTC)[reply]

Thank you User:Theknightwho! ---> Tooironic (talk) 00:39, 11 February 2024 (UTC)[reply]

hmm[edit]

How much longer is it going to take you to finally finish making this new pronunciation module for Polish? You've been doing it for several months now, hurry up, or someone might think you're getting a little lazyyy :) Gugugagasraniewbanie (talk) 08:30, 13 February 2024 (UTC)[reply]

@Gugugagasraniewbanie Yeah it will happen soon. Benwing2 (talk) 08:32, 13 February 2024 (UTC)[reply]

OK, then you have my forgiveness Gugugagasraniewbanie (talk) 08:35, 13 February 2024 (UTC)[reply]

Mon-Burmese script[edit]

I changed some letters defined for specific languages (e.g. "X is a letter of the Shan alphabet") to that language (i.e. Shan), then added a request for definition to the translingual entry. If this is somehow considered vandalism, I'll revert myself, but I'm assuming obvious fixes like this are acceptable, an it parallels other entries that only have definitions for specific languages. (A definition might be as simple as stating that it's a letter of the Mon-Burmese script corresponding to a certain letter in Sanskrit, but I didn't do that myself as I thought I might be accused of vandalism.)

I also removed a couple pronunciations that were for the wrong entry. kwami (talk) 04:25, 14 February 2024 (UTC)[reply]

@Kwamikagami "Vandalism" doesn't seem like the right word for changes that are in good faith. As to whether they are wrong or counterproductive I don't know but they seem generally fine to me. User:RichardW57 do you have any comments? Benwing2 (talk) 04:45, 14 February 2024 (UTC)[reply]

Okay, "blockable offense" then. kwami (talk) 04:47, 14 February 2024 (UTC)[reply]

Yeah I understand. BTW I think blocking is only likely if you edit-war or keep making changes of a specific nature after people have objected to them. (Also editors who don't know what they're doing but think they do; editors of this nature can do a lot of damage.) Wikipedia seems generally more tolerant of edit-warring, maybe because of the number of editors relative to how many articles there are. Benwing2 (talk) 04:57, 14 February 2024 (UTC)[reply]

@Benwing2: Which Shan alphabet? There are several Shan languages, which often makes the letters translingual because shared by several Shan languages! The change seems backwards - I would have said that the thing to do was to waste space by adding the Shan entry. As Burmese-script words easily consist of a single letter, cloning letters to each language using them makes Wiktionary more difficult to find by eye, in accordance with the apparent aim of difficulty of use. --RichardW57 (talk) 08:49, 14 February 2024 (UTC)[reply]

If there are other Shan languages besides [shn], and they use the same letter, then they should be listed. But as it was, they were not listed -- only [shn] was.

And yes, I know you want to lump all languages together, but that's not the consensus for Wikt. kwami (talk) 18:49, 14 February 2024 (UTC)[reply]

We have Shan (shn), Khamti Shan (kht), Aiton (aio), Phake (phk) and Tai Laing (tle) that use the Burmese script. The Tai Nuea (tdd) (= Tai Le /Tai Dehong / Chinese Shan) (not to be confused with Northern Tai or Northern Thai) and Tai Khuen (kkh) (though their speech is more akin to Northern Thai, but they identify as Shan) use different scripts. There's also Khamyang (ksu or nrr). Tai Ahom should arguably be included, but again it has its own script. --RichardW57 (talk) 23:32, 14 February 2024 (UTC)[reply]

And when we say a letter is used by [shn], do we necessarily know that it's also used by the others? E.g. in Lik-Tai for Khamti? The label "Shan" may cover multiple languages in some usage, but when Wikt has an entry for Shan [shn], we mean specifically that language. When we mean Khamti, we say Khamti. Etc. But sure -- if we can demonstrate that a letter is used by multiple languages, we can say that it's used for multiple languages. Though when giving the pronunciation and orthographic rules, we need to be careful not to present [shn] as representative if it isn't. kwami (talk) 01:23, 15 February 2024 (UTC)[reply]

Seeking template help[edit]

Hi, we find your Hindi language templates very helpful. Could you assist us with essential Sylheti templates (language code: syl) on English Wiktionary? We could contribute with translations, although we are still familiarizing ourselves with Wiktionary policies. -- ꠢꠣꠍꠘ ꠞꠣꠎꠣ (talk) 07:52, 16 February 2024 (UTC)[reply]

@ꠢꠣꠍꠘ ꠞꠣꠎꠣ Hi I'm up to my ears in requests so I'm won't be able to get to this soon, although if someone else wants to work on it using the Hindi modules as a starting point, I can provide guidance. Benwing2 (talk) 09:55, 16 February 2024 (UTC)[reply]

Category:Romance terms inherited from Latin nominatives[edit]

Hi. Sorry, I think I was a bit too 'bristly' with how I responded earlier. I really do support removing these categories and sticking the relevant content into 'Appendix: Romance terms plausibly inherited from Latin nominatives'. Nicodene (talk) 17:21, 18 February 2024 (UTC)[reply]

@Nicodene This sounds good to me and "plausibly" sounds like a good term to use, and I apologize if I also was a bit in-your-face. If you can write the appendix and put the terms there in a list, I can remove the categories from the terms by bot. Benwing2 (talk) 19:56, 18 February 2024 (UTC)[reply]

Done. This should actually make it easier for me to reorganise/restructure it all, which I've been meaning to do. Nicodene (talk) 20:39, 18 February 2024 (UTC)[reply]

@Nicodene Thanks! Benwing2 (talk) 00:45, 19 February 2024 (UTC)[reply]

@Nicodene I am going to remove the pages listed in the appendix from the '... inherited from Latin nominatives' categories. Just checking that this is OK with you. Benwing2 (talk) 04:56, 19 February 2024 (UTC)[reply]

Yes, go for it please. Nicodene (talk) 05:01, 19 February 2024 (UTC)[reply]

@Nicodene OK it's done. BTW the appendix is looking good and I'm glad you have included detailed notes. Benwing2 (talk) 05:26, 19 February 2024 (UTC)[reply]

Macrolanguages[edit]

Hi - do you have any ideas for how we could handle macrolanguages in the data (Chinese being the most obvious example, given how we handle Chinese L2s). I’m not keen to create a whole new type of object, since this situation comes up in loads of places, as we don’t have a coherent distinction between “is a type of” and “is a descendant of”, leading to the issues I mentioned in WT:RFM#Converting Min Nan into a family, where Teochew and Leizhou Min are “descended from” Min Nan, whereas they’re actually types of Min Nan.

I suspect you’ve noticed similar things with how Persian and Latin are handled. One common situation which stands out are language periods: we list Old Latin as ancestral to Latin, but as it’s an etym-only language of Latin that technically means we’re saying it’s ancestral to itself. Same for Early Modern English and English, and so on. We get round it by adding an explicit check to Module:languages to prevent a language being ancestral to itself, but that’s a kludge which is symptomatic of our poorly defined language model.

Also see the Japonic family tree at Category:Proto-Japonic language, where the periodisation of Japanese is all messed up because they’re all treated as etym-only languages part of Japanese, even though Early/Late Middle Japanese have Middle Japanese as their immediate parent. (They currently display in the wrong order, since Middle Japanese should not be listed before Early Middle Japanese if we were to follow the same system as Latin; the data is correct but Module:family tree is bugged.) A much bigger issue is that we imply Middle Japanese is split into three periods, and that the central period is somehow representative. This is confusing at best, and outright misleading to anyone who isn’t familiar with the nuances of our data modules. Theknightwho (talk) 18:29, 18 February 2024 (UTC)[reply]

@Theknightwho Since you have merged etym-only and full languages to the point that both are more or less just types of Language objects, can we not just have a "type" field identifying something as a macrolanguage? That way it will still work as a language for most purposes. IMO we do need to properly distinguished is-a-X and is-a-descendant-of-X, and it seems you've provided a way with the ancestors field. As for the issue of Old Latin vs. Latin, we do have a "Classical Latin" etym language and ultimately we need to push more in this direction, although it will require some thinking. These are just the thoughts off the top of my head. Benwing2 (talk) 19:54, 18 February 2024 (UTC)[reply]

@Benwing2 Thanks - that's helpful to think about.

I'd rather not have a specific macrolanguage field, since it's superfluous to whether or not something is set as being a "type of" that language. I think the handling of Chinese, Latin, Persian, English and (one I missed above) Norwegian should probably all be done in the same way. At the most extreme end, the Sinitic family and Chinese are in fact the same thing, so I'm more inclined towards having a way to set one language as a type of another (as we do with etym-only languages), fully merging etym-only languages into languages, and then having a flag which sets whether it should be treated as a full language. That way, we also get rid of the weird half-and-half situation going on with Classical Persian and the arbitrary distribution of Chinese lects between language and etym-only language, while making it more straightforward to switch something from one to the other (e.g. the Prakrits). It may also be worth doing the same with families, since (as Chinese shows) macrolanguages and families are basically the same thing in most situations.

I think we probably need some kind of periodisation mechanism. In the case of Latin, if we're treating Old Latin as a "type of" Latin, then strictly speaking Latin's ancestor should be Proto-Italic. However, within that we could have the various periods, including Classical Latin, and there should be a way to set a default period for situations when only the generic language code is provided. For most languages that would be the standard language; in the case of Latin, it would be Classical. This would alo potentially address the issue of cross-overs between regional lects and periods: e.g. Northern Early Modern English, and should also help avoid the silly Japanese situation, since periods should be possible to nest inside each other. Theknightwho (talk) 20:10, 18 February 2024 (UTC)[reply]

@Theknightwho All this sounds good to me in general although it would be helpful if you could write out your proposals in more detail as it's sometimes a bit hard for me to work out what your thoughts are when presented abstractly. Benwing2 (talk) 20:31, 18 February 2024 (UTC)[reply]

@Benwing2 Will do. I’ll also have a think about how we should handle this in the family tree display, since a lot of the confusion stems from that displaying descendants and variants/types in exactly the same way. Theknightwho (talk) 20:52, 18 February 2024 (UTC)[reply]

One problem that needs to be addressed is that language change doesn't always follow a tidy tree model. Macrolanguages are messy. A macrolanguage always has a standard lect that the other lects identify with- but there can be more than one, and which lect is the standard can change over time. Even some of the more complex ordinary languages have similar phenomena. This can end up being reflected in the history of languages both within and deriving from the (macro)language.

With English, you have the same language changing its prestige/standard dialect several times in Old English due to the rise and fall from prominence of specific kingdoms: Anglia, Mercia, Northumbria, and finally Wessex (this is off the top of my head- I'm sure I missed something). With the transition to Middle English it all moved to London. Middle English borrowed heavily from Old Northern French, but since then the source has been Parisian French. Scots split off from the northern dialects that descended primarily from Northumbrian. I'm sure there were changes in the Old Norse dialects that Old English and Middle English borrowed from, and then there's the matter of Brythonic Pictish and Goidelic Gaelic in Scotland and their influence on Scots and northern English.

China had several changes in which were the prestige lects, and these are reflected in the various named yomi in Japanese, as well as the borrowings into other neighboring languages. Then there's Mycenaean Greek, which is different from whatever became Ancient Greek, and the fact that older Latin borrowings didn't come from the Attic dialect that became modern Greek, and Tsakonian that came from Doric, etc.

If you look at a regional lect, you can find things descended directly from the same region in the ancestral language, and things that came in from the standard lects of the different historical stages, and other things that were borrowed from various external languages. Sometimes separate languages split off from these regional lects, so they have more in common with the regional varieties of the main language than with the standard lects of any historical period.

To stretch the tree analogy a bit: sometimes a limb that's touching the ground sets root and becomes a tree in its own right, and other times branches or roots from separate trees graft together after prolonged contact.

I seem to have written a book here, but I hope you can see what I'm getting at. It would be a good idea to think about some way of representing the internal structure of macrolanguages and even regular languages, and the way that different descendants can come from different parts of the same language. There's a complex interchange between region and historical period, so the Wessex dialect of today has a completely different status from the Wessex dialect of a thousand years ago, and the geographical identification of what's mainstream and what's dialectal changes over time. It's all secondary to the main concept of parent and daughter language, but it might help us with some exceptional cases like Chinese. Chuck Entz (talk) 23:15, 18 February 2024 (UTC)[reply]

Agreed. Even Anglo-Norman, the main vehicle of 'Gallicisms' in Middle English, began as a chaotic hodge-podge of Old French dialects, certainly in many respects 'northern-flavoured', but not only, and increasingly slanting towards (but never quite attaining) Central French norms as the centuries went by. In this case as well there is no question of a precise dialectal ancestry. Nicodene (talk) 14:34, 19 February 2024 (UTC)[reply]

Italicising synonyms for taxonomic names[edit]

Hi Benwing. Could you edit Module:form of, Module:form of/templates, and/or T:synonym of to add the ability to italicise the linked-to term in transclusions of {{synonym of}} (preferably by calling |i=), please? Such functionality is needed for taxonomic synonyms. ATM, work-arounds like those seen in Asclepias filiformis var. buchenaviana, Bulbophyllum buchenavianum, Gomphocarpus filiformis var. buchenavianus, Megaclinium buchenavianum, and Tropaeolum buchenavianum are necessary. 0DF (talk) 00:38, 19 February 2024 (UTC)[reply]

@DCDuring who would know how this is handled in other taxonomic entries. Chuck Entz (talk) 01:08, 19 February 2024 (UTC)[reply]

Now, {{syn of}} (and {{alt form of}}, possibly others) suppresses italics formatting that {{taxlink}} provides or direct or piped wikitext formatting. All we would need is templates like {{syn of}} and {{alt form of}} to handle embedded wikitext for italics, as is now possible in other templates that incorporate links. Alternatively Something like {{syn of}}, say {{taxsyn}} (also {{taxalt}}), would have all the formatting capabilities {{taxlink}}, which include not italicizing terms like "var.", "section" ("sect.", "subsect"), "subg.", and "subsp." in taxonomic names. This would probably not involve too much renaming of templates at this point. DCDuring (talk) 13:58, 19 February 2024 (UTC)[reply]

And it would be nice to allow † to appear without requiring pipes. DCDuring (talk) 14:37, 19 February 2024 (UTC)[reply]

@DCDuring: I assume it would be possible to include the non-italicising functionality of {{taxlink}} in {{synonym of}} by making it contingent upon both |1=mul and |i=1 being true. I can't imagine a case in which one would want to define a term as a synonym of something translingual that contains any of the strings sect., subg., subsect., subsp., or var.; italicise it; and for that term not to be a taxonomic name. 0DF (talk) 14:38, 19 February 2024 (UTC)[reply]

The italicization rules of the various taxonomic bodies include that all taxonomic names (ie, any rank) of viruses, bacteria, and archaebacteria be italicized. It is probably simpler to use passed-through wikitext italics than to duplicate {{taxlink}} functionality. DCDuring (talk) 14:47, 19 February 2024 (UTC)[reply]

@DCDuring: I only meant {{taxlink}}'s functionality of automatically de-italicising those few abbreviations. Italicising dependent on a parsing the taxon (as a species, genus, phylum, or whatever) seems superfluous and unnecessarily complicated for {{synonym of}}; |i=1 should be all that's necessary. 0DF (talk) 14:59, 19 February 2024 (UTC)[reply]

It seems too complicated to me too, but I've often been surprised with what our techno-mavens are willing to do, for reasons that remain mysterious to me. Simply passing through wikiformatting (and, possibly, "†") would be fine with me. It would be easy enough to find the relatively few instances we would have of improper handling of those not-to-be-italicized terms in {{syn of}}, {{alt of}}, and the various etymology templates, too. DCDuring (talk) 19:06, 19 February 2024 (UTC)[reply]

@DCDuring: How would you want the obelus to be treated? 0DF (talk) 22:42, 19 February 2024 (UTC)[reply]

Directly in front of taxon, ignored for linking, but displayed without being italicized. DCDuring (talk) 12:42, 20 February 2024 (UTC)[reply]

@DCDuring: De-italicising would be handled in the same way as it's handled for sect., subg., subsect., subsp., and var., I expect. Stripping † from the link text would be easy (handled in the same way Latin ā, ē, ī, ō, ū, ȳ link to Latin a, e, i, o, u, y), but it may end up being enacted in undesirable circumstances. Do we need a new (mul-tax?) language code for taxonomic names, perhaps? 0DF (talk) 18:06, 20 February 2024 (UTC)[reply]

I'd prefer a shorter one, of course, like 'mult' or 'mul-t'. DCDuring (talk) 18:27, 20 February 2024 (UTC)[reply]

@Mahagaja: How much freedom do we have in devising language codes? 0DF (talk) 18:30, 20 February 2024 (UTC)[reply]

@0DF: You'd have to get consensus at WT:RFM for it. I wouldn't hold my breath. —Mahāgaja · talk 18:48, 20 February 2024 (UTC)[reply]

@Mahagaja: Thanks for the response. I mean, rather, what restrictions are there on the form that language codes take? I know we use ISO 639-3 codes where they're available, but what about custom, in-house codes? 0DF (talk) 20:17, 20 February 2024 (UTC)[reply]

@0DF @Mahagaja @DCDuring We actually already have mul-tax as a variant of Translingual (no idea when it got added, but see Module:etymology languages/data). I don't think it's used for anything at the moment, but it would make sense to use it for this. Theknightwho (talk) 20:25, 20 February 2024 (UTC)[reply]

@Theknightwho: Thank you.
@DCDuring: How 'bout it?
0DF (talk) 20:29, 20 February 2024 (UTC)[reply]

I always fear that the cure will turn out worse than the disease. Can it all be done automagically or will there be a few hundred exceptions? It is true that mul in Latin script is hard to confuse with mul in CJKV. DCDuring (talk) 20:37, 20 February 2024 (UTC)[reply]

Daniel Carrero added Tax. "for test purposes" back in November 2016; -sche then standardized it to mul-tax. I don't know what he was testing, but the code is there for anyone who wants to use it. —Mahāgaja · talk 20:38, 20 February 2024 (UTC)[reply]

BTW, why are discussions like this conducted on a userpage rather than, say, BP or GP? Does that just reflect where the power is? DCDuring (talk) 20:42, 20 February 2024 (UTC)[reply]

@DCDuring: I looked at the histories of Module:form of, Module:form of/templates, and {{synonym of}}. They showed me that Benwing had done a lot of editing on all three, so I figured he/she would be sufficiently familiar with those pages to make the changes I requested. There's nothing suspicious about that and I hardly see how I can be said to have "power" here. 0DF (talk) 00:33, 21 February 2024 (UTC)[reply]

It's a habit of exclusion, not an intent of exclusion. Specific folks can always be pinged. DCDuring (talk) 14:31, 21 February 2024 (UTC)[reply]

@DCDuring: I guess so. Not that I intended the request to turn into a prolonged discussion. 0DF (talk) 15:12, 21 February 2024 (UTC)[reply]

Error handling with Module:parameters and Module:languages[edit]

Hiya - just a heads up (and you've probably noticed already), but I've recently updated Module:parameters to allow languages, scripts, families (etc) as data types, as well as a few other things. The means that the argument table which is returned contains the relevant object(s), and invalid codes will throw an error (which automatically highlights the incorrect parameter). This avoids having to manually handle invalid codes, since the only way to do proper error-handling previously was to pass the ready-baked parameter into Module:languages using getByCode's paramForError parameter, which was tricky when dealing with lists etc. Having converted a number of template modules, it's also cut down on code length by quite a bit, too.

Ideally, we should be able to remove error handling from Module:languages and Module:scripts altogether at some point, since it doesn't really belong there, and it's annoying having to work around it when requesting etymology langs and families, too. Theknightwho (talk) 15:21, 27 February 2024 (UTC)[reply]

@Theknightwho Yup I did notice it, thanks. I haven't had a chance to use the new functionality but it sounds good to me. BTW if you haven't already done this you might consider adding support for comma-separated lists of lang codes and for a term with a preceding language code (see parse_term_with_lang in Module:parse utilities, which implements this latter functionality currently). Benwing2 (talk) 20:01, 27 February 2024 (UTC)[reply]

@Benwing2 I've already done the comma-separated list actually, but haven't updated the documentation since I want to make sure the implementation is stable/won't need further expansion. The solution I opted for was sublist=, where sublist=true splits the list using %s*,%s*, but using a string value allows for other splits. The other thing which isn't yet documented is set=, which is for parameters that take an (ideally small) closed set of values, where inputs with other values would be nonoperative anyway.

I'll have a think about how to handle preceding langcodes. Theknightwho (talk) 20:07, 27 February 2024 (UTC)[reply]

@Theknightwho The |set= support is definitely useful. Note that the corresponding flag in Python's argparse module is called |choice=, which might possibly be a clearer name (although I can see the argument for using set as well). Benwing2 (talk) 20:16, 27 February 2024 (UTC)[reply]

@Benwing2 That makes sense. The reason I opted for set= is because it uses the {a = true, b = true, c = true} format, since that makes lookup much faster/simpler. Theknightwho (talk) 20:26, 27 February 2024 (UTC)[reply]

@Theknightwho Hmm, I wonder if that isn't false economy since it requires more typing, and I imagine a lot of people will call listToSet on a list to handle this format. Benwing2 (talk) 20:28, 27 February 2024 (UTC)[reply]

@Benwing2 That's a good point, but checking a list is the same amount of work as doing listToSet, so changing Module:parameters to accept a list would simply guarantee the worst-case scenario, instead of leaving it up to the calling module. Theknightwho (talk) 20:34, 27 February 2024 (UTC)[reply]

@Theknightwho I suppose but the actual difference in memory and speed is completely negligible, so IMO you might as well make it easier for the callers. Benwing2 (talk) 20:54, 27 February 2024 (UTC)[reply]

And also you don't have the overhead of loading a new module. Benwing2 (talk) 20:54, 27 February 2024 (UTC)[reply]

@Benwing2 If I have time, I might do some profiling on Module:parameters, since I have a feeling it's contributing a significant chunk to page loading time. e.g. a loads about a second faster since I made the changes, and there are still quite a few other optimisations that could be made. Theknightwho (talk) 21:02, 27 February 2024 (UTC)[reply]

@Theknightwho OK but I still think requiring the use of a set rather than (also) allowing a list is a micro-optimization since the number of items should be small. Benwing2 (talk) 21:10, 27 February 2024 (UTC)[reply]

@Benwing2 Alright - I can change it. Theknightwho (talk) 21:16, 27 February 2024 (UTC)[reply]

@Theknightwho & Ben: pardon the partial threadjacking, but I've been waning to ask you two about the practicality of adding parameter checking to existing, non-Lua templates, and this seems like an opportune moment while you're both already thinking about Module:paramaters. I'm envisioning something like an unobtrusive template {{allowparams|1,2,3,foo,bar,baz}} that could be added to existing templates to generate errors/warnings when the template is invoked with any params besides those listed. On the backend, it could just call Module:parameters.process() with the list of supplied params and then do nothing with the result. Ignoring the difficulty of identifying the valid parameters and cleaning up all the existing calls with invalid parameters, would adding param checking to every template add an unacceptable overhead to page processing? JeffDoozan (talk) 01:45, 28 February 2024 (UTC)[reply]

@JeffDoozan I think User:Theknightwho can best answer the question about efficiency as he's done a lot more investigations of this sort. Benwing2 (talk) 01:48, 28 February 2024 (UTC)[reply]

@JeffDoozan That's certainly doable, but it would add an extra Lua burden to those templates, and in many cases it would be more straightforward to do the whole thing in Lua anyway.

The reason why it concerns me is that a lot of these mixed templates already make multiple calls into Lua to retrieve things like language names, and there is an inherent cost every time a module is invoked; this is the reason why {{multitrans}} is so effective, because it removes that inherent cost from each template. Aside from memory costs, each invocation is quite time-consuming (relatively speaking), since a ton of things are done by the back-end to create each new Lua environment. Theknightwho (talk) 01:48, 28 February 2024 (UTC)[reply]

@Theknightwho: Thank you for the explanation. I had naively assumed that if a page calls Lua once, then subsequent calls would be relatively cheap. I'm still assuming that most pages include few enough templates that the benefit of having parameter checking outweighs the cost of invoking the checks, but as pages get bigger and closes to memory/speed limits, the calculus may change. Do you have any guess where that tipping point might be? (100 additional calls? 1,000? 10,000?) For pages that exceed that threshold, maybe {{allowparams}} could check the pagename against a fixed "denylist" of problematic pages before invoking Lua. I'm assuming the denylist would be < 100 pages and could be programmatically generated from an XML dump by counting the number of templates that would call {{allowparams}}. What do you think? JeffDoozan (talk) 17:39, 28 February 2024 (UTC)[reply]

@JeffDoozan So conventional wikicode would probably preclude that being workable, because there's the post-expand include size limit of 2MB, which is calculated by adding up the size of every page accessed, multiplied by the number of times it's accessed, and on top of that, parser functions like {{#if:}} actually apply a multiplier to anything that goes through them (which compounds, though I think it's capped at something like x12). This was a big problem we ran into with the lite templates, where the bottom 10% of a simply wasn't loading templates anymore. Even now, it's using about 1.8MB of the limit. Obviously I'm being really pessimistic when I say these things, but the irony of it is that adding these kinds of checks to aid large pages can end up having the opposite of the intended effect!

The things that help are:

Reducing the number of calls into Lua. If it can be done in one invoke that's ideal, but really it should be no more than 5. This includes uses of any templates which themselves are Lua based (like {{l}}), since they each result in independent calls into Lua. The Coptic conjugation templates are a great example of why this matters, since they're way slower than water/translations despite having nowhere near as many links.
Not creating complex wikicode logic with the parser functions (like we do with the citation templates, for example). They're really slow, a pain in the neck to maintain, and inevitably result in lots of separate Lua invocations for basic information like language names.

In terms of the parameter checking, let me know if there are any templates which are on your priority list, because it may be that we can score some quick-wins by converting some of them into pure-Lua, whereas with others the manual parameter checking may be workable. Theknightwho (talk) 17:51, 28 February 2024 (UTC)[reply]

@TheknightwhoThat kind of deep information is exactly why I wanted to run this by you. Since I'm hoping to do this programatically and en-mass, it would be limited to templates where I can parse the code to find all of the parameters used, which eliminates anything already calling #invoke since the invoked module can make its own use of the parameters and I'm not sure how practical it is to try to determine the parameters used by a Module. I think this means that every modified template would mean 1 additional call to Lua for every use and also that there's likely little or no benefit to converting them to Lua. How many total Lua calls on a page is too many?

I would probably start with the templates that don't already have calls with bad parameters, which probably means the lesser used templates that might not even be included on our bigest pages. I can check which templates are used on pages with more than X template calls and exclude those templates from the mass conversion, to ensure we're not adding additional stress to our biggest pages. I understand that not all template calls are equal, but is there some reasonable number of template calls I could use for detecting "big" pages? 100? 500? 1000? JeffDoozan (talk) 20:34, 28 February 2024 (UTC)[reply]

"terms spelled with"[edit]

Hi, I would like to bring your attention to categories such as Category:Hindi terms spelled with ॉ. We seem to have decided that ◌ (U+25CC) should not be used for the Hindi combining characters, but Translingual doesn't seem to know about that, which is why Category:Translingual terms spelled with ◌ॉ exists. What should we do about that? --kc_kennylau (talk) 16:47, 28 February 2024 (UTC)[reply]

@Kc kennylau Can you explain further about U+25CC? What is its replacement? As for the "terms spelled with" categories, AFAIK these categories are suppressed for one-character entries but this entry seems to involve two Unicode chars. Maybe User:Theknightwho can comment more as he reworked the code to generate these categories. Benwing2 (talk) 02:40, 29 February 2024 (UTC)[reply]

U+25CC is usually used with combining characters (see Category:Translingual terms spelled with ◌̺, which is U+25CC followed by U+033A) in order to display the character. However, due to some unknown reasons, at least in my browser the Hindi combining characters in "isolation" already come with a dotted circle when they are rendered, so using U+25CC would create two dotted circles when displayed. I tried to look at The Unicode Standard, but so far it seems to me that this is not really specified one way or another, at least not specifically for Devanagari. This is why I don't really know if we should include U+25CC or not. --kc_kennylau (talk) 02:48, 29 February 2024 (UTC)[reply]

(moved to Wiktionary:Beer parlour/2024/April#"terms spelled with") --kc_kennylau (talk) 00:59, 4 April 2024 (UTC)[reply]

Latin macronization change: veho, vē̆xī, vectum[edit]

Hello, I was just looking into the vowel length of Latin vē̆xī (perfect of vehō) and it looks like most recent sources think there's a good chance that it had a long vowel like Sanskrit ávākṣam (although there is some uncertainty). I edited the entry for vehō with notes on this and to mark the vowel in the perfect stem as ē̆, but of course, that doesn't affect all the inflected forms and derived compounds (e.g. advehō, convehō, invehō, prōvehō, subvehō, trānsvehō, ēvehō). Could you have Wingerbot update those? (The long vowel seems to only be reconstructed for the perfect stem vē̆x-, not the supine stem vect-). I hope it's not too much trouble. I have also been wondering how I might set up a bot account of my own to make changes like this after editing the length of a vowel in Latin entries; if that's feasible for me to do, any tips would be welcome! Urszag (talk) 20:46, 1 March 2024 (UTC)[reply]

@Urszag Hi. I'll go ahead and fix these. As for setting up a bot account, in order to do that (a) you need to be able to write Python scripts, (b) you do some small test runs using your own account and verify that everything works, (c) you set up a vote to create an account for your bot using the link in WT:Votes. I recommend using a combination of pywikibot to interface to Wiktionary and mwparserfromhell to parse the template invocations on a given page. Note that there's also AutoWikiBrowser which lets you make semi-automated changes based on regular expressions and takes less work to set up than a bot account; I used this several years ago before I set up a bot account. (It is only supported on Windows but it seems to work OK through Wine on MacOS, and there's also a JavaScript browser variant called JWB.)

BTW are there are any other macron changes you need done? I think there's an outstanding request somewhere in my archives that I never got to, possibly it was from you. Benwing2 (talk) 01:49, 2 March 2024 (UTC)[reply]

Done. Benwing2 (talk) 05:18, 2 March 2024 (UTC)[reply]

OK, I found the previous request. It was from you in April 2023: User talk:Benwing2/2023#More Latin vowel length changes. You mentioned hirtus, hirsutus, luxus, luctor. The relevant part of the input to my script has this:

###
### hīrtus
### 
a1 hīrtus
pn2 Hīrtius
a1 hīrsūtus
a1 hīrtellus
a3 hīrtipēs hīrtiped
###
### lūctor
###
v1+ lūctor
n1 lūcta
n3 lūctātiō
n3 lūctātor
v1+ adlūctor
v1+ allūctor
v1+ collūctor
n3 collūctātiō
v1+ conlūctor
n3 conlūctātiō
v1+ ēlūctor
a3 ēlūctābilis
a3 inēlūctābilis
v1+ relūctor
n3 relūctātiō
###
### lūxus "dislocated"
###
a1 lūxus
n4 lūxus
v1+ lūxō

Do all these need to change to ī̆ ū̆? Are there any words missed here? Also can you give me the appropriate changelog comment(s) to have the bot add when making the changes? The default is "if before two cons, per Bennett corrected by Allen and Michelson" but that's obviously wrong for these cases. Benwing2 (talk) 05:30, 2 March 2024 (UTC)[reply]

Thanks! Those all look correct with ī̆ ū̆. I would add lūxuria, lūxuriō, lūxuriōsus, lūxuriēs, obluctor.

In addition, it looks like I missed some inflected forms of derivatives of nūbō, nūpsī, nū̆ptum when I made that change (e.g. nūptum, nūptiāle). Specifically, there's innūbō, inflected forms of innū̆ptus, nū̆ptia, nū̆ptiae, nū̆ptiālis, nū̆ptus (It seems I just edited the main entry for these), and connūbium and its inflected forms.

I just made a new change to the perfects of alliciō, allē̆xī (formerly marked as just long) and illicio, illexī and pellicio, pellexī (formerly marked as just short) to mark them as uncertain (it seems likely all three had the same quality, probably short). These just need the inflected verb forms updated.

The references I'm basing these on are cited at the pages for hī̆rtus, lū̆xus, lū̆ctor, alliciō, nūbō, cōnū̆bium, so I think one option is to add notes of the format "Vowel length marked as uncertain based on references cited at hī̆rtus", and so on. Or the specific references could be listed as follows. Hirt- and lux-: uncertain based on Bennett (long) vs. De Vaan (short). Luct-: uncertain based on Bennett (long) vs. De Vaan, Wartburg, Buchi and Schweickard (short, with complications). Allex-: uncertain based on Bennett, Buck and Allen. Nupt-: uncertain based on Lewis and Bennett (long) vs. De Vaan, Ernout and Meillet, Wartburg and Bienvenu (short). -nubium: uncertain per Kennedy. -licio, -lē̆xī: uncertain per Bennett and Buck, "probably short" per Allen.--Urszag (talk) 15:13, 2 March 2024 (UTC)[reply]

@Urszag Done. Note that there also exists conubialis, which is currently indicated with long ū. Not sure if this needs ū̆. Benwing2 (talk) 06:18, 4 March 2024 (UTC)[reply]

Thank you! Yes, conubialis seems to be like conubium.--Urszag (talk) 06:36, 4 March 2024 (UTC)[reply]

Category:Hijazi Arabic terms with IPA pronunciation - Alphabet order[edit]

how can you change the alphabet order of the Hijazi Arabic letters from

آ أ إ ا ب پ ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ڤ ق ك ل م ن ه و ي

to

آ أ إ ا ب ت ث ج ح خ د ذ ر ز س ش ص
ض ط ظ ع غ ف ق ك ل م ن ه و ي . پ ڤ

since پ and ڤ are additional letters and not part of the Alphabetical order عربي-٣١ (talk) 12:39, 2 March 2024 (UTC)[reply]

@عربي-٣١ Are you referring to the sort order as it appears on category pages? The thing is, those additional letters are letters even if they aren't part of the standard Hijazi alphabet, and they need to be sorted *somewhere*. The "to chart" you gave doesn't include them anywhere. Benwing2 (talk) 22:47, 3 March 2024 (UTC)[reply]

Oh NVM, you want them placed at the end. Benwing2 (talk) 22:48, 3 March 2024 (UTC)[reply]

@Theknightwho @Fenakhay What do you think about this? It looks to me like there is no explicit sort key currently specified for Hijazi Arabic (nor for Egyptian and Gulf Arabic). Standard Arabic has a sort key but only for one Judeo-Arabic character, and Moroccan Arabic has a sort key of some sort that has no comments so I'm not sure what it's doing. IMO we should strive to treat all varieties of Arabic the same as much as possible, e.g. in using the same sort order everywhere as much as feasible; the additional letters correspond to /p/ and /v/, which are marginal phonemes in most varieties of Arabic (with the possible exception of /p/ in Iraqi varieties?). (Also per Wikipedia's Varieties of Arabic article, there are two different ways of writing /v/ in the Arabic script, corresponding to an East-West split.) Benwing2 (talk) 03:06, 4 March 2024 (UTC)[reply]

@Benwing2 Well they are additional variants of letters (foreign letters) and should be included at the end of this list, since they are already included as the last when you check the pages in any of the Arabic dialects sorting pages, also the Arabic sorting key should be from right to left as with the rest of Arabic dialects (not from left to right as it is now in Category:Arabic terms) عربي-٣١ (talk) 16:01, 13 March 2024 (UTC)[reply]

@عربي-٣١ Sounds good to me but can you post about this in the Beer parlour (WT:BP) to make sure no one objects? Benwing2 (talk) 18:05, 21 March 2024 (UTC)[reply]

Replacement of quotation templates[edit]

Hi, when you have time could you please do the following quotation template replacements?

{{RQ:Ayliffe PJCA}} → {{RQ:Ayliffe Juris Canonici}}
{{RQ:Bancroft USA}} → {{RQ:Bancroft United States}}
{{RQ:Fairfax Godfrey of Bulloigne}} → {{RQ:Tasso Fairfax Godfrey of Bulloigne}}
{{RQ:Milton Eikonoklastes|1|Chapter Name}} → {{RQ:Milton Eikonoklastes|chapter=Chapter Name|page=1}}

Thank you! — Sgconlaw (talk) 13:45, 3 March 2024 (UTC)[reply]

@Sgconlaw Done. Benwing2 (talk) 22:47, 3 March 2024 (UTC)[reply]

Thanks! — Sgconlaw (talk) 11:28, 4 March 2024 (UTC)[reply]

By the way, was the {{RQ:Milton Eikonoklastes}} replacement also done? I couldn’t tell; maybe none of the entries it’s used in are on my watchlist. If so I’m changing the template to swap around the |1= and |2= parameters so that the template is in line with other templates. — Sgconlaw (talk) 11:37, 4 March 2024 (UTC)[reply]

@Sgconlaw Yes. There were only a few pages using those params though. Benwing2 (talk) 21:08, 4 March 2024 (UTC)[reply]

OK, great. — Sgconlaw (talk) 22:04, 4 March 2024 (UTC)[reply]

Bugs in ar-conj/module:ar-verb[edit]

Hi. I want to inform you about a couple of problems in ar-con/module:ar-verb. I already informed Fenakhay about'em, I'll also inform you just in case, perhaps you can sort it out. I'm sorry in advance for my post being this long:

when I was looking for entries on حَيَّ/حَيِيَ (root ح ي و), I saw long present tense alone (يَحْيَا) still being generated for short form, and it doesn't generate the short one (يَحَيُّ), which exists per Lisan al-Arab: [3]. Needs to be fixed to generate short present tense.

Also a related problem is for عَيَّ/عَيِيَ (root ع ي ي), while the conjugation table for long form عَيِيَ will be generated with specified paradigm i/a with long present يَعْيَا, unlike with حَيَّ, conjugation table for عَيَّ won't be generated at all. Btw, it also has short version of present: يَعْيُّ: [4]

Also notice how participles aren't generated at all for حَيَّ/حَيِيَ (should be short and long versions: حَيّ and حَيِيّ). Fixmaster (talk) 20:45, 5 March 2024 (UTC)[reply]

Bugs in ar-conj/module:ar-verb (part 2)[edit]

Also notice how participles aren't generated at all in conjugation tables for حَيَّ/حَيِيَ (should be short and long versions of active participles: حَيّ and حَيِيّ). Same goes for عَيَّ/عَيِيَ (should be عَيّ/عَيِيّ per dictionaries).

And if you generate the conjugation table with عَيِيَ (don't forget, the table for عيَّ won't generate at all), there will be participles, but with wrong form: عَايٍ for active and مَعْيُوّ for passive.

Btw, speaking of passive participles, what they should be? In almaany online dictionary, I found مَحْيىّ and مَعِيّ correspondingly. Notice how patterns don't match? In any case, they could probably be ignored, those passove are mostly theoretical and impersonal, anyway. Just thought it was worthy of mentioning.

What matters is the ability to generate the conjugation table at all for short version verb عَيَّ like we have for حَيَّ, long present tense for these 2 (يَحَيُّ and يَعَيُّ) which currently isn't generated, and generation of short/long active participles (حَيّ/جَيِيّ and عَيّ/عَيِيّ)

Just as a side note: maybe there should be parameters in the template to forcefully override active/passive participles (like we have the parameter for verbal nouns)? Just an idea. Fixmaster (talk) 20:41, 5 March 2024 (UTC)[reply]

About categories[edit]

Feedback on categories from a not-so-clever reader, if you allow me. I find Categories at en.wikt very complex and unpatrolled (many were started by someone, and then were left untouched). Some of them are broken in so specialised subcategories, that one cannot find a wanted word e.g. dog in Cat:en:Animals. Is there an index=1 kind of Category-Index (allll members a...z)? We have done this at @el.wikt.Animals, plants, medicine with a different colour. Just 3 or 4 Cats. The little ««« links to the overall Cat for all languages. Also! The code-indicator for topics makes alphabetisations and comprehension impossible: why should a reader know the codes? If a first word is to be avoided, why not the style: Cat:Animals (English)? Thank you for listening. ‑‑Sarri.greek ^♫ I 03:31, 6 March 2024 (UTC)[reply]

@Sarri.greek You've brought up several points and this is a big topic. Can you bring this up in the Beer Parlour? Most of the basic decisions concerning category structure predate me and we'd need consensus to institute any significant changes. Benwing2 (talk) 03:35, 6 March 2024 (UTC)[reply]

@Benwing2, Here, I am not an admin, it is not my place to bring such things for discussion -my understanding of en.wikt structure and modules is not adequate-. Sir, I have been thinking at el.wikt (from where my admin.collegues, mostly wikipedians, demanded that i stop, for being too autocratic... True: I cannot stand sloppiness, lack of refs, loose CFI etc. :) But same is valid for all wiktionaries perhaps: 20 years have passed. Basics (plus details too) are covered. What now? I think, a general workpage for a.Feedback on the current state. b.The future plans for formation of crews on each subject. Cleanup, reviewing, and unifgying: cats, params, templats. Leadership: vote plans by Xadmin, by Zadmin., people responsible to do the plan and supervise the crews. If you organise a room /wikt.Future or something... and subpages for Cats, for Temps etc... we could all bring ideas? Plus: a very important thing. en.wikt is now the leader of all wiktionries, where every little wikt. copies from. IF you had to design a wiktionary from scratch, how would you go about it? Because now, it is a patchwork procedure: adding, correcting in a maze of things... Hhhhh I talk too much too! Sorry ‑‑Sarri.greek ^♫ I 04:01, 6 March 2024 (UTC)[reply]

@Sarri.greek I think in a wiki it's impossible to do everything top-down. It has to be done through consensus. Also I don't think we need a separate wikt.Future discussion forum or anything; that's what the Beer Parlour is for. There's no need to be an admin to initiate a discussion for change, just go ahead. Benwing2 (talk) 04:32, 6 March 2024 (UTC)[reply]

Adding a category with multiple subcategories[edit]

Hi, I'd like to add categories to track calls to templates with bad parameters but I haven't touched categories before so I wanted to double check that that this is a reasonable idea and that I'm going about it the right way. I think I need to create a parent category and then use a handler for the per-template categories. Since these would be maintenance categories, I would edit Module:category tree/poscatboiler/data/wiktionary maintenance and insert:

-- add the variable handlers at the top of the page (the file doesn't currently use any handlers)
local handlers = {}

--- snip ---

raw_categories["Pages using bad params when calling a template"] = {
	description = "Pages that use unrecognized parameters when calling a template.",
	additional = "These template calls should be reviewed and corrected or removed",
	breadcrumb = "Bad template params",
	parents = {"Wiktionary maintenance"},
	can_be_empty = true,
	umbrella = false,
	hidden = true,
}

table.insert(handlers, function(data)
	local template = data.label:match("^Pages using bad params when calling (.+)$")
	if template then
		return {
			description = "Pages that use unrecognized parameters when calling " .. template .. ".",
	        additional = "These template calls should be reviewed and corrected or removed",
			breadcrumb = template,
			umbrella = false,
			parents = {{
				name = "Pages using bad params when calling a template",
				sort = template,
			}},
		}
	end
end)

-- add HANDLERS to the existing return table
return {RAW_CATEGORIES = raw_categories, HANDLERS = handlers}

I know I can do something similar using template tracking, but I'm trying to make this a little more "user friendly" with the hope that it won't just be me cleaning up these categories. Is there an overhead cost to using categories like this or anything else I should take into consideration? Thanks! JeffDoozan (talk) 21:04, 8 March 2024 (UTC)[reply]

@JeffDoozan Yup, this approach will work, although you need a few changes: (1) use a raw handler instead of a regular handler (because the category in question doesn't begin with a language name), and the first line of the handler should use `data.category` instead of `data.label`; (2) you don't need the `umbrella` settings because raw categories don't have corresponding umbrella categories. Other than that everything looks good. Benwing2 (talk) 21:18, 8 March 2024 (UTC)[reply]

After adding categorization to ~300 templates that are used less than 5 times and called at least once with invalid parameters, I think it would be easier for cleanup if the templates were categorized into "language" templates and "general use" templates, like this:

Category:Pages using bad params when calling a template
- Category:Pages using bad params when calling Finnish templates
  - Category:Pages using bad params when calling Template:fi-decl-hame-dot
- Category:Pages using bad params when calling general use templates
  - Category:Pages using bad params when calling Template:cite-av‎

To do that, I came up with the following code:

raw_categories["Pages using bad params when calling a template"] = {
	description = "Pages that use unrecognized parameters when calling a template.",
	breadcrumb = "Bad template params",
	parents = {"Wiktionary maintenance"},
	can_be_empty = true,
	hidden = true,
}

table.insert(raw_handlers, function(data)
	local template_type = data.category:match("^Pages using bad params when calling (.+) templates$")
	if template_type then
		return {
			description = "Pages that use unrecognized parameters when calling " .. template_type .. " templates.",
			breadcrumb = template_type,
			parents = {{
				name = "Pages using bad params when calling a template",
			}},
			hidden = true,
		}
	end
end)

table.insert(raw_handlers, function(data)
	local template = data.category:match("^Pages using bad params when calling (.+)$")

	if template then
        template_name_without_namespace = template:gsub("^Template:", "")

		-- Check if the template name starts with a hyphenated language code
		local lang
		possible_language_code = template_name_without_namespace:match("^([a-z][a-z][a-z]?-[a-z][a-z][a-z])-")
		if possible_language_code ~= nil then
			lang = require("Module:languages").getByCode(possible_language_code)
		end

		-- Check if the template name starts with a two or three character language code
		if lang == nil then
			possible_language_code = template_name_without_namespace:match("^([a-z][a-z][a-z]?)-")
			lang = require("Module:languages").getByCode(possible_language_code)
		end

		local template_type
		if lang == nil then
			template_type = "general use"
		else
			template_type = lang:getCanonicalName()
		end

		return {
			description = "Pages that use unrecognized parameters when calling " .. template .. ".",
	        additional = "These template calls should be reviewed and the bad parameter should be corrected or removed.",
			breadcrumb = template,
			parents = {{
				name = "Pages using bad params when calling " .. template_type .. " templates",
				sort = template_name_without_namespace,
			}},
			hidden = true,
		}
	end
end)

Am I just re-inventing umbrella categories? Is there a better way to do this? Would this add unnecessary overhead to the categorization system? JeffDoozan (talk) 22:28, 15 March 2024 (UTC)[reply]

A couple of code replacements[edit]

Hi, as part of the Min Nan split, would it please be possible for you to bot replace a couple of the codes which are being deprecated? The only places these are now used should be links, which should make the switch straightforward.

Hokkien: nan-hok → nan-hbl (etym-only to full language conversion)
Teochew: zhx-teo → nan-tws (code standardisation within the nan family)

Thanks. Theknightwho (talk) 01:12, 11 March 2024 (UTC)[reply]

@Theknightwho Sure, will do. Benwing2 (talk) 01:30, 11 March 2024 (UTC)[reply]

Thanks. Theknightwho (talk) 01:34, 11 March 2024 (UTC)[reply]

@Theknightwho Does the code zhx-teo still exist? I can't find any references to it in the language data. Benwing2 (talk) 01:34, 11 March 2024 (UTC)[reply]

@Benwing2 It's currently set up as an alias, but that's just a temporary thing. I recently changed the way aliases are handled so that they're no longer directly integrated into the data, because (a) that added overhead we don't need most of the time, (b) it makes keeping track of aliases easier by collating them all in one place, (c) it means we can use them for situations like this, where a code is being changed for whatever reason, and (d) we can now use them for full languages without having to complicate the language data (see point c). They're now stored in Module:languages/data at the bottom. Theknightwho (talk) 01:37, 11 March 2024 (UTC)[reply]

@Theknightwho Ahh, thanks. Benwing2 (talk) 01:41, 11 March 2024 (UTC)[reply]

@Benwing2 Btw, it does mean the integration isn't quite as smooth as before, since you now can't use aliases for anything that accesses the language data directly as the alias is only looked up during the creation of a language object. In practical terms, that just means they can't be used anywhere in the language data itself (e.g. the ancestors field). That was semi-intentional, though, since we don't really want aliases in the first place. Theknightwho (talk) 01:45, 11 March 2024 (UTC)[reply]

@Theknightwho Yeah that is fine. I agree we should eliminate aliases as much as possible, and in fact I did that previously with a bunch of random etym-only aliases. Benwing2 (talk) 01:47, 11 March 2024 (UTC)[reply]

@Benwing2 I've just added a check to Module:data consistency check for alias codes, which covers the data for languages, etym-only languages, families and scripts: all it does is check that none of the subtables has multiple keys (e.g. due to someone adding m["abc"] = m["xyz"], which is the old way aliases were handled).

The only ones it's found at the moment are for various Arabic script codes, where I consolidated all the ones that had identical tables a while back. Working out what to do with them will need a proper discussion, though. Theknightwho (talk) 02:43, 11 March 2024 (UTC)[reply]

@Theknightwho Yeah I've never been very happy with having a bunch of language-specific script codes for Arabic and certain other scripts. However, I'm not sure whether it's possible to eliminate them (or some of them) using things like language selectors in CSS. Maybe User:This, that and the other and/or User:Erutuon can comment more. Benwing2 (talk) 02:48, 11 March 2024 (UTC)[reply]

@Theknightwho I did a replacement run for both codes but as the tracking categories were only added yesterday, it will take longer to flush out all the old usages (indeed I now see 8 new pages in the nan-hok category and 3 in the zhx-teo category). Benwing2 (talk) 03:22, 11 March 2024 (UTC)[reply]

Thanks. Theknightwho (talk) 05:38, 11 March 2024 (UTC)[reply]

@Theknightwho I'll do another run tomorrow. Benwing2 (talk) 05:41, 11 March 2024 (UTC)[reply]

@Theknightwho Did another run. Going to bed now but will do another one tomorrow evening; hopefully that will catch any stragglers. Benwing2 (talk) 08:42, 11 March 2024 (UTC)[reply]

Sounds good - thanks. Theknightwho (talk) 08:43, 11 March 2024 (UTC)[reply]

@Theknightwho I did two runs, one just now and one about 10 hours ago, and already more have appeared, so it may be a few days before everything catches up and there are no more additions to the tracking categories. Benwing2 (talk) 07:50, 12 March 2024 (UTC)[reply]

@Theknightwho I went through CAT:Terms derived from Hokkien and CAT:Terms derived from Teochew recursively and changed all the terms in them as well as remaining tracked terms (including uses in {{rfp}} and {{cog}} and such). I *THINK* this is done now; probably close enough that you can delete the old codes and handle any remaining errors as they occur. Benwing2 (talk) 22:40, 12 March 2024 (UTC)[reply]

@Benwing2 Thanks - I caught one, but that looks to be it. Theknightwho (talk) 18:49, 13 March 2024 (UTC)[reply]

I have also wondered why we use those special lang+script codes for the Arab and Beng scripts. Perhaps they date from a time when no other solution was well-supported enough to deliver different fonts for different languages. I note that Syrc and Xsux specify different fonts for different languages with CSS alone, so it is clearly possible to do it that way. (Not too sure what is going on with Mong...) This, that and the other (talk) 03:50, 11 March 2024 (UTC)[reply]

@-sche, Surjection Maybe either of you could comment. If we can replace things like fa-Arab with just the appropriate language selectors in MediaWiki:Gadget-LanguagesAndScripts.css I would rather do it that way and not expose what is essentially an implementation detail into the wikicode. Benwing2 (talk) 03:57, 11 March 2024 (UTC)[reply]

@This, that and the other, Benwing2 In the case of Mong, it's been split because the code actually covers four closely related scripts: Mongolian (proper), [Oirat] Clear Script, Manchu and Xibe. It's a situation where the split exists to get more accurate language data, rather than because we need different CSS classes (though that may be something we want in the future; Manchu and Oirat-specific fonts exist, and I suspect Xibe as well). In each case, the character ranges only cover the characters used by those scripts; there's some overlap, but most are only used in a subset of the four. See [5] for a breakdown (note: Todo = Clear Script; Sibe = Xibe). (Edit: this distinction does matter in some cases, e.g. Sanskrit, which has Mong, mnc-Mong and xwo-Mong.) Theknightwho (talk) 05:38, 11 March 2024 (UTC)[reply]

@Theknightwho could you update the Chinese entry at WT:LT, such as it is? This, that and the other (talk) 03:51, 11 March 2024 (UTC)[reply]

Done. Theknightwho (talk) 05:38, 11 March 2024 (UTC)[reply]

Module editing tutorials[edit]

Hi, would you be able to point me to some places where I can learn more about module creation and editing?

I'm self-taught in HTML which has served me fine for entries and templates, but there are quite a lot of things I would like to see done at the module level in Welsh (ways of presenting collective-singulative nouns, accounting for literary and colloquial forms in adjectives, a template for phrasal verbs, a template for generating IPA transcriptions) that at the moment are well beyond my abilities.

I'd also prefer not to bother other users by constantly asking them to do tasks for me when I could just learn to do it myself. Cheers, Arafsymudwr (talk) 16:45, 13 March 2024 (UTC)[reply]

@Arafsymudwr Sorry for the very belated response! The documentation on how modules work, as well as links to tutorials (under the "Getting started" section), is found in WT:LUA. The first thing you will need to do is learn something about Lua. If you are at all familiar with JavaScript, you will find Lua rather similar. When you make a change to a module, you should always test it before saving. The way to do that is to use the "Preview page with this template" functionality (a box near the bottom left) to preview one of the Welsh pages that uses the module. Start by making a small change and gradually make more extensive changes as you get more comfortable. Let me know if I can be of more assistance. Benwing2 (talk) 23:59, 19 March 2024 (UTC)[reply]

Min translations[edit]

Hi - following the renaming of various Min lects, could you please do the following name replacements in translation sections?

Min Bei → Northern Min (mnp)
Min Dong → Eastern Min (cdo)
Min Zhong → Central Min (czo)
Puxian → Puxian Min (cpx)

They should all be nested under Chinese.

I'm not including Min Nan, since all the translations have to be converted manually due to the split anyway, so changing them to Southern Min would just create confusion. Thanks. Theknightwho (talk) 21:26, 13 March 2024 (UTC)[reply]

@Theknightwho OK I have an existing script to sort translations that I was able to modify to handle this. I will run it shortly. As for Min Nan in translation sections, I checked and there are 2,637 pages with Min Nan translations in them so it will take awhile to do this totally manually. I had hoped they would have a qualifier by them indicating the particular Min Nan lect but usually that doesn't seem to be the case. The first two examples, from dictionary and rain cats and dogs, are typical:

*: Min Nan: {{tt+|nan|字典|tr=lī-tián / jī-tián}}, {{tt|nan|詞典|tr=sû-tián}}, {{tt|nan|辭林|tr=sû-lîm}}
*: Min Nan: {{t+|nan|㴙㴙落|tr=tsa̍p-tsa̍p-lo̍h}}

I know little about Min Nan but from what I've heard, I suspect the vast majority of them are Hokkien. It may be possible in any case to speed this up by looking up the terms in question to see whether the lect can be identified. For example, the four terms given above all have Pronunciation sections indicating that the transliterations in question are Hokkien (and some of them also have Taiwanese Hokkien qualifiers). Some translations don't have transliterations given, but in that case as long as there is a Hokkien pronunciation given, I think it's fine to tag it as Hokkien. (Also I looked for Teochew translations and several of them are tagged as nan or even mn, presumably because someone thought mn stood for Min Nan.) Benwing2 (talk) 23:53, 13 March 2024 (UTC)[reply]

@Benwing2 Thanks - I've spent a couple of hours going over them so far, and I've already dealt with all the ones that were marked Teochew (including the one labelled mn, yeah). Out of the ones simply marked "Min Nan", I've only found one which was definitely Teochew, with the others all being Hokkien.

In terms of automating it, the safest thing to do would be to convert any which don't have numbered tones to Hokkien, leaving the rest for manual review (which will probably be <20).

There could plausibly be a handful which are in fact Teochew but have POJ-style (i.e. Hokkien-style) transliterations, but I don't think it's feasible to determine those, since it would be way too time-consuming to convert it to the correct romanisation and check against the entry for every single translation.

Theknightwho (talk) 00:02, 14 March 2024 (UTC)[reply]

@Theknightwho: Sounds good. For reference here is the complete list of Min Nan translations as of the Mar 1 dump that have numbered tones in them:

Page 872 four: Found match for regex: *: Min Nan: {{qualifier|Xiamen}} {{tt+|nan|四|tr=sì, sù}}, {{qualifier|Teochew}} {{tt+|nan|四|tr=si3}}
Page 873 five: Found match for regex: *: Min Nan: {{qualifier|Xiamen}} {{tt+|nan|五|tr=go, ngò}}, {{qualifier|Teochew}} {{tt+|nan|五|tr=ngou6}}
Page 1054 eight: Found match for regex: *: Min Nan: {{qualifier|Xiamen}} {{tt+|nan|八|tr=peh, poeh, pat}}, {{qualifier|Teochew}} {{tt+|nan|八|tr=boih4}}
Page 2107 percent: Found match for regex: *: Min Nan: {{t|nan|百分之|tr=pah-hun-chi...|alt=百分之……}} {{qualifier|the number follows it, e.g. 30%: 百分之三十 pah-hun-chi saⁿ-cha̍p}}
Page 2462 cousin: Found match for regex: *: Min Nan: {{t|nan|叔伯兄|tr=chek-peh-hiaⁿ}} {{qualifier|{{tooltip|older, father’s brother’s son|[[oFBS]]|und=1}}}}, {{t|nan|叔伯阿兄|tr=chek-peh-a-hiaⁿ}} {{qualifier|{{tooltip|older, father’s brother’s son|[[oFBS]]|und=1}}}}, {{t|nan|叔伯小弟|tr=chek-peh-sió-tī}} {{qualifier|{{tooltip|younger, father’s brother’s son|[[yFBS]]|und=1}}}}, {{t|nan|叔伯阿姊|tr=chek-peh-a-chí}} {{qualifier|{{tooltip|older, father’s brother’s daughter|[[oFBD]]|und=1}}}}, {{t|nan|叔伯小妹|tr=chek-peh-sió-mōe, chek-peh-sió-bē}} {{qualifier|{{tooltip|younger, father’s brother’s daughter|[[yFBD]]|und=1}}}}, {{t|nan|表兄|tr=piáu-hiaⁿ}} {{qualifier|{{tooltip|older, mother’s sibling’s or father’s sister’s son|o[[MSiS]] or [[oFZS]]|und=1}}}}, {{t|nan|表小弟|tr=piáu-sió-tī}} {{qualifier|{{tooltip|younger, mother’s sibling’s or father’s sister’s son|y[[MSiS]] or [[yFZS]]|und=1}}}}, {{t|nan|表姊|tr=piáu-ché, piáu-chí}} {{qualifier|{{tooltip|older, mother’s sibling’s or father’s sister’s daughter|o[[MSiD]] or [[oFZD]]|und=1}}}}, {{t|nan|表小妹|tr=piáu-sió-mōe, piáu-sió-bē}} {{qualifier|{{tooltip|younger, mother’s sibling’s or father’s sister’s daughter|y[[MSiD]] or [[yFZD]]|und=1}}}}
Page 2809 handmaid: Found match for regex: *: Min Nan: {{t|nan|女婢|tr=lu2-pi7}}, {{t|nan|tsa1-boo2-kan2}}
Page 4233 eyelash: Found match for regex: *: Min Nan: {{t|nan|目睭毛//目珠毛|tr=ba̍k-chiu-mn̂g / ba̍k-chiu-mô͘}}, {{t+|nan|目睫毛|tr=ba̍k-chiah-mn̂g / ba̍k-cheeh-mô͘ / ba̍k-chiah-mô͘ / ba̍k-chia̍p-mn̂g / ba̍k-chiap-mn̂g}}, {{t|nan|目毛|tr=ba̍k-mn̂g / ba̍k-mô͘}}; {{t|nan|目眥毛|tr=mag8 ci3 mo5}} {{q|Teochew}}
Page 4352 flesh: Found match for regex: *: Min Nan: {{t+|nan|肉|tr=bah4}}
Page 5089 stiff: Found match for regex: *: Min Nan: {{t|nan|liau1}}
Page 16449 aircraft: Found match for regex: *: Min Nan: {{t|nan|飞行器|tr=hui1-hing5-khi3}}
Page 30166 gnash: Found match for regex: *: Min Nan: {{t|nan|咬牙切齒|tr=ga6 ghê5 ciag4 ki2}}, {{t|nan|咬牙|tr=kā-gê}}, {{t|nan|切齒|tr=chhiat-khí / chhiat-chhí}}
Page 31973 farmer: Found match for regex: *: Min Nan: {{t+|nan|農民|tr=lông-bîn}}, {{t+|nan|作穡人|tr=chò-sit-lâng}}, {{t+|nan|作田人|tr=chó-chhân-lâng}}, {{t|nan|农夫|tr=long5-hu1}}
Page 35994 cabbage: Found match for regex: *: Min Nan: {{t|nan|植物人|tr=sêg4 muêh8 nang5}}
Page 38088 arsehole: Found match for regex: *: Min Nan: {{t|nan|lan7-tsiau2-bin7}}, {{t|nan|臭面人|tr=tshau2-bin7-lang5}}
Page 43201 glove: Found match for regex: *: Min Nan: {{t+|nan|手囊|tr=tshiu2-long5}}, {{t+|nan|手套|tr=tshiu2-tho3}}
Page 45493 reunion: Found match for regex: *: Min Nan: {{t|nan|ui5-loo5 围炉}}
Page 45800 dung beetle: Found match for regex: *: Min Nan: {{t|nan|蜣螂|tr=khiong-lông}}, {{qualifier|Quanzhou Hokkien}} {{t|nan|屎龜|tr=sái-ku}}, {{t+|nan|牛屎龜|tr=gû-sái-ku}}, {{qualifier|Teochew}} {{t|nan|牛屎核|tr=ghu5 sai2 hug8}}
Page 48510 loess: Found match for regex: *: Min Nan: {{t+|nan|黃色}}, {{t|nan|黄砂|tr=hong2 sê1}}
Page 50690 troublesome: Found match for regex: *: Min Nan: {{t|nan|lo1so1}}, {{t|nan|lui1-lui1-tui1-tui1}}, {{t|nan|啰嗦|tr=lo1-so1}}
Page 50799 feud: Found match for regex: *: Min Nan: {{t|nan|se3-siu5}}
Page 54507 sashimi: Found match for regex: *: Min Nan: {{t|nan|刺身|tr=chhiah-sin; sa33 si55 mih3}}
Page 64034 shove: Found match for regex: *: Min Nan: {{t|nan|long1}}, {{t|nan|lang1}}, {{t|nan|nng1}}
Page 67068 vulva: Found match for regex: *: Min Nan: {{t|nan|陰門|tr=im-mn̂g}}, {{t|nan|外阴|tr=gue7-im1}}
Page 76097 shirk: Found match for regex: *: Min Nan: {{t|nan|liu1-kiang1}}
Page 104634 halfway: Found match for regex: *: Min Nan: {{t|nan|半路|tr=puann3-loo7}}
Page 106660 thimble: Found match for regex: *: Min Nan: {{t|nan|鍼黹}}, {{t+|nan|針黹|tr=cham-chí, chiam-chí}} {{qualifier|Mainland China}}, {{t|nan|指套|tr=chí-thò}} {{qualifier|Quanzhou and Xiamen}}, {{t|nan|頂針|tr=dêng2 zam1}}, {{t|nan|銅指|tr=tâng-cháiⁿ}}
Page 106811 spacious: Found match for regex: *: Min Nan: {{t|nan|阔|tr=khuah4}}, {{t|nan|khuann3-long1-long1}}
Page 125580 K2: Found match for regex: *: Min Nan: {{t|nan|K2 Hong}}
Page 179602 disadvantageous: Found match for regex: *: Min Nan: {{t|nan|put4-li7}}
Page 335793 Wiktionary:Beer parlour/2007/April: Found match for regex: :::*:Min Nan: (''Amoy'') [[囡仔]] ([[gín-á]]); (''Teochew'') [[孥囝]] ([[nou5gian2]])
Page 1357199 Wiktionary:Beer parlour/2009/May: Found match for regex: :: That's '''1''' then. The [[child]] has 3 levels. Is it really necessary? Can we keep to 2 levels? For example, ** Min Nan: 囡仔 (gín-á), 孥囝 (nou5gian2) (''Teochew'')? [[User:Atitarev|Anatoli]] 22:39, 10 May 2009 (UTC)

Benwing2 (talk) 00:58, 14 March 2024 (UTC)[reply]

Thanks. Theknightwho (talk) 00:59, 14 March 2024 (UTC)[reply]

@Theknightwho I am running my script now to change Min Dong -> Eastern Min and Min Bei -> Northern Min and re-sort appropriately (there were no translations involving Min Zhong or Puxian). A couple of questions:

Are you finished fixing up the pages with numbered tones in them that I mentioned above? If so once the script finishes I'll do a run to change Min Nan -> Hokkien in translations along with nan -> nan-hbl, and re-sort.
What about occurrences of Min Dong etc. in {{lb}}, {{tlb}}, {{zh-forms}}, {{q}} (occurring mostly in Synonyms sections), etc.? Do these need to be renamed? On rough count, there are 1,318 occurrences of Min Dong in {{lb}}, 279 in {{q}}, 48 in {{zh-forms}} and 20 in {{tlb}}. Counts for Min Bei are roughly similar, while there are only a few instances of Min Zhong and Puxian (without "Min").

Benwing2 (talk) 02:50, 14 March 2024 (UTC)[reply]

@Benwing2 Thanks.

Yes.
Yes. For things like labels etc., "Min Nan" should be changed to "Southern Min".

Theknightwho (talk) 04:12, 14 March 2024 (UTC)[reply]

@Theknightwho OK sounds good. #1 is running now. Benwing2 (talk) 04:14, 14 March 2024 (UTC)[reply]

@Theknightwho What about things like "[Cc]oastal Min" as occurs in {{zh-forms}} in 唐人 and in {{lb}} in 牛母? (I guess these need manual editing, as it appears Coastal Min can be any of Eastern, Southern or Puxian.) Benwing2 (talk) 04:19, 14 March 2024 (UTC)[reply]

See also coastal|_|Min in 儂. Benwing2 (talk) 04:20, 14 March 2024 (UTC)[reply]

@Theknightwho: Not sure if this is useful but there are 203 occurrences of from=Min in the Mar 1 dump, which generally occur in {{surname}}:

Page 27803 Cu: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin
Page 31307 Lao: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 54700 Dee: Found match for regex: # {{surname|tl|Chinese Filipino|from=Min Nan}}, most notably borne by:
Page 68861 Kong: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 71443 Juan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 80226 Chan: Found match for regex: # {{surname|tl|from=Min Nan}} (Hokkien) of Chinese origin
Page 80245 Chi: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 80288 Co: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 80532 Du: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, mostly around [[Cebu]]
Page 80539 Dy: Found match for regex: # {{surname|tl|Chinese Filipino|from=Min Nan}}
Page 80915 Go: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 81022 Haw: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 81061 Ho: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}} of Chinese origin, most notably borne by:
Page 81318 King: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 81334 Ko: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 81420 Lee: Found match for regex: # {{surname|tl|Chinese Filipino|from=Min Nan}}
Page 81515 Lu: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 81516 Lua: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin
Page 81890 Ng: Found match for regex: # {{surname|tl|{{w|Chinese Filipino}}|from=Min Nan}}
Page 82214 Po: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82353 Que: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82618 See: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82665 Shaw: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82674 Sia: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82690 Sin: Found match for regex: # {{surname|tl|from=Min Nan}}, most associated with former Archbishop of Manila, {{w|Jaime Sin}}
Page 82735 So: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, most notably borne by:
Page 82750 Son: Found match for regex: # {{surname|tl|from=Min Nan|Filipino-Chinese}} of Hokkien origin
Page 82890 Sy: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82930 Tan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82931 Tang: Found match for regex: # {{surname|en|Chinese|from=Min Nan}}.
Page 82949 Te: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 82956 Tee: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 83037 To: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 83141 Ty: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 83141 Ty: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 83396 Yap: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 83409 Young: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 83409 Young: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 121853 Tiu: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, most notably borne by:
Page 196098 Samson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 766971 Lew: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 825754 Anson: Found match for regex: # {{given name|tl|male|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 825754 Anson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 1066196 Chu: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1178407 Yao: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265062 Lim: Found match for regex: # {{surname|ilo|from=Min Nan}}
Page 1265062 Lim: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265654 Cheng: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265730 Ang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265732 Ong: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265733 Suan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1265734 Cua: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1266900 Pua: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1266901 Uy: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1266918 Chua: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1266924 Khoo: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin
Page 1266970 Ching: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin or {{surname|tl|from=Cantonese}} of Cantonese Chinese origin, notably borne by:
Page 1277675 Gan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1284142 Koa: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 1443955 Nga: Found match for regex: # {{surname|en|from=Min Dong}}.
Page 1579807 Kang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 2178085 Deang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 2625641 Wee: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 2700666 Tin: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 2845428 Henson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 3305014 Yang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 3750292 Lo: Found match for regex: # {{surname|tl|from=Min Nan|Filipino-Chinese}} of Hokkien origin
Page 4170429 Chung: Found match for regex: # {{surname|tl|from=Cantonese}}, or {{surname|tl|from=Min Nan}} (Hokkien) of Chinese origin.
Page 4713793 Coo: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin, or {{surname|tl|from=Cantonese}} of Cantonese Chinese origin.
Page 5112069 Sanson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5152613 Kho: Found match for regex: # {{surname|tl|Malaysia, Singapore, Indonesia, Philippines, Thailand, Vietnam-Chinese|from=Min Nan}}, most notably borne by:
Page 5152613 Kho: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, most notably borne by:
Page 5159150 Kua: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5171997 Yee: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 5375208 Yu: Found match for regex: # {{surname|ceb|Filipino-Chinese|from=Min Nan}}, the 26th most common in the Philippines
Page 5375208 Yu: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, the 26th most common in the Philippines
Page 5404772 Ngo: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5406204 Chong: Found match for regex: # {{surname|tl|from=Cantonese}} of Cantonese Chinese origin, or {{surname|tl|from=Min Nan}} of Hokkien Chinese origin.
Page 5406528 Tong: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5406530 Chiu: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, most notably borne by:
Page 5410833 Leong: Found match for regex: # {{surname|tl|from=Cantonese}} of Cantonese Chinese origin or {{surname|tl|from=Min Nan}} of Hokkien Chinese origin.
Page 5411779 Pang: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5413143 Ison: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5415076 Dizon: Found match for regex: # {{surname|pam|from=Min Nan}} of Chinese origin, notably borne by:
Page 5415076 Dizon: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, notably borne by:
Page 5435565 Yung: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5437599 Shao: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5437924 Loo: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5438022 Sison: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5438022 Sison: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry, notably borne by:
Page 5438104 Hau: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5438288 Tian: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5439278 Teng: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}} of [[Hokkien]] origin
Page 5442404 Ting: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5453194 Tien: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5512124 Tuazon: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5512124 Tuazon: Found match for regex: # {{surname|pam|from=Min Nan}}
Page 5512124 Tuazon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5514761 Goh: Found match for regex: # {{cln|en|surnames from Chinese}} {{surname|en|Chinese|from=Min Nan}}.
Page 5514761 Goh: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5538352 Niu: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5538352 Niu: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5543775 Quiambao: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5543775 Quiambao: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin, most notably borne by:
Page 5558677 Lacson: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 5558677 Lacson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry, most notably borne by:
Page 5582383 Tecson: Found match for regex: # {{surname|en|from=Min Nan}} ''[[Hokkien]] Chinese'', common among Filipinos of Chinese descent.
Page 5582383 Tecson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin, most notably descendants of ‘Tek Sun’ brothers from Guangzhou (Canton), China
Page 5584134 Layson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5586737 Cinco: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5586737 Cinco: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5614689 Soon: Found match for regex: # {{surname|tl|from=Min Nan|Filipino-Chinese}} of Hokkien origin
Page 5618852 Singson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5636472 Gozon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5646811 Gotamco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5652715 Cayco: Found match for regex: # {{surname|tl|from=Min Nan|Filipino-Chinese}}
Page 5652718 Syson: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5652722 Layco: Found match for regex: # {{surname|tl|Tagalog|from=Min Nan}}
Page 5653661 Tengco: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}} of Hokkien origin
Page 5655949 Yuzon: Found match for regex: # {{surname|ceb|Filipino-Chinese|from=Min Nan}}
Page 5655949 Yuzon: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5656631 Tiongson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 5656647 Cojuangco: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}, borne by a known political and business clan in the Philippines
Page 5671242 Jocson: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5673469 Tiangco: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5674047 Quisumbing: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5674054 Lichauco: Found match for regex: # {{surname|tl|Filipino-Chinese|from=Min Nan}}
Page 5676213 Locsin: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, most notably borne by:
Page 5677430 Quizon: Found match for regex: # {{surname|pam|from=Min Nan}}
Page 5677430 Quizon: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin, most associated with [[w:Dolphy|Dolphy]], which bears the real name of Rudolf Quizon
Page 5677431 Quimpo: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 5677431 Quimpo: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5678951 Tangco: Found match for regex: # {{surname|tl|from=Min Nan}} or Hokkien origin
Page 5678980 Tiongco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5678984 Guanzon: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5678991 Hizon: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry, most notably descendants of migrants from [[Macau]] to {{w|Parián}}, {{w|Mexico, Pampanga|Mexico}}, {{w|Pampanga}}
Page 5684485 Tiamson: Found match for regex: # {{surname|tl|from=Min Nan}} or Hokkien origin
Page 5686268 Tuason: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin, {{alt form|tl|Tuazon|nocap=1}}
Page 5686671 Tio: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5687329 Ganzon: Found match for regex: # {{surname|tl|from=Min Nan}} or Hokkien origin
Page 5689830 Pecson: Found match for regex: # {{surname|pam|from=Min Nan}}
Page 5689830 Pecson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5690622 Siason: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5690623 Tiozon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5691453 Unson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin common among Filipinos of Chinese ancestry
Page 5692143 Cuizon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5692145 Suico: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5693840 Quimson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5694341 Tancinco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5696938 Ongkiko: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5696941 Sioson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700562 Bauzon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700580 Yatco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700589 Gancayco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700604 Limjoco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700656 Coquia: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700659 Dijamco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700712 Ticzon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5700939 Cosico: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5701342 Yuvienco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5701354 Sangco: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5738755 Ayson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 5740882 Songco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5764989 Leyson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5769732 Kiamzon: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5769773 Sayson: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 5773490 Sanciangko: Found match for regex: # {{surname|ceb|Filipino-Chinese|from=Min Nan}}
Page 5773649 Guico: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5773673 Tanchoco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5773685 Siongco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5788737 Tayson: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5788738 Limcaoco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5885208 Joson: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5889986 Tanseco: Found match for regex: # {{surname|tl|from=Min Nan}}
Page 5906982 Siao: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 5906982 Siao: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 5983082 Yongco: Found match for regex: # {{surname|ceb|from=Min Nan}} of Chinese origin
Page 5983082 Yongco: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6060762 Pacquiao: Found match for regex: # {{surname|ceb|from=Min Nan|xlit=Pacquiao}}
Page 6601914 Caw: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6601919 Pueson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 6601923 Causon: Found match for regex: # {{surname|tl|from=Min Nan}} common with Filipinos with Chinese ancestry
Page 6601938 Quitson: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 6601988 Auyong: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6601989 Awyoung: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6603830 Syaw: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6603831 Shau: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6603884 Hwan: Found match for regex: # {{surname|tl|from=Min Nan}} of Chinese origin
Page 6603960 Liong: Found match for regex: # {{surname|tl|from=Cantonese}} of Cantonese Chinese origin, or {{surname|tl|from=Min Nan}} of Hokkien Chinese origin.
Page 6603976 Mapua: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin, notably borne by:
Page 6638858 Banzon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien Chinese origin
Page 7439359 Teh: Found match for regex: # {{surname|en|from=Min Nan}}.
Page 7782052 Sitchon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 7782063 Itchon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 7849686 Tiong: Found match for regex: # {{surname|en|from=Min Dong}}.
Page 7849688 Diong: Found match for regex: # {{surname|en|from=Min Dong}}.
Page 7924413 Ngeh: Found match for regex: # {{surname|en|from=Min Dong}}.
Page 8003694 Canoy: Found match for regex: # {{surname|tl|from=Min Nan}} common among Filipinos of Chinese ancestry
Page 8060607 Gueco: Found match for regex: # {{surname|pam|from=Min Nan 慧哥}}
Page 8343774 Siocson: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 8343781 Bengzon: Found match for regex: # {{surname|tl|from=Min Nan}} of Hokkien origin
Page 9058034 Quiason: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry
Page 9058035 Quiazon: Found match for regex: # {{surname|tl|from=Min Nan}} common on Filipinos of Chinese ancestry

As can be seen, these are almost all Min Nan, almost all Tagalog and some of them explicitly say "of Hokkien origin". Are these all Hokkien? If so I'll change them accordingly. Benwing2 (talk) 04:29, 14 March 2024 (UTC)[reply]

@Benwing2 Thanks for this re the surnames. The whole "of X origin" thing is totally superfluous imo, so should be deleted. If it explicitly says Hokkien somewhere then change it to that; it might also be possible to infer it from the etymology section, too. Any remaining ones should be left to manual review. Theknightwho (talk) 04:33, 14 March 2024 (UTC)[reply]

@Theknightwho All right, I'll do this. BTW some of them are already fixed; I randomly picked Siocson and User:Mlgc1998 fixed it 3 days ago. Benwing2 (talk) 04:36, 14 March 2024 (UTC)[reply]

@Benwing2 It's probably fine to keep Coastal Min in {{zh-forms}}. We should probably have proper categories set up for it, which categories like Category:Southern Min Chinese would be part of.

There's a whole issue with labels in Chinese entries causing a ton of duplication between the label categories and the lemma categories, but we've not come up with a satisfactory solution to it yet. Theknightwho (talk) 04:30, 14 March 2024 (UTC)[reply]

@Theknightwho Yeah, IMO things like Category:Hokkien Chinese should go away in favor of Category:Hokkien lemmas now that we have the latter. {{lb}} could be made to generate the latter category in place of the former but it doesn't seem like such a good idea as it wouldn't categorize correctly into the other categories. Benwing2 (talk) 04:34, 14 March 2024 (UTC)[reply]

Also IMO all label categories that refer to specific lects should have corresponding lang codes, either full or etym-only, and probably the etym-only categories added by the Pronunciation section instead of the {{lb}}. Note also that User:-sche proposed awhile ago renaming "etym-only language" to something else, which IMO is a good idea; they have gone far beyond being used only for etymologies. Benwing2 (talk) 04:39, 14 March 2024 (UTC)[reply]

Yeah, agreed. It's probably worth starting a thread on the BP about renaming etym-only languages, as the current name is really misleading. Theknightwho (talk) 04:50, 14 March 2024 (UTC)[reply]

Done. BTW it looks like "Min Nan" was already removed from all Tagalog etc. surnames; the only remaining instances of "from=Min" occurred in a few English surnames of Min Dong origin. I cleaned them up and removed the text "of Chinese origin" etc. following various {{surname}} invocations. The script to implement #2 above (correct "Min Dong", "Min Bei" etc. in labels/qualifiers/etc.) is running. Benwing2 (talk) 07:05, 14 March 2024 (UTC)[reply]

Task #2 is close to done; going to sleep now. There are still 6,406 occurrences of "Min Nan" in qualifiers, which my script didn't touch. The occurrences can be found here: User:Benwing2/qualifier-min-nan-1 and User:Benwing2/qualifier-min-nan-2 (split over two files because otherwise the files supposedly exceed the 2MB size; in fact the total file size is 1.2MB but there's that stupid doubler effect). Some of the qualifiers occur in Reference sections but the vast majority seem to occur in Synonyms and Antonyms sections. I am guessing again that the majority are Hokkien but I'm not sure, and generally the transliterations aren't attached. Here we might have to fall back on looking up the terms in question to see which lects they are listed as occurring in (which should be bottable, if you provide appropriate instructions). Benwing2 (talk) 08:31, 14 March 2024 (UTC)[reply]

@Theknightwho Let me know if you need help with any other renaming tasks that can be done or sped up by bot. I notice you're going through and renaming instances of "Min *" in comments, {{rfp}} params and other random places but there may be too many to do by hand. There were 17,750 pages satisfying the regex (Min Bei|Min Dong|Min Zhong|Puxian|Min Nan) as of the Mar 1 dump, and 12,222 remaining when I re-downloaded the same pages last night before running task #2. Task #2 changed 6,245 pages, meaning there might be on the order of 6,000 pages left, although I can check for sure by re-downloading the same pages. As I mentioned above, most of the occurrences are probably Min Nan occurring in qualifiers because my script didn't change them. Benwing2 (talk) 22:51, 14 March 2024 (UTC)[reply]

@Benwing2 Thanks. Yeah, I was just going through and renaming the various "Min Bei" and "Min Dong" labels, but noticed that "Min Nan" is used on thousands of pages. It's annoying, as it's the one where "Hokkien" is sometimes a more appropriate label. That being said, it's not wrong to put "Southern Min", so it would probably be helpful to change those automatically. Theknightwho (talk) 23:04, 14 March 2024 (UTC)[reply]

@Theknightwho See my comment above from last night. It's probably possible to figure out how to change Min Nan automatically to the right label by looking up the page in question to see what lects are listed on the page. If you want me to work on that I can although I'd need some instructions as to what lects to look out for. Benwing2 (talk) 23:10, 14 March 2024 (UTC)[reply]

@Benwing2 Yes please - @Justinrleung might be able to give better pointers than me. Theknightwho (talk) 23:12, 14 March 2024 (UTC)[reply]

@Theknightwho OK, I re-downloaded the relevant pages. There are 7,396 pages remaining satisfying the regex (Min Bei|Min Dong|Min Zhong|Puxian|Min Nan). Of these, 7,128 mention Min Nan; 39 mention Min Bei; 59 mention Min Dong; 22 mention Min Zhong; and 211 mention Puxian but only 15 of those mention Puxian using the regex Puxian($|.$|[^ ]| [^M]), which excludes "Puxian Min". There are 8,195 total lines mentioning of Min Nan (since some pages mention Min Nan more than once). Of these lines, 6,761 contain a qualifier and 6,593 specifically satisfy the regex {q.*{zh-l, i.e. a qualifier followed by a Chinese-style link. Of the 1,605 lines not satisfying {q.*{zh-l, 45 match {q.*{l (a qualifier with a generic link); 111 contain {{thcwd}} or a variant ({{thcwda}}, {{thcwdq}}), almost all preceded by a Min Nan qualifier; 227 contain Min Nan inside of {{zh-forms}}; 21 contain Min Nan inside of {{zh-see}}; 105 contain Min nan inside of {{zh-der}}, {{col3}} or a variant; and 24 contain an occurrence of {{desc}}. Excluding all of these leaves 1,063 occurrences over 412 pages, of which 260 are outside of mainspace. So I think it should be possible to create a script to handle the {q.*{zh-l occurrences, and handle the remainder type-by-type in a semi-manual fashion. Benwing2 (talk) 00:02, 15 March 2024 (UTC)[reply]

@Benwing2 Sounds like a good plan. Thanks for doing this. Theknightwho (talk) 00:03, 15 March 2024 (UTC)[reply]

@Theknightwho FYI I also did a download run of those same pages checking for those now containing "Southern Min". There are 5,119 lines over 4,377 pages mentioning Southern Min, mostly in labels (as expected) but occasionally in other places that could stand to be reviewed. Benwing2 (talk) 00:05, 15 March 2024 (UTC)[reply]

@Theknightwho OK. Can you help me sketch out a general idea of what the qualifiers should be transformed into? For example, I randomly picked page 4445 天涯海角, which contains a synonym 天邊海角 labeled "Min Nan". This latter page has a label Hokkien and it also has {{zh-pron|mn=ml,jj,tw:thiⁿ-piⁿ-hái-kak|cat=cy}}. According to the documentation of {{zh-pron}}, mn means Hokkien and the codes inside mean ml="Mainland China (Xiamen, Quanzhou, Zhangzhou)", jj="Jinjiang", tw="mainstream Taiwan", for which a pronunciation is given. How much info do we want in the qualifiers? Is just "Hokkien" enough in this situation? In general, what lects should be specified in the qualifiers? Maybe just Hokkien, Teochew, Leizhou? Possibly also Quanzhou and/or Zhangzhou dialect if pronun is given for these dialects? This is where I need a bit of guidance from someone like you who knows the languages in question. Benwing2 (talk) 00:24, 15 March 2024 (UTC)[reply]

@Benwing2 I'd wait for Justin to comment, as I think you're really overestimating my knowledge. I've got a very broad understanding of what needs to be done, but my understanding of Module:nan-pron is relatively low, so I won't be much help in interpreting the input. Theknightwho (talk) 00:30, 15 March 2024 (UTC)[reply]

@Theknightwho OK. I had assumed you know the languages because you seem able to correctly split the lects; maybe you're just a fast learner ;) ... Benwing2 (talk) 00:40, 15 March 2024 (UTC)[reply]

@Benwing2: I think for qualifiers of synonyms, etc., it can just be

"Hokkien" when there's only a Hokkien pronunciation, "Teochew" when there's only a Teochew pronunciation, etc., and we don't need to worry about the finer distinctions, which we will get with {{lb}} at the entry. If it's more than one Southern Min variety, we could either use the Southern Min label or list all the relevant Southern Min languages; I don't have a strong feeling about either way. — justin(r)leung _{{ (t...) | c=› }} 01:38, 15 March 2024 (UTC)[reply]

@Justinrleung All right. What is the complete list of Southern Min varieties? Benwing2 (talk) 01:39, 15 March 2024 (UTC)[reply]

The currently supported varieties in {{zh-pron}} are Hokkien, Teochew and Leizhou Min. Other than these, there's Hainanese as well as other varieties that haven't be dealt with (WT:RFM#Additional Southern Min languages). — justin(r)leung _{{ (t...) | c=› }} 01:46, 15 March 2024 (UTC)[reply]

┌────────────────────────────────────────────────────────────────────────────────────────────────────┘ @Justinrleung, Theknightwho: I finished the script to convert Min Nan and Southern Min in qualifiers in Synonym/Antonym sections (and the like; whenever followed by a {{zh-l}} link). Out of 6,283 pages where it tried to do something, it was able to process 5,938, which is a pretty good record (94.5%). The breakdown of lects generated is as follows:

5418 Hokkien
 485 Hokkien|Teochew
  16 Hokkien|Teochew|Leizhou
  10 Hokkien|Leizhou
   9 Teochew

The script issued 663 warnings. They are here: User:Benwing2/min-nan-qualifier-conversion-warnings. One of you two might want to go through them. Note that 268 "may be ignorable" (meaning that the script was able to continue on and ultimately do something, despite the warning). Of the remaining 395, 276 are due to the link referring to a nonexistent page; you'd need domain knowledge to know which lect(s) are appropriate. This leaves 119, of which 50 are "Couldn't parse" errors (the line wasn't formatted in a standard fashion); 35 are "Couldn't find 'Min Nan' or 'Southern Min' qualifier" errors (the qualifier template says something like {{q|literary or Min Nan, Hakka}} or {{q|Cantonese, Min Nan}} rather than just "Min Nan"); 22 are "Saw multiple Etymology/Pronunciation sections" (in such a case, the code tries hard to figure out the correct lects, including using the gloss in the {{zh-l}} link and making sure there is more than one Etymology/Pronunciation section that refers to Min Nan and that the two sections have different lects in them); 5 are "Can't find Chinese section"; and 7 are some random misc stuff. I am going to run the script in save mode either tonight or tomorrow. Benwing2 (talk) 07:52, 15 March 2024 (UTC)[reply]

@Theknightwho This is running; maybe 1 to 1.5 hours and it will finish. Benwing2 (talk) 20:43, 15 March 2024 (UTC)[reply]

Cool - thanks. Theknightwho (talk) 20:47, 15 March 2024 (UTC)[reply]

BTW can {{zh-l}} be replaced by {{l|zh}}? I'm not sure any more what the Chinese-specific behavior in {{zh-l}} is. Maybe it's just automatic handling of traditional vs. simplified forms? Benwing2 (talk) 20:47, 15 March 2024 (UTC)[reply]

@Theknightwho Also maybe we can have the lect be specified using a lang code prefix instead of having it a separate qualifier. Benwing2 (talk) 20:48, 15 March 2024 (UTC)[reply]

@Benwing2 On that point, would it be possible to do a similar analysis for all uses of the nan code used in the Thesaurus namespace? There are 483 uses at the moment, but conversion is slow as it requires a bunch of manual analysis. Some of them also have "Min Nan" in qualifiers, which will need revising as well. Theknightwho (talk) 20:52, 15 March 2024 (UTC)[reply]

@Theknightwho OK, I'll take a look. Benwing2 (talk) 20:54, 15 March 2024 (UTC)[reply]

@Theknightwho @Justinrleung For this purpose I think we (a) need to add the missing etym-only codes for Min Nan lects, and (b) we should include the specific lect and not just "Hokkien" in the lang prefix or qualifier. For example, I took a look at Thesaurus:打耳光 meaning "to slap someone in the face"; there are three synonyms labeled nan as well as two more explicitly labeled Zhangzhou Hokkien and Tainan Hokkien respectively. Of the three labeled nan, one is a red link, one is labeled Xiamen Hokkien and one is labeled "Quanzhou, Zhangzhou and Taiwanese Hokkien". Labeling the latter two just "Hokkien" would seem incomplete. Benwing2 (talk) 21:09, 15 March 2024 (UTC)[reply]

@Benwing2 The principle I've followed so far has been to use the most specific label which adequately covers everything at the target, where that's possible. So anything that's labelled (e.g.) "Xiamen Hokkien" would get the langcode nan-xmn, but something labelled "Quanzhou, Zhangzhou and Taiwanese Hokkien" would just get nan-hbl. I agree with Justin that the labels for links aren't as important as those on the entries themselves, so incompleteness isn't the end of the world. When multiple lects are mentioned (e.g. Hokkien and Teochew), I've ditched the langcode altogether and put (e.g.) "Southern Min" as a qualifier. Theknightwho (talk) 21:12, 15 March 2024 (UTC)[reply]

Also, as an aside, we don't currently have an etym-only langcode for Taiwanese Hokkien, because it's not a well-defined lect in the way varieties like Xiamen, Zhangzhou and Quanzhou are; all three are spoken on Taiwan, but (for historical reasons) the Hokkien-speaking communities on Taiwan have undergone a lot more influence from Japanese and English than their equivalents on the mainland, so it makes sense to use that label sometimes. In those cases, just labelling them "Hokkien" isn't really a problem if it's just in the thesaurus entry. Theknightwho (talk) 21:20, 15 March 2024 (UTC)[reply]

@Theknightwho All right, let me look at a few more examples. While we're at it, what do you think of replacing the etym-only codes for the Hokkien varieties with ones conforming to the principles I laid out in WT:RFM? Since these codes are newly added I suspect they're barely used. This would mean nan-jnj -> nan-jin (Jinjiang Hokkien), nan-qzh -> nan-qua (Quanzhou Hokkien), nan-xmn -> nan-xia (Xiamen Hokkien), nan-zzh -> nan-zha (Zhangzhou Hokkien), nan-plp -> nan-qua-PH (probably) or nan-PH (possibly) or nan-phi (perhaps) (Philippine Hokkien). Benwing2 (talk) 21:44, 15 March 2024 (UTC)[reply]

@Benwing2 I don't mind too much. I have a small preference for doing it syllabically rather than by the first letters of the name, but I don't mind if you want to use a standardised format for them.

There are sometimes instances where we won't be able to follow it, though (e.g. Category:South Dravidian I languages and Category:South Dravidian II languages, where I opted for dra-sdo and dra-sdt, respectively). Theknightwho (talk) 21:48, 15 March 2024 (UTC)[reply]

@Theknightwho Yes, understood. BTW I wouldn't have an issue with something more syllabic than using the first three letters, it's just that it's not so easy to guess automatically what the right set of letters to use is in that case. (Actually the principle you followed for South Dravidian I/II *is* consistent with the principles I laid out, which call for using the initials of the lect when using the first three letters isn't practical.) Benwing2 (talk) 21:53, 15 March 2024 (UTC)[reply]

@Theknightwho I changed the language codes. I used nan-hbl-PH for Philippine Hokkien. I think we can go ahead and use nan-hbl-TW for Taiwanese Hokkien, and create subvariety codes for the specific dialects that are derived respectively from Xiamen, Zhangzhou and Quanzhou (e.g. nan-xia-TW etc.). I also modified Module:columns so that it can take a comma-separated list of prefixed lang codes, e.g. nan-hbl,hak:[[毋]][[知]] and handle them appropriately (i.e. using the first one to create the term link but displaying all of them as qualifiers). I'm going to work on fixing up the Thesaurus entries now. Benwing2 (talk) 23:29, 16 March 2024 (UTC)[reply]

@Benwing2: I think in most cases, specific dialects of Taiwanese Hokkien should not be tied back to the source varieties of Quanzhou and Zhangzhou (and maybe Xiamen, which is itself generally thought of as a Quanzhou-Zhangzhou mixed variety). These kinds of labels are generally not helpful lexicographically; they are only well-defined phonologically and have small bearing on vocabulary, where much more convergence has occurred in Taiwan due to dialect levelling. The locales in Taiwan (e.g., Lukang, Yilan, etc.) for subdialects of Taiwanese that are less mixed may be more helpful in cases where we want to highlight them. — justin(r)leung _{{ (t...) | c=› }} 02:55, 17 March 2024 (UTC)[reply]

@Justinrleung OK, this is fine and it jives well with the nan-hbl-TW label. I was just responding the User:Theknightwho's assertion that Taiwanese Hokkien isn't a well-defined lect. Benwing2 (talk) 03:21, 17 March 2024 (UTC)[reply]

@Theknightwho Code is written to process Thesaurus entries and convert nan as appropriate. I will finish the analysis tomorrow and run it. Benwing2 (talk) 09:32, 17 March 2024 (UTC)[reply]

@Theknightwho I expanded the script I wrote so it also attempts to convert lects mentioned in <qq:...> qualifiers into lect code prefixes. (This is the origin of that "part 1" section in WT:RFM.) These should not change the qualifier output much (possibly in some cases rearranging the order, that's it) but will help with transliteration and such. Some stats on what I have so far:

I ran it on the 2,013 pages in CAT:Chinese thesaurus entries. It would change 620 pages.
It issues 328 warnings. Of these:
1. 255 of these are due to unrecognized lects in qualifiers. All of these are already discussed in the "part 1" WT:RFM section.
2. Of the remaining 73, 40 are due to looking up a page tagged as nan: and finding it doesn't exist.
3. Of the remaining 33, 14 are "informational" warnings that can be ignored.
4. Of the remaining 19, 15 are due to finding multiple etymologies with different sets of Southern Min varieties in the different etymologies.

Benwing2 (talk) 05:04, 18 March 2024 (UTC)[reply]

@Theknightwho Scratch the above stats. My script needs some changes to not overgenerate in the presence of multiple definitions (it already handles multiple etymology/pronunciation sections but needs to be extended for multiple definitions, because sometimes specific labels apply only to specific definitions). Benwing2 (talk) 05:24, 18 March 2024 (UTC)[reply]

OK, I rewrote the script to take into account the presence of multiple definitions and try to use the glosses present in Thesaurus pages to whittle down the set of possible definitions to use. The first pass doing that increased warnings from 328 to 1,344 (!) and reduced the number of pages changed from 620 to 490, but I think I can do a whole lot better than that. Stay tuned. Benwing2 (talk) 07:07, 18 March 2024 (UTC)[reply]

Theknightwho With some changes I brought the warnings down to 498 and increased the pages changed up to 624. I just ran the script. There are now only 52 pages remaining in the Thesaurus namespace with nan links. The warnings generated are here: User:Benwing2/zh-thesaurus-conversion-warnings, minus the warnings about "Saw unhandled lect qualifier", which aren't very important. (For reference, the first four such warnings are as follows:

Page 4 Thesaurus:一會: WARNING: Saw unhandled lect qualifier Anxi Hokkien (term [[一孔久]]): <qq:Anxi Hokkien>
Page 46 Thesaurus:中飽: WARNING: Saw unhandled lect qualifier Taiwan (term [[歪哥]]): <qq:Taiwan>
Page 56 Thesaurus:亂說: WARNING: Saw unhandled lect qualifier Internet slang (term [[口胡]]): <qq:neologism, Internet slang>
Page 61 Thesaurus:互聯網: WARNING: Saw unhandled lect qualifier Mainland China (term [[網絡]]): <qq:Mainland China, Hong Kong, Macau>

) Of the 245 warnings in that file (covering 144 pages), only 67 of them actually concern being unable to convert the nan code (or occasionally the Min Nan qualifier) to something more specific. I'd focus on those. A couple of such warnings are given here for reference:

Page 26 Thesaurus:不料: WARNING: Unable to convert 'nan' to correct lang code (reason: Found synonym/antonym [[無疑悟]] (template {{col3|zh|不料|孰料|詎料|豈料|yue:點知|想不到|不意<qq:formal>|不虞<qq:literary>|nan:無疑|怎知|哪知|未料|怎料|nan:無疑悟|沒想到|不謂<qq:literary>|不圖<qq:literary>}}, glossed as 'unexpectedly') but page doesn't exist)
Page 53 Thesaurus:乞討: WARNING: Unable to convert 'nan' to correct lang code (reason: Saw multiple definitions with different Southern Min types for synonym/antonym [[分]] (template {{col3|zh|乞討|討乞|行乞|zhx-tai:乞米|nan:討食<t:to beg for food>|nan:分}}, glossed as 'to beg (ask for food or money as charity)'): defn '# to [[divide]]; to [[separate]]' has Hokkien,Teochew while defn '# {{lb|zh|Hakka|Teochew|Hainanese}} to [[give]]' has Hokkien,Teochew,Hainanese; skipping)

Benwing2 (talk) 04:36, 19 March 2024 (UTC)[reply]

Generally, {{zh-l}} should be replaced (especially if it's giving a Hokkien pronunciation), but that's probably something to do en masse at another time, as there are tens of thousands of uses so we'll probably want to hash out a proper conversion method. Theknightwho (talk) 20:55, 15 March 2024 (UTC)[reply]

@Theknightwho Yes, agreed; just something to keep in mind. Benwing2 (talk) 20:56, 15 March 2024 (UTC)[reply]

Module:columns and Module:sa-verb, Module:sa-verb/data[edit]

There are 3 sanskrit entries in CAT:E because of an error in {{sa-conj}}, and I checked the entire transclusion list for ततान- your edit to Module:columns is the only recent change to executable code for anything in the list. Indeed, there are comments in Module:sa-verb, saying that code was copied from Module:columns and would need to be updated if that were changed. Chuck Entz (talk) 00:19, 17 March 2024 (UTC)[reply]

@Chuck Entz Thank you, I'll fix. I looked for modules using Module:columns but I forgot about the display_from entry point. Benwing2 (talk) 00:21, 17 March 2024 (UTC)[reply]

@Chuck Entz I don't think my change to Module:columns has anything to do with this error. User:Exarchus is actively working on Module:sa-verb/data and made the last change only an hour ago. User:Exarchus, can you take a look at these errors? They are due to a buggy Lua pattern. Benwing2 (talk) 00:35, 17 March 2024 (UTC)[reply]

I somehow read the dates wrong on those edits- I could have sworn they were from the same date as the ones to Module:sa-verb. You're no doubt right. Sorry! Chuck Entz (talk) 00:47, 17 March 2024 (UTC)[reply]

Replacement of quotation templates[edit]

Hi, I'd appreciate it if you could do the following bot replacements:

{{RQ:Fuller Bertram Cope}} → {{RQ:H. B. Fuller Bertram Cope}}
{{RQ:Fuller On the Stairs}} → {{RQ:H. B. Fuller On the Stairs}}
{{RQ:Livy Holland Romane Historie}} → {{RQ:Livy Holland Romane Historie|year=1659}} (the template has been updated to add the 1st edition, which is now the default).

Thank you. — Sgconlaw (talk) 16:00, 27 March 2024 (UTC)[reply]

@Sgconlaw Done. Benwing2 (talk) 02:08, 28 March 2024 (UTC)[reply]

Thanks! — Sgconlaw (talk) 04:18, 28 March 2024 (UTC)[reply]

`{{quote-song}}`[edit]

Subsequent to our discussion at Wiktionary:Grease_pit/2024/February regarding {{quote-song}}, would you mind making the appropriate edits to the module? RcAlex36 (talk) 07:05, 28 March 2024 (UTC)[reply]

Wu information origin[edit]

The recent update to Module:labels/data/lang/zh is very much appreciated. However, a lot of the information included seems poorly researched, with a lot of unnecessary/false information, etc. seemingly lifted straight from some enwikp entries, which may be problematic. What sources did you consult? If you need pointers regarding reading/an explanation of the zh primary sources feel free to let me know. Thanks — nd381 (talk) 12:31, 29 March 2024 (UTC)[reply]

@ND381 Apologies, I was using a combination of Chinese and English Wikipedia entries and Glottolog, which seem to generally agree with each other. I had assumed they were reliable since they generally agree with each other. I can't read primary sources in Chinese, though, except using Google Translate. Let me know what specifically seems wrong and feel free to correct and/or delete stuff. Benwing2 (talk) 18:46, 29 March 2024 (UTC)[reply]

I should add, I generally only created a label when there is a page in the Chinese or English Wikipedia (and hence a Wikidata item) for the particular lect, except in some cases of higher-level groupings. Benwing2 (talk) 19:23, 29 March 2024 (UTC)[reply]

Glottolog's family tree is known to have some mistakes and Wikipedia is, well Wikipedia. A few notes from a cursory look at Glottolog's family tree:

it uses Li Rong (1987)'s classifications, which are famously unreliable not just in Wu but nation-wide
It makes some pretty unorthodox naming choices and forgot to put Jinhua under Wuzhou
Its Northenr (Taihu) Wu is a pretty big mess and although I agree with some of their choices it is important to note that not everything there is accepted
- In particular, "Northern Zhejiang" (which I've seen more as "Southern N Wu", tautological as it may be) is one that is highly likely to be a valid branch, which contrasts with Glottolog's "Northwestern", "Su-Hu-Jia", and "Tiaoxi" branches, themselves also forming a Northern N Wu branch
- The "Northwestern Wu" branch is completely disproven as Piling has Southern Mandarinic (ie. Huai) influence whereas Hangzhounese has Northern Mandarinic influence

I (and wpi for Yue) have notified/fixed a lot of the mistakes already present, however, please consult us next time before making large scale, academically controversial changes to Chinese templates. Do you have a Discord or other means of instant messaging? I can send some English-language sources about Wu diachronics and classification so that you can make a more informed decision next time. Thanks — nd381 (talk) 00:17, 30 March 2024 (UTC)[reply]

@ND381 I am not on Discord currently; feel free to post the sources here. I did notice that Jinhua was not under Wuzhou in Glottolog, and in general I followed the names used in Wikipedia (English and/or Chinese) except for the reclassifying of Wuzhou Wu as Jinqu Wu, which seems not well-accepted. In terms of intermediate branches, if there is controversy about them, one fairly easy way to handle that is to flatten the trees, so that e.g. the Tiaoxi etc. branches go away. Overall what I have been trying to do for all primary branches is fill out the main missing labels, esp. those corresponding to labels already present in various entries, although I did add more labels for Wu than other branches. Before my changes, things seemed to be in a pretty haphazard state. The idea is that from labels we can create categories and then add the more important ones as etymology-only languages. Keep in mind that in general, information in a place like Module:labels/data/lang/zh can easily be changed as it's in a single location and not propagated across several entries; but indeed I will try to consult you guys in the future. Note also that because of these label changes there are now some uncreated categories in Special:WantedCategories, such as Category:Jinhua Wu, Category:Chuqu Wu, Category:Quzhou Wu and Category:Shaoxing Wu (and possibly more; the data in Special:WantedCategories is from a couple of days ago). I will be creating these categories using {{auto cat|dialect=1}} if that's OK with you; again, the information here is easily fixable if needed. Benwing2 (talk) 00:36, 30 March 2024 (UTC)[reply]

Please also note that I just made a change to the code that handles the language variety categories using {{auto cat|dialect=1}} so that the Wikipedia articles are automatically pulled out of the labels in places like Module:labels/data/lang/zh if not explicitly specified. See Category:Jiaoliao Mandarin for an example of where this does its thing. In general I am trying to consolidate the information on lects in fewer places, as it's currently scattered in at least five locations (labels data modules; language data modules such as Module:etymology languages/data; "dialect data" modules used for {{alt}}; category pages that use {{auto cat|dialect=1}}; dialect synonyms data modules such as Module:dialect_synonyms/zh). Benwing2 (talk) 00:47, 30 March 2024 (UTC)[reply]

I think what I'm also going to do is add a parent field to the label data so that the tree of lects can be indicated properly; this info is already specifiable in Module:etymology languages/data and {{auto cat|dialect=1}}. Benwing2 (talk) 01:02, 30 March 2024 (UTC)[reply]

I see, thank you. I am not against the creation of the category pages.

What we have in the labels page for the most part works now, though I would like to make the following changes:

Lishui and Pucheng (Fujian) Wu are to be separated; the original Fujian ∈ Lishui notation was only done because of the lack of Chuzhou.
"Northern Zhejiang Wu" and "Northwestern Wu" are to be removed as very few sources even mention let alone include them
Jinhuanese is to be included as a Jinqu variety
Taizhouic is to be renamed to Taizhou; "Taizhounese" in itself doesn't really exist as the urban centre of Taizhou is home to several varieties. There is nothing that is conflated with Taizhou in reality other than maybe Tàizhounese (Mandarin > Huai) in Jiangsu
Not implemented yet, but if necessary, Urban Shanghainese should be a subcategory of Shanghainese, which should in turn be a subcategory of Sujiahu

If I have any additional thoughts later I can inform you/edit the labels page. When the time comes.

My sources document is here [6] (a lot of the books are pirated) and is maintained and handled by many other users, including several here on Wiktionary. The Wu section is Section 1.6. Thanks — nd381 (talk) 03:12, 30 March 2024 (UTC)[reply]

@ND381 Wow that is a lot of information in that link! Thank you. I will make the suggested changes. A couple of questions, though:

If we rename "Taizhouic Wu" -> "Taizhou Wu", what should happen to the current "Taizhou Wu"? The reason I chose "Taizhouic" as the name is because of the existence of both the Taizhou Wu and Taizhou dialect articles, based on the name "Beijingic Mandarin" (which is found in Glottolog) corresponding to the primary Mandarin branch Beijing Mandarin (division of Mandarin) Mandarin as opposed to "Beijing Mandarin" corresponding to the dialect of Beijing itself, i.e. Beijing dialect. Should we use something like "Urban Taizhou Wu" corresponding to Wikipedia's Taizhou dialect?
"Jinqu" seems to be the more recent name of Wuzhou Wu. Should we get rid of Wuzhou Wu in favor of Jinqu Wu?

Benwing2 (talk) 03:25, 30 March 2024 (UTC)[reply]

Also do you have knowledge of Northern Min? According to Chinese Wikipedia, there are two primary branches called Dongxi (東溪) and Xixi (西溪) (although Glottolog does not have them, but groups all Northern Min varieties other than Shaojiang, which seems to not be Northern Min at all, under "Northwestern Min Bei"). If they are real, I am thinking maybe it would be better to call them something like "East Northern Min" and "West Northern Min". Does this make sense? I have similar notes in Module:labels/data/lang/zh about Eastern Min, where the primary branches Funing and Houguan should maybe instead be called North Eastern Min and South Eastern Min. Benwing2 (talk) 03:43, 30 March 2024 (UTC)[reply]

Regarding this, I do not study Min. I do not know. Ask one of the Minguists here instead. I would direct you to one but I'm not sure which of the people I know are here and which are on Discord, sorry — nd381 (talk) 09:01, 30 March 2024 (UTC)[reply]

1. Delete it. I saw your comment regarding "Beijingic Mandarin" already. The main problem here is that "Taizhounese" doesn't actually refer to something that "corresponds to the dialect of [urban] Taizhou itself", as that would usually be called Linhainese (臨海話), Huangyanese (黃巖話), Jiaojiangese (椒江話), etc., cf. Wugniu. What I have as "Urban Taizhounese" in the dump™ is just an helpful label for searching and is not meant to be used as an authoritative source in classification. I would recommend removing (Urban) Taizhou Wu and rename Taizhouic Wu to just Taizhou Wu. If an "urban Taizhou" label is to be desired, use Jiaojiang.

2. It's not that it's more recent and more just that the revised edition of Li's atlas (Li 2012; the one in the dump™ called 中國語言地圖集), which is still filled with blatantly false information, uses it. I personally use a Wuzhou-Chuzhou-Xinqu (Xin referring to Shangrao) split but you can really do whatever you want/leave it be since classifying these lects is still p contentious

3. Change "Fujian Wu" to Pucheng Wu. No conflict with Pucheng Min since Wu is specified; lects in Ningde and the Jinxiang isolate are Auish (ie. Wenzhou-related) and Northern respectively and would lead to more ambiguity.

Thanks — nd381 (talk) 08:59, 30 March 2024 (UTC)[reply]

@ND381 Hi. You caught me right as I'm going to bed but please take a look at the current state of the module. I already renamed Taizhouic to Taizhou and Taizhou to Urban Taizhou; I'll delete the latter. I removed Northern Zhejiang Wu and Northwestern Wu but left Wuzhou as-is, and added a Pucheng Wu node, removing Fujian Wu. BTW I am now going through and adding textual descriptions and parent label properties, which will be useful in centralizing the information currently found in the individual category pages; but none of this is in the production module yet, just on my own machine. Also see the Grease Pit post I made about centralizing/consolidating lect info. Benwing2 (talk) 09:06, 30 March 2024 (UTC)[reply]

Thank you. 谷拿脫 — nd381 (talk) 09:35, 30 March 2024 (UTC)[reply]

@ND381 I have added support for including descriptions and parent labels in the label data in Module:labels/data/lang/zh. I have converted the Lua comments to parent labels in most cases and added descriptions (using the |region= or sometimes |def= fields) for Mandarin and Northern Wu lects and some others. I am working on Southern Wu now but I may need a bit of help. In particular, I added labels for all the subgroups called 小片 (xiǎopiàn) in Chinese, which we define as "cluster" (see the box near the bottom of the w:zh:吴语 page for a diagram of all these clusters), but increasingly I think they shouldn't be defined. Enwiki generally doesn't include such intermediate divisions in its descriptions of individual dialects, and most of these "clusters" are red links, redirects or stubs in zhwiki. Tentatively I'm thinking of keeping the ones for Northern Wu (Piling, Tiaoxi, Sujiahu, etc.) and probably the ones for Chuqu Wu (Longqu, Chuzhou, Shangshan), and discarding the remainder. Thoughts?

Also, on a related subject, why is it that there is such extraordinary diversity in the Wu lects (esp. the southern ones) in such a small area, when Mandarin lects seem to vary only a little over vast areas? Is the terrain in Zhejiang such that movement is very difficult? Or was there some sort of recent calamity in Northern China that caused migration all over the place (and resulting dialect mixing)? Benwing2 (talk) 06:06, 1 April 2024 (UTC)[reply]

mountain = dividing + no wide-scale areal effects (Mandarinic is not a phylogenetic group)

as for xiaopian, honestly if you want you don't need to add any since they're very contentious — nd381 (talk) 10:51, 1 April 2024 (UTC)[reply]

@ND381 OK thanks. I have left the xiaopians that I mentioned above (for Northern Wu and Chuqu Wu) and removed the others. I finished adding parents and region descriptions for Southern and Northern Wu lects to Module:labels/data/lang/zh, added all Wu categories that had at least one entry in them and fixed the existing Wu categories to read just {{auto cat|dialect=1}} (instead of having additional parameters to specify the parent, region, etc.), so that the parent and region description get picked up from the label data. There shouldn't be anything very controversial that I added; the descriptions are mostly just listing the area(s) where each lect is spoken per English and Chinese Wikipedia, although in some cases (e.g. Fuyang Wu, Hangzhounese, Jinxiang Wu, Old Guangde Wu, Sujiahu Wu, Urban Shanghainese Wu, Baizhang Wu, Changbei Wu, Jujiang Wu, Old Xuanzhou Wu, Pucheng Ou Wu, Pucheng Wu, and in general all the primary branches) I added text under the |addl= field describing the notable characteristics of the lect.

It occurs to me we will eventually probably need to split Wu into different languages, at least on the primary branch lines (Northern Wu, plus some number of Southern Wu branches); but I think we probably should wait to tackle that until we finish the Southern Min and Yue splits. Benwing2 (talk) 05:13, 2 April 2024 (UTC)[reply]

Gender-neutral adjectives in Module:es-headword[edit]

I noticed you added the option gneut for gender-neutral nouns in Spanish. Could you add the same option for adjectives?

For example, the headword-line for the adjective latine currently displays as "m or f", which is wrong, it should look the same as the headword-line for the noun. 26agcp (talk) 19:20, 30 March 2024 (UTC)[reply]

@26agcp I added this. You can use |gneut=1 on an adjective to indicate that it's gender-neutral, which I have done for latine. I'm not sure whether this will work correctly on adjectives not ending in -e, such as latinx or latin@ (if it's possible to use these as adjectives). If these are adjectives and |gneut=1 doesn't work right, let me know and I'll fix it. Benwing2 (talk) 06:15, 1 April 2024 (UTC)[reply]

Category:Chinese terms written in foreign scripts[edit]

Hi, I noticed that you've added functionality in Module:zh-pron to automatically add pages that do not contain any Chinese characters to the category. However this has caused the category to be flooded with POJ entries (which are romanisation entries and therefore shouldn't be there) and Zhuyin entries (which are not "foreign"). Can you see how this can be fixed? Or perhaps revert the changes for the time being. Much thanks.

PS: The POJ entries are there because of Module:zh-see which tries to call Module:zh-pron with |only_cat=. It's a total mess there which I don't want to talk about.

– wpi (talk) 13:56, 2 April 2024 (UTC)[reply]

@Wpi Thanks for letting me know. The Zhuyin entries should be fixable by changing the regex to exclude Zhuyin/Bopomofo characters. If the POJ entries are only there because they are calling Module:zh-pron with a specific flag, I can check for that flag. Let me see what I can do. Benwing2 (talk) 19:41, 2 April 2024 (UTC)[reply]

OK, this should be fixed. Benwing2 (talk) 20:07, 2 April 2024 (UTC)[reply]

Replacement of quotation template[edit]

Hi, when you are free could you please do the following bot runs?

{{RQ:Dryden Iliad}} → {{RQ:Homer Dryden Iliad}}
{{RQ:Mandela Long Walk to Freedom}} → {{RQ:Mandela Long Walk to Freedom|year=2010}} (the quotation template has been updated to add the 1st edition (1994), so current uses need to be updated)
{{RQ:Pope Iliad}} → {{RQ:Homer Pope Iliad}}
{{RQ:Selver RUR}} → {{RQ:Capek Selver RUR}}

Thank you. — Sgconlaw (talk) 22:52, 2 April 2024 (UTC)[reply]

@Sgconlaw Done. Benwing2 (talk) 04:13, 5 April 2024 (UTC)[reply]

Thanks! — Sgconlaw (talk) 04:36, 5 April 2024 (UTC)[reply]

Time-outs from change to Module:headword[edit]

Hi - I think your latest change to Module:headword is causing time-outs at some Written Oirat entries, like ᠨᡇᡇ᠍ᠷ. Theknightwho (talk) 03:38, 5 April 2024 (UTC)[reply]

@Theknightwho Yup, I just added a check to make sure this doesn't happen. I don't quite know why it's happening, something weird about the script being returned, but I limit the iterations to 10 now no matter what. Benwing2 (talk) 03:41, 5 April 2024 (UTC)[reply]

By the way[edit]

It seems that one thing that really slows things down is declaring functions inside other functions. Sometimes it's unavoidable, but there are plenty of instances where it's straightforward to move them out of the parent function; sometimes with extra parameters, if they needed to access any upvalues. This happens a lot with anonymous functions declared inside gsub, but Module:languages is currently a big offender, since all the methods get redeclared every time a language object is requested. Theknightwho (talk) 11:52, 5 April 2024 (UTC)[reply]

@Theknightwho Hmmm, interesting. Do you know if it's related to the size of the function (in which case we could move the contents of the larger functions in Module:languages outside of the object) or just the presence of the function? Benwing2 (talk) 18:29, 5 April 2024 (UTC)[reply]

@Benwing2 There seems to be an inherent cost for each closure, just like objects. The literal length of the function in bytes is a (very small) factor, but since it's only parsed once it makes no difference whether it's inside another function or not. Theknightwho (talk) 18:35, 5 April 2024 (UTC)[reply]

This is basically the memory-speed trade-off with Lua. Local objects/functions are cleared very quickly by the garbage collector (especially anonymous ones), but you need to spend extra time generating each one, and often it's just not worth it. Theknightwho (talk) 18:39, 5 April 2024 (UTC)[reply]

@Theknightwho Got it, in which case the only way to speed up Module:languages is to redo it without the use of an object, which would entail (AFAIK) a huge amount of rewriting of code that uses it. Possibly there is an in-between way, e.g. create an object-less version of Module:languages and then create an object that wraps it, and rewrite only the core modules (Module:links, Module:headword, Module:translations, ...?) to use the object-less version. But this is still a fair amount of work. Benwing2 (talk) 19:26, 5 April 2024 (UTC)[reply]

@Benwing2 I don’t think it’s as bad as that - it should be possible to move the function declarations out of the language-generating function, since they’re inherited via metamethods anyway. If I remember correctly, the reason for the current set-up is because I wanted to make it possible to grab language objects that use require instead of mw.loadData in contexts where speed is more important than memory-use (since mw.loadData adds a lot of overhead to data access times). I think the only module which uses that option is Module:family tree, since everything’s done in a single invocation via {{auto cat}}. Theknightwho (talk) 15:12, 6 April 2024 (UTC)[reply]

@Benwing2 I just implemented this, and this change alone sped things up by about 5% on very large pages. useRequire is now specified using a key and a dedicated lang:loadData(modname) method, but in all honesty it might not be necessary anymore: it was only ever implemented because mw.loadData makes memory usage worse with {{auto cat}}, because everything's contained within one invoke, and some proto-language pages were pushing the old 50MB limit due to the descendants trees. Theknightwho (talk) 23:16, 21 April 2024 (UTC)[reply]

@Theknightwho Hmm, are you saying the useRequire functionality might not be needed because it's only used in Module:family tree, and the 50MB limit no longer applies? If so, it might be reasonable to consider removing it at some point, but I would say leave it for the moment because it might be needed elsewhere. I notice for example that some pages that use {{auto cat|dialect=1}} are hitting 65MB or so of memory, and maybe could benefit from this. I think the high memory usage is because of the implementation that searches through all labels to find those that categorize into a particular category, since I've been consolidating the lect info into the labels modules instead of having them scattered in the {{auto cat}} calls themselves and duplicated in several other places. I added memoization of the calls to dialect_handler (which ends up being called multiple times due to the way that the poscatboiler code retrieves information on all parent categories in order to determine the breadcrumbs and the parents' parents etc.), which reduced the memory a bit and sped things up a lot. Maybe adding further memoization of the fetched labels data would reduce memory usage significantly and/or use mw.loadData (the labels data itself is already loaded using mw.loadData but it is then converted into containing structures by Module:labels/utilities, which maybe could be cached since it's all happening in a single {{auto cat}} invocation). Benwing2 (talk) 23:31, 21 April 2024 (UTC)[reply]

@Theknightwho BTW thanks for all the profiling work you're doing. This sort of work isn't really my strong point and something I don't really enjoy doing that much, so I am glad you are putting the time into doing it as it's quite necessary. Benwing2 (talk) 23:33, 21 April 2024 (UTC)[reply]

pcall and accessing nonexistent pages[edit]

I think I've worked out the reason why pcall(require, ...) is so slow when used with nonexistent modules: it's because nothing gets cached in package.loaded, so every time the module's requested it's forced to run the full loader, whereas retrieved modules simply use the cache on subsequent accesses. We could get around the issue by adding false to package.loaded after the first failure, which should speed things up. After doing various profiling tests, I'm pretty sure the issue isn't down to pcall itself. Theknightwho (talk) 06:23, 8 April 2024 (UTC)[reply]

@Theknightwho Interesting. This does make sense, and I wondered why I was seeing such slow pcalls (loading nonexistent modules) when you reported no issues with them. Benwing2 (talk) 06:36, 8 April 2024 (UTC)[reply]

@Benwing2 I've added safe_require to Module:utilities, which (1) checks if there's a cached value for the module in package.loaded and returns it if so, (2) runs pcall(require, modname), and (3) if the module doesn't exist, caches it as false. Two things of interest:

It's still about twice as fast as require even when handling already-cached modules, since it doesn't bother with all the assert safety checks and so on (close to 1 million iterations per second).
Nonexistent modules still don't work with require even after they've been cached, since require checks if p then instead of if p ~= nil then. I'll put in a Phabricator ticket about that, but they'll probably ignore it.

Theknightwho (talk) 07:43, 8 April 2024 (UTC)[reply]

Related to this, I've discovered that if you require a module with a return value of false, you get true haha. require seems to use true as a placeholder so that modules with no return value get cached, but the falsy existence check causes this bug. Theknightwho (talk) 08:58, 8 April 2024 (UTC)[reply]

Consolidating into Module:string utilities[edit]

Hiya - I've done a total rewrite of most of Module:string utilities, which I'll be introducing over the next few days (so that I don't run into issues changing everything at once). I've decided to reverse course on splitting out functions into their own modules, as I'm not convinced that it's actually very helpful, and it makes organising everything much more confusing.

At the same time, we've got a bunch of duplicate functions floating around (I think there are 4 version of pattern_escape), so it makes sense to consolidate everything. To that end, I think it makes sense to merge in most of the single-function modules, as well as some of the smaller satellite modules which are integral to string manipulation, like Module:pattern utilities, because so many of them are dependent on each other anyway. Theknightwho (talk) 18:22, 8 April 2024 (UTC)[reply]

@Theknightwho OK, makes sense! Benwing2 (talk) 18:25, 8 April 2024 (UTC)[reply]

@Benwing2 I've rolled out pretty much everything new - there are stll a few single-function modules I want to merge in, but at least the new code seems to be holding up well. By the way - I've renamed capturing_split to split, since it's faster than mw.text.split for everything except the default charset (since that's the only time mw.text.split uses the standard string library), so it makes sense to use with and without capturing groups. Theknightwho (talk) 21:04, 8 April 2024 (UTC)[reply]

@Theknightwho Interesting ... I wrote capturing_split() long ago with no particular intention of making it fast; the capturing functionality was just needed by Module:ru-common. Benwing2 (talk) 21:10, 8 April 2024 (UTC)[reply]

@Benwing2 I've reworked it pretty heavily, but it essentially still works in the same way. The big thing is finding fast ways to detect whether you can use the string library, since anything involving magic characters in the ustring functions is completely hopeless. Theknightwho (talk) 21:17, 8 April 2024 (UTC)[reply]

@Theknightwho Cool, thanks for all this work. I do think after you finish this you should revisit the pattern change in Module:headword made by User:Erutuon that seems to have slowed down the average time of big pages; maybe there's a way to preserve the functionality while avoiding the double Kleene star operators. Benwing2 (talk) 21:21, 8 April 2024 (UTC)[reply]

@Benwing2 Yeah, it should be possible to do it in multiple stages. Theknightwho (talk) 21:24, 8 April 2024 (UTC)[reply]

Also, just to illustrate the point about speed: the revised split function is over 10 times faster than mw.text.split with the input split("abc", ""), and the gap increases as the string gets longer. Theknightwho (talk) 23:12, 8 April 2024 (UTC)[reply]

I'll say again that changing and even removing that pattern altogether and previewing a page didn't seem to significantly up Lua execution, but I gave up pretty quickly so maybe it's worth further testing. — Eru·tuon 00:20, 12 April 2024 (UTC)[reply]

@Erutuon Yeah there seems to be a whole lot of variability in times, but something definitely caused an average-time slowdown, just don't know what. Benwing2 (talk) 00:22, 12 April 2024 (UTC)[reply]

Module:grc:Dialects[edit]

CAT:E is being swamped with Greek pages complaining that this module doesn't exist. I think it may have been deleted prematurely... Ioaxxere (talk) 01:41, 10 April 2024 (UTC)[reply]

@Ioaxxere Ahh, fuck me. Thanks to whoever undeleted it. Benwing2 (talk) 01:43, 10 April 2024 (UTC)[reply]

Nonfunctional newversion in `{{quote-journal}}`[edit]

See zacusi. Neither |journal2= or |work2= is accepted. ―⁠Biolongvistul (talk) 13:42, 11 April 2024 (UTC)[reply]

Template:tracking/defdate/hyphen[edit]

{{defdate}} still seems to be using this, and is now displaying a redlink to it; see e.g. sirrah or bḥ. (Searching mainspace for "template:tracking" finds about 600 instances of this, but AFAICT no instances of any other template besides defdate doing anything like this, so it seems to be an issue with only this one template, not a more widespread issue.) - -sche (discuss) 18:20, 11 April 2024 (UTC)[reply]

@-sche Should be fixed. Benwing2 (talk) 22:23, 11 April 2024 (UTC)[reply]

Bot-addition of templates[edit]

You've recently told Theknightwho multiple times to not introduce changes to core modules without discussing it with the community beforehand. Then why are you doing the exact same thing with templates? I have not heard anything about this, it's from before my time, I am now finding it all over the place added by a bot, and I have many objections! You can't just go off an old discussion and start mass-adding templates with a bot without making sure the current editors are still fine with it, especially when the discussion that this template was based on had just five users comment on, and, I repeat, is eight years old. Thadh (talk) 21:42, 11 April 2024 (UTC)[reply]

@Thadh I did not do that. User:-sche posted in the Beer Parlour about this template, see WT:Beer parlour/2024/April#T:antsense, to finally clarify T:sense on antonyms, and I posted and said I would do a bot run to introduce this if no one objected. I waited about a week; next time I'll wait a month if that would help. What are your specific objections? This can be undone if necessary but I want to make sure your objections can't be met in some other way. Benwing2 (talk) 21:46, 11 April 2024 (UTC)[reply]

Yes, waiting a bit longer would be a good idea for next time. I'll post my objections in the BP thread, but thanks for pointing me towards it. Thadh (talk) 21:48, 11 April 2024 (UTC)[reply]

Why does Wingerbot has been made to "canonicalize Sicilian phonemic pronun"?[edit]

Can I ask you why every Sicilian pronunciations I am encountering it's being wrongly changed in phonological expressions? Hyblaeorum (talk) 09:59, 19 April 2024 (UTC)[reply]

@Hyblaeorum What is wrong? User:Nicodene asked me to convert Sicilian pronunciations into their phonemic form, that's all. Benwing2 (talk) 19:24, 19 April 2024 (UTC)[reply]

Transcriptions using // are phonemic, and Sicilian only has five vowel phonemes: /i ɛ a ɔ u/. Nicodene (talk) 19:47, 19 April 2024 (UTC)[reply]

Actually Sicilian language has 5 stressed vowels and 3 unstressed ones. English got 28.

So if a language like Sicilian has a given set of unstressed vowels in its system are they going to be put out of the slashes?

Just to be clear:

u lupu is not pronuounced /uˈlu.pu/;

it's unavoidably /ʊˈlu.pʊ/

I would like to allow people to know how to speak my language; not spreading misinformations about it. Hyblaeorum (talk) 08:42, 20 April 2024 (UTC)[reply]

[ʊ] is an unstressed allophone of the phoneme /u/ in Sicilian. Unless you can show an example where [u] versus [ʊ] can distinguish word A versus word B (a minimal pair) I don’t see any basis for giving a phoneme */ʊ/. It is simply a mis-use of basic linguistic notation. Phonetic and phonemic transcription are not the same thing.

Speaking of misinformation, according to The Oxford Guide to the Romance Languages (pages 250–1) Sicilian /i/, /u/ in word-final position are phonetically [i], [u] and not [ɪ], [ʊ]. I’m not sure why you keep doing that. Nicodene (talk) 09:38, 20 April 2024 (UTC)[reply]

"Cannot handle template `{{synonym of}}`."[edit]

I thought I would bring this up here rather than throwing red meat to the wolves at a certain other discussion. If we're going to be having Module:transclude pulling from a wide variety of entries, we need to make it robust enough so it doesn't get the vapors at the first sight of a template someone didn't think to program into it. I'm really surprised I haven't seen this error before. Chuck Entz (talk) 02:22, 21 April 2024 (UTC)[reply]

@Chuck Entz Yeah this is why I generally only use it for toponyms. Handling things like {{synonym of}}, {{alternative form of}}, etc. is tricky because when you switch to another language, the form-of template no longer becomes valid. In this case, admiral was changed to say it's a synonym of flagship, but naturally that relationship doesn't apply in Middle Polish or any other language. I think in some cases like this one, this can be fixed by just listing the other term without any form-of qualifiers (hence "synonym of flagship; a ship of the line [etc.]" becomes just "flagship; a ship of the line [etc.]"), but that may not work in all cases. The only other alternative I can think of is to just ignore the form-of template in the transclusion. Thoughts? Benwing2 (talk) 02:44, 21 April 2024 (UTC)[reply]

My first thought was to ignore, but flag. That way someone could follow up to look for things that could be fixed or that would need to be addressed. Chuck Entz (talk) 02:55, 21 April 2024 (UTC)[reply]

@Chuck Entz OK, I will implement something like this: (a) handle all the form-of templates I can think of in some sensible way, (b) handle unrecognized templates by ignoring them but issuing a warning during Preview, and also add template tracking and/or a tracking category, and also maybe logging using mw.log(). We could also insert some text into the output itself saying essentially "implement handling for this template"; I don't know if you think this is a good idea. Benwing2 (talk) 03:27, 21 April 2024 (UTC)[reply]

transliteration of Greek to Latin characters[edit]

Hi Benwing2, I am usually in el-wikt and only occasionally here. We are looking for a tool in el-wikt to transliterate names in greek characters to latin characters according to en:w:ISO 843. I see that here Template:t does something similar (we would only have to change the table of equivalent characters), but my knowledge is not enough to locate and copy the relevant part (I am looking at Module:translations, but as I told you, I cannot see which module is invoked). Are you the right person to ask for help? If not, who could possibly give us a hand? FocalPoint (talk) 16:45, 22 April 2024 (UTC)[reply]

Forget it, we found it ! No need to invest time. Have a nice day. FocalPoint (talk) 05:32, 23 April 2024 (UTC)[reply]

@FocalPoint Glad you found it, and sorry for the delay in responding. Benwing2 (talk) 05:50, 23 April 2024 (UTC)[reply]

Duplicate categories[edit]

Hi, I'm from ckbwiktionary. While mass importing subcategories of Category:Languages by country on ckbwiktionary, I noticed that Category:Languages of Republic of the Congo and Category:Languages of the Republic of the Congo being the same category. Which one should stay? Thanks! Aram (talk) 21:56, 26 April 2024 (UTC)[reply]

@Aram Thanks for pointing this out. The latter category should stay; this one is consistent with our naming policies (which include the word "the" when appropriate), and contains all but one of the languages. Benwing2 (talk) 22:08, 26 April 2024 (UTC)[reply]

User talk:Benwing2

Archive[edit]

Catalan inflections[edit]

Your bot is removing valid categories[edit]

Twice-borrowed terms[edit]

New :toBcp47Code() method[edit]

Addition to quotation-template documentation[edit]

Using the Old French conjugation table as an inspiration[edit]

Finnish inflections[edit]

Request to deploy {{szy-pron}}[edit]

Relational -> demonym[edit]

Revert adding acceleration forms to {{pl-conj-ai}}[edit]

On the {{quote-book}} template[edit]

WingerBot and Welsh animal genders[edit]

Links to English possessives in inflection-line templates[edit]

devil's own[edit]

Category:LANG nouns with other-gender equivalents[edit]

Email[edit]

bùzháodiào[edit]

hmm[edit]

Mon-Burmese script[edit]

Seeking template help[edit]

Category:Romance terms inherited from Latin nominatives[edit]

Macrolanguages[edit]

Italicising synonyms for taxonomic names[edit]

Error handling with Module:parameters and Module:languages[edit]

"terms spelled with"[edit]

Latin macronization change: veho, vē̆xī, vectum[edit]

Category:Hijazi Arabic terms with IPA pronunciation - Alphabet order[edit]

Replacement of quotation templates[edit]

Bugs in ar-conj/module:ar-verb[edit]

Bugs in ar-conj/module:ar-verb (part 2)[edit]

About categories[edit]

Adding a category with multiple subcategories[edit]

A couple of code replacements[edit]

Module editing tutorials[edit]

Min translations[edit]

Module:columns and Module:sa-verb, Module:sa-verb/data[edit]

Replacement of quotation templates[edit]

{{quote-song}}[edit]

Wu information origin[edit]

Gender-neutral adjectives in Module:es-headword[edit]

Category:Chinese terms written in foreign scripts[edit]

Replacement of quotation template[edit]

Time-outs from change to Module:headword[edit]

By the way[edit]

pcall and accessing nonexistent pages[edit]

Consolidating into Module:string utilities[edit]

Module:grc:Dialects[edit]

Nonfunctional newversion in {{quote-journal}}[edit]

Template:tracking/defdate/hyphen[edit]

Bot-addition of templates[edit]

Why does Wingerbot has been made to "canonicalize Sicilian phonemic pronun"?[edit]

"Cannot handle template {{synonym of}}."[edit]

transliteration of Greek to Latin characters[edit]

Duplicate categories[edit]

Navigation menu

Search

Request to deploy `{{szy-pron}}`[edit]

Revert adding acceleration forms to `{{pl-conj-ai}}`[edit]

On the `{{quote-book}}` template[edit]

`{{quote-song}}`[edit]

Nonfunctional newversion in `{{quote-journal}}`[edit]

"Cannot handle template `{{synonym of}}`."[edit]