Wiktionary:Grease pit

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Grease pit

A grease pit

Welcome to the Grease pit!

This is an area to complement the Beer parlour and Tea room. Its purpose is specifically for discussing the future development of the English Wiktionary, both as a dictionary and thesaurus and as a website.

The Grease pit is a place to discuss technical issues such as templates, Lua modules, CSS, JavaScript, the MediaWiki software, extensions to it, Toolforge, etc. It is also the second-best place, after the Beer parlor, to think in non-technical ways about how to make the best, free, open online dictionary of “all words in all languages”.

Others have understood this page to explain the “how” of things, while the Beer parlour addresses the “why”.

Permanent notice

  • Tips and tricks about customization or personalization of CSS and JS files are listed at WT:CUSTOM.
  • Other tips and tricks are at WT:TAT.
  • Find information and helpful links about modules, Lua in general, and the Scribunto extension at WT:LUA.
  • Everyone is encouraged to expand both pages, or to come up with more such stuff. Other known pages with “tips-n-tricks” are to be listed here as well.

Grease pit archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007
2006


March 2024

Japanese kanji appear as orange links[edit]

Hello, adjacent to my other post about accelerated editing, this time the orangelink gadget seems to be acting up somehow when linking to some Japanese kanji, a problem I noticed maybe a month or two ago now but am only now reporting; see e.g. 倒す, 不倒, 押掛ける, 怖気づく, 圧し殺す for some examples. I don't know why it affects some kanji and not others; I assume if the gadget's working by detecting headings, somehow there's been an interference with the way the page text is being parsed as of more recently. If I'm reading the gadget source correctly, the content is considered absent if the entry doesn't belong to a category like "Japanese lemmas" or "Japanese non-lemma forms", but no recognition appears to be made if "Japanese" is followed by "kanji" (despite "logograms" and "Han tu" being checked for); would it be possible that this is the culprit? Kiril kovachev (talkcontribs) 20:13, 1 March 2024 (UTC)[reply]

@Kiril kovachev When you say it's "acting up somehow" can you clarify what the issue is? I looked at those examples but I'm not sure what the desired behavior is vs. what you're actually seeing. Benwing2 (talk) 02:04, 2 March 2024 (UTC)[reply]
I believe the problem was that in {{ja-kanjitab}} in 倒す#Japanese was orange (or green in my case because of my CSS) because the kanji page doesn't have Category:Japanese lemmas but does have Category:Japanese Han characters, and the gadget doesn't recognize the latter as a lemma-like category. Adding "Han characters" to the regex for lemma-like categories seems to have fixed the problem. — Eru·tuon 03:25, 2 March 2024 (UTC)[reply]
@Benwing2 Sorry, I didn't make it clear what I meant, but as Erutuon pointed out, e.g. should not be an orange link, rather it should just be blue because it does have a Japanese entry for it. Up until now those kanji were displayed as orange inside the respective kanjitabs despite definitely existing on the linked pages. @Erutuon, thanks a lot for making the change. Looks good now. Kiril kovachev (talkcontribs) 15:21, 2 March 2024 (UTC)[reply]

Access to Raw Transliteration[edit]

For some languages,the transliterate method fails because of a workaround to the problem that some transliteration modules fail when the text to be transliterated includes mark-up. A formal statement of the problem with the current solution is given in the link above. One solution to the failure of the method is to bypass the workarounds in the method and access the transliteration modules directly. We have been discussing the issue for Sanskrit at Module talk:sa-translit#Getting_Text, where the context for interpreting accentuation marks is usually larger than the scope of mark-up, such as pairs of triple ASCII apostrophes for mandatory emboldening of words.

Should we have a generic template, analogous to {{xlit}}, and a generic Lua method or function to bypass the workarounds, or should we use ad hoc language-specific templates (e.g sa-tr) to do the bypassing? I believe the major use of such templates would be to generate 'manual' transliteration strings for quotation templates. --21:23, 1 March 2024 (UTC) RichardW57 (talk) 21:23, 1 March 2024 (UTC)[reply]

@RichardW57 IMO neither approach you're suggesting is good. The issue you're running into has been a point of contention between me and User:Theknightwho; the decision he made to chop up transliteration into parts and not pass formatting such as apostrophes through to the translit method seems to be causing problems in several languages. AFAIK this is only needed for certain languages with complex transliteration methods so I would recommend we switch it to be opt-in on a per-language basis, and pass the unmodified source text by default. User:Theknightwho do you have any objections to this approach and can you let me know which languages should opt into the chop-up functionality? Benwing2 (talk) 02:00, 2 March 2024 (UTC)[reply]
@Benwing2 It would potentially take a lot of work to determine that, and the current method is not something I want to keep around much longer, as obviously this is a major shortcoming. For now, I'd really prefer that we simply add languages to the opt-out list as necessary. Theknightwho (talk) 02:03, 2 March 2024 (UTC)[reply]
@Theknightwho Is there an opt-out list? I remember our discussion for Thai concluding that there wasn't a simple way to opt out entirely. Also when you say "not something I want to keep around much longer" are you planning on reworking the code? Benwing2 (talk) 02:05, 2 March 2024 (UTC)[reply]
@Benwing2 For what Richard needs, the opt-out list in Module:languages/data should be sufficient. In terms of replacing it, the wikitext parser should make that possible, since it knows how to work around formatting without needing to split up the string into chunks. Theknightwho (talk) 02:09, 2 March 2024 (UTC)[reply]
@Theknightwho: Is the opt-out list the table contiguous_substitution? --RichardW57 (talk) 02:45, 2 March 2024 (UTC)[reply]
@RichardW57 Yes. Theknightwho (talk) 02:48, 2 March 2024 (UTC)[reply]
@RichardW57 @Theknightwho Then I suggest to add Sanskrit to the list, so I can check if the Vedic accents work fine (and perhaps check for other bugs) Exarchus (talk) 12:04, 2 March 2024 (UTC)[reply]
@Exarchus I have added this. I doubt it will work for your purposes, though, as the opt-out is (IMO) badly designed in that it passes munged versions of the formatting characters rather than the formatting characters themselves. Let me know if you have issues; if so I'll create a second opt-out that does the right thing and opts out entirely of all processing. Benwing2 (talk) 12:19, 2 March 2024 (UTC)[reply]
@Benwing2 It doesn't work perfectly now, although there is a difference. Exarchus (talk) 12:40, 2 March 2024 (UTC)[reply]
if I can somehow know what characters these munged versions consist of, that might work too Exarchus (talk) 12:47, 2 March 2024 (UTC)[reply]
If I rewrite the code somewhat (by including whatever characters that aren't Devanagari vowels etc.), then it seems to work as intended

So a workaround can be found with the current opt-out, the code would just be a bit weird.Exarchus (talk) 12:55, 2 March 2024 (UTC)[reply]

One thing that doesn't work is using <br\> (without backslash) to detect the start of a new prosodical unit. So people should use danda । there (which is normal practice). Exarchus (talk) 14:03, 2 March 2024 (UTC)[reply]
I think the module is working fine now. I don't think there's currently a use case for adding different accentuation schemes than the Rigvedic one (I could try adding the Samaveda one, or the Atharvaveda symbol for independent svarita, U+1CE1). Exarchus (talk) 16:53, 2 March 2024 (UTC)[reply]
If there's no danda in the source, the only way to add it is as an explicit emendation of the source, which is supported using |norm= with {{quote-book}}. Or are fraudulent quotations acceptable now? --RichardW57 (talk) 23:13, 2 March 2024 (UTC)[reply]

This is currently in CAT:E with the message 'The language or etymology language name "Hindustani languages" is not valid.' Looking at entries in this category, I can find at least one where {{translit|en|inc-hnd}} has been in place since October, but this category was only created today. That leads me to think that something has changed in the modules recently. The code "inc-hnd" must have already been in existence since October, or its absence would have resulted in module errors. I can only conclude that one or more of the following has changed:

  1. The behavior of {{translit}}
  2. The settings for the code "inc-hnd"
  3. The behavior of one of one or more of the modules called by {{auto cat}}

It seems reasonable to me to have a category for transliterations where the exact language within a group isn't specified, so how do we fix this? @Theknightwho. Chuck Entz (talk) 21:49, 2 March 2024 (UTC)[reply]

@Chuck Entz I can't find anything that would explain why this category would've suddenly appeared recently, so it may be that no-one got round to creating it until now. I've certainly refrained from creating categories in the past if I notice the preview throws an error.
I agree that this should be allowed, though, so it's worth updating the category tree to permit families for this type of category.
Theknightwho (talk) 22:11, 2 March 2024 (UTC)[reply]
@ChuckEntz: Unless we are publishing lies again, Hindustani is not a *group* of languages, but another name for Hindi and Urdu. Again, always explaining a language designation by its Wikipedia entry is a bad idea, especially for Indian languages. --RichardW57 (talk) 23:27, 2 March 2024 (UTC)[reply]
@RichardW57: There's reality, and then there's the way the modules work. This is the Grease Pit, so I was talking about the latter. We have Hindi and Urdu as separate language codes, each with their own infrastructure. Combining them into one language code would cause massive disruption and require a huge amount of work. You would have to ask at the Beer parlour about whether there's a consensus to make that change. I'm just trying to fix something that's broken. Chuck Entz (talk) 00:00, 3 March 2024 (UTC)[reply]

5,705 errors[edit]

What is going on here? Whatever caused the error has been fixed but not before completely trashing CAT:E. This happened yesterday, too, just not as extreme. Whoever is editing core modules here needs to be more careful. Benwing2 (talk) 02:10, 3 March 2024 (UTC)[reply]

@Benwing2 It's really odd - the number keeps climbing, but I can't find any pages which are actually throwing the error. Theknightwho (talk) 02:41, 3 March 2024 (UTC)[reply]
@Theknightwho: that's not uncommon. More often than not, I never see more than a page or two still displaying the error. I believe that's because the category updates are a separate process from the page updates. The display goes back to normally fairly quickly, while the propagation of the category updates can take as much as a week. When I'm around, I do my best to clear everything using the API Sandbox link. In this case I had it going in two tabs for what seemed like an hour, starting with well over 8,000 down to the current 3. I'm sure I wasn't the only one working on it, though. I expect there to be a few every few minutes or so for an hour or more. Chuck Entz (talk) 04:19, 3 March 2024 (UTC)[reply]
@Chuck Entz Yeah, I also spent about 30 minutes trying to clear it before I gave up and went to bed. Thanks for sorting it. Theknightwho (talk) 10:10, 3 March 2024 (UTC)[reply]

Gothic script[edit]

The Wikipedia link on the page Category:Gothic script is to w:Gothic script which is a disambiguation page. It should link to w:Gothic alphabet instead. 212.179.254.67 08:14, 4 March 2024 (UTC)[reply]

Fixed. Theknightwho (talk) 20:27, 5 March 2024 (UTC)[reply]

der2[edit]

On the page cut off the der2 template is used to make a show more / show less list of derived terms. But there is only one line revealed by "show more". So the "show more" doesn't save any space... you may as well just show the one hidden line (and it will save the user the click). 212.179.254.67 12:08, 4 March 2024 (UTC)[reply]

FYI: Major romanization change coming in Japan[edit]

May impact modules, etc. if this is a change we want to adopt as well: https://languagelog.ldc.upenn.edu/nll/?p=62827

Tangentially related: Wiktionary:Grease_pit/2024/February#Japanese_accel_moduleJustin (koavf)TCM 19:08, 4 March 2024 (UTC)[reply]

@Koavf The change in question is a switch from Kunrei to Hepburn romanization. It looks like we already use Hepburn romanization, e.g. the page 松下 is transliterated Matsushita not Matusita. BTW there's some category breakage on that page; User:Theknightwho it looks something to do with sort key generation, do you have any idea what's going on? Benwing2 (talk) 23:23, 5 March 2024 (UTC)[reply]
@Benwing2 I think it's related to this diff @Erutuon. Theknightwho (talk) 23:39, 5 March 2024 (UTC)[reply]
Sorry about that! I should have checked a few more cases in {{tracking category}}. Fixed, I think. — Eru·tuon 23:49, 5 March 2024 (UTC)[reply]
@Erutuon Thanks. What does this template do? Can you add a bit of documentation? Benwing2 (talk) 23:51, 5 March 2024 (UTC)[reply]
@Benwing2: Okay, done. — Eru·tuon 00:02, 6 March 2024 (UTC)[reply]
Nice. Thanks. I'm glad it at least surfaced this issue that got swiftly fixed. —Justin (koavf)TCM 00:27, 6 March 2024 (UTC)[reply]

Time range with time ranges[edit]

This is about {{quote-book}}. If, for example, a work of literature was started somewhere between 1900 and 1905 (something to indicate using |startyear=), and finished somewhere between 1915 and 1920 (something to indicate using |year=), considering date ranges use an en dash (–), I would think to simply type:
{{quote-book|[LANGUAGE]|startyear=1900–1905|year=1915–1920|[...]}}
which would produce:
1900–19051915–1920 []
I'm wondering: is this too confusing? or is it a good enough way of rendering it? —— GianWiki (talk) 16:16, 5 March 2024 (UTC)[reply]

@GianWiki Definitely a very edgy edge case. Does this ever actually happen? I think the display form 1900–19051915–1920 is not going to be interpreted correctly. Maybe we could add some code to change the display if there are en dashes or em dashes in the |startyear= or |year= parameters but I think it's probably better to just put the appropriate explanatory text in the |year= param, something like |year=from '''1900–1905''' to '''1915–1920''' (exact years unknown). The code should not boldface the year if there's already boldface in the param value. Benwing2 (talk) 23:18, 5 March 2024 (UTC)[reply]
@GianWiki why not just put c. 1900–1920? Ioaxxere (talk) 22:20, 6 March 2024 (UTC)[reply]

Fix Module:place and Module:place/data for bugs[edit]

The function unpack is wrongly used in Module:place and Module:place/data.

Example code :

local prev_qualifier, this_qualifier, bare_placetype = unpack(split)

In cases where split is a table where index 1 and 2 are nil (i.e. a sparse table, eg: { [3] = 'continent' }), this will not work as expected (all 3 variables will be nil). Code should be corrected to :

local prev_qualifier, this_qualifier, bare_placetype = unpack(split, 1, 3)

I do not have right to fix it myself, but this should be fixed. Dodecaplex (talk) 19:17, 5 March 2024 (UTC)[reply]

@Dodecaplex You are right about that; it's unfortunate the unpack function was implemented in a broken fashion. Whether this correction needs to happen depends on whether the values can ever be nil; I'll need to take a look at the code in question. Benwing2 (talk) 23:13, 5 March 2024 (UTC)[reply]
For instance in page Mexico, the template is called for continent holonym (e.g. first definition), which has no qualifier. So, yes, it is nil in many places. As I extract all pages using an alternate Lua environment, I got at least 151849 errors of this kind. Dodecaplex (talk) 08:22, 6 March 2024 (UTC)[reply]
I think it will be better to specify the start and end indices. What happens when we don't specify the start and end indices, and split[1] or split[2] is nil, is that unpack sets the end index to basically #split, and the length operator (ultimate implementation found here) gives oddball results based on undocumented implementation details of tables. unpack and the length operator are only designed to work properly for sequence tables, because they don't traverse all keys in the table to find the actual maximum integer key. So the only way to ensure that this_qualifier and bare_placetype are always set to the values of split[2] and split[3] is to set the start and end indices ourselves. — Eru·tuon 22:16, 6 March 2024 (UTC)[reply]
I've gone ahead and added the start and end indices to unpack in Module:place and Module:place/data. — Eru·tuon 22:37, 7 March 2024 (UTC)[reply]
Thanks ! Dodecaplex (talk) 17:39, 8 March 2024 (UTC)[reply]

Disappearing text[edit]

My talk history page on Βικιλεξικό here shows edits which do not appear on the talk page itself here. There is obviously an explanation — I hope that it isn't me!!   — Saltmarsh🢃 19:25, 5 March 2024 (UTC)[reply]

@Saltmarsh, the user was blocked and the edit at your Talkpage reversed. It was a text by a blocked (at en.wikt, now also at el.wikt) by Shāntián Tàiláng who asked these questions: Request for English Wiktionary. Hello, I have noticed that όρισμα (modern Greek) may be derived from modern Greek ορισμός (from ancient Greek ὁρισμός). I do know that English orismology needs an etymology section added; that section should state that it derives from ancient Greek ὁρισμός and {{suffix|en||logy}}.
Also, Category:grc:Woodworking should be created, because πρίσμα needs that same category added to it.
Incidentally, tenpenny nail really needs "w:" placed just before "The Old Curiosity Shop" in its first quotation. Shāntián Tàiláng (συζήτηση) 20:18, 27 Φεβρουαρίου 2024 (UTC)
1) After that, was blocked by me 2024.02.28.@el.wikt#Block for continuing annoying admins with questions.
2) After that, repeated the text, as an IP, and another admin reversed and changed visibility. HistoryOfYourTalk
3) He tries to apply again at en.wikt for unblocking and asked me @meta how he could apply for unblocking. ‑‑Sarri.greek  I 20:05, 5 March 2024 (UTC)[reply]
Dear @Saltmarsh, tell me, if you wish me to unblock ‑‑Sarri.greek  I 20:12, 5 March 2024 (UTC)[reply]
@Sarri.greek I strongly advise not unblocking ST on el.wikt. Pinging @Surjection, who is most familiar with them. Theknightwho (talk) 20:23, 5 March 2024 (UTC)[reply]
ST should not be unblocked under any circumstance. — SURJECTION / T / C / L / 20:53, 5 March 2024 (UTC)[reply]

Deleting and moving of public sandbox submodules[edit]

User:Theknightwho and User:Benwing2 have been getting rid of /sandbox submodules by moving them to Module:User:Erutuon/ and deleting them. I'm uneasy about this idea, but haven't cared enough to complain before today, when my module sandbox subpages are filling up with various sandbox modules I've created in the past. (At least they're more discoverable in my module sandbox subpages than when they're deleted.)

I recall some sort of discussion (Wiktionary:Grease pit/2022/July#Sandboxes in CAT:E I guess) awhile ago, but I'm not aware of a vote that says that these public sandbox modules are banned.

I think it's counterproductive to remove the sandbox modules. Ideally we'd have a whole set of sandbox modules and lots of testcases in the main modules and sandbox modules, so casual users could just test a change in the sandbox and see what happens, without causing thousands of module errors. IP users don't have a place (Module:User:IPAddress/) to put sandbox modules in, and casual users who notice an error are also probably not going to know or bother to copy over modules to Module:User:whatever/modulename and test changes. User sandbox modules are hard to find and it's tedious to ask if User:whatever will mind you editing them, if you do find them. So I think it's good to have "public" sandbox modules.

Granted also that sandbox modules are not very useful when they are not in sync with the main module, which is very likely to be the case when production modules are being edited often. And we don't have very extensive testcases for main modules, much less sandbox modules, so it's currently hard for editors of sandbox modules to see what their edit actually does. It takes valuable time to add testcases for new changes. So I don't know how realistic my reasoning actually is.

To solve Wiktionary:Grease pit/2022/July#Sandboxes in CAT:E, I've expanded Template:tracking category so that it identifies all the types of sandbox modules listed in Template talk:tracking category#Identifying sandbox modules and templates. I did a bunch of regex on the list of titles in the dump to figure out all the formats of titles of sandbox modules, and then I ran some JavaScript code to make sure that the new version of the template identifies all the sandbox modules I listed. Now MediaWiki:Scribunto-common-error-category should put all the usual types of sandbox modules out of sight in Category:Pages with module errors/hidden rather than in CAT:E. — Eru·tuon 21:58, 5 March 2024 (UTC)[reply]

@Erutuon Hi. I moved some of them yesterday. The ones I moved were almost exclusively years old, almost exclusively yours, and usually not worked on by anyone else. My logic is that sandbox modules should not be cluttering the mainspace. In practice I have never found a need to use mainspace sandbox modules and I definitely believe that such modules should be in userspace. Mainspace sandbox modules by their nature don't support more than one person working on them at a time and there's no mechanism provided for multiple people to synchronize their edits to a given sandbox module. In general, all sorts of problems can potentially arise with mainspace sandbox modules. In addition they get out of date quickly since production modules do get edited fairly often. In practice, anyone working on sandbox modules has to copy over the latest production modules anyway, so I don't see how there's any benefit to having the sandbox modules in mainspace vs. in your own userspace. I understand that theoretically they could help IP users but I'm not sure how commonly this ever actually happens. Also, given the reality that testcases take effort to maintain that most people don't want to spend, I think it's unlikely we'll ever have a reasonable sandbox testcase infrastructure. That said, I won't move any more modules for the time being but I do hope you'll consider switching to userspace sandbox modules. Benwing2 (talk) 22:28, 5 March 2024 (UTC)[reply]
Just to chime in to say the same thing: the ones I deleted were in all cases hopelessly out of date, and none had been edited within the last year; many hadn't been edited since before 2020. Theknightwho (talk) 23:49, 5 March 2024 (UTC)[reply]

change to module categorization[edit]

FYI I made a change to Module:documentation so that modules are categorized even when documentation is present, as long as there is no <includeonly> section present on the page. I also made the module categorization smarter. Benwing2 (talk) 02:17, 6 March 2024 (UTC)[reply]

CJK Compatibility Ideographs in ranges for Hani script[edit]

I haven't run the bot that converts between {{t}} and {{t+}} since December, because when I tried, I ran into a problem: the entry-name rules for Korean (ko) contain a pattern whose Perl analogue is invalid, causing my code to blow up with Invalid [] range "豈-舘" in regex.

I took some time to investigate this yesterday, and I believe I now understand how to fix it (so no real action is required), but I figured (some) people might be interested in what I found, because it involves some MediaWiki tech stuff that we don't usually think about but does have user-facing effects.

Some background:

So anyway, the issue turns out to be with the character range from U+F900 to U+FA6D, which ends up as a character range in a Lua pattern in the Korean entry-name rules [link · link].

The problem is that U+F900 and U+FA6D are CJK Compatibility Ideographs, and MediaWiki applies Unicode Normalization Form C (NFC) to inputs and outputs, so by the time my bot sees the range, it's become the range from U+8C48 to U+8218, which Perl rejects because the greatest character in the range would be less than the least character. And that's actually kind of good luck; the range immediately below it, from U+FA70 to U+FAD9, gets normalized to the range from U+4E26 to U+9F8E, which includes a whole bunch of characters that it's not intended to, but is valid so far as Perl can tell, so I would never have noticed it.

For purposes of the translation-bot, I plan to fix this by just changing its server-side component to escape non-ASCII characters in some way, and the bot proper to de-escape them. That should completely circumvent MediaWiki's application of NFC.

More broadly, it may be worth asking if we really want ranges of characters that MediaWiki literally won't even let be saved; I can see arguments either way. Feel free to discuss. :-)     (FYI @Theknightwho.)

RuakhTALK
08:27, 7 March 2024 (UTC)[reply]

@Ruakh Thanks for doing the investigation! I know about the conversion to NFC form but I didn't suspect it would affect CJK chars in this fashion. The current code is probably OK since it doesn't store the characters literally but rather as numbers, and constructs the ranges on the fly (hence they never get saved and converted to NFC form). Whether the ranges are OK depends on whether there are any characters in the middle of the range that aren't canonicalized out of existence during the NFC conversion, and that I don't know. User:Theknightwho will hopefully comment on this. Benwing2 (talk) 22:23, 7 March 2024 (UTC)[reply]
@Benwing2 The reason I did this was for a couple of reasons:
  1. I wanted to cover any edge-cases which involved these compatibility ideographs, since I didn't know if they were used anywhere (e.g. in the Unicode modules).
  2. There are actually 12 CJK characters in the "compatibility ideographs" range which aren't actually compatibility ideographs, and don't get normalised to other characters in NFC (which I assume got added to that range by mistake many years ago, or have since been disunified for some reason): 﨎, 﨏, 﨑, 﨓, 﨔, 﨟, 﨡, 﨣, 﨤, 﨧, 﨨, 﨩. They don't form a continuous range, so it was slightly more efficient to simply include the whole block.
Theknightwho (talk) 22:32, 7 March 2024 (UTC)[reply]
@Erutuon Just letting you know about this approach of Ruakh's, since we were discussing something similar a while ago (which I have not had time to revisit). This, that and the other (talk) 11:47, 8 March 2024 (UTC)[reply]
I should say that if I had to do the bot over from scratch, given the current state of Wiktionary and given what I know now, I probably would not implement it this way. I think a better approach would involve some degree of asking the server-side to do transformations, plus aggressive client-side caching (storing previously-computed transformations in timestamped files and reusing them for an extended period, e.g. six months), a bunch of client-side special-casing for high-volume cases (e.g. "if the language code is [foo] and the translation matches [simple pattern] then compute the entry-name by [simple function] and don't bother querying the server"), and various other such optimizations. In fact, even though I have all the code/etc. for my current approach, I'm still considering migrating to an approach like that at some point.
So if you're planning on writing something from scratch, I think that's what I'd recommend. (But if you're comfortable with Perl, and would rather just reuse my code than write something from scratch, let me know and I can try to get it into a shareable state.)
RuakhTALK 10:01, 9 March 2024 (UTC)[reply]

T:km-xi got worse[edit]

At ថៃ (thay) is linking words to #English, rather than Khmer.

E.g. ព្រះរាជាណាចក្រថៃ  ―  prĕəh riəciənaacak thay  ―  Kingdom of Thailand

Not asking for any improvement, just fixes. Anatoli T. (обсудить/вклад) 05:21, 8 March 2024 (UTC)[reply]

I suspect a general problem. The same symptoms are showing with {{th-x}}, which invokes {{#invoke:th|usex}}, and earlier this week I found the same problem with plain double square brackets that link to translingual words rather than English words in glosses. Experimentation suggests that it only shows up on lines formatted by '#', so afflicting quotations and glosses. --10:07, 8 March 2024 (UTC) RichardW57m (talk) 10:07, 8 March 2024 (UTC)[reply]
It occurred to me that Module:th should be corrected to specify that the linked-to words are Thai, because Thai entries are usually the last on their page, and got as far as changing Line 223 of Module:th from
exSet, "[[" .. thaiWord .. "]]")
to
table.insert(exSet, "[[" .. thaiWord .. "#Thai|"..thaiWord.."]]")
, but then I realised that that wouldn't handle normal numbers or even idiomatic ones - Thai 555 bears no relation to Translingual 555, so I abandoned the edit. Test cases were อมฤต (à-má-rít) and โควิด-19 (with '14' as a normal number). More thought is needed on that one - (Notifying Alifshinobi, Octahedron80, YURi, Judexvivorum, หมวดซาโต้, Atitarev, GinGlaep, RichardW57, Noktonissian): . --RichardW57m (talk) 12:15, 8 March 2024 (UTC)[reply]
The cause of this is a JavaScript change mentioned in Wiktionary:Beer parlour/2024/February § Use of T:lang. I can fix this, but I think the templates should link to the correct language section. If ASCII numbers shouldn't be linked to the Thai section, that would be easy to fix in the module with if thaiWord:match("^%d+$") .... Granted I suppose that won't be the only thing that you don't want linked to the Thai section. Generally bare links to no section should be avoided when you know the probably correct language section to link to (which is wrong in the case of 14 here). {{th-x}} probably needs some way to link to 14#Translingual, and to disable linking. [[14#Translingual|1{สิบ} 4{สี่}]] doesn't work.
However, I've prevented the link-changing code in MediaWiki:Common.css from running within lang="..." text other than lang="en(-...)". — Eru·tuon 16:06, 8 March 2024 (UTC)[reply]
@Erutuon Thank you, that change seems to have removed most of the problems. However, I'm still confused how taxonomic names as definitions should be linked to. @DCDuring. It seems that {{l|mul}} isn't the recommended way.
The Thai ASCII numbers are now acting tolerably again, though they obviously can't all be translingual - we hit a limit at 101, though I would expect that one to have specific semantics in Thailand as a place name. I'm fairly happy with treating them mostly as semantically digit sequences, though I think there may be lurking chauvinistic problems, and possibly trouble with line-breaking. Roman script acronyms (CD, DVD, VDO and OT come to mind, though the last one may be overseas Thai and it's a word I've heard, rather than seen) and taxonomic names may cause problems for {{th-usex}}, though I've mostly seen the latter as definitions in Thai dictionaries. Again, nationalism may have stored up problems. --RichardW57m (talk) 17:46, 8 March 2024 (UTC)[reply]
@RichardW57m. I agree. Thai usex templates have the same problem now but the formatting colours don't reveal the problem. With Khmer, I am sure colours were right before but I can't say when exactly this problem occurred.
Pinging @Theknightwho, @Benwing2: Are you able to fix the language in the links? Anatoli T. (обсудить/вклад) 05:41, 12 March 2024 (UTC)[reply]
@Atitarev What's the problem you're seeing now? Erutuon's fix of 8 March seems to have removed the problem you were talking about. The outstanding issue with the {{th-usex}} is that there doesn't seem to be a mechanism to specify the language of the elements in the quotation, which causes at the least a colouring problem with translingual elements if we try tagging the elements for language. (I noted this problem nearly 4 years ago.) --RichardW57m (talk) 09:52, 14 March 2024 (UTC)[reply]
@RichardW57m:
I think the displaying colour is now fixed. When I posted, the linked terms showed in orange for the Khmer template. Which edit on which module was it? Can you ping me the {{diff|}}, please?
However, please compare the output by hovering over the word components. Only the last line shows expected links, like [[王國#Chinese]], the first two just show [[រាជាណាចក្រ]] without the language. So, if any of the words were shared by multiple languages, the links wouldn't connect to the correct ones.
  1. ព្រះរាជាណាចក្រថៃ  ―  prĕəh riəciənaacak thay  ―  Kingdom of Thailand
  2. ราชอาณาจักรไทย  ―  râat-chá-aa-naa-jàk tai  ―  Kingdom of Thailand
  3. 王國王国  ―  Tài wángguó  ―  Kingdom of Thailand
Anatoli T. (обсудить/вклад) 23:34, 14 March 2024 (UTC)[reply]
@Atitarev: I believe the fixing diff is Special:diff/78355928. The problem, as I said above, is tagging the entries in the quotation correctly - a quotation in Thai is not always composed only of Thai elements. The Chinese quote template seems to make the assumption that all the entries are in the same variety of Chinese; I don't know how well it handles translingual words within the quotation. For Thai and Khmer, the links actually connect to the page, rather than the first entry, which is correct, but not very helpful if the Thai or Khmer entry is not the first entry. (At least Khmer occurs before Pali.) --10:17, 15 March 2024 (UTC) RichardW57m (talk) 10:17, 15 March 2024 (UTC)[reply]
@RichardW57m: Thanks. The Chinese template works like other language template when the words are wikified (linked), in case you're not familiar, e.g.
В чужо́й монасты́рь со свои́м уста́вом не хо́дят (proverb)V čužój monastýrʹ so svoím ustávom ne xódjatwhen in Rome, do as the Romans do (literally, “You don't go to another monastery with your own charter”)
All the words above link to Russian entries.
You can also unlink foreign words in a Chinese usex:
  1. X什麼意思X什么意思  ―  X shì shénme yìsī?  ―  What does X mean?
As for varieties, of course, it's linking to "Chinese", since the varieties are merged under "Chinese" L2 header. Defaults to Mandarin transliterations. It's working with other varieties too with parameters, e.g |C= for Cantonese:
  1. X乜嘢意思 [Cantonese, trad. and simp.]
    X mat1 je5 ji3 si1 aa3? [Jyutping]
    What does X mean?
Delinking should work in the Thai and Khmer usexes as well, the trouble is, nobody seems to be able to make sense, let alone fix or enhance these language-specific modules, since Wyang left. Anatoli T. (обсудить/вклад) 08:15, 16 March 2024 (UTC)[reply]
@Atitarev: Can you make a list of things that are broken and what the correct behavior should be, with an example for each issue? I am going to sleep now but when I get up I will take a look and see about fixing them. Benwing2 (talk) 08:49, 16 March 2024 (UTC)[reply]
@Benwing2: Thanks.
If User:Atitarev/Khmer translit test cases and User:Atitarev/Thai translit test cases are still on your watchlist, yoy can start there. I will start with simple fix requests, since I don't know if you guys still plan to make it work like the Chinese counterparts.
  1. I made a comment about ។ symbol problem (and other punctuation symbols, foreign symbols) on the Khmer page.
  2. Khmer is behind Thai in handling ៗ (repetition symbol). Thai ๆ can, at least repeat the last full word.
  3. The Khmer, unlike the Thai template, demands an English translation parameter, it should be optional but can ask for it, like regular templates.
  4. As above, it's desirable to delink certain words with @ without making the output fail.
  5. Delinked foreign words (e.g. English words, numerals) should transliterate as they are, without trying to "transliterate" from Thai/Khmer. E.g. โควิด-19
  6. A harder fix. Please see @Erutuon's example above regarding numerals for re-spelling of numerals. I pinged you on re-spelling numerals but I have to find that topic. It's a harder fix. Remind me if you still have the motivation later.
(You can move/split) this discussion, if you wish. Anatoli T. (обсудить/вклад) 09:20, 16 March 2024 (UTC)[reply]
@Benwing2:
Here's the numeral respelling topic: Wiktionary:Grease_pit/2024/January#Transliterating_foreign_language_usage_examples_with_numerals
Chinese templates can respell numerals. Thai or Khmer can't.
หนองคายอยู่ห่างจากกรุงเทพฯ ๖๑๔ กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep 614 · gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
Delinking @๖๑๔ doesn't work either. ๖๑๔ (614) doesn't need to be linked in the usex.
๖๑๔ (614) is pronounced (hòk rɔ́ɔi sìp sìi)
respelling "6{หก ร้อย} 1{สิบ} 4{สี่}" doesn't work.
In words: หกร้อยสิบสี่  ―  hòk rɔ́ɔi sìp sìi  ―  six hundred fourteen. Anatoli T. (обсудить/вклад) 10:13, 16 March 2024 (UTC)[reply]
@Benwing2, @Atitarev: But
{{th-xi|หนองคาย อยู่ ห่าง จาก กรุงเทพฯ  6{หก-ร้อย} 1{สิบ} 4{สี่}  กิโลเมตร|Nong Khai is 614 kilometers from Bangkok.}}
หนองคายอยู่ห่างจากกรุงเทพฯ 614 กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep hòk-rɔ́ɔi sìp sìi · gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
does work.
Apart from irrelevantly fixing the punctuation errors - the number in the parameter should be flanked by double spaces so as to give visible spaces - the trick is not to have a space in the Thai phonetic spelling. Join the components with hyphens. The original example, which is an odd form of Thai, can be achieved by using Thai digits.
Of course, the documentation needs improvement. --RichardW57m (talk) 09:09, 18 March 2024 (UTC)[reply]
@Atitarev: Actually, the second error may matter. If one omits all spaces before the last word, it disappears. That's a problem with lax parsing. --RichardW57m (talk) 09:29, 18 March 2024 (UTC)[reply]
@RichardW57m, thanks. I see.
We need to have the ability to use both Arabic and Thai numerals (the example I provided earlier used Thai numerals, even if it's less common, not sure).
They need to be simply displayed, transliterated (if no respelling is provided) or transliterated with respellings - both Thai and Arabic numerals.
Does your example or Thai orthography require any VISIBLE space with numerals?
The example you gave also works with the Thai numerals!:
{{th-xi|หนองคาย อยู่ ห่าง จาก กรุงเทพฯ  '''๖{หก-ร้อย} ๑{สิบ} ๔{สี่}'''  กิโลเมตร|Nong Khai is '''614 '''kilometers from Bangkok.}}
หนองคายอยู่ห่างจากกรุงเทพฯ กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep hòk-rɔ́ɔi sìp sìi · gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
In my book the text appears exactly with this spacing, including the Bangkok spelling : หนองคายอยู่ห่างจากกรุงเทพ ฯ ๖๑๔ กิโลเมตร
Hope it all makes sense, @Benwing2, at least we know there is a way to work with numerals. Anatoli T. (обсудить/вклад) 23:52, 18 March 2024 (UTC)[reply]
Both Thai and Khmer modules need fixes and enhancements but Khmer modules are in a worse state than Thai.
This is failing with an error: {{demo|{{km-xi|វា ជា ភាសា មួយ ដ៏ ចំណាស់ ដែល ប្រហែល ជា មាន ដើម កំណើត តាំង តែ ពី '''២០០០''' ឆ្នាំ មុន មក ម្ល៉េះ '''។'''|It is an ancient language that probably dates back to 2000 years ago.}}}}
{{km-xi|វា ជា ភាសា មួយ ដ៏ ចំណាស់ ដែល ប្រហែល ជា មាន ដើម កំណើត តាំង តែ ពី  '''2000'''  ឆ្នាំ មុន មក ម្ល៉េះ|It is an ancient language that probably dates back to 2000 years ago.}}
វាជាភាសាមួយដ៏ចំណាស់ដែលប្រហែលជាមានដើមកំណើតតាំងតែពី 2000 ឆ្នាំមុនមកម្ល៉េះ  ―  viə ciə phiəsaa muəy dɑɑ cɑmnah dael prɑhael ciə miən daəm kɑmnaət tang tae pii · chnam mun mɔɔk mleh  ―  It is an ancient language that probably dates back to 2000 years ago. Anatoli T. (обсудить/вклад) 00:26, 19 March 2024 (UTC)[reply]
@Atitarev: I've seen the statement that numbers need to be separated from words by white space, and turning to a Thai newspaper web site, e.g. https://www.thairath.co.th/home, that's what I see. On the other hand, at least on price tags, the baht symbol (฿‎) tended to be written without any separation from the digits. The space after "๖๑๔" was missing from your statement, but you've shown that your source had it. --RichardW57m (talk) 09:48, 19 March 2024 (UTC)[reply]
@RichardW57m: Thanks for pointing out and explaining the common usage. The correct examples would have spaces on both sides (I will correct later).
With Khmer only Arabic numerals work, as you an see in the failure above. The punctuation, especially the important (used to mark sentence ending), fails all the time.
In my test cases in User:Atitarev/Khmer translit test cases I had to remove ។ but the original text is on top.
@Benwing2. Anatoli T. (обсудить/вклад) 04:12, 20 March 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Thanks Anatoli. I will take a look. I am still planning on fixing the scraping of Thai and Khmer, it's just that it requires some non-trivial changes and I have some other things I'm also working on :) ... but let me see if I can make number handling work better. Benwing2 (talk) 22:00, 16 March 2024 (UTC)[reply]

How are we supposed to link to a page rather than an entry in lines formatted with '#'? --RichardW57m (talk) 12:15, 8 March 2024 (UTC)[reply]
@RichardW57m: What do you mean? What page? Anatoli T. (обсудить/вклад) 00:27, 19 March 2024 (UTC)[reply]
@Benwing2: Hi. Any luck? Let me know if you need any clarifications. Anatoli T. (обсудить/вклад) 07:39, 18 March 2024 (UTC)[reply]
@Atitarev Apologies, I was dealing with Chinese stuff today. Heading to bed now but I'll definitely take a look when I wake up. Benwing2 (talk) 07:43, 18 March 2024 (UTC)[reply]
@Atitarev: please fix your edit above so it doesn't have a module error. CAT:E is for emergencies. @Benwing2: Any progress? Chuck Entz (talk) 21:43, 22 March 2024 (UTC)[reply]
@Chuck Entz: Allright. I converted it to use {{tl}}, so it doesn't add to CAT:E. --Anatoli T. (обсудить/вклад) 22:31, 22 March 2024 (UTC)[reply]
@Atitarev @Chuck Entz Let me take a look. Benwing2 (talk) 23:20, 22 March 2024 (UTC)[reply]
@Benwing2: Hi. It looks like you lost motivation to try and fix this issue. I am almost sure you can add transliterations for Khmer number and critical punctuation symbols without some major efforts. You developed much more complex modules than this. It's OK if you did, just say so, don't promise, if you won't do it. :) Unfortunately, I am clueless there. I have tried but failed miserably.
Also calling @Octahedron80 who's got some interest in Khmer and some knowledge of Lua. Hi. Are you able to check, if it's possible to fix the Khmer transliteration module for numbers and ។ symbol without breaking it? Anatoli T. (обсудить/вклад) 23:54, 25 March 2024 (UTC)[reply]
@Atitarev My apologies, I have not lost motivation but I was traveling in Puerto Rico up through yesterday and had difficulty finding a contiguous chunk of time long enough to look into this. I haven't forgotten about this, though. I should be able to look into this in the next couple of days, as soon as I finish the current effort I'm doing cleaning up Chinese lect categories, but like you noted, I won't make any promises because I don't want to end up making a promise I can't follow through on. Benwing2 (talk) 00:02, 26 March 2024 (UTC)[reply]

Taxon linking[edit]

split from "#T:km-xi got worse"
For taxonomic-name linking there are now two distinct templates: {{taxlink}} (Now with more Lua!!!), which is for taxonomic names for which enwikt DOES NOT have an entry, used as before, eg, {{taxlink|Rosa noentry|species}}, and {{taxfmt}} (New!!!, with Lua!!!), to be used for taxonomic names for which enwikt DOES have an entry, used just as {{taxlink}}, eg, {{taxfmt|Rosa multiflora|species}}. I hope that "we" (@User:JeffDoozan, @User:AutoDooz) will soon (months) have applied {{taxfmt}} automagically to all taxonomic names that currently have some link and eventually (many months) even to all now-unlinked taxonomic names. At present this just addresses formatting (various configurations of italics) and makes searches easier. In the more distant future it may make other changes (improvements???) easier. The formatting should be the same for both templates, but categorization will be different, mostly effecting only me or someone else with an active interest in taxonomic names. DCDuring (talk) 19:10, 8 March 2024 (UTC)[reply]
@DCDuring: Thank you for the clarification. Are there any plans to document the user interface of {{taxfmt}}? --RichardW57 (talk) 13:54, 9 March 2024 (UTC)[reply]
On the input side {{taxfmt}} is identical to {{taxlink}}. I have always accepted that contributors may have trouble determining taxonomic rank (as taxonomists also seem to), especially at generic and suprageneric ranks (eg, homonyms, uncertain and changing placement, changes in nomenclature rules and fashions). The purpose of having two templates is that it be easy to count instances of missing taxonomic names ({{taxlink|Taxon name|rank}}) and that it be easy to rename the instances to {{taxfmt|Taxon name|rank}}. Further, each instance of {{taxfmt}} should not necessarily have to test for existence of an entry at each loading of the page it is on. Finally, categorization needs for taxa in {{taxfmt}} should be more modest than for those in {{taxlink}}. Not all of this is fully settled. DCDuring (talk) 15:30, 9 March 2024 (UTC)[reply]
I have added 'temporary' documentation for {{taxfmt}}. DCDuring (talk) 15:42, 9 March 2024 (UTC)[reply]
@DCDuring: While an improvement, it implies that {{taxfmt}} should not be used! Is the only difference most editors need know is whether an appropriate multilingual entry exists? --RichardW57 (talk) 21:46, 9 March 2024 (UTC)[reply]
Should there be |id= for linking to taxonomic names with homonyms in the {{senseid}} and {{etymid}} systems? That might apply to clades, and will apply to generic names used in different kingdoms, and also for some taxons that have changed greatly, e.g. Hominidae and Reptilia. --RichardW57 (talk) 21:46, 9 March 2024 (UTC)[reply]
Quite likely, at least for homonyms from different kingdoms (or, rather, different current taxonomic codes). We now have some 300 taxonomic entries with distinct homonyms, but a good number of them include an archaic or obsolete definition, many being synonyms of current taxa. For now, most readers would get one of the appropriate definitions without the help an id parameter would offer. Trying to follow the twists and turns of taxonomic history in terms of circumscription and placement is not something I have seen any taxonomic database do. They just leave breadcrumbs. Their breadcrumbs are more complete than ours, which is why I believe we need links to multiple other taxonomic databases. When WP articles try to follow twists and turns, it is limited in scope to 'recent' (< or <<20 years) changes and can be quite confusing, often because article contributors don't seem to understand how ambiguous English can be. Wikispecies just lays out 'systems' (with dates and authors) of higher taxa on the same page (See species:Holozoa for a short example of a recent (2002) name.). I always try to update to the latest accepted term, circumscription, and placement to be found in the better current databases, and retain any older taxon in our entry as a synonym.
Our coverage will probably always be limited compared to the comprehensive taxonomic databases. (Would we want to have a million taxonomic entries?) Our value added is in etymology (at least potentially), gender, vernacular names/translations, linkage to multiple taxonomic databases, images, and (potentially) definitions that address relevance (location, economic value, use for food, medicine, etc). I doubt that imprecise linking to definitions is our biggest deficiency, though it should and, I'm sure, will be addressed. DCDuring (talk) 23:01, 9 March 2024 (UTC)[reply]
Given the massive instability in taxonomic names, it would be very useful to record older meanings, especially those of or as polyphyletic taxa. There are also dictionaries that have tried to anchor themselves in the sand of taxonomic names. Even now, I'm not sure that usages of 'crustacean' are usually intended to include butterflies, let alone in works from the 1980's. --RichardW57 (talk) 12:02, 10 March 2024 (UTC)[reply]
We can give it a try. Century 1911, MW 1913, MW Intl. 2d would be reasonable sources for relatively common, older names. Beyond those, we can leave breadcrumbs. DCDuring (talk) 14:33, 10 March 2024 (UTC)[reply]
@DCDuring: I think you are confusing names and meanings. To quote from the equivalent vernacular, when I was a young man, one would not say that a chimpanzee was a hominid, but would say that a mammal-like reptile (such as Dimetrodon) was a reptile. These changes don't reflect a change in knowledge, but a rejection of the notion that we are not fish. (And objectively, a dimetrodon was closer kin to a Jurassic allosaur than to us.) I don't see how 'breadcrumbs' help with such shifts in meaning. --RichardW57m (talk) 09:54, 11 March 2024 (UTC)[reply]
Perhaps you could like to take a run at multiple definitions for a taxon so I could see what you mean? It would be interesting to keep track of the degree of acceptance of names and their circumscription and placement by date. DCDuring (talk) 13:52, 11 March 2024 (UTC)[reply]
@DCDuring: I've got to do some work on taxonomic examples, but to get an idea before then, you might find it helpful to look at velociraptor. --RichardW57m (talk) 09:21, 18 March 2024 (UTC)[reply]
@User:RichardW57 Generally, I don't think the taxonomic part of any etymology of a 'vernacular' word derived from a taxon, like velociraptor, belongs at the vernacular name, rather than at the taxon, eg, Velociraptor. A definition like the second one would seem hard to justify in an English vernacular-name entry, but this may be an exceptional case.
The definition at velociraptor seems encyclopedic. As we have an encyclopedia as a sister project just a link away, there is little justification for encyclopedic material here. Therefore, stylistically, a definition shouldn't need more than one phrase, possibly with a subordinate clause or absolute if there is particularly relevant information. For a taxon or a vernacular name of an organism, such information might be location, use to humans, disease, scientific importance, or other cultural significance (like use in Jurassic Park), etc. DCDuring (talk) 12:26, 18 March 2024 (UTC)[reply]
The relevance is that there are two different meanings of velociraptor. The first one, with, as you complain, a rather encyclopaedic definition, is the one that is a popular synonym of Velociraptor, and is the meaning normally found in documentaries. The second one is actually Deinonychus, and is the one found in the context of Jurassic World, and probably toy shops.
In this particular case, I am not confident that the meaning of Velociraptor having Deinonychus as a hyponym actually meets CFI. Perhaps I am setting too high a bar for independence, but I have little confidence of finding two independent usages of the second sense of Velociraptor. This is not typical of evolving meanings of taxonomic names; G.S. Paul's proposed merger of the genera has not been accepted.
I think we should make it clear that 'velociraptor' may actually refer to Deinonychus. Likewise, we should not hide the fact that 'hominid' may be used to exclude Sivapithecus. RichardW57m (talk) 14:23, 18 March 2024 (UTC)[reply]
I'm skeptical that there is such a meaning in actual English usage. In any event, defining velociraptor as "a member of the genus Velociraptor addresses the matter, adequately IMHO. DCDuring (talk) 14:54, 18 March 2024 (UTC)[reply]
It addresses the first meaning. It doesn't address the meaning used in association with Jurassic Park. --RichardW57m (talk) 16:14, 18 March 2024 (UTC)[reply]
You are so right and I so wrong. I am interested in how you would address the problem of multiple referents (or placements or circumscriptions) of a taxon, especially how they change over time. Taxonomic databases just leave breadcrumbs, of various kinds. DCDuring (talk) 00:28, 19 March 2024 (UTC)[reply]
@DCDuring: I've added an archaic meaning for 'Hominidae' as an example of the sort of meaning shift I had in mind. Sometimes there are redefinitions, but I don't know how well they are recorded in the databases, and I don't know that there is one for Hominidae. Strictly, they're not adequate on Wiktionary for well-documented languages, as they're mere mentions, so I'm inclined to treat them like any other shift in meaning.
It looks as though we Wiktionarians need to do some research on the meaning of 'pongid' - w:Ape implies that it once included gibbons.
For interpreting Felis, it looks as though w:Felidae#Classification does the work for extant species of felid. The usage note at Felis is quite helpful.
There may be unexpected problems sorting out carnosaur - clade definitions seem to have been used lately, and membership of a clade can be a difficult question. There may be a lot of hard work for botany - updating a translation as the name of a species of flowering plant can itself be non-trivial. --RichardW57m (talk) 15:32, 19 March 2024 (UTC)[reply]
@DCDuring: I've now created two extra senses for Pongidae, with quotations. I've added one of them as a synonym of Ponginae using {{syn}}. I couldn't use {{taxfmt}} for the link as it does not support sense-specific fragments. --RichardW57m (talk) 17:11, 20 March 2024 (UTC)[reply]
I have added links to external databases using {{R:PaleoDB}} and {{R:Mammals}} to sow further ambiguity or food for more definitions. DCDuring (talk) 02:08, 21 March 2024 (UTC)[reply]
Re: "hard work". I find it hard just to make basic entries for taxa that we are linking to, assuming that the most sought-after definitions are for the currently accepted names. I'm not doing much for fossil species either. DCDuring (talk) 02:13, 21 March 2024 (UTC)[reply]

aWa not working[edit]

Our archiving gadget, aWa, is broken. It is getting confused by the "[subscribe]" link which is now present on discussions, which makes it try to archive on the wrong page.

I disabled the gadget for now until it can be fixed (I haven't tried to debug the issue yet). Ping @Erutuon who last edited the gadget. This, that and the other (talk) 23:08, 8 March 2024 (UTC)[reply]

It's because of the changes to headers. The gadget was interpreting the "[subscribe]" link at the beginning of the header (in the HTML, though it displays as if it's after the header) as the link to the page to archive at. Also, the gadget wasn't going to the next HTML elements after the header correctly because they've added another layer of HTML elements in the header. I haven't fixed the fact that the "[subscribe]" link is interpreted as part of the header, which is a bug apparently tracked in phab:T13555#9592945 and due to be fixed soon. Not sure if the gadget works (because I don't really know where to test it), but give it a try and let me know. — Eru·tuon 01:18, 9 March 2024 (UTC)[reply]
@Erutuon it looks like you've fixed it. I just tested it and, although it displayed the [subscribe] text in its UI as part of the header, it didn't actually make a difference to the archival itself. See [1]. This, that and the other (talk) 03:25, 9 March 2024 (UTC)[reply]

Attempted to create a legitimate entry for "chmobik", tripping vague anti-spam measures[edit]

The specific abuse rule that was tripped was 'various specific spammer habits'. I'm not sure what that means, and the entry I wrote up has nothing I can find wrong with it.

Ishiura (talk) 10:27, 9 March 2024 (UTC)[reply]

I'm not sure exactly what it is, but my first instinct is that it's the Reddit/Twitter links. Those aren't considered durably archived sources for quotations either way. — SURJECTION / T / C / L / 10:33, 9 March 2024 (UTC)[reply]
OK. I actually modelled the "chmobik" entry on the "mobik" one, which uses pretty extensive Twitter quotations.
Ishiura (talk) 10:36, 9 March 2024 (UTC)[reply]

Derived terms tool[edit]

As I'm useless with programming, I asked AI to make a tool to quickly add Derived terms. It is stored at User:Denazz/Derived Terms Tool. Is it complete crap, as I suspect? Denazz (talk) 15:58, 9 March 2024 (UTC)[reply]

Lol, it looks very incomplete. Equinox 19:17, 9 March 2024 (UTC)[reply]

husband's[edit]

How should I resolve the red link on husband's stitches? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:50, 9 March 2024 (UTC)[reply]

What red link? Vininn126 (talk) 19:21, 9 March 2024 (UTC)[reply]
That fixes the immediate problem, but doesn't address the difference between the lemma and the plural entries in the way linking is handled in the headword. Chuck Entz (talk) 19:40, 9 March 2024 (UTC)[reply]

Template/deletion/inclusion error[edit]

Wiktionary:Beer parlour/2006/October and Wiktionary:Beer parlour/2006/August are showing up in pages for speedy deletion. Equinox 20:37, 9 March 2024 (UTC)[reply]

 resolved by deleting Template:zh-hanzi-box. This, that and the other (talk) 02:40, 10 March 2024 (UTC)[reply]

removing cruft from Module:labels/data/regional[edit]

Heads up, I am planning on moving quite a bit of stuff from Module:labels/data/regional to language-specific modules. There are over 4,000 lines of stuff in this module and 662 entries. Most of the entries are limited to one or two languages, but having them in the lang-independent data means that any use of the labels for any language will add the corresponding category. Hence we get CAT:French Translingual (with 5 current entries), CAT:Austrian English (with one current entry, which does not belong), CAT:Finland English (with one current entry, probably likewise), CAT:French Chinese (with one current entry, debatable), CAT:French Catalan (with one current entry that belongs rather in CAT:Northern Catalan), etc. I wrote a script to find all the existing per-language categories for each label in Module:labels/data/regional, which I am planning on using as a basis to move most entries out. There is a slight disadvantage to doing this in the case of a regional label that corresponds to several languages, in that the aliases and Wikipedia fields will get duplicated. For example, the current entry for France defines French as an alias with a link to the Wikipedia entry for France, and corresponds to six lang-specific categories: CAT:French French, CAT:French Ladino, CAT:French Latin, CAT:French Norman, CAT:French Vietnamese, CAT:French Yiddish. If we care enough about this, one way to minimize duplication is to support a field containing a list of allowed languages; I may do this. (OTOH the Wikipedia links should maybe be customized on a per-language basis. For example, rather than just linking to the Wikipedia entry on France, which is of questionable usefulness here, we could imagine linking to the Western Yiddish article for French Yiddish, the European French article for French French, etc.) Benwing2 (talk) 02:30, 11 March 2024 (UTC)[reply]

FYI I have written a function in Module:alternative forms to convert lang-specific labels data modules to {{alt}} data modules. I will eventually be merging the two sets of data modules so that all the info is found in the labels data and the separate dialectal information in Module:CODE:Dialects disappears. For now I have done Maltese and Albanian. @Catonif, Fenakhay Benwing2 (talk) 04:21, 12 March 2024 (UTC)[reply]
Nice, it's good to see this is gaining traction. Catonif (talk) 05:06, 12 March 2024 (UTC)[reply]
OK, I have written most of the necessary code. Tomorrow I will run some of the code. The plan is as follows:
  1. Allow a list of language restrictions to be added to lang-independent labels, esp. those in Module:labels/data/regional (DONE).
  2. Move most regional labels to lang-specific modules. The current criterion is as follows: A label remains in Module:labels/data/regional if either (a) it concerns more than 3 languages, or (b) it has more than 1 alias and concerns more than 1 language. This means something like Congo with 8 aliases (Democratic Republic of the Congo, Democratic Republic of Congo, DR Congo, Congo-Kinshasa, Republic of the Congo, Republic of Congo, Congo-Brazzaville, Congolese) and 3 languages (yom, fr, avu) remains, as does Nigeria with 6 languages and 1 alias, as does Erzincan with 3 aliases (Yerznka, Erznka, Erzinjan) and 2 languages (tr, hy). OTOH, Lānaʻi with 4 aliases (Lanaʻi, Lanai, Lāna'i, Lana'i) gets moved because it concerns only one language haw (Hawaiian). Overall this moves 591 out of 662 entries out, spreading them over 108 lang-specific modules, of which 34 are new. Sometimes there are clashes between a lang-independent and lang-specific label; in that case the code adds the moved lang-independent version in a Lua comment, for later manual fixing.
  3. Fix up the clashes noted in the previous step; needs to be done manually.
  4. Convert the existing {{alt}} dialectal data modules to label modules (I have a script to do this), and integrate them into existing label modules (I have another script to do this). I wanted to do this step after step (2) because there may be clashes between labels in a lang-specific {{alt}} data module and a lang-indepedent label data module (specifically Module:labels/data/regional), and I'd like to have as few of those as possible as they need manual handling. The integration/merging of the two modules may introduce clashes when there are conflicting specs; as in step 2, the code generates comments for later manual fixing.
  5. Fix up the clash comments generated in the previous step.
  6. Convert all the {{alt}} dialectal data modules to auto-convert from the corresponding {{lb}} data modules; or better, just use the latter directly in {{alt}}. Benwing2 (talk) 06:46, 13 March 2024 (UTC)[reply]
Benwing2 (talk) 06:46, 13 March 2024 (UTC)[reply]
Great work @Benwing2!. ... French French? haaa ha ha ha. I cannot stop laughing. Why not placenames. France French. Belgium French. USA English. British English (a! an exception, ok say Britain...). I haven't seen them all, but the repetition is ... ‑‑Sarri.greek  I 07:04, 13 March 2024 (UTC)[reply]
FYI I have carried out steps 1, 2 and 3. Step 6 won't be necessary because {{alt}} now directly reads the label data modules. Currently working on steps 4 and 5. Benwing2 (talk) 07:08, 6 April 2024 (UTC)[reply]
Finished steps 4 and 5. After a bit of time to verify that nothing is amiss, I will delete all the dialectal data modules. Benwing2 (talk) 03:49, 9 April 2024 (UTC)[reply]
Thanks. This was a monumental task. Seems to work correctly. —Justin (koavf)TCM 06:21, 9 April 2024 (UTC)[reply]
@Koavf Thank you. Yes, it took a lot of coding plus several days of manual effort fixing up merge conflicts. Benwing2 (talk) 06:44, 9 April 2024 (UTC)[reply]

Missing category[edit]

@Benwing2: Category:Places in Baja California seems to have been overlooked; there is one for Category:Places in Baja California Sur. A bit confusing, I know - anyway it's needed for English and Spanish. DonnanZ (talk) 17:41, 12 March 2024 (UTC)[reply]

@Donnanz Baja California Norte is in the place data (the erstwhile state of Baja California was split into two states some time ago). Benwing2 (talk) 20:20, 12 March 2024 (UTC)[reply]
@Donnanz NVM, I see that Baja California is the official name. I changed the place data and set up Baja California Norte as an alias. Benwing2 (talk) 20:25, 12 March 2024 (UTC)[reply]
@Benwing2: More confusing than I realised. I see it's coming up as a red link for now. Thanks. DonnanZ (talk) 20:37, 12 March 2024 (UTC)[reply]

Sicilian vowels[edit]

The last time this came up Cato and I found ourselves bogged down in consonant difficulties. No need to let the perfect be the enemy of the good, however.

It is uncontroversial that Sicilian has five, and only five, phonemic monophthongs: /i ɛ a ɔ u/. So let's simply start there. Could we run a bot to identify, and hopefully fix some of, the phonemic transcriptions featuring nonsense like /ɪ i̞ ɨ ɛ̃ ɐ̠ ɐ ʊ Vː/? I can clean up the rest manually. Might as well remove the various full-stops while we're at it, as there is no /./ phoneme in Sicilian. Nicodene (talk) 00:48, 14 March 2024 (UTC)[reply]

Keeping the vowel transcriptions to those 5 phonemes sounds good. I don’t find “there is no /./ phoneme” a compelling rationale for omitting syllable divisions in phonemic transcriptions.
Many languages have phonological contrasts that are not normally analyzed as reducible to the presence or absence of a specific phoneme in a sequence; e.g. there is a contrast in Spanish between the pronunciation of ame and amé, which we transcribe (properly, I think) as /ˈame/ vs. /aˈme/. I have never seen it argued, nor would I argue, that /ˈ/ in this transcription is a phoneme, but I don't think these should both be transcribed as /ame/.
There might be other good reasons to omit syllable divisions. E.g. in the case of English, a lot of the time there isn’t even consensus between phoneticians about where syllable divisions falls. In some languages, syllable divisions might be completely predictable just from the sequence of phonemes in a word. (In other languages, this is mostly but not entirely the case, with morphology also affecting syllabification in some circumstances: e.g. in Latin, Catalan, and Spanish, heteromorphemic /bl/ can only be syllabified as heterosyllabic /b.l/ (as in Latin sublātus) but morpheme-initial or internal /bl/ can or must be syllabified as an onset.)--Urszag (talk) 01:27, 14 March 2024 (UTC)[reply]
It was a tongue-in-cheek way of saying that syllabification is not phonemic in Sicilian (unlike stress in Spanish).
Perhaps I should state my concern more plainly. At the moment our transcriptions claim syllabification as a phonemic feature of Sicilian. That is an extremely bold claim, and one made accidentally by editors unaware of what phonemes are. It should be removed on those grounds alone. In the event that a groundbreaking paper surfaces to prove that Sicilian syllabification happens to be phonemic after all, then we will apply its findings carefully, and systematically, to our transcriptions. The chances that our current transcriptions would have got all the details right are nil.
As for morphology affecting pronunciation - that is a concern that applies to many if not most of the languages we have here. That is, morpheme (or word) boundaries often have consequences on the phonetic level. The solution would be adding morphophonemic transcriptions, if we reach such a level. Nicodene (talk) 02:38, 14 March 2024 (UTC)[reply]
@Nicodene I'm not sure that just the presence of periods/full stops between slashes asserts anything about their phonemicity. It is fairly common to include syllable dividers in phonemic representations, esp. if the phonemic representation is all we have (yet another reason, I think, to prefer broad phonetic representations between brackets; it lets you include relevant info without having to worry about whether such-and-such a distinction is phonemic). In any case I can easily do a bot run to find occurrences of the non-phonemic vowels you mention above; correcting them automatically is a bit trickier as it depends both on having rules to do the conversion (which might not be so hard to work out) and making sure the rules are correct in all cases (which might be harder, as people might be doing surprising things with these non-phonemes). Benwing2 (talk) 03:49, 14 March 2024 (UTC)[reply]
@Nicodene Here: User:Benwing2/bad-sicilian-vowels. There are about 600 instances. If you give me a list of replacement rules I'll see about implementing them. Benwing2 (talk) 04:03, 14 March 2024 (UTC)[reply]
I've gone through the list. (/fʷ/ was really quite something.) These are all straightforward:
/ɪ i̞ ɨ/ → /i/
/ɛ̃/ → /ɛ/
/ɐ̠ ɐ ä aː/ → /a/
/ʊ/ → /u/
As for the long vowels, most of them are spurious, but some are indicated in the spelling with a circumflex and derive from contractions of /VV/. I'll see if I can find a paper discussing these before I do anything with them.
As for the full-stops – well, to place something in a phonemic transcription is to indicate that it is phonemic, inevitably and by definition. This is something that is often not grasped, for instance by (I would estimate) more than half of the contributors here, including otherwise very knowledgeable ones, but that is just how it is. One can either use phonological notation correctly or not use it at all.
I'm in favour of phonetic transcriptions as well, so long as we actually know enough to do them properly for the language in question. From what I have seen perusing the existing transcriptions, that is not the case for Sicilian. Someone has to put together a properly sourced and cited Wiki page on Sicilian phonology. Maybe I'll do it if I can find it in me to. Nicodene (talk) 05:17, 14 March 2024 (UTC)[reply]
@Nicodene OK thanks. Are you sure about converting /aː/ to /ɛ/? That seems odd, while the others look totally fine. Benwing2 (talk) 05:22, 14 March 2024 (UTC)[reply]
OK, I see you changed it. Benwing2 (talk) 05:23, 14 March 2024 (UTC)[reply]
Yes. As for the 'legitimate' long vowels, we have the following (if the information in the entries is accurate):
Very interesting. Nicodene (talk) 05:37, 14 March 2024 (UTC)[reply]
@Nicodene Done. Benwing2 (talk) 06:05, 14 March 2024 (UTC)[reply]
Thank you. Nicodene (talk) 06:07, 14 March 2024 (UTC)[reply]
I don't mean to be a wet blanket but I have to add that your claim that "more than half of the contributors here, including otherwise very knowledgeable ones" are confused about what "phonemic" means (by implication, this includes anyone who disagrees with you, including me and User:Urszag) is a very strong statement. I also see you are going ahead and removing the existing syllable breaks in the phonemic notation despite there being no consensus for this (since the two other people in this discussion both disagree with it). Benwing2 (talk) 07:19, 14 March 2024 (UTC)[reply]
@Benwing2 Sorry I didn't mean at all to imply that. I ultimately disagree but your reasoning has to do, respectively, with morphological complications and acceptance of a common practice, not any kind of basic misunderstanding.
My point, which I could have conveyed better, was that the common practice itself comes ultimately from that misunderstanding. Sicilian transcriptions like /çɪɾɪ(ɨ)ˈv(ʲ)ɛɖːu/, Galician transcriptions like /baˈβuʃa̝/, and Neapolitan ones like /ʃkuŋˈtʃi.ʝʝə/ were all common practice until recently, and this sort of thing is self-reinforcing: the more such transcriptions there are, the more they come off as a legitimate model to emulate, and so they can spread and take on a life of their own. Which is what I think happened with syllable divisions in phonemic transcriptions becoming a sort of Wiktionary canon, across languages.
I've not seen a phonology paper with syllable divisions in phonemic transcriptions unless the author is really proposing that they are phonemic. And (aside from my finding the concept itself unlikely) I've not seen a proposal to that effect gain widespread acceptance, e.g. for English, or seen one made at all for Romance languages or Latin.
As for removing syllable breaks - since I was already there cleaning up the long vowels, I fixed other issues with the same transcriptions, such as /e̞ u̞/. In my view /./ is also incorrect but I didn't intend to edit any entries just for that reason. I'll now simply leave it as-is. Nicodene (talk) 14:02, 14 March 2024 (UTC)[reply]
@Nicodene OK, my apologies as I think the tone of that message was stronger than I intended. What you say makes sense (although I'm pretty sure syllable breaks are in fact phonemic in English, cf. the classic minimal pair nitrate vs. night rate, unless you make morpheme boundaries in compounds be phonemic, which is six of one vs. half a dozen of the other). I still think it's helpful to include syllable breaks. Again this leads to my conclusion that for practical purposes (given that our foreign-language entries are not meant for a linguistics paper but as a learner's dictionary for English speakers) we should abandon a purely "phonemic" transcription in favor of a broad phonetic one, which allows us to pick and choose which level of detail to show. This is already done, for example, in Russian, where e.g. we are choosing to show broad /l/ as [ł] and notate some of the more important vowel allophones such as [æ] between palatalized consonants. A pure phonemic representation is a theoretical construct and sticking with such a thing can often put us in a straitjacket, sometimes leading to bizarre results, e.g. per User:AG202 the Spanish terms fui [fwi] and muy [muj] should be notated phonemically as /fui/ and /mui/, where the very salient vowel differences between the two are considered non-phonemic and lexically determined and hence not displayed. (Now, I don't really believe it makes sense to have lexically determined phoneme -> allophone rules like this, but per AG202, this is the consensus view among linguists working on Spanish.) Benwing2 (talk) 00:39, 15 March 2024 (UTC)[reply]
There is a boundary in night-rate, and that boundary is what causes people to pronounce it differently from nitrate. I agree. But it is surely more economical to explain that as a word boundary, given that night-rate is plainly recognizable to a native speaker as night plus rate — and given also that we know speakers have a mental model that can treat words as fundamentally distinct units (or else language wouldn't be possible I think) — than it is to posit that speakers have a mental model which, in addition to that, treats t.r and .tr as fundamentally distinct units. What does the additional assumption contribute?
I agree about promoting [] over // at any rate so long as the phonetic details are known. It would save a lot of headaches, for more than one reason.
As for the thing about Spanish - it sounds by definition impossible. Has AG202 cited a paper to that effect? Perhaps there is some other factor involved, like regional differences which have been shoved into one phonemic representation. Nicodene (talk) 04:26, 15 March 2024 (UTC)[reply]
See the discussion here: User talk:Benwing2/2023 § Borrowing module es-pronunc for Spanish Wiktionary. Particularly the part citing The Routledge Book of Spanish Phonology when it comes to syllabification. It specifically lists "muy" as an exception and phonemically represents it as /mui/, which is what I've seen for the most part elsewhere too from authors that don't list /j/ & /w/ as separate phonemes (which is the consensus). There's no minimal pair with a hypothetical "mui" [mwi] as well. There's an argument that could be made that it's instead /'mu.i/ though. AG202 (talk) 04:51, 15 March 2024 (UTC)[reply]
@AG202 Thanks for the response. Keep in mind there are other terms in -uy. Looking through the lemmas we have produces the following: ababuy, Chuy, cocuy, cuy, espumuy, Esteguy, huy, Jujuy, Luy, muy, pijuy, Ruy, tepuy, uy, Yaracuy. If there are no minimal pairs with words in -ui, it seems a random gap not an inherent feature of language. (And in fact cf. huy and hui, both Spanish words.) Benwing2 (talk) 05:21, 15 March 2024 (UTC)[reply]
Why is it not simply /kui'dado/, /ku'iko/, /fu'i/, /'mui/? I don't follow any of this. Nicodene (talk) 06:00, 15 March 2024 (UTC)[reply]
Because, as Urszag stated below, /fu'i/ implies a disyllabic word, when in fact it's one syllable. AG202 (talk) 06:27, 15 March 2024 (UTC)[reply]
So you're saying /fuí/ can only be [fwí], and not [fuí], while /kuíko/ can be both [kwíko] and [kuíko]? Nicodene (talk) 06:29, 15 March 2024 (UTC)[reply]

The contrast between Spanish fui and muy can be analyzed as a matter of the position of the stress (like the contrast between ame vs. amé). The problem with the standard IPA stress notation (aside from the fact that the stress mark is not a phoneme) is that the IPA stress symbol is supposed to go at the start of the stressed syllable, which calls for /'fui/, /'kuiko/, etc. Some phonologists use the acute instead (/fuí/ vs. /múi/) to avoid that issue. "Quasi-Phonemic Contrasts in Spanish", by José Ignacio Hualde (2004:5), cites Quilis and Fernández 1985 as giving transcriptions like "[bjénto] /biéNto/; [porfiában] /poRfiábaN/; [kwál] /kuál/;[fwérte]". Ralph Penny, in A History of the Spanish Language, also makes use of the acute to mark stress in phonemic transcriptions e.g. /kantáis/. I agree with Benwing that broad phonetic transcriptions can often be preferable to phonemic transcriptions. Linguists discussing Spanish glides and syllabification seem to usually use broad phonetic transcriptions, but I've also seen a few uses of slashed transcriptions that the authors don't seem to have obsessed over getting perfectly theoretically accurate. E.g. "The Syllable", Alfonso Morales-Front (The Cambridge Handbook of Spanish Linguistics, 2018, pp. 190-210) gives a number of phonetic transcriptions such as [suβ.li.mi.ˈnal], [su.ˈβli.me], [ˈpje.ðɾa], [gwe.βo], but also gives in slashes the transcriptions /uebo/, /-ecito/, /ˈaman/. There's no explanation of why the stress symbol is included in the last but not the first two, or why the symbol /c/ was used in /-ecito/.--Urszag (talk) 06:10, 15 March 2024 (UTC)[reply]

Thanks! You explained it better than I could, and I agree that it looks to be a matter of stress like Routledge also posits. I'm a bit wary of using the acute accent though as it's usually used for tone. I'm not sure how else we can show it though. AG202 (talk) 06:28, 15 March 2024 (UTC)[reply]
The "calls for /'fui/, /'kuiko/" part doesn't follow for me. I understand specific languages can have some 'home-brew' IPA practices, to an extent, but this just seems misleading. To anyone else this reads as if /u/ is stressed, then stress migrates rightwards in every surface realization. And it causes a clash with the actually stressed /u/ in muy. Nicodene (talk) 06:42, 15 March 2024 (UTC)[reply]
To be clear, I wasn't recommending the transcriptions "/'fui/, /'kuiko/". My point was that these (also /'fiesta/, /'fuerte/, etc.) would fall as a natural but undesirable consequence of the convention of placing IPA stress marks before the onset of the stressed syllable. Then again, I can't find that principle explicitly stated anywhere in the online IPA chart or in the 1999 handbook (just implicitly conveyed by the examples), so maybe it doesn't even technically have official status anyway--I know some phoneticians have violated it and instead adopted the convention of placing the stress marker directly before the stressed vowel, but we don't generally do that on Wiktionary (e.g. we don't transcribe floro as /flˈoɾo/).--Urszag (talk) 07:41, 15 March 2024 (UTC)[reply]
Is it true that Spanish phonologists agree on phonemic representations like /'kuiko/ in a phonology that contain the vowels /i/ and /u/ and no phonemic diphthongs? I ask because it isn't clear to me how that would work. Given the phonology as described, if I've not missed something, that transcription could only stand for a phonemically stressed /u/.
Also, does the pronunciation [uˈi] occur? The linked discussion suggests so, at least for cuico, whereas the comments here seem to suggest otherwise. Nicodene (talk) 08:52, 15 March 2024 (UTC)[reply]
I also am not saying that Spanish phonologists generally recommend using the transcription /'kuiko/. But it is a possible phonemic transcription of the disyllabic pronunciation ['ku̯i.ko]. Stress is analyzed as a suprasegmental feature, so the placement of /'/ relative to other symbols in a phonemic transcription is a matter of convention. One convention is to put it directly before the stressed syllable. If you think that convention doesn't seem to work very well in this context, you're not alone, but as a convention, it isn't something that can be true or false: it isn't a fact about Spanish phonology or the position of Spanish phonemes (since /'/ is not a phoneme and doesn't actually come before or after any phoneme in the phoneme sequence). Here are some relevant transcriptions and commentary from José Ignacio Hualde's chapter "Spanish", in Gabriel, Christoph; Gess, Randall; Meisenburg, Trudel (eds.), Manual of Romance Phonetics and Phonology, 2022:790: "/ˈbiaxe/ [ˈbi̯a.xe]", "/ˈbaile/ [ˈbai̯.le]", "/liˈana/ [liˈa.na]", "/ˈioɡa/ [ˈʝo.ɣa]", "/iˈato/ [iˈa.to]" "/ˈkon.iuxe/ [ˈkonʲ.ʝu.xe]" (these are given in the context of explaining the analysis where [ʝ] is treated as a positional allophone of /i/). In footnote 1, Hualde notes: "In yoga the vowel /o/ is the phonologically stressed element, not the initial /i/, which becomes a consonant as it does not receive the stress on itself in this context, although the initial syllable is stressed. The lack of clarity introduced by the IPA convention of marking the stress at the beginning of the syllable in sequences like /io/ without a preceding consonant is the reason why in Hualde (2005) stress is indicated directly on the stressed vowel instead." I can't confirm whether cuico is potentially trisyllabic, but I have no reason to doubt it.--Urszag (talk) 10:21, 15 March 2024 (UTC)[reply]
There isn't anything in the phonemic representation /ˈfui/ to convey that it is one syllable as opposed to two like /ˈmio/ and /'tea/. If we're to assume that /u/ here is inherently non-syllabic, then what we are really saying is that it is the phoneme /u̯/ or /w/ and the transcription has to be revised.
If we attempt an allophonic rule turning /u/ in that context to [w], we'll have to find a way to make sure it doesn't affect the /u/ in /ˈmui/ or any of the other -uy words mentioned by Benwing earlier. Most difficult of all, we would have to differentiate /ˈui/ from /ˈui/, namely the pair huy/hui. Nicodene (talk) 11:48, 15 March 2024 (UTC)[reply]

Autocloseable.close[edit]

All use of the {{tl}} template is now rendering as Autocloseable.close for some unknown reason. That's all I know. Thanks, Soap 01:59, 14 March 2024 (UTC)[reply]

It probably began when an IP editor changed it from a redirect to {{temp}} into a standalone page with the error. The error was actually appearing as plain text in the diff, so maybe this was just an odd form of vandalism? Either way, if {{tl}} is supposed to be a redirect to {{temp}}, it should be fine now. If not, we need to work out what the IP was trying to do. Soap 02:09, 14 March 2024 (UTC)[reply]
@Soap Thanks. Yes, {{tl}} is just supposed to redirect to {{temp}}. Benwing2 (talk) 03:42, 14 March 2024 (UTC)[reply]

Interesting failure (8,782,141,951 IDs)[edit]

Sure, this edit is just vandalism, but I'm intrigued by the effect it had: instead of just making the one use of {{af}} fail, it made every instance of a Lua-using template on the page fail, saying "The time allocated for running scripts has expired." Why? Was the module thinking that "id8782141951=agent noun" meant it should keep looking through the other parameters trying to find "id8782141950=", "id8782141949=", etc? (If I change the parameter to e.g. "testbadparameter=agent noun", only that one instance of {{af}} fails.) Do I gather the module supports arbitrarily many id= parameters, even 8,782,141,951 of them, and times out when it thinks there are that many? Would it make sense to set any kind of sanity-check/sanity-limit, like more than 50 id= per template makes it spit out an error so that only the one instance of {{af}}, and not the whole page, breaks? - -sche (discuss) 06:53, 15 March 2024 (UTC)[reply]

@-sche Yes, that's more or less what's going on. More specifically, I think what's happening is that it checks the maximum index of all numbered parameters and iterates from 1 up to that index, processing arguments. The reason for doing this is that potentially e.g. the tr could be supplied but not the term or display, etc. Yes, it probably should have some sanity checks in it, although it's not especially high priority because (a) it only breaks one page, (b) properly for this to be fairly robust we'd have to add sanity checks in lots of places, which is both a big undertaking and could backfire if we set the limits too low. Generally when I add sanity checks it's to prevent errors from swamping CAT:E, e.g. things like an alias loop in a label module used to cause all sorts of pages to get errors; now (if I remember aright) it only causes errors on certain pages (so we do get a few pages in CAT:E to alert us of the problem) and has some sort of fallback behavior on the rest (so they don't swamp the category). Benwing2 (talk) 07:07, 15 March 2024 (UTC)[reply]
That makes sense (re why not to bother implementing sanity checks for this). - -sche (discuss) 14:51, 19 March 2024 (UTC)[reply]

Mohawk stems[edit]

Many nouns in Mohawk are contain noun stems that are useful for stuff like noun incorporation and also historical linguistics (kéntsion is much more easily seen to be from proto iroquoian *-tsjõɁt- when you can see that the stem is -itsion- so I wanted to create a template moh-stem but I'm not sure how to do that and if I should be doing that. If anyone has any help I've been trying to change the etymology for the page for mohawk onón:tsi to say "Noun stem -nontsist- from Proto-Iroquoian *-nõːtsiː-" but I'm not sure how to do that ChromeBones (talk) 07:52, 15 March 2024 (UTC)[reply]

@ChromeBones do you need a template for this? Can't you just write "Noun stem {{m|moh||-nontsist-}}"? This, that and the other (talk) 09:53, 20 March 2024 (UTC)[reply]

uh oh, script timeouts[edit]

@Theknightwho semen, laven and kennen are now running out of time halfway through the page. This has only happened in the last hour or two. Could you have made a recent change (e.g. your bug fix to Module:parameters or some other change) that inadvertently slowed things down? If not, any ideas? Benwing2 (talk) 08:15, 15 March 2024 (UTC)[reply]

@Theknightwho It is indeed this change, because when you preview the pages in question without it, you don't get timeout errors. Interestingly they happen only with Middle English verb conjugations; Module:enm-conj must be doing something strange with parameters that is triggering an edge-case bug in Module:parameters. Benwing2 (talk) 08:39, 15 March 2024 (UTC)[reply]
@Benwing2 I've fixed this. In essence, {{enm-conj}} was relying on the old way that defaults were handled for list parameters, where if item 1 of a list was empty then the default value would get used as the first item. This applied even if the list contained higher values, such as (in this case) class2= etc. This is only relevant when lists are allowed to contain holes, as in this case, so the solution was twofold:
  1. Revert to the old method of handling defaults, so that they're always added if item 1 of a list parameter is empty.
  2. Move the handing of default values so that it comes after the handling of holes in lists. This therefore means that item 1 of a list can only be empty at that point if allow_holes = true.
There might still be some other module which relies on lists not having holes while also relying on the old default handling, so it might be worth tracking any instances where allow_holes hasn't been specified and an input list contains a hole at item 1, since that should hopefully flush out the possible instances where it could occur.
Going forward, we might want to change the spec so that defaults can either be (a) inserted only if the list has 0 items, or (b) inserted if item 1 is empty. 16:45, 15 March 2024 (UTC) Theknightwho (talk) 16:45, 15 March 2024 (UTC)[reply]
@Theknightwho Great, thank you for looking into this and fixing it! I think ideally we should have disallow_holes = true as the default but that might require a lot of work. Benwing2 (talk) 19:35, 15 March 2024 (UTC)[reply]
@Benwing2 That should be the default at the moment (or rather, allow_holes = true has to be set manually), but the issue is if a template relies on holes being removed automatically except for item 1, which is set as the default if empty. Theknightwho (talk) 20:28, 15 March 2024 (UTC)[reply]
@Theknightwho There are actually three states with regard to holes: allow holes (allow_holes = true), compress holes (the default) and disallow holes (disallow_holes = true). What I mean is probably "disallow holes" should be the default and the "compress holes" state should have to be requested explicitly using compress_holes = true. I think the behavior where holes can be present and are compressed away is surprising, esp. with named parameters. Benwing2 (talk) 20:42, 15 March 2024 (UTC)[reply]
@Benwing2 You're right - I'd forgotten (and it's not in the documentation, so I should update that). Theknightwho (talk) 20:46, 15 March 2024 (UTC)[reply]

Unicode 15.1 update for Appendix:Unicode[edit]

I just updated the Indonesian Wiktionary's version of Appendix:Unicode to Unicode 15.1 at Lampiran:Unicode. Here's the list of the relevant changes if anyone wants to update the Appendix:Unicode to Unicode 15.1 since I don't have permission to edit the modules:

Also slightly unrelated, I created a name rule for Lampiran:Unicode/Variation_Selectors so it doesn't need a name module anymore:

Thank you! Ekirahardian (talk) 20:30, 17 March 2024 (UTC)[reply]

Pinging @Erutuon who has edit-access to said modules. Ekirahardian (talk) 23:52, 18 March 2024 (UTC)[reply]

Remove users with foo-0 from foo's Babel cats[edit]

I noticed (by looking at VGPaleontologist, who has apparently tried to indicate every language he doesn't speak) that users who declare "egy-0" nonetheless get put into Category:User egy, and likewise for other languages. Can we change this so they're not, so that Category:User egy (etc) only contains users who've indicated knowledge of egy? (Also, is anyone working on bot-removing / re-sorting inactive users, or am I just doing that manually when I think about it?) - -sche (discuss) 15:37, 19 March 2024 (UTC)[reply]

@-sche Should be  fixed in [2]. This, that and the other (talk) 09:32, 20 March 2024 (UTC)[reply]

typo, twice[edit]

Appendix:Glossary is edit-protected. Appendix:Glossary#ablaut and Appendix:Glossary#voice have 2 full stops instead of 1 after they mention Wikipedia. (Looks like someone put a period at the end of the sentence/fragment, unaware that the {{pedia|template}} automatically adds a period.)

Happy editing!

--173.67.42.107 15:56, 19 March 2024 (UTC)[reply]

Fixed. Vininn126 (talk) 16:08, 19 March 2024 (UTC)[reply]

Can anyone help with this problem I'm having in creating categories pls?[edit]

I'm trying to add a category in Maltese regarding word stems. For context, Maltese has words derived from Arabic in the form of roots (E.g. k-s-r; related to breaking --> kiser; 'he broke') and other words derived from mainly Italian in the form of stems or 'morphemic stems' (E.g. -komunika-;related to communicating --> with added suffix; komunikat; 'comunicated').

So, since "Maltese terms by root" already exists in the "Maltese Categories", i wish to create a "Maltese terms by stem", however I am having an increasingly hard time with trying to do so... firstly i can't seem to create a page with a word in the format '-***-' which is needed as to indicate affixes, and i just can't find a way to create a category or template or any link of some sort even with auto cat... can anyone suggest any solutions to this please?? Melithius (talk) 22:08, 19 March 2024 (UTC)[reply]

@Melithius I can help with the first part of your question. To create a page that starts with "-", you either need to edit a page to include a red link to the desired entry (as in [[-something-]]) and click the link, or manually go to the page by typing it in your web browser's address bar, like https://en.wiktionary.org/wiki/-something-
As for the categorisation issue, judging by the absence of Category:Terms by stem by language, it seems like categorisation by stem is not something we currently do at Wiktionary. It would need to be implemented via Lua modules, probably after a community discussion at the Beer Parlour. This, that and the other (talk) 09:03, 20 March 2024 (UTC)[reply]
@Melithius @This, that and the other I should add, the difference between roots and stems only makes sense in certain languages. Potentially we could categorize by stem for Maltese only; however, I'd be concerned about the number of stems involved (a ton), and the resulting likely sparsity of the coverage. Also I suspect that many of these "stems" only exist in a single or a limited number of words; since Maltese is a Semitic language and Semitic languages don't usually have "stems" per se, all of the stems in question (including the one you cited) are borrowed, usually in a single word. Benwing2 (talk) 03:28, 21 March 2024 (UTC)[reply]
@Melithius In addition, something like -komunika- is not the normal way we do things at Wiktionary. Things with hyphens on both sides are infixes or interfixes; roots and stems of the sort you're referring to would only have a hyphen at the end (or at least this is how Proto-Indo-European roots are handled). Benwing2 (talk) 03:30, 21 March 2024 (UTC)[reply]
No no, you are wrong actually if i'm understanding your confusion correctly... Maltese is a language of semitic origin but greatly influenced by italian and even english... The statistics being 40% arabic 40% italian and the rest english and other possible languages, last time i checked. Hence, yes, there are many borrowed terms from italian that are greatly used in everyday speech. So i think it's only fair such a system for showing the many possible formations of these roots exists. For example from -komunika- you can add -tur 'communicator', -zzjoni 'communication', -r 'communicating (noun)' and so on for many other stems. However, i do see the specialty of such system only being used by a handful of languages including Maltese... So i don't have many hopes of such a template being introduced.
Also yes I understand how it may be interpreted as an infix, but many maltese sites display them this way to show how you can add both prefixes and suffixes. But something like KOMUNIKA is enough, just something that doesn't show it as an actual word. Melithius (talk) 08:12, 21 March 2024 (UTC)[reply]
My point is that words like komunikatur and komunikazzjoni were not formed in Maltese by adding a suffix -tur or -zzjoni to a stem komunika, but were borrowed as whole words from Italian. Benwing2 (talk) 02:04, 28 March 2024 (UTC)[reply]
ah okay yes, that is indeed the case most of the time, but we shouldn't completely disregard such a system of 'adding' affixes to stems simply because of that no? Yes the words exist in their own sense, but yet they still need to be learnt, and the way to that in regards to our own language is by classifying such a system (as such a system is what we had in the first place before italian). We do it with english as well in education to teach what certain affixes mean and form. You are being very technical, as with your logic we can go further as to say italian doesn't do it either, everything's preserved from Latin, same with english and any other Latin derived languages just to have atleast a few rules to go by. Its just analysis and reinterpretation for better use and learning within the language itself, which in Maltese education is was long officialized, and i want to extend that part of the Maltese education system and the language as a whole onto here. Melithius (talk) 17:00, 28 March 2024 (UTC)[reply]

Requests for verification/​Reconstruction[edit]

@This, that and the other, why again was it made so {{rfv}} no longer works for reconstructions? -- Sokkjō 05:57, 20 March 2024 (UTC)[reply]

The RFV process is based on the goal of attesting words by finding the required number of usage examples. By definition, a reconstructed form won't have any usage examples.--Urszag (talk) 08:26, 20 March 2024 (UTC)[reply]
Exactly. For relevant discussions, see this one from January 2023 and this one from February 2023. This, that and the other (talk) 08:51, 20 March 2024 (UTC)[reply]
I suppose that makes sense, but what if I want to put in a verification request for a declension table, or a word sense? -- Sokkjō 19:58, 20 March 2024 (UTC)[reply]
As far as I can make out from the links, {{rfd-sense}} will do for a word sense. --RichardW57m (talk) 13:42, 21 March 2024 (UTC)[reply]
For specific declension tables, I've been experimenting with {{rfv}} next to the table and explaining the scope of the challenge in the discussion section. --RichardW57m (talk) 13:42, 21 March 2024 (UTC)[reply]
@RichardW57m: {{rfv}} on reconstructions forwards to {{rfd}} at the moment, as does {{rfd-sense}}, which is not want I want to do. Another example is Reconstruction talk:Proto-West Germanic/hą̄han, which I started on a talk page, but it's not going to get the eyes like an RfV would. -- Sokkjō 19:31, 21 March 2024 (UTC)[reply]
I would recommend using WT:ES or maybe WT:TR for discussing specific details of reconstruction entries. The right people hang out at ES, even if the question is not specifically about etymology. This, that and the other (talk) 22:38, 21 March 2024 (UTC)[reply]

Languages with entries in fr.Wikt but not en.Wikt[edit]

I'd like a list of languages for which fr.Wikt has entries for terms in the language (see fr:Wiktionnaire:Statistiques and fr:Wiktionnaire:Statistiquesb) but en.Wikt does not (Wiktionary:Statistics). The difficulty is that both sites' stats pages index by language name (which obviously differ between English and French) rather than code; for English I suspect I could reasonably easily isolate all the names from our table and plug them into {{#invoke:languages/templates|getByCanonicalName|English}} and get a list of the language codes we have entries in, but I don't know what the equivalent function for fr.Wikt is: fr:Module:langues seems to only mention a function for getting the language name from the code but not vice versa. - -sche (discuss) 14:20, 20 March 2024 (UTC)[reply]

@-sche An additional issue is that the codes used on fr.wikt and en.wikt may differ, esp. in lesser-used languages without ISO 639-3 codes (which are the ones you would be interested in). I don't know anything about fr.wikt though. Maybe @Noé would have some idea. Benwing2 (talk) 03:22, 21 March 2024 (UTC)[reply]
Hello, French Wiktionary Stats are provided by Unsui, based on the dumps. He may be able to generate a list with language codes instead of language names? As Benwing2 said, some languages have a local-made code when ISO doesn't provide any, and it is often the name of the language itself in French. Pamputt and Otourly may also be interested by this conversation, and may like to do the reverse operation to have a list of entrees to create in French Wiktionary 🙂 Noé 09:42, 21 March 2024 (UTC)[reply]
Hello, with Cognate Dasboard and some manipulations I was able to do something like that, manually using #ifexist magic word. But Cognate is broken, and I don’t know if it will be repaired. Otourly (talk) 19:16, 21 March 2024 (UTC)[reply]
You can also try to use the kaikki JSON dumps for the French Wiktionary, they have the language codes: https://kaikki.org/dictionary/rawdata.html . The only remaining problem is likely a lack of standardization. MrBeef12 (talk) 08:39, 24 March 2024 (UTC)[reply]

uh oh, timeouts once more[edit]

@Theknightwho Suddenly we have several pages in CAT:E that are running out of time. Cf. Milton, which exists only for a few languages but nonetheless has timeout errors. I see that User:Chuck Entz already pinged you about this, and you said there was a bug in the template parser that you fixed, but there still seem to be issues. What are the changes you've been making lately to the template parser module? Benwing2 (talk) 03:35, 21 March 2024 (UTC)[reply]

@Benwing2 They've cleared up on their own with no intervention from me. I have no idea what the issue was. Theknightwho (talk) 03:50, 21 March 2024 (UTC)[reply]
I think it may be something to do with this diff which restored the old punctuation-removing pattern in Module:languages. If I preview water/translations with the old version, it takes about 8.5 seconds, whereas the new version times out. That may just be down to random chance, though, since I don't see any obvious issues with the new pattern. @Erutuon, Benwing2. Theknightwho (talk) 04:04, 21 March 2024 (UTC)[reply]
@Theknightwho @Erutuon Interesting. I suspect this is not chance. There is clearly variation on how long page saves take but we've never before this had a bunch of pages periodically appearing in CAT:E due to timeouts. If we've seen them once, they'll be back, and the fact that you were able to pinpoint the likely cause means we should focus on optimizing the pattern in question. The obvious thing that pops out is the two .- operators; depending on the implementation esp. given the need to work with Unicode, this could easily turn into an N^2 operation, whereas the previous operation, with only star operator (not counting the %s* operator, which shouldn't in most cases match anything, so should be O(C)), could be O(N). You might consider trying to split up the operation into various operations, e.g. separating the punctuation splitting and trimming in their own operations, so that at the end all you need to do is check for [^%p%s] in the punctuation-stripped and whitespace-trimmed string; this is guaranteed to be O(N). Benwing2 (talk) 04:24, 21 March 2024 (UTC)[reply]
@Benwing2 As a side point, I've just noticed a bug in Module:translations where every translation is being tracked as having "no term", which is adding about 4 seconds to water/translations. Obviously that won't fix the other pages, but it may explain why certain big translation pages have been sluggish. Theknightwho (talk) 04:29, 21 March 2024 (UTC)[reply]
The quantifier - in the pattern could certainly take some time. However, I tried replacing it with *, and then removing the line entirely, and didn't see Lua take significantly less time in Milton, and there's so much variation that I have no idea how to even time it so I've given up. Ultimately there's a translation to PHP regex and I don't know what the difference is between - and * in the PHP translation, but if the translation of * is faster in general, it would be fine to use it here. — Eru·tuon 19:29, 21 March 2024 (UTC)[reply]
@Benwing2 @Erutuon Yeah, the variation in times is enormous. I've been doing quite a lot of profiling with the template parser today, as I was concerned that it was a major contributor to the time-outs on . After a lot of work, it now contributes about 0.5 seconds to the page load time (remembering that that's the time taken to parse several hundred pages; not just the raw content of ). That's totally dwarfed by the variation in some of the MediaWiki functions, and I really don't know what we can do about it: the profile shows mw.ustring.gsub varying between 0.8 to 1.5 seconds, and getContent (which is necessary for page scraping) takes anywhere between 1.5 to 3.5(!) seconds. I assume it's down to whichever machine happens to do the processing server-side. Theknightwho (talk) 23:03, 21 March 2024 (UTC)[reply]
Just to add: the massive variation only seems to affect anything that calls back into PHP. The template parser is mostly built with Lua's native libraries, and I've noticed the times are pretty consistent between page loads. Theknightwho (talk) 23:06, 21 March 2024 (UTC)[reply]
@Theknightwho: Entries keep popping in and out of CAT:E all the time. This used to happen with an entry or two every week. Now it's several at a time, every few minutes to an hour or two. It makes it harder to spot the real errors. I managed to completely clear CAT:E, but in the time it's taken to write this, another one has popped up. I cleared it again- we'll see how long that lasts. Chuck Entz (talk) 15:16, 27 March 2024 (UTC)[reply]
@Chuck Entz I'm not completely sure, but it seems that module load times are longer just after they've been recently changed, but drop back down again after a short while. Presumably it's something to do with caching. Theknightwho (talk) 22:00, 27 March 2024 (UTC)[reply]
@Theknightwho It may well be related to the change mentioned just above by User:Erutuon. I think we should consider reverting it. I don't think it has anything to do with caching. Something has definitely raised the average time that large pages take, which is why they're timing out a lot more often. Benwing2 (talk) 22:06, 27 March 2024 (UTC)[reply]

Template requiring date or year[edit]

For some reason, the template here at the bottom is requesting a date or year, even though two dates are given. The same template does not request anything when used in template namespace. Anybody can tell me what’s going on? MuDavid 栘𩿠 (talk) 09:52, 21 March 2024 (UTC)[reply]

@MuDavid: it's a strange template that uses reference templates inside the quotation template. I'll have to take a closer look at it later. — Sgconlaw (talk) 11:39, 21 March 2024 (UTC)[reply]
@MuDavid: the issue is that you aren't supposed to squeeze a citation template inside |2ndauthor=. If you use |newversion= then the module requires you to provide a value for |date2= or |year2=. Thus, you can't just use {{cite-book}}, etc., inside |2ndauthor= but have to split up all the parameters using |title2=, |location2=, |publisher2=, etc. You can do this for the Allen and Stigand sources, but it won't work if you allow editors to insert a reference in the form of a {{cite-*}} template using |trans_from=. — Sgconlaw (talk) 13:25, 21 March 2024 (UTC)[reply]
However, a possible workaround is to avoid using |newversion= and to put the citation templates into |section= instead. — Sgconlaw (talk) 13:29, 21 March 2024 (UTC)[reply]
Okay, thanks for the hint. I edited the template and it seems to work as desired. MuDavid 栘𩿠 (talk) 02:05, 22 March 2024 (UTC)[reply]

Ancient Greek conjugation template labels contracted forms as uncontracted[edit]

Template:grc-conj, when used to show a contract verb, is supposed to give two tables, one uncontracted and the other contracted. But when I use {{grc-conj|fut-con-a|...}}, the contracted table is also labeled as "Uncontracted". I noticed this on ἐλαύνω, where {{grc-conj|fut-con-a|ἐλ|dial=att}} produces

LaetusStudiis (talk) 15:19, 21 March 2024 (UTC)[reply]

It's best to post problems with {{grc-conj}} in Module talk:grc-conj. They may not be fixed anytime soon, but at least they'll be in a central location. — Eru·tuon 18:36, 21 March 2024 (UTC)[reply]
Actually, there appears to be a post about this already at Module talk:grc-conj § {{grc-conj|fut-ln|ἀγγελ|ἀγγελθ|dial=att}}, deux conjuguaisons non contractées. — Eru·tuon 18:38, 21 March 2024 (UTC)[reply]
Finally fixed it. Just a single-character mistake but took a long time to find. — Eru·tuon 19:45, 21 March 2024 (UTC)[reply]

Percent-encoded pipe and square brackets in T:rfv-sense (AE)[edit]

Look at AE, where the {{rfv-sense|de|regarding gender}} tag displays (Can we verify(gender%7cAE%5D%5D +) this sense?), and {{rfv-sense|de|also regarding gender}} displays (Can we verify(regarding gender%7cAE%5D%5D +) this sense). Why is it eating (not displaying) the first word, and why is it displaying the % stuff? - -sche (discuss) 04:23, 22 March 2024 (UTC)[reply]

This can be fixed by adding additional URL encoding to the template code, but note that this template parameter is not intended to accept a reason. It is meant to take a unique "topic" identifier to distinguish RFVs for different senses under the same language on the same entry. I'm actually reluctant to fix the URL encoding issue as it would paper over the real problem (confusing parameters). I think we should retire unnamed parameter 2 and force the explicit use of |topic=. This, that and the other (talk) 06:41, 22 March 2024 (UTC)[reply]
@This, that and the other What does the documentation sentence "If given, the specified text will be included at the end of a CSS span id contained in the request message." mean? This is confusing to me. Also it looks like you repurposed the old "topic" parameter, which was more open-ended. Was this intentional? Benwing2 (talk) 03:16, 23 March 2024 (UTC)[reply]
@Benwing2 I didn't write that wording. Either way, it means that the text of |topic= or |2= will be appended to the "anchor" (id parameter) generated by the template. For example, {{rfv-sense|en}} will generate an anchor #rfv-sense-notice-en-, while {{rfv-sense|en|1 2 3}} or {{rfv-sense|en|topic=1 2 3}} will generate #rfv-sense-notice-en-1 2 3.
Yeah I don't know why I added support for {{{2}}} [3]. Moment of madness I guess. Probably {{{2}}} should simply contribute a pre-filled reason to the RFV section creation link. This, that and the other (talk) 07:15, 23 March 2024 (UTC)[reply]
Fixed that problem, but {{rfv-sense|de|probably that's Translingual {{m|mul|AE}}}} still is breaking the template output. — Eru·tuon 04:12, 23 March 2024 (UTC)[reply]
Thanks, all. I actually initially thought the issue was that this not input the template was intended to accept, and was going to post here just asking out of curiosity why it failed in this odd way (why was it eating the first word? what is the percent encoding coming from?), but the documentation seemed to suggest this use of 2 was OK. I note that T:rfv accepts a reason as 2, so it seems understandable that people would expect to be able to give a reason in T:rfv-sense too, but I think having that reason only be present in the wikicode (not displayed), and then auto-loaded when adding the section to WT:RFV (as suggested above), would be a reasonable solution. If it's easy to make it also handle {{rfv-sense|de|probably that's Translingual {{m|mul|AE}}}} at that point, great, but if not I see no problem with just updating the documentation to tell people not to do that. - -sche (discuss) 14:27, 23 March 2024 (UTC)[reply]
@Erutuon @-sche It's fixable with {{ANCHORENCODE:string}}, which is specifically designed to generate anchor text from inputs containing links etc. Theknightwho (talk) 14:02, 27 March 2024 (UTC)[reply]

Proto-Brythonic template[edit]

Instead of linking to Britonnic languages, it might make more sense to more specifically link to Common Brittonic. I don't know how to tweak this. Shoshin000 (talk) 13:30, 22 March 2024 (UTC)[reply]

@Shoshin000 Which template are you referring to? Benwing2 (talk) 03:12, 23 March 2024 (UTC)[reply]
Click on the "Proto-Brytonnic" link on aneval for instance. Shoshin000 (talk) 09:34, 23 March 2024 (UTC)[reply]
@Shoshin000 Fixed. Benwing2 (talk) 22:16, 27 March 2024 (UTC)[reply]

{{desctree|non|ok|id=yoke}} throws a module error

Lua error in Module:descendants_tree at line 39: Could not find the correct senseid template in the entry ok (with language non and id 'yoke')

but {{desc|non|ok|id=yoke}} links to the correct sense with no error:

Old Norse: ok

@Theknightwho. Chuck Entz (talk) 22:34, 22 March 2024 (UTC)[reply]

@Chuck Entz Someone had accidentally copied a <noinclude> tag onto the page in the Elfdalian section, which the template parser was dutifully respecting by ignoring everything after it (since it had no closing tag). Theknightwho (talk) 23:21, 22 March 2024 (UTC)[reply]

This has been throwing a module error since the beginning of the week. It's pretty tricky because it only occurs on the template page itself, and is in a module invocation in a parameter that's not displayed. The only way to tell on the page that there's an error is by the Category:Pages with module errors link at the bottom of the page. Since it occurred at the same time as some edits to {{cite-book}} that JeffDoozan had just posted about on Theknightwho's talk page, I posted a reply there:

@JeffDoozan: {{R:cu:ESJS}} started throwing an invisible module error at about the same time you did this, and I suspect these changes are somehow involved. I don't really understand what's going on, but tinkering with html comments has narrowed it down to {{interval}} in the |entryurl= code throwing an error when |2= for the main template is missing, and |entryurl= not being displayed when |1= is missing. I have no clue why the module error didn't show up until now, since neither {{R:cu:ESJS}} nor {{interval}}/Module:interval have been edited recently and I didn't see anything about your edits that should have affected the |entryurl= parameter in {{cite book}} as used in this template. I'm obviously missing something. Chuck Entz (talk) 22:15, 17 March 2024 (UTC)]][reply]

No one seems to have read it who had the time and/or expertise to fix this, so I'm bringing it here. It only happens when the first two positional parameters are empty, which is true on the template page itself. The tempate has no provision for doing things differently there and has the parameter references scattered throughout the template, so it's not something I could fix easily with "noinclude" or "includeonly" tags. It's true that the template has had this deficiency all along, but I brought it to JeffDoozan's attention because it's only after the recent edits that {{cite-book}} responded to it with a module error. I don't really care whose fault this is, but we can't have it stay in CAT:E forever. Thanks! Chuck Entz (talk) 23:14, 22 March 2024 (UTC)[reply]

@Chuck Entz I wrapped the whole thing in <includeonly> ... </includeonly> tags. This should always work in cases like this. One issue for sure is things like {{#ifexpr:{{{2|}}}>45|+4}}; in template space, |2= is undefined and the expression {{{2|}}} evaluates to a blank string, making the thing inside of #ifexpr: expression look like {{#ifexpr:>45|+4}}, which is malformed and results in Expression error: Unexpected > operator. I'm not sure if this counts as a module error (probably not) but it's certainly not good. The module error may come from the recent parameter checking added to {{cite-book}}. Benwing2 (talk) 03:10, 23 March 2024 (UTC)[reply]
Those usually show up in CAT:PFE, which I also patrol. Chuck Entz (talk) 03:18, 23 March 2024 (UTC)[reply]
I looked at this when Chuck first posted it but didn't see any way the param checking in cite-book could have caused the error unless it was exposing some sort of deep, weird interaction between the templates so I left it as-is in case someone else wanted to take a deeper look. JeffDoozan (talk) 18:41, 23 March 2024 (UTC)[reply]

This template does not produce the correct rhyme categories if the stress is anything other than penultimate (such as at matematyka and Jujuy). İʟᴀᴡᴀ–Kᴀᴛᴀᴋᴀ (talk) (edits) 16:39, 23 March 2024 (UTC)[reply]

@Ilawa-Kataka There is going to be a new module where this is handled. Vininn126 (talk) 17:03, 23 March 2024 (UTC)[reply]
@Ilawa-Kataka @Vininn126 Yes, my apologies, I have this partly finished. Benwing2 (talk) 18:57, 23 March 2024 (UTC)[reply]
BTW to give credit where it's due, the module in question was originally created by User:Catonif. Benwing2 (talk) 18:58, 23 March 2024 (UTC)[reply]

Latin entries incorrectly containing M&A template[edit]

Some Latin entries contain the template {{R:M&A}} even though they are not actually in the phrasebook (eg Hesperus, occiduus, valentulus), and so are wrongly in Category:Latin words in Meissner and Auden's phrasebook. The template is able to detect this using Module:R:M&A and displays "[0 phrases]". Can the template be removed from these pages by a bot? Weylaway (talk) 19:27, 23 March 2024 (UTC)[reply]

I now have a full list of the words and there are only about 30, but I would appreciate it if I could be given AutoWikiBrowser permission so I can use Javascript Wiki Browser to fix this and other things in the future. For instance I would also like to fix pages that use the {{Q}} template but don't use the "thru" parameter to properly specify a range of lines. Weylaway (talk) 00:00, 25 March 2024 (UTC)[reply]
@Weylaway: Done. Benwing2 (talk) 19:53, 25 March 2024 (UTC)[reply]
Thank you. Weylaway (talk) 21:24, 26 March 2024 (UTC)[reply]

names template requires grammar fix[edit]

e.g. at Lída: {{given name|cs|female|dim=Lidmila|dim2=Ludmila}}, it says "a diminutive of the female given names Lidmila or Ludmila". It should be either "the name X or Y", or "the names X and Y". Equinox 05:15, 26 March 2024 (UTC)[reply]

Honestly this sounds fine to me as written but we could change the conjunction in this case to "and" ("the name X or Y" sounds strange to me). However, the conjunction "or" is used in a lot of places, e.g. in the masculine and feminine equivalents (|m=, |f=) and it's not clear to me it could be switched in those cases to "and" without sounding strange. Another possibility is changing the text of diminutives to read more like "a female given name, diminutive of Lidmila or Ludmila", avoiding the singular/plural issue entirely. Benwing2 (talk) 02:01, 28 March 2024 (UTC)[reply]

Could this be changed so that it allows separators to be specified on a work level as well as an author level? Quotes from the Vulgate are currently displayed like "Genesis.1.1". Weylaway (talk) 21:26, 26 March 2024 (UTC)[reply]

Insertion of undocumented {{auto doc}}[edit]

Why has seemingly ineffective invocation of undocumented template {{auto doc}} been added to documentation page Module:RQ:pi:Sai Kam Mong/testcases/documentation? All it seems to achieve is the addition of the red text, "Unable to auto-generate documentation for this module page.", which is just confusing when displayed on Module:RQ:pi:Sai Kam Mong/testcases. I am minded to undo this addition. It was added on 9 March 2024, so it doesn't look like a temporary feature. --RichardW57m (talk) 10:41, 27 March 2024 (UTC)[reply]

More techno-imperialism. Probably preparing for AI-generated entries to dispense with pesky manual contributors. DCDuring (talk) 12:54, 27 March 2024 (UTC)[reply]
@RichardW57m Hi. Please ping me in the future when you see my bot has made a change you question, so I will make sure to see it. I forgot to document this template but it's used on pages where Module:documentation will autogenerate the documentation if no documentation is present; it explicitly requests Module:documentation to autogenerate the documentation. The idea is that if you want to put manual text on a doc page but you also want the autogenerated documentation, you use {{auto doc}} to explicitly request the latter. The way it's set up, it normally works when you view the module page itself but not when you directly view the documentation page. I should fix the message to make this clearer. I'm not sure why I put it on the page in question because I don't think there's any autogenerated module documentation available for that page; you can go ahead and take it out on that page. Benwing2 (talk) 18:33, 27 March 2024 (UTC)[reply]
@Benwing2: I asked here because I thought this instance might be part of a general pattern. It seems that I need to add some categorisation. How in general should test case modules be categorised? As parent plus cat:Testcase modules and, for example, where applicable, cat:Pali testcase modules? Copying the parent categories just looks like clutter to me, but seems to be happening with automatic categorisation - but this could just be happening by oversight. --RichardW57 (talk) 08:10, 28 March 2024 (UTC)[reply]
@RichardW57 You should probably use {{module cat}} (which should have good documentation) and follow the example of another test page. They do currently copy the parent categories but I'm not sure that is the best; I could be persuaded to change this to work in some other way. The advantage of using {{module cat}} is changes like this can easily be made in one place and propagate everywhere. Benwing2 (talk) 08:15, 28 March 2024 (UTC)[reply]
@Benwing2: OK, I'll use that. The automation could be enhanced by recognising standard (TBC) prefixes such as 'RQ:pi:' as indicating language Pali and type 'Quotation and usage example' (which is a misnomer - 'and' should be replaced by 'or' to make the description true), though the yield may be fairly small. --RichardW57m (talk) 10:09, 28 March 2024 (UTC)[reply]
@Benwing2: Done. --RichardW57 (talk) 11:36, 29 March 2024 (UTC)[reply]

the template for adding Set-not-Topic categories is T:topic[edit]

I notice that e.g. CAT:en:Pinks advises (correctly) that it's a Set and not a Topic cat: "NOTE: This is a set category. It should contain terms for pinks, not merely terms related to [the topic of] pinks." I also notice the template which adds e.g. Mexican pink to it is T:topic. This seems like it could be confusing.
I wonder if we should change the main name of T:topics to something more indicative of its function, like "catlangcode" with a shortcut like "clc" mirroring catlangname and cln? (Could keep "topic"/"topics" as redirects too so as not to disrupt people who are used to them.) Or if we split the naming systems of topic vs set categories so that the scope of a category is discernible from its name and doesn't require users to click through to read the category description (or did we decide against that?), then maybe we'd just need a separate T:setcat (or something) at that point. - -sche (discuss) 14:44, 28 March 2024 (UTC)[reply]

@-sche So the way I have handled this so far is to rename what formerly were "topic" categories to be "related-to" categories, preserving the name "topic" for the union of "related-to" and "set" categories. (Actually there are other types beyond just related-to and set categories; see Module:category tree/topic cat/data/documentation#Category types.) The term "related-to" is a bit awkward but I think it conveys pretty well what the purpose is, more than "topic" does. An alternative is to use a template like {{group}} or {{groups}} or {{groupcat}} or similar. As for splitting related-to and set categories, I don't think we decided against it but the discussion didn't come to a conclusion; there were some issues that we haven't yet resolved. Benwing2 (talk) 01:10, 30 March 2024 (UTC)[reply]

Alphabeticisation of subcategories of Lithuanian terms suffixed with -mas[edit]

cat:Lithuanian terms suffixed with -mas has three subcategories, those suffixed by -imas, -umas and -ymas. These are sorted in English order, under alphabetic heads 'I', 'U' and 'Y'. Shouldn't they be sorted by Lithuanian alphabetic order, so in the order -imas, -ymas and -umas, and probably under initial letters 'I' and 'U'? (I don't think we can thoroughly 'Y' anyway as a header for Lithuanian ordering.) Pinging @Benwing2, Fay Freak. --RichardW57m (talk) 15:07, 28 March 2024 (UTC)[reply]

Yeah, this is interesting, as in the superordinate category Category:Lithuanian terms by suffix y is sorted at the place customary for Lithuanian again. Fay Freak (talk) 15:34, 28 March 2024 (UTC)[reply]
@Fay Freak: You mean by the Wiktionary sort order for Lithuanian. In the customary Lithuanian sort order, -yba comes before -izmas because 'b' comes before 'z' and 'y' only orders after 'i' as a tie-break. --RichardW57 (talk) 01:47, 29 March 2024 (UTC)[reply]
@RichardW57 They are sorted that way because the parent categories are manually specified on the child category pages using raw Wikicode. I think if they used a template to do the categorization, things would work better; the sort order for Lithuanian in Module:languages/data/2 does indeed sort y with i. Pinging User:Theknightwho who may have thoughts about this. Benwing2 (talk) 21:47, 29 March 2024 (UTC)[reply]

the label "color"[edit]

I was cleaning up entries which were in both the top-level "Colors" cat and the relevant subcat, e.g. Yale blue (already in "Blues"), and I notice it's {{lb|en|color}} that adds the redundant "Colors" category. IMO this label is not useful, it just tells you that "A dark azure colour" is a color, which the definition already tells you. (Should we label saleswoman {{lb|en|woman}}?) So I am inclined to remove the label altogether. (I removed it from a few entries following this, but upon realizing the scope of the issue, am coming here.) But if we don't remove the label, I am inclined to at least remove the categorization, because double-categorizing things into both parent and child categories is generally undesirable.
Thoughts? (It'd be good to ensure any entries the label is removed from are already in a subcat of "Colors" or, in rare cases where something is a non-visible color like octarine, the top-level "Colors" category.) - -sche (discuss) 19:13, 28 March 2024 (UTC)[reply]

Yes, this doesn't seem to be a correct use of a label. — Sgconlaw (talk) 19:20, 28 March 2024 (UTC)[reply]
Agreed. So long as it is clear from the definition that it is a color, the label adds nothing. It's worse than the "(anatomy) elbow" stuff you see sometimes, because at least that doesn't involve needless repetition of a word on a sense line. This, that and the other (talk) 21:40, 28 March 2024 (UTC)[reply]
Agreed, feel free to correct. Benwing2 (talk) 21:49, 29 March 2024 (UTC)[reply]

Template:table:colors[edit]

I notice that Template:table:colors also categorizes any entries it's on into CAT:foo:Colors, e.g. lime green. I am inclined to change it to not apply that category, thus requiring the relevant subcategories to be applied manually (like "CAT:en:Greens" in the case of lime green). Alternatively, if the template is supposed to be used on all and only those pages which are "top-level" primary colors we feel are 'worthy' of double-categorization into both the relevant subcats and the top-level Colors cat, then many things like "lime green" and "mint green" are clearly not regarded as fundamentally different colors from "green" in English and would need to be removed from the table, no? Thoughts? - -sche (discuss)

Agreed on removing the top-level category. —Justin (koavf)TCM 22:21, 31 March 2024 (UTC)[reply]
@-sche Also agreed. I think maybe theoretically this table is supposed to be used only on basic colors, but I checked the usage of the Spanish variant and it's also used for the equivalent of cream, fuchsia, cobalt blue and several other random colors. Benwing2 (talk) 22:32, 31 March 2024 (UTC)[reply]
I'm definitely not a fan of this template. The colors weren't well chosen to begin with ("lime green?", "mint green?", "magenta"?), and different languages divide up the color space differently. For a language that uses the same word for blue and green, what color do you show? The idea of choosing a single hue to show what a given color name depicts is particularly bad for proto-languages and dead languages. I helped to get the Proto-Indo-European one deleted by pointing out that Latin flavus (yellow) and English blue are from the same PIE root, but there are no doubt others that deserve the same fate. Chuck Entz (talk) 23:37, 31 March 2024 (UTC)[reply]
I think the template could be useful if set up and used correctly. I'm under the impression that different languages dividing the color space differently is intended to be handled by modifying the template for that language, the way Template:table:colors/egy does, and certainly, I think Template:table:colors/egy is useful. But this probably does make the top-level, language-nonspecific template a bad idea, because having it encourages people to just use its values. The way the template is currently set up, people's desire to have a table with no empty cells results in many things being separated as fundamentally different colors which should not be, which (I agree) severely reduces the value of the template. It's impossible to discern that the reason the Russian template separates light and dark blue is that those are regarded as different colors in Russian (like pink vs red in English), when the English template turns around and separates two "blue" fields with virtually the same colors, and separates out three green fields, even though the only fundamental colors are one "green" category with various shades and one "blue" category with various shades. Maybe I'll try and clean up the English table later to have its own values like the Egyptian table does. - -sche (discuss) 19:12, 1 April 2024 (UTC)[reply]
I revamped the English table. It could use more swatches to illustrate more shades of brown, grey, etc, but I tried to remove things that weren't separate 'core' colours, or subsume them under the relevant core colour. But as Chuck says, a lot of other tables also need revising, and maybe the idea of having a base Template:table:colors is bad because people just use that, and include everything it includes, rather than creating a table based on actually-recognized colours... e.g. the /ja table seems to just have translated the base table... - -sche (discuss) 01:28, 2 April 2024 (UTC)[reply]
@-sche Thanks a million, I actually raised a very similar concern at the tea room in January. A Westman talk stalk

Phrasal verbs in Welsh and Irish[edit]

English phrasal verbs are subcategorised by the particle that the verb occurs with, e.g. Category:English phrasal verbs with particle (aback). Phrasal verbs are extremely common in Welsh, so it would be a good idea to have similar subcategories such as Category:Welsh phrasal verbs with particle (allan) - but many phrasal verbs are actually formed with multiword "particles", such as i fyny and i lawr.

I can see that Irish does indeed have subcategories like Category:Irish phrasal verbs with particle (faoi iamh) but it doesn't seem quite right to call these "particles". Any suggestions on a better term to use? Arafsymudwr (talk) 01:27, 29 March 2024 (UTC)[reply]

@Arafsymudwr Ideally any such term would apply cross-linguistically, because currently these categories are all handled at the cross-linguistic level. Benwing2 (talk) 21:50, 29 March 2024 (UTC)[reply]
@Benwing2 I'd be tempted to suggest simply saying English phrasal verbs formed with aback or Welsh phrasal verbs formed with i fyny in that case. But I realise that must be a lot of editing work, which is why I was hoping for a term that might cover multiword "particles" so the existing categories other than a few Irish ones can be left alone. Arafsymudwr (talk) 00:25, 30 March 2024 (UTC)[reply]
@Arafsymudwr It's not actually so hard to make such a change as it can be done by bot; I've done similar category renames before. Benwing2 (talk) 00:39, 30 March 2024 (UTC)[reply]
I actually think this suggestion made by @Arafsymudwr is a good one. Anyone else want to weigh in? Benwing2 (talk) 01:11, 30 March 2024 (UTC)[reply]
Support from me. This, that and the other (talk) 09:49, 30 March 2024 (UTC)[reply]
Support from me. Arafsymudwr (talk) 10:25, 30 March 2024 (UTC)[reply]

merging lect info[edit]

Pinging a few people who might be interested: @Theknightwho, -sche, Surjection, Vininn126 It occurs to me we have info on different language lects/varieties in a whole shitload of places:

  1. Label data, e.g. for English: Module:labels/data/lang/en; for Chinese: Module:labels/data/lang/zh;
  2. {{alt}} "dialect" data, e.g. for English: Module:en:Dialects; for Chinese: Module:zh:Dialects (not currently defined);
  3. Module:etymology languages/data;
  4. the "varieties" and "aliases" fields in language data, e.g. Module:languages/data/2/extra;
  5. dialect synonyms data, e.g. for Chinese: Module:dialect synonyms/zh;
  6. category pages for individual lects, e.g. Category:Polari and Category:Jiaoliao Mandarin, which have parameters specified to {{auto cat|dialect=1}}.

This scattering and duplication of info is a real problem because inevitably the different sources get out of sync. I have been thinking of how to merge some of this data. My thoughts:

  1. I already proposed eliminating #2 (the {{alt}} "dialect" data) in WT:Grease pit/2024/March#removing cruft from Module:labels/data/regional, and I have written the code to do this, so that the dialect data modules can read and convert label data modules.
  2. I just added support to Module:category tree/poscatboiler/data/language varieties (which implements #6, the language variety category pages), so that Wikipedia and Wikidata information can be pulled out of label data modules automatically, and the support was already there to automatically pull this info out of Module:etymology languages/data when present. An example of this in action is Category:Jiaoliao Mandarin, where the links to the English and Chinese Wikipedia articles in the upper right-hand corner come from the Wikidata item listed for label Jiaoliao Mandarin in Module:labels/data/lang/zh.
  3. I am thinking of further moving info currently specified on individual category pages into the label data modules, perhaps into "extra data" modules similar to Module:languages/data/2/extra (e.g. Module:labels/data/lang/zh/extra) so they don't bloat the label data modules themselves.

I am soliciting thoughts for how to centralize lect information. Either we can continue augmenting the label data information, as I've been doing, or we can create a separate set of language-variety modules that contain all the info needed for the various applications mentioned above. Benwing2 (talk) 03:03, 30 March 2024 (UTC)[reply]

I agree that most of the same labels need to be used in different places and that scattering them is problematic. As I am not a programmer, I am unsure which option would be best. I have an inkling that having a separate language module might be preferred by some. Vininn126 (talk) 07:41, 30 March 2024 (UTC)[reply]
I agree with trying to consolidate as much of this as possible. I don't think I even knew/remembered Module:en:Dialects even existed (and now that I do, I'm unsure why it does exist as something different from the others). I'm unsure whether Module:labels/data is the best place for it to get consolidated to, though.
On one hand, I understand that because many (most?) of these will occur as {{label}}s, putting them in Module:labels avoids that module (and its human users) having to look somewhere else to process some labels. OTOH, (A) the label module seems like a less expected place to look for "language/lect data as such", compared to a language/lect module, particularly for any lect data that isn't used in a {{label}}, because (B) putting something in Module:labels/data strongly suggests that we think it is (or will/should be) used in {{label}}s, but AFAIK at least a few "etymology-only languages" really are "etymology-only" (the substrate codes for sure; are there others?), so does it make sense to have some things which aren't used in labels, or where we've added them to the module without regard for whether / intention that they be used in labels, be in the labels-data module?
But I understand that if we decide to consolidate things to a separate Module:subsumed language varieties (or whatever name) instead, the question is then, is that confusing to users, for certain labels to be in Module:labels while lect-y ones are in another module? So I'm unsure what's best.
BTW, it occurs to me that e.g. "en-CA" "Canadian English" and "fr-CA" "Canadian French" exist as etymology languages that can be used in etymologies, but you can also deploy either of them (or at least, their categories) as {{label}}s via {{lb|en|Canada}} and {{lb|fr|Canada}}, so if we centralize lect info to Module:labels, I guess it needs to be able to account for "Canada" being a lang=fr-specific alias of (or at least, adder of the category of) "Canadian French" while also being a lang=en-specific alias of "Canadian English"...? - -sche (discuss) 14:04, 2 April 2024 (UTC)[reply]
@-sche Thanks for your thoughts. Yeah there are indeed some issues with centralizing into the labels modules, as you point out, although there are also issues with not doing this (as you also point out). So far what I've been doing is putting descriptions and parent label info in Module:labels/data/lang/zh and pulling it out in Module:category tree/poscatboiler/data/language varieties (which implements language variety categories such as Category:Wuhan Mandarin). This avoids the need to put this information in the call to {{auto cat}} itself (although this can still be done). Note also that the way the above module distinguishes "lect" labels from "non-lect" labels is by the presence of the parent field; I thought of introducing a specific nolect field to indicate non-lect labels but it seems unnecessary if all lects have a parent field (which is set to true for top-level lects). The basic problem is that many of the different data modules are used for slightly different purposes, so it's difficult to merge them all. The issue with the Canada label in particular is that it's in Module:labels/data/regional and is used by several languages; this can potentially be solved e.g. by moving such labels into the language-specific modules if they have language-specific info attached. Benwing2 (talk) 21:20, 2 April 2024 (UTC)[reply]

Request for list of pages by time usage[edit]

Could someone with the technical knowledge make or teach me how to make a list of the, say, top 1000 pages by their "CPU time usage" or "Real time usage" or "Lua time usage"? Some pages hop in and out of Cat:E sometimes and I think such a list would be helpful to see which pages are in the "gray area". --kc_kennylau (talk) 11:51, 30 March 2024 (UTC)[reply]

@Kc kennylau This is a very good question and I honestly don't know how to do it. It would be great if MediaWiki exported a page showing this but AFAIK they don't. In order to do this, then, you'd first have to figure out how to get the usage stats on a given page, then run this on the pages most likely to be taking up lots of time (which would probably be some combination of pages with lots of template calls and pages that have a lot of Wikitext). To get the usage stats, ideally there would be an API exposed by MediaWiki to get the usage stats but I looked and I can't find one; the alternative is to scrape the page when previewing but that would be rather painful to write, I think. You might want to search Phabricator and/or contact a MediaWiki developer like Tim Starling for this. Benwing2 (talk) 20:46, 30 March 2024 (UTC)[reply]
@Benwing2, Kc kennylau: it's easier than you think. There's a parser profile report embedded as a comment in the HTML source. I use it all the time. I don't know if it's the same for bots or AWB, but it wouldn't take that long to just "view source" in your browser and save it for automated extraction later. Chuck Entz (talk) 21:40, 30 March 2024 (UTC)[reply]
@Chuck Entz Right, that's the option I mentioned of scraping the page when previewing. Maybe not so hard to write but not ideal. Benwing2 (talk) 21:52, 30 March 2024 (UTC)[reply]
@Benwing2 I'm not sure what you mean by "previewing". If you mean clicking "Edit", then "Preview", that's not necessary. All it takes is going to the page (not viewing the diffs, but just going to the page).
For instance, if I click on the link for a, right-click on the page, then select "View Page Source" from the menu, then page up from the bottom a bunch of times, I see:

<!-- NewPP limit report
Parsed by mw‐web.eqiad.main‐78d6c98b98‐tjckl
Cached time: 20240330220709
Cache expiry: 2592000
Reduced expiry: false
Complications: [vary‐revision‐sha1, show‐toc]
CPU time usage: 14.922 seconds
Real time usage: 17.072 seconds
Preprocessor visited node count: 144535/1000000
Post‐expand include size: 1873158/2097152 bytes
Template argument size: 163519/2097152 bytes
Highest expansion depth: 25/100
Expensive parser function count: 72/500
Unstrip recursion depth: 0/20
Unstrip post‐expand size: 36525/5000000 bytes
Lua time usage: 9.702/10.000 seconds
Lua memory usage: 72225617/104857600 bytes
Lua Profile:
recursiveClone <mwInit.lua:41> 1200 ms 12.8%
 ? 920 ms 9.9%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::gsub 700 ms 7.5%
pcall 640 ms 6.9%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::match 380 ms 4.1%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::getAllExpandedArguments 340 ms 3.6%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::redirectTarget 300 ms 3.2%
MediaWiki\Extension\Scribunto\Engines\LuaSandbox\LuaSandboxCallback::toNFD 280 ms 3.0%
<mw.title.lua:50> 280 ms 3.0%
(for generator) 260 ms 2.8%
[others] 4040 ms 43.3%
Number of Wikibase entities loaded: 0/400
-->

<!-- Transclusion expansion time report (%,ms,calls,template)
100.00% 14458.411 1 -total
8.85% 1279.244 400 Template:head
8.37% 1210.145 184 Template:inh
7.39% 1068.371 334 Template:no_deprecated_lang_param_usage
6.55% 946.334 536 Template:l-self
6.01% 868.748 24 Template:audio
5.84% 844.661 27 Template:catlangname
5.74% 829.833 69 Template:cite-book
4.92% 711.522 369 Template:l
4.86% 702.181 87 Template:cite-meta
-->

<!-- Saved in parser cache with key enwiktionary:pcache:idhash:106923-0!dateformat=mdy and timestamp 20240330220709 and revision id 78696257. Rendering was triggered because: page-view
-->

I escaped the comments and added line breaks to make it readable, but otherwise, that's it. If you know how to extract everything between "Lua time usage: " and "/10.000 seconds" you can get the processor time completely painlessly. Chuck Entz (talk) 22:22, 30 March 2024 (UTC)[reply]
@Chuck Entz: Thanks for the insight. However, generating the source code still seems to be an expensive operation. Does en.wikt save the data somewhere in the "page" as returned by a pagegenerator? --kc_kennylau (talk) 22:35, 30 March 2024 (UTC)[reply]
@Kc kennylau I don't think so. As I said above, maybe there's an API you can use to request this info, but if so I don't know it. I would suggest searching Phabricator [4], and if you can't find anything, opening a ticket about how to do this. Benwing2 (talk) 22:39, 30 March 2024 (UTC)[reply]
@Kc kennylau: I doubt it, since it's different for every page load. I don't think there's any automated way to do this, so it would require visiting each page and working with the generated source. It's very easy compared to other things you might do without a bot, but there's no comparison to anything done by bot. The only consolation is that there aren't that many candidates and it could be optimized to less than a minute per page. Look at User:Chuck Entz/Memory and subpages to see what I pieced together using such crude techniques when I first starting trying to figure out what was going on with all the Lua errors that were popping up at the time. I used English frequency lists as a crude way of narrowing things down. If I were doing it now, I would start with single-character entries for Latin and Han scripts, and short (c)V(c) sequences with uncommon letters like "q", "x" and "z" excluded. You could alternatively use the number of templates or the number of L2 sections in the wikitext per page as tests that could be determined from the dumps, just to narrow things down. Chuck Entz (talk) 00:12, 31 March 2024 (UTC)[reply]
(@Chuck Entz there was no "reply" button for this because you used ~~~ instead of ~~~~) This method would miss what I was trying to look for in the first place. The Coptic inflection tables had a lot of links and took a lot of parsing time (which I have since optimised). I am trying to find pages in similar situations. --kc_kennylau (talk) 23:52, 30 March 2024 (UTC)[reply]
Oops. I manually added the date and time to fix that. As for other cases like Coptic: Coptic was exceptional. All the pages I've seen in CAT:E with timeouts have been due to large numbers of templates or due to bugs that caused things like partial recursion. That's not to say that there aren't potential cases that just haven't gotten bad enough yet. I'm not discouraging you from pursuing other options like those offered above- I just wanted to add a low-tech option in case high-tech ones aren't available. I only elaborated because I wasn't sure if everyone understood what I was talking about. Chuck Entz (talk) 00:51, 31 March 2024 (UTC)[reply]

aggressive GC patch is rolled out[edit]

It appears that the more aggressive garbage collection patch [5] is rolled out. I notice that the memory on a has reduced to 71MB; I'm not sure if it's related but quite possibly. Benwing2 (talk) 02:06, 31 March 2024 (UTC)[reply]

fixing Reply gadget on the Grease pit[edit]

@This, that and the other Whatever you did for the Beer Parlour works in that I can reply to posts directly from WT:BP. However, I can't do that in the Grease Pit. Could you apply the same change here as well? Benwing2 (talk) 05:18, 31 March 2024 (UTC)[reply]

@Benwing2 I'm leaving this reply from WT:GP itself using the Reply tool, without having changed anything. However, I've also noticed that it doesn't work all the time. There seems to be some kind of intermittent fault, whether on our end (depending on something within the wikitext of the discussions?) or on the server end. Not really sure where to start tbh. This, that and the other (talk) 08:29, 31 March 2024 (UTC)[reply]
Having reloaded the page a few times, I can now only use the reply tool on discussions in the top half of the page (February 2024) and not the bottom half (March 2024). This, that and the other (talk) 08:33, 31 March 2024 (UTC)[reply]

Postal Romanization in derivation categories[edit]

Right now, Category:Tagalog terms derived from Postal Romanization is in CAT:E because {{auto cat}} doesn't know what to do with "Postal Romanization". Category:English terms derived from Postal Romanization exists, but it doesn't use {{auto cat}}. We do have romanizations such as "Wade-Giles" in Module:etymology languages/data, but I'm not sure which language code to attach it to. My understanding is that the Postal Romanization may not be strictly or solely Mandarin, being based on a Nanjing dialect- I'm not sure which language. Pinging (Notifying Atitarev, Benwing2, Fish bowl, Frigoris, Justinrleung, kc_kennylau, Mar vin kaiser, Michael Ly, ND381, RcAlex36, The dog2, Theknightwho, Tooironic, Wpi, 沈澄心, 恨国党非蠢即坏): for input from people who would know. Chuck Entz (talk) 15:27, 31 March 2024 (UTC)[reply]

Apart from "Amoy" being from Hokkien, the names seem to be from "southern (Nanjing) Mandarin" as the Wikipedia article Chinese postal romanization suggests. Is there no option to simply attach it to "Chinese"? --kc_kennylau (talk) 15:37, 31 March 2024 (UTC)[reply]
@Chuck Entz Postal Romanization is not a recognized etymology language; we could add it but I'm not sure it's really needed. Benwing2 (talk) 20:31, 31 March 2024 (UTC)[reply]
@Benwing2 look at Category:English terms derived from Postal Romanization. Most of the Wade-Giles spellings for place names like Pep'ing, Ssu-ch'uan and Nan-ching are completely obsolete, but Postal Romanization ones like Peking, Szechuan, and Nanking are still recognizable, even if the Pinyin spellings like Beijing, Sichuan and Nanjing are currently prescribed. Chuck Entz (talk) 00:08, 1 April 2024 (UTC)[reply]
@Chuck Entz OK, I added code zh-postal for this and cleaned up the pages that referred to it. Benwing2 (talk) 01:57, 1 April 2024 (UTC)[reply]

Invalid params in call to Template:lv-decl-noun-1[edit]

Whilst editing the Latvian entry autors, I noticed the following error in the declension template:

Invalid params in call to Template:lv-decl-noun-1: 6={{{6}}}; 7={{{7}}}; 3=1st; drop-v=; 5={{{5}}}.

This error appears to apply to all instances of the template which I've checked, where the general format is {{lv-decl-noun|autor|s|1st|extrawidth=-60}}. I've read the documentation for Template:lv-decl-noun, Template:lv-decl-noun-1, and have tried removing the extrawidth parameter, but nothing stood out to me and I'm assuming that it's a fundamental issue with the template design. From what I can see, none of the original contributors to the templates appear to be still active, so I'm wondering if there's anyone in the general audience who would be able to look into this? Helrasincke (talk) 18:19, 31 March 2024 (UTC)[reply]

This is quite a problematic situation, as the general format in which the template is called is completely incompatible with the parameters that are actually used by the template, and there is no template documentation to tell us the intended way of calling the template... --kc_kennylau (talk) 19:09, 31 March 2024 (UTC)[reply]
Edit: the documentation is located at {{lv-decl-noun}}, and searching through the old versions, I still have completely no idea why |extrawidth=-60 got there in the first place. I can't find it in the old versions at all. --kc_kennylau (talk) 19:20, 31 March 2024 (UTC)[reply]
(I have struckthrough my previous comments as I have investigated further.) @Helrasincke: Basically {{lv-decl-noun}} is a central hub that calls other inflection table templates depending on the 3rd parameter, and in this case it calls {{lv-decl-noun-1}}, and it also passes on parameters to the sub-templates. However, certain parameters are used only for other declensions, so the parameter checker puts a warning in the preview page (but not the actual page) that there are unused parameters. The extrawidth parameter is used to adjust the width of the table. Since the warning only appears in the preview, I suppose one can just ignore it. --kc_kennylau (talk) 19:33, 31 March 2024 (UTC)[reply]
@Kc kennylau: I think it's less stressful, and possibly less error prone, if similar templates have the same set of parameters, even though some of them not be used in some cases. The downside is that {{#invoke:checkparams|warn}} then has to be told that the redundant parameters are allowed. If the extra parameter is one that is easy for a human to generate, it makes sense to allow it. If a parameter is one that would frequently be omitted, like the 'alt' parameter in {{link}}, then it does make sense to warn if it supplied with a non-blank value. @JeffDoozan and I have already has this discussion (Module talk:checkparams#Gaps in Positional Parameters)about Lithuanian inflection templates, where the general pattern is {{name|stem-with-no-accent|stem-with-accent|...}}, which is easier to commit to memory even if the first parameter is then unused for accentuation pattern 1. Greek has a similar template pattern for recessive accents. --RichardW57m (talk) 14:00, 2 April 2024 (UTC)[reply]
This warning was showing up because I add parameter checking to some of the templates called by {{lv-decl-noun}}. I fixed a bug on {{lv-decl-noun}} and adjusted the list of allowed parameters on each of the templates called by {{lv-decl-noun}} so they will no longer display a warning. JeffDoozan (talk) 00:02, 1 April 2024 (UTC)[reply]
@JeffDoozan: I believe we should really be checking that the extra parameters (to the template ..-noun-1, such as {{{5}}}) are empty, and that there are no "real" extra parameters (such as, say, {{{9}}}). Unfortunately the current module doesn't allow for this, but I think I can still do this "manually". --kc_kennylau (talk) 10:21, 1 April 2024 (UTC)[reply]
Actually, it seems that the module does not count empty number parameters. I have changed the main template to not pass {{{3}}} (the declension type) to the sub-templates. I have also used a bit of a hacky method to ensure in the sub-templates that the other named parameters are empty. In the long run we would preferably convert the templates to Lua. --kc_kennylau (talk) 10:55, 1 April 2024 (UTC)[reply]
@Kc kennylau, JeffDoozan, Helrasincke: I've just found a way to easily allow 2 currently unused parameters when another 27 are actually used. Just use the two without effect, e.g. in the condition of a #if test that does nothing either way. There are a lot of Sanskrit declension templates that used to use the 3rd and 4th positional parameters for transliteration when inflected forms were wrapped in {{lang}}. The forms are now wrapped in {{l}} or similar, so became redundant, but are mentioned all over the place, including in templates for adjectives that build on templates for nouns. I think the proper long term way forward is to replace these templates by less specific ones, but even that's a lot of effort for mostly little gain. --RichardW57m (talk) 15:43, 3 April 2024 (UTC)[reply]
@RichardW57m: The problem is that we want to ensure that the parameters that are not used are actually empty. --kc_kennylau (talk) 15:53, 3 April 2024 (UTC)[reply]
@kc_kennylau: Why? Let sleeping dogs lie. --RichardW57m (talk) 16:14, 3 April 2024 (UTC)[reply]
Isn't that the whole point? It's a preview warning and it gets added to a hidden category. --kc_kennylau (talk) 17:02, 3 April 2024 (UTC)[reply]
@kc_kennylau:: Then I'd better raise an RfD on the module. I thought the main purpose was to catch typos and mistaken names in calls; it also catches attempts to use, for example, the {{|cat2}} parameter of {{head}} in language-specific headword templates that happen not to support it. I've used it to fix about half a dozen uses of the wrong name such as 1 for tr, g for 1, head v. entry (in dictionary references), and some of the latter type I've mentally noted as requests for enhancement. --RichardW57m (talk) 17:33, 3 April 2024 (UTC)[reply]
@RichardW57m If the module is raising warnings about unused parameters, and you think those parameters should be allowed, then the obvious solution is to add support for those parameters. Deleting the module is like throwing out an alarm instead of doing something about whatever caused it to go off. Theknightwho (talk) 22:35, 3 April 2024 (UTC)[reply]
@Theknightwho: Or using blinkers to stop a horse panicking. Or leaving shrapnel in a wound rather than run severe risks in removing it. In some cases, the solution is to substitute better values invocation by invocation and enable their use - but getting better values takes significant effort. In another case, the better solution is actually to replace the templates by probably no worse templates that already exist - but that is not a higher priority, and I would check in each case that the new templates don't introduce new errors - they have generated erroneous outputs in several instances recently, and they don't have any testcases. The only issue caused by the unused parameters is slightly larger and presumably slower code. --RichardW57m (talk) 15:04, 4 April 2024 (UTC)[reply]
Incidentally, I've not been deleting the module's invocation; I've been telling it what doesn't matter. However, I do remember you saying that it's not the sort of thing we should be using! --RichardW57m (talk) 15:04, 4 April 2024 (UTC)[reply]
@RichardW57m I specifically said that because it's less efficient than rewriting the template in Lua, because it means the template's wikitext has to be parsed. It's certainly doable, but it would be better if we didn't have to. Theknightwho (talk) 15:07, 4 April 2024 (UTC)[reply]
@Theknightwho: This was for a dumber version of the concept, which had to be told all the allowable parameters, just like call_quote_template in Module:quote. No automated template parsing was required. I think you were worried about the potential for parallel processing and all the Lua troubles that's been giving us over the years. --RichardW57m (talk) 15:19, 4 April 2024 (UTC)[reply]
You are aware, are you not, that chopping and changing positional parameters is a recipe for disaster? --RichardW57m (talk) 15:04, 4 April 2024 (UTC)[reply]

April 2024

Magic words appearing in WhatLinksHere[edit]

I noticed that WT:Todo/Lists/Entries using nonexistent templates had suddenly filled up with spurious transclusions of MediaWiki-implemented magic words like {{!}} and {{PAGENAME}}. This can also be seen in WhatLinksHere: [6]. These templates don't exist and are known by our Lua code to be magic words (well, at least {{temp}} itself treats them specially), so there should be no reason to attempt to transclude them.

Something has changed in the last week. It's only happening on this wiki, so it's coming from our Lua modules rather than MediaWiki itself. I'm pinging @Theknightwho as a starting point. This, that and the other (talk) 01:26, 1 April 2024 (UTC)[reply]

@This, that and the other. I noticed this last week: if you look at the March 24 revision you'll see there are already some of those, so it had to have happened before then- not as many, so probably not long before. Chuck Entz (talk) 02:38, 1 April 2024 (UTC)[reply]
After doing some spot-checking it appears that all of these have the magic words (as well as things like "!" and "=") wrapped in {{ }}. These were inserted as part or all of parameters in templates, in interwikis, and in categories. The use of {{PAGENAME}} in filenames makes me nervous, since they'll have to be fixed if the page is moved (those should have been subst:ed).
Come to think of it, either subst:ing all the {{PAGENAME}}s or replacing them with the pagenames themselves looks like a perfect job for a bot. Chuck Entz (talk) 03:47, 1 April 2024 (UTC)[reply]
@Chuck Entz Yeah, some people systematically insert {{PAGENAME}} into Wikitext. I think it's a bad idea. Benwing2 (talk) 05:03, 1 April 2024 (UTC)[reply]
@This, that and the other @Chuck Entz @Benwing2 This was down to an older version of the template parser which didn’t handle parser variables (i.e. magic words which don’t take any parameters), so it was still grabbing the title object. This was fixed about a week ago, but clearly hasn’t propagated through everywhere yet. Some parser variables can also act like magic words (e.g. {{PAGENAME}} vs {{PAGENAME:title}}), but many can’t (e.g. {{!}} and {{=}} will default to templates if you try), and some of them are case-sensitive while others aren’t, so I had to make sure it knew how to handle all the various possible inputs. As a side point, it is actually possible to use templates with those names by using (e.g.) {{msg:PAGENAME}}, which {{temp}} is also aware of, and is on my to-do list for the template parser. Theknightwho (talk) 15:15, 1 April 2024 (UTC)[reply]

Der3, Rel3, Col3[edit]

Please replace derx, relx in all Philippine languages (especially Tagalog) to colx templates. Thank you. Ysrael214 (talk) 02:36, 1 April 2024 (UTC)[reply]

Welsh word 'hambon'[edit]

I'm trying to add the Welsh word 'hambon'. I've given various sources for the word used in context, but I'm given a

"This action has been automatically identified as harmful, and therefore disallowed. If you believe your action was constructive, please start a new Grease pit discussion and describe what you were trying to do. A brief description of the abuse rule which your action matched is: various specific spammer habits"

Could someone help resolve this? Wemblydumblediddle (talk) 08:58, 2 April 2024 (UTC)[reply]

Your entry does not follow the correct formatting. Please see WT:EL and existing Welsh noun entries for examples. The "Cultural Significance" should perhaps at best be a "Usage notes" section. The most likely reason for the filter though is the usage of external links to e.g. YouTube. — SURJECTION / T / C / L / 09:03, 2 April 2024 (UTC)[reply]
Thanks for getting back so quickly, I'm new to this. I can't think of a way around this, there isn't much in writing about the word. The only decent written source I could find is that Guardian article. Is there a way around the automatic filtering? Does someone have the authority to verify the entry, modifying it if necessary? Wemblydumblediddle (talk) 09:12, 2 April 2024 (UTC)[reply]
Try publishing the entry again, but without the YouTube links (and ideally also with formatting changes you can gather from the two links I posted). — SURJECTION / T / C / L / 09:48, 2 April 2024 (UTC)[reply]
Thanks, that worked. It's a shame I can't include the video, though. That Hansh video is probably the best example of the word in use; it features hambons explaining what it means to be a hambon, and it's very entertaining for West Wales Welsh speakers. Wemblydumblediddle (talk) 10:41, 2 April 2024 (UTC)[reply]
Wiktionary doesn't seem to block youtube links per se, because I successfully added a youtube link in the third quotation of this revision via Template:quote-av in order to confirm pronunciation and the stressed syllable. Does this go against the WT:CFI#Durably_archived policy? --Ssvb (talk) 20:15, 2 April 2024 (UTC)[reply]
"Durably archived" is only a policy requirement when it comes to proving the existence of the word (as a WT:RFV/WT:ATTEST question); it's OK to add links to (ideally reliable or at least representative and inoffensive) youtube videos to show pronunciation, and we regularly provide References or Further reading links to various reliable online dictionaries. Brand-new users are currently prevented from doing so, because most such users are spammers, but we do get feedback like this maybe once a year(?) from legitimate users whom the filter has stopped... it's a question of whether the (large) amount of spam that gets stopped is worth the (small) amount of valid edits which get stopped. - -sche (discuss) 05:13, 3 April 2024 (UTC)[reply]
@Ssvb: Generally, abuse filters are much stricter on new accounts: vandals, spammers and self-promoters almost always get blocked long before they stop being new. As for YouTube: it shouldn't be used to meet WT:CFI, but it can be used occasionally for other purposes. In general we try to avoid linking to anything commercial or promotional, so it's best to be as judicious and selective as possible. Chuck Entz (talk) 05:17, 3 April 2024 (UTC)[reply]
@Chuck Entz: Thanks, that's good to know. The documentation of the quote-av template says "Do not link to any webpage that has content in breach of copyright" and this is very useful, but other than this, the information is pretty scarce and maybe it could be improved? I think that the new contributors would appreciate that.
In my quotation I provided a link to a fragment of a news report published by a news agency on their own official youtube channel, so it should be okay from the copyright standpoint. As for the "avoid linking to anything commercial or promotional" guideline, I'm afraid that even a quotation from a book of a modern author may be potentially twisted as a commercial promotion of that particular author. I guess, "don't quote the same legit source too often and don't quote any shady sources at all" could be a good plan, though the distinction between legit and shady sources may be subjective in some cases. --Ssvb (talk) 07:23, 3 April 2024 (UTC)[reply]
@Ssvb These are guidelines, and you should use your common sense when it comes to things like "avoid linking to anything commercial or promotional". Copyright infringement could lead to legal consequences for Wikimedia concerning Wiktionary, which is why it says "Do not" rather than "Avoid". Benwing2 (talk) 07:57, 3 April 2024 (UTC)[reply]

alternative forms respect labels[edit]

I've fixed {{alt}} so if the tags specified after || can't be found in the "dialect data", they are looked up as labels. This respects omit_preComma and similar flags, so you can say something like

  • {{alt|en|Shi-jia-zhuang||also from|_|Pinyin|rare}}

and it correctly displays as

Here, the tag rare is a recognized label so it automatically links to the glossary; Pinyin is a label that normalizes to Hanyu Pinyin and links to Wikipedia; and the underscore prevents a comma from appearing. Benwing2 (talk) 03:36, 3 April 2024 (UTC)[reply]

Ethiopic Letter Kurk[edit]

I cannot add the Ethiopic Letter Kurk. 2A09:BAC3:378F:D2:0:0:15:1B5 06:37, 3 April 2024 (UTC)[reply]

I don't think that's a thing. See the letter names at w:Geʽez script#Geʽez abugida. --kc_kennylau (talk) 00:44, 4 April 2024 (UTC)[reply]
@Kc kennylau|2A09:BAC3:378F:D2:0:0:15:1B5 That's not a complete list, though. But unless the IP can show us why he thinks it exists, we probably can't help any further. If it hasn't been encoded in Unicode (either as one character or a sequence), it can't be added. --RichardW57m (talk) 16:38, 8 April 2024 (UTC)[reply]
google:"Ethiopic Letter Kurk" turns up exactly one hit: this thread. I suspect that this is not the right name for an Ethiopic letter. ‑‑ Eiríkr Útlendi │Tala við mig 18:19, 8 April 2024 (UTC)[reply]

I notice that entries are categorized into this category manually. It seems like {{head}} et al could detect multiple scripts and add the category automatically, at least in most cases. No? Is the issue that checking would be too 'expensive'? Would it be more expensive than the code that adds the "Terms spelled with..." categories? - -sche (discuss) 14:53, 3 April 2024 (UTC)[reply]

@-sche: I think there may be some complex cases because Wiktionary scripts may overlap, e.g. the Beng and as-Beng scripts for Sanskrit, and I'm not sure that Arabic script variants don't overlap even for some varieties of Arabic. It gets worse if one considers scripts not recorded as being used for the language of the text they're found in. --RichardW57m (talk) 16:04, 3 April 2024 (UTC)[reply]
Sure, some cases could still have to be added manually, but it seems like most cases could be handled automatically. Re "scripts not recorded as being used for the language of the text they're found in": isn't that orthogonal, or what am I missing? A Sanskrit term written (e.g.) partly in Beng and partly in Arab is a "term written in multiple scripts", regardless of whether the language has used both scripts, or neither script, or only one script, and regardless of whether our modules record either, both, or neither script as being used for the language, isn't it? The headword template/module just has to look at the characters in the pagetitle/head, determine if they're from more than one script, and add the category if so. We only need to fall back on manually adding the category if a pair of characters appear to be from the same ISO- or Wiktionary-code-having script, but actually represent different scripts (like might've been the case for subvarieties of Mong until we split Mong and gave them their own sub-codes, and like might be the case for subvarieties of Egyh if Egyh ever becomes computer-encodable and font-supported). - -sche (discuss) 16:20, 3 April 2024 (UTC)[reply]
@-sche: Consider Sanskrit কামো (kāmo). It's correctly recorded as being in both the Assamese and the Bengali scripts. A dumb algorithm could consider it to be written in a mixture. It's also a Pali word. Now, Pali is currently recorded as using the Bengali script but not the Assamese script, so there is no ambiguity.
Now considered Pali ৰরো (varo). We don't have a record of an attestation yet, but I think it's only a matter of time before it turns up. The word's currently treated as being in the Bengali script, but the first letter belongs to the writing system used for Assamese, but not Bengali, while the second letter is in the writing system not used for Bengali. If you don't like this word, look at the last word of Example 20 on page 8, the 20th displayed page at https://archive.org/details/pali-grammar/Ucchatar%20Pali%20Bhasha%20Shikkha%20by%20Karunabangsha%20Bhikkhu/page/n19/mode/2up. That word also has both the letters in it. To keep things clean, we might need to declare a new script (pi-Beng) for Bengali script Pali, and prevent the analysis considering the other scripts. So far I've preferred to avoid the complication of doing that, and put up with the inconveniences occasioned by the word , which is written entirely in a letter from as-Beng, which shows up in Example 4 (the second example on that same page).
Now, we may be able to do a reasonable job if we partition the scripts as Unicode does, and ignore the 'inherited' and 'common' characters. We might miss some interesting examples in Burmese script Pali, where different local groups have rather different sets of characters, and for Pali, I'm not talking about the difference between NGA and MON NGA, which are distinguished only by the encoding in real Pali words. --RichardW57m (talk) 17:18, 3 April 2024 (UTC)[reply]
@RichardW57m @-sche Category:Chinese terms written in multiple scripts is autogenerated by simply looking for terms that have both Hanzi and non-Hanzi characters in them. I don't see why we can't automate this everywhere by simply taking wha tever is the autodetermined script (which is based on which script has the most characters in the term) and looking to see whether all characters belong to that script. There's no problem in this approach if two scripts share some characters. Benwing2 (talk) 21:13, 3 April 2024 (UTC)[reply]
And worst-case scenario, if Indian scripts are actually problematic, just exclude those from being auto-categorized (so people still have to add entries in those scripts to the category manually, just like they currently do: they're no worse off). - -sche (discuss) 21:41, 3 April 2024 (UTC)[reply]
@-sche I implemented this. It started having false positives with spaces and hyphens, so I excluded them from consideration. However, there's still an issue with things like Area 51, where numbers aren't considered part of Latn. What do you think we should do here? Should we consider numbers as Latn, so that e.g. a Greek term with numbers in it still gets considered a "term written in multiple scripts", or should we exclude numbers entirely, or do nothing? Benwing2 (talk) 22:26, 3 April 2024 (UTC)[reply]
Also issues with apostrophes (devil's advocate), slashes (K/S), etc. Thoughts? Maybe all ASCII chars should be considered Latn? Benwing2 (talk) 22:28, 3 April 2024 (UTC)[reply]
@RichardW57m There are no terms so far in Category:Pali terms written in multiple scripts, and only one in Category:Sanskrit terms written in multiple scripts, which is उपेक्षिन्द्रिय​. Do you know why that term is there? Benwing2 (talk) 23:12, 3 April 2024 (UTC)[reply]
NVM, the term wrongly contained a U+200B (zero-width space) at the end. Benwing2 (talk) 23:17, 3 April 2024 (UTC)[reply]
@Benwing2, -sche: My first cut solution would be to ignore all characters in the Unicode script Common, aka Zyyy, and Inherited, aka Zinh. See https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt for definitions. The first includes ASCII non-letters. Note that many Thai abbreviations end in full stops - just look at category Category:Rhymes:Thai/ɔː - and they're being assigned to the category Creating Category:Thai terms written in multiple scripts. --RichardW57 (talk) 23:54, 3 April 2024 (UTC)[reply]
There is at least one term in Category:Pali terms written in multiple scripts, but you have to look at the categories of এৰ to see it. These two categories could conceivably take a week for all the members to be recorded in the category views. --RichardW57 (talk) 23:54, 3 April 2024 (UTC)[reply]
Is Thai โควิด-19 written in multiple scripts? --RichardW57 (talk) 23:54, 3 April 2024 (UTC)[reply]
@RichardW57 I have already excluded periods (full stops) from consideration for all scripts, along with commas, hyphens and spaces. I would argue that โควิด-19 contains multiple scripts; certainly it looks that way on first glance. Benwing2 (talk) 23:56, 3 April 2024 (UTC)[reply]
Can you tell me what's going on with এৰ? Does this legitimately have two scripts? If not, why not? Benwing2 (talk) 23:57, 3 April 2024 (UTC)[reply]
@Benwing2: I didn't design the Wiktionary script concept. As far as Unicode is concerned, it's in a single script, the Bengali script, and from its usage, it would seem that at least some Bengalis think it is. There's a relevant discussion at Template talk:pi-alt#ৰ. For script determination, it's the same as ৰরো (varo) discussed above. Pali in Bengali script is a mixture of Beng (uses ) and as-Beng (uses , but for /v/, not /r/). We could put it in a single script pi-Beng created by adding 'ৰ' to Beng. --RichardW57 (talk) 00:16, 4 April 2024 (UTC)[reply]
@RichardW57 If it's just a single char (or a fixed set of chars), I can add an exclusion for it, just like I've done for things like apostrophes in Cyrillic. Benwing2 (talk) 00:17, 4 April 2024 (UTC)[reply]
And U+200C in fa-Arab. Benwing2 (talk) 00:18, 4 April 2024 (UTC)[reply]
And U+200D in Sinh, and in whatever we use for the Bengali script for Pali. (It's needed in the latter to stop 'vy' being rendered with a repha.) --RichardW57 (talk) 00:28, 4 April 2024 (UTC)[reply]
I think U+200C may needed in Deva to support pedantic Hindi and also the faking of Sanskrit quotations (though possibly the latter doesn't matter for this application). Also needed in Tham for some Lao words where the ᨶᩣ ligature was deliberately not used. Possibly also for some odd-looking Tham-script Pali. --RichardW57 (talk) 00:34, 4 April 2024 (UTC)[reply]
Also, some cases where the ligature isn't formed in Northern Thai when the consonant and vowel are in different syllables might not be errors --RichardW57 (talk) 01:17, 4 April 2024 (UTC).[reply]
@Benwing2: COVID-19 clearly contains a heathen (Arabic to be precise) number in it - shouldn't that similarly be categorised as mixed script? --RichardW57 (talk) 00:20, 4 April 2024 (UTC)[reply]
@RichardW57 Maybe; but Arabic numerals are the native numeral set for Latin script whereas Thai script has Thai numerals natively. Benwing2 (talk) 00:21, 4 April 2024 (UTC)[reply]
@Benwing2: What? Roman numerals are the native set for the Latin script, not these newfangled (Western) Arabic numerals, which incidentally are the usual set for Maghribi Arabic. And Thais do their arithmetic in European-style Western Arabic numerals, and may convert the results to use Thai digits. --RichardW57 (talk) 00:38, 4 April 2024 (UTC)[reply]
There may be an exception if an abacus is used, but I think that would be by Chinese Thai, and I've never seen a Thai use an abacus. --RichardW57 (talk) 00:44, 4 April 2024 (UTC)[reply]
Serious books in English often start their page numbering using Roman numerals; less commonly, serious books in Thai start with numbering in letters. Contrariwise, magazines in English and in Thai generally use Western Arabic digits for their page numbering throughout. --RichardW57 (talk) 01:07, 4 April 2024 (UTC)[reply]
١٢٣٤٥٦٧٨٩٠ ≠ 1234567890 Chuck Entz (talk) 05:41, 4 April 2024 (UTC)[reply]
The first set are the near eastern digits ('ARABIC-INDIC' digits in Unicode parlance), not the Western Arabic digits nor, in Unicode parlance, the EXTENDED ARABIC-INDIC digits (a slightly dodgy concept). --RichardW57 (talk) 20:36, 4 April 2024 (UTC)[reply]
@Benwing2: Lithuanian is going to have a problem with U+0301 COMBINING ACUTE ACCENT and U+0303 COMBINING TILDE not being included in Latn. For future-proofing, we should also include U+0300 COMBINING GRAVE and U+0307 COMBINING DOT ABOVE. You're probably better off ignoring characters from the combining diacritics block altogether - there are issues with Romanian (combining comma below) and Thai-Script Patani Malay. I'll dig into them on request. --RichardW57 (talk) 01:41, 4 April 2024 (UTC)[reply]
@RichardW57 I agree; done. Benwing2 (talk) 01:50, 4 April 2024 (UTC)[reply]
I am thoroughly confused by this category. I went to see what could be an example in English other than terms that have numbers in them, which is a pretty suspect inclusion, and I saw Holy Wednesday is in Category:English terms written in multiple scripts. 1.) Why? 2.) How??? That category does not appear when I look at the entry, it is not a hidden category, and I assumed that it must have been in the entry as a manual addition that was recently removed, so there was just a lag time in the MediaWiki software generating the category, but it hasn't been edited in a year! How is "Holy Wednesday" multiple scripts??? Note that this is just a random example but there are many more that seem to have no clear reason for inclusion. —Justin (koavf)TCM 00:33, 4 April 2024 (UTC)[reply]
@Koavf That is because of MediaWiki lag. When I first added the category, I forgot to exclude spaces from consideration, so some terms with spaces got added. They will clear in time. Benwing2 (talk) 00:35, 4 April 2024 (UTC)[reply]
So it seems like most of the legit entries are letters-with-numbers, letters-with-@, and Roman-and-Greek-letters mixes, which is more-or-less sensible. As noted above, Arabic numbers are the native numeral system in English, so it's maybe arguable that this is "multiple scripts", but other typographic characters like "@" are definitely not a standalone "script", but perfectly normal parts of English-language writing. An entry like Borel σ-algebra seems legitimate. —Justin (koavf)TCM 00:40, 4 April 2024 (UTC)[reply]
Letters-with-numbers and letters-with-@ aren't considered multiple scripts; I exclude all non-letter ASCII symbols from consideration when the script is Latin. Any of this nature that you see are due to MediaWiki lag. Benwing2 (talk) 00:53, 4 April 2024 (UTC)[reply]
@Koavf the category is now clear of all stray (lagged) entries.
@Benwing2 we still need to dismiss en rules (Einstein–de Haas effect) from the category. Also not so sure about superscript numerals like I²C. This, that and the other (talk) 04:55, 4 April 2024 (UTC)[reply]
The Unicode rules say they no more count for script determination than do ASCII digits. --RichardW57 (talk) 05:06, 4 April 2024 (UTC)[reply]
FWIW, I would also not consider B♭ to be "multiple scripts". Would it work to (a) only categorize entries if they use multiple code-having scripts (so, using one script like Latn + using characters that are not script-specific won't get categorized, only the use of 2+ scripts like Latn + Arab would get categorized), and (b) also exclude any non-script scripts that need to be excluded, like if ♭ or ' (etc) is in Zsym, then have things in Zsym count as "not script-specific" for this purpose. ? - -sche (discuss) 05:33, 4 April 2024 (UTC)[reply]
@-sche Sort of. I think your idea is a good one but there are still some special cases, e.g. I just had to add a case for Cyrillic ъ and ь used in Proto-Slavic Latin terms, and it is a bit trickier to implement than what I'm doing so far. Benwing2 (talk) 06:04, 4 April 2024 (UTC)[reply]
@Benwing2: A case could be made for exempting the entire Reconstruction namespace, since they're in effect not spelled so much as notated. Chuck Entz (talk) 06:14, 4 April 2024 (UTC)[reply]
@Chuck Entz I agree, and have added this exemption. Benwing2 (talk) 06:43, 4 April 2024 (UTC)[reply]
Well, it's clearing, but it still has (e.g.) in hysterics on my end. It went from 443 to 233, so MediaWiki is doing its magic, so thanks to whomever (BW?) did that. I reckon we will soon have it whittled down to the 60 or so semi-legitimate entries.
I would think that "letters-plus-numbers" terms are actually much more reasonable to put in Category:English terms with numerals or somesuch (note that Category:English terms containing Roman numerals exists), as that could plausibly be something that someone is searching. And I don't think that someone who wants to see "Latin-characters-with-Greek-characters" also wants to see COVID-19 or A♭. Since it seems like a substantial majority are actually entries with Greek characters, I could give a weak support to "Category:English terms with Greek characters" or somesuch. —Justin (koavf)TCM 08:13, 4 April 2024 (UTC)[reply]
@Koavf I manually purged the whole category but some things have crept in afterwards. Benwing2 (talk) 08:23, 4 April 2024 (UTC)[reply]
I wasn't joking when I suggested it might take a week. I've certainly waited the best part of a week for a change to Pali categories to converge, and Pali is only a small part of Wiktionary. --RichardW57m (talk) 15:11, 4 April 2024 (UTC)[reply]
@-sche @RichardW57 I have redone the algorithm and made it simply elide the difference between e.g. Beng and as-Beng (in general ignoring the language-specific component of a script), which should fix the issue with এৰ. A side effect of this is that โควิด-19 no longer is considered to have multiple scripts (and wouldn't even if it mixed Thai characters with e.g. Devanagari numerals, I think). Benwing2 (talk) 03:27, 5 April 2024 (UTC)[reply]
Thanks. I'll defer to people who edit Thai, but my impression is that Thai uses Arabic numerals so normally that a text using them would not strike speakers as mixing scripts the way a mixture of Thai and Arabic letters would; certainly, I see that many languages like Chinese use Arabic numerals regularly enough that they don't seem to be part of a different script. So I think โควิด-19 not being considered to have multiple scripts is appropriate. - -sche (discuss) 04:30, 5 April 2024 (UTC)[reply]

Why is .nato in Category:Translingual terms written in multiple scripts ?[edit]

Equinox 08:46, 4 April 2024 (UTC)[reply]

Because it has "." Note that this will be purged and no longer appear in said category soon. E.g. I do not see it on my end.Justin (koavf)TCM 08:56, 4 April 2024 (UTC)[reply]

This is a good idea but there are still several terms being falsely categorized, including (within the English category) 5′ cap, Ger⁺⁶, H₂O, ni🅱️🅱️a, o͝o, and others. Now I realize that I've been criticized for the same thing, but in this case there really was a severe lack of testing before making a change. I think a much more conservative approach is required, where two scripts (e.g. Latin and Greek) are explicitly set as "different". It might even have to be done on a per-language basis, since Japanese being written using Chinese characters is clearly different from the other way around. By the way, @Koavf, your idea would exclude the entries い-adjective and な-adjective, which are definitely the most interesting of the bunch. Ioaxxere (talk) 19:24, 4 April 2024 (UTC)[reply]

@Ioaxxere I agree in general about testing, but this kind of stuff is difficult to test completely beforehand and the effect of getting things a bit wrong is fairly minor (just a false positive in a category). But I am going to implement User:-sche's approach of excluding all symbols and anything not a proper "script" from consideration; just had to get some sleep :) ... Benwing2 (talk) 19:44, 4 April 2024 (UTC)[reply]

Does anyone know how to do this?[edit]

Does anyone know how to check for changes on a Language as a whole? So say i wanted to keep an eye on what changes are mad on English as a whole, including entries, categories and what else, is there a way to easily view them instead of having to see the ‘newest changes’ table of every category? Melithius (talk) 10:07, 4 April 2024 (UTC)[reply]

@Melithius This is kind of possible: Go to Category:English lemmas and click "Related changes" on the left sidebar. For completeness, you would also need to monitor Category:English non-lemma forms' related changes page too. All English entries are in one or other category.
The big drawback, which will become obvious as soon as you attempt this, is that all changes for the entries concerned will be shown, even those relating to other language sections of the entry. But it may still be workable for you depending on what you want to do. It is likely to be very workable for languages written in scripts other than Latin. This, that and the other (talk) 11:41, 4 April 2024 (UTC)[reply]
Ah ok yes it worked, especially with the other languages i wanted to view, as you mentioned. Thanks! Melithius (talk) 13:02, 4 April 2024 (UTC)[reply]

Horizontal toclimit2[edit]

Would you like to test e.g. at te or a something like {{Template:User:Sarri.greek/toc2-hor}}
If you think it looks better that the vertical toclimit, could a real programmer take a look? (my amateurish Module:User:Sarri.greek/toc2-hor, style.css, Template:User:Sarri.greek/toc2-hor alert programmers alert - Please erase your username from this template if you are not available to receive alerts! Thank you, Sarri.greekMM Benwing2, Surjection PS Would editors of 3phased languages like something like wikt:el:Tempalte:test-ol? Thank you ‑‑Sarri.greek  I 05:13, 5 April 2024 (UTC)[reply]

Template:ja-new some changes[edit]

Accelerated Japanese entry creation {{subst:ja-new|へん-のう|s|returning|to return}} didn't work on creation 返納. Anatoli T. (обсудить/вклад) 08:20, 5 April 2024 (UTC)[reply]

@Atitarev What went wrong? It looks OK to me, although maybe I missed something. Benwing2 (talk) 08:54, 5 April 2024 (UTC)[reply]
@Benwing2: To reproduce, paste the full code obove on an empty line in the same entry and preview.
I didn’t generate the entry, I made it manually. The code above is supposed to create a verbal noun and verb entry simultaneously. Anatoli T. (обсудить/вклад) 09:52, 5 April 2024 (UTC)[reply]
@Atitarev Hmm, I tried it and it seems to work fine for me. What is the error you're seeing? Benwing2 (talk) 20:16, 5 April 2024 (UTC)[reply]
@Benwing2: Thanks for checking. Something happened between yesterday and today, I was getting some string concatenation error. Anyway, it's working now. Anatoli T. (обсудить/вклад) 23:06, 5 April 2024 (UTC)[reply]
@Benwing2: Hi. It happened again on 返納金(へんのうきん) (hennōkin): Lua error in Module:template_parser at line 402: bad argument #1 to 'find' (string expected, got nil)
I used {{subst:ja-n|へん-のう-きん||refund, repayment}}
It fixed itself on the 2nd edit but I saved this revision. Anatoli T. (обсудить/вклад) 05:22, 6 April 2024 (UTC)[reply]
Also calling @Theknightwho. It's your module. Anatoli T. (обсудить/вклад) 05:27, 6 April 2024 (UTC)[reply]
@Atitarev Hmm, I took a look at the error but I'm not sure why it happened. Usually this would mean someone accidentally introduced a bug and then quickly fixed it, but I don't see evidence of this. The error is in Module:template parser, which has been edited recently by User:Theknightwho but not in the last few minutes (and he hasn't contributed anything in a few hours). Benwing2 (talk) 05:27, 6 April 2024 (UTC)[reply]
@Benwing2: I think it's the same as yesterday. It fails on the preview or first edit on a NEW page. Then it can be fixed by a new edit with the same code. Anatoli T. (обсудить/вклад) 05:33, 6 April 2024 (UTC)[reply]
@Atitarev Hmm. Does it always happen on a new page? If so I may be able be fix it. Benwing2 (talk) 05:36, 6 April 2024 (UTC)[reply]
@Benwing2: Yes, on a new page. I don't know when it started to occur but I only noticed yesterday. It may have been a few weeks since I made new Japanese entries. Anatoli T. (обсудить/вклад) 05:39, 6 April 2024 (UTC)[reply]
@Atitarev Yes, I can reproduce this, but I can't figure out how to get a full stack trace due to the substing that's going on. Hopefully User:Theknightwho should be able to fix this; I imagine it is a simple fix. Benwing2 (talk) 05:47, 6 April 2024 (UTC)[reply]
@Atitarev @Benwing2 I’ll need to check when I’m on my laptop, but that error suggests that something is feeding nil into the parser instead of the page content. I know that subst sometimes causes a page to need to be saved twice to fully take effect, so I wonder if that’s a relevant factor here. Theknightwho (talk) 12:49, 6 April 2024 (UTC)[reply]
@Theknightwho, @Benwing2: Thanks, please do check.
I've made a three language (including four Chinese varieties) entry on 再起 with:
{{subst:zh-n|v|to rise again, to make a comeback||resurgence, comeback|k=재기}}
{{subst:ja-new|さい-き|s|resurgence, comeback|to rise again, to make a comeback}}
Only the Japanese entry failed, you can see in the edit history. The error was different this time.
The only sort of strange behaviour with "subst" I observed before was when something is reliant on the entry existence and it wasn't created yet, it showed some temporary errors, e.g. Thai readings in a usex or even headword but that behaviour changed to better.
Please fix. It may discourage users from making new accelerated Japanese entries, they will just think it's not working at all. Anatoli T. (обсудить/вклад) 00:03, 7 April 2024 (UTC)[reply]
For experimenting, you can try creating a new entry on e.g. 才気(さいき) (saiki, wisdom) with this:
{{subst:ja-new|さい-き|n|wisdom}} Anatoli T. (обсудить/вклад) 00:07, 7 April 2024 (UTC)[reply]
@Atitarev This should be fixed. Let me know if you're still having issues. Benwing2 (talk) 07:28, 8 April 2024 (UTC)[reply]

Why do some Wikipedia images not show up when used on Wiktionary?[edit]

e.g. the cartoon I just added at Colonel Blimp. Equinox 13:32, 6 April 2024 (UTC)[reply]

@Equinox Non-free images are uploaded to Wikipedia directly rather than Commons (where they’re not allowed). You could do the same, but we don’t really have any infrastructure for it. Theknightwho (talk) 13:39, 6 April 2024 (UTC)[reply]
I see. Had noticed it seemed to happen with commercial-ish stuff like screenshots and comics. Equinox 14:06, 6 April 2024 (UTC)[reply]
@Equinox If you do decide to reupload here, one other thing to be careful of is that permission to use non-free images is sometimes only given to Wikipedia by the copyright-holder. Theknightwho (talk) 15:02, 6 April 2024 (UTC)[reply]
@Equinox, Theknightwho: A related discussion is Wiktionary:Beer_parlour/2023/August#Image_upload_rights, where people seemed to oppose including fair use images. I still think that Wiktionary is being seriously hampered by copyright paranoia. Ioaxxere (talk) 22:07, 6 April 2024 (UTC)[reply]
It's not the culprit in this case, but FWIW another reason I've seen some images not display (anymore) here recently is that we added a bunch of images to our blacklist recently (because vandals started to put a few of them on irrelevant entries), and it turns out we were using at least one of them (to correctly illustrate nipple). (Perhaps someone could check whether any of the other images on MediaWiki:Bad image list are actually being used.) - -sche (discuss) 15:53, 6 April 2024 (UTC)[reply]
There is a protocol for allowing the use of an otherwise banned image on an appropriate page, though I don't know the procedure offhand. bd2412 T 16:23, 6 April 2024 (UTC)[reply]
In the nipple case, I just removed the image from the blacklist (it had been added as part of a mass import of WP's blacklist and not because anyone was specifically misusing it; I think we have abuse filters which stop most bad-image addition anyway). — This unsigned comment was added by -sche (talkcontribs) at 19:36, 6 April 2024 (UTC).[reply]
I am willing to provide a free, tasteful image of my nipple. Equinox 22:09, 6 April 2024 (UTC)[reply]
Only the one? DCDuring (talk) 23:36, 6 April 2024 (UTC)[reply]
@Equinox we currently have a single non-free image at thagomizer. Indeed, we have a policy specifically to allow this file: WT:NFCC. If you want to upload a non-free file in the same vein as the Far Side strip we already have, you would need to ensure that "its presence significantly increases readers' understanding of the topic" (per point 5 of that policy). I'm not sure that a picture of Colonel Blimp would qualify. This, that and the other (talk) 02:54, 7 April 2024 (UTC)[reply]
@Ioaxxere also. This, that and the other (talk) 02:54, 7 April 2024 (UTC)[reply]
As noted above, we have a very restrictive media upload policy and only four pieces of local media, two of which are basically required by MediaWiki software, one as redundant in case there is some vandalism to the item at c:, and a single fair-use file. While these are the only files, there are several discussions of deleted and moved ones as well and those could also be instructive about what the requirements are to upload locally. —Justin (koavf)TCM 07:45, 7 April 2024 (UTC)[reply]

Automatic acute stress addition to Belarusian (and possibly also Russian/Ukrainian) words in book quotations.[edit]

A somewhat relevant old discussion: https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2014/February#Should_quotations_be_normalized?. Ping @Benwing2, Atitarev, Insaneguy1083.

The current (unwritten?) rule is to add acute diacritics to mark stressed syllables in the quotations taken from the Belarusian, Russian and Ukrainian books (ex. дзот, бревно, завдовжки). However this is an annoying and time consuming chore for the native speakers and possibly a much more challenging and error prone task for the others. Not to mention possible typos. Also touching the original spelling just doesn't feel right.

I think that the stress marks can be added automatically for the majority of words. And I have created two proof-of-concept modules: Module:User:Ssvb/be-autostress-simple and Module:User:Ssvb/be-autostress-bloom-filter. The former is simple and doesn't scale. But the latter allows to squeeze up to ~30-40K lemmas and all their inflected forms (~200-300K words total) into a ~2MB Lua module without becoming a resource hog. It's possible to use data from https://github.com/Belarus/GrammarDB for the Belarusian words. And for the Russian language it's possible to just extract the words inflection and stress information from the Wiktionary dump (~53K lemmas). May I integrate it into the transliteration module? Does anyone see any pitfalls or have objections? --Ssvb (talk) 01:49, 7 April 2024 (UTC)[reply]

@Ssvb I don't have major conceptual objections to this but there are a large number of considerations and edge cases that should be worked out *BEFORE* you integrate this into any transliteration module. I actually wrote an offline script awhile ago [7] to add automatic accents as well as lemma links to Russian terms, and it runs to 1,200 lines and took weeks of development effort to work out the kinks. Benwing2 (talk) 06:40, 7 April 2024 (UTC)[reply]
@Benwing2: Thanks for the interesting link. I'm curious, how is this offline script used in practice? For example, К.Артём.1 have been adding some nice Russian quotations recently, but without annotating stressed syllables in them. Do you periodically run a bot to fix such quotations from time to time? How is this process organized? --Ssvb (talk) 13:53, 7 April 2024 (UTC)[reply]
@Benwing2: As for the stress annotation in my Lua module, I want to keep it very simple and reliable without any extra bells and whistles. Your offline script has a lot more features, which are nice, but don't seem to be strictly necessary. Right now the Belarusian transliteration module already automatically annotates stress for the letter "o" and this doesn't seem to cause any problems. This algorithm guesses correctly in more than 90% cases. But it isn't perfect and makes mistakes, because compound words like "мовазна́ўства" or "штодня́" don't fit this model. This problem can be addressed by adding a small dictionary of these few problematic compound words. Once this is implemented, we just get a better user experience and no disadvantages at all! And once we have a dictionary framework up and running, nothing stops us from adding even more words to it. Conceptually this is still just an extension of the already existing letter "о" stress auto-guesser functionality.
As for the edge cases, the obvious ones are "гады́" vs. "га́ды". Also some capitalized proper nouns are tricky, such as "Та́ні" (genitive form of a girl's name) vs. "тані́" (imperative form of "to drown") or "Я́на" (genitive form of a boy's name) vs. "яна́" ("she"). The module needs testcases with a good coverage for such things, but handling them is pretty straightforward. At least that's how I see it right now. --Ssvb (talk) 14:24, 7 April 2024 (UTC)[reply]
I'm not really a coder at least when it comes to Wiktionary, so I'm probably not one to answer here. I'm perfectly happy doing the stresses by hand, although as you mentioned, it's error-prone for non-native speakers like myself. Insaneguy1083 (talk) 11:40, 7 April 2024 (UTC)[reply]
@Insaneguy1083: Thanks for your response. I can handle Lua coding myself and I'm primarily interested in your feedback as a user. I think that the Belarusian part of English Wiktionary needs a lot more editors to add a lot of the currently missing content, but the learning curve unfortunately seems to be too steep for many potential contributors. --Ssvb (talk) 14:37, 7 April 2024 (UTC)[reply]
Adding accent marks to the first form of the quotation is deeply wrong. If you want to add editorial opinion to the line, there are {{quote-book}} options such as |norm= for this. While I understand why we don't do transliteration for Thai, it bothers me that there is no necessary relationship between the apparent transcription and how the original utterer would have intended the sentence to be said. For comparison, imagine transcribing "the ignominy of either economic controversy". --RichardW57 (talk) 17:21, 7 April 2024 (UTC)[reply]
@RichardW57: I'm completely ignorant about Thai, do you mean that you would prefer |norm= instead of |transliteration= for Thai word quotations, such as the quotation used for "ระกาศก"?
I agree that it seems natural for |text= to precisely reproduce the original spelling of the quoted book, but these things are rather loosely documented in WT:QUOTE#Spelling_and_typography ("Generally, the original spelling of the word or phrase should be kept in the citation. In practice, however, this doesn't always happen") and new contributors tend to mimic the formatting of the existing entries. The language-specific guidelines in WT:ARU could potentially provide clarifications specifically for the Russian entries, but currently it has no clear explanations for book quotations.
I propose the following:
  • In a Russian quotation like |text=Мама мыла раму|t=Mom was washing a window frame, the Lua module can automatically create its normalization |norm=Ма́ма мы́ла ра́му using a dictionary and then the template can create transliteration |tr=Máma mýla rámu from this normalization. But if |text= already contains acute stress marks like it is done now, then the generation of normalization can be suppressed.
  • In a Belarusian quotation, Cyrillic normalization can be automatically created even from Łacinka and automatically stress annotated using a dictionary: |text=Ulezła ŭ chatu jak sztodnia|norm=Уле́зла ў ха́ту як штодня́|tr=Uljézla ŭ xátu jak štodnjá|t=Sneaked into the house like it was a daily routine.
The downside is that having both |text= and |norm= adds extra visual clutter, so I understand why the existing practice of replacing text with its normalization in Russian quotations has its appeal. --Ssvb (talk) 02:44, 8 April 2024 (UTC)[reply]
@Benwing2: I just noticed that the |norm= parameter and the https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2023/July#Adding_a_normalization_param_to_{{ux}},_{{quote}},_etc. discussion about it was a relatively new development. Is there a framework and some sort of standardized Lua modules naming convention planned for hooking the automated conversion from |text= to |norm=? I mean something similar to the Module:languages#Language:transliterate functionality. --Ssvb (talk) 04:49, 8 April 2024 (UTC)[reply]
For example, ภรรยา (pan-yaa, wife) may also be pronounced pan-rá-yaa. When we transliterate it in a quotation, we unintentionally attribute the 2-syllable pronunciation to the author. With โควิด-19, we have no idea whether the number part would have been pronounced as in Thai or (approximately) as in English. This problem is inherent in unfaithful transliteration, and I think it's rampant in Japanese with its multiple readings. With such systems, reason for scepticism increases as ones go from text to normalisation to 'transliteration' to translation. --