Wiktionary:Grease pit/2024/April

Magic words appearing in WhatLinksHere

I noticed that WT:Todo/Lists/Entries using nonexistent templates had suddenly filled up with spurious transclusions of MediaWiki-implemented magic words like {{!}} and {{PAGENAME}}. This can also be seen in WhatLinksHere: [1]. These templates don't exist and are known by our Lua code to be magic words (well, at least {{temp}} itself treats them specially), so there should be no reason to attempt to transclude them.

Something has changed in the last week. It's only happening on this wiki, so it's coming from our Lua modules rather than MediaWiki itself. I'm pinging @Theknightwho as a starting point. This, that and the other (talk) 01:26, 1 April 2024 (UTC)[reply]

@This, that and the other. I noticed this last week: if you look at the March 24 revision you'll see there are already some of those, so it had to have happened before then- not as many, so probably not long before. Chuck Entz (talk) 02:38, 1 April 2024 (UTC)[reply]

After doing some spot-checking it appears that all of these have the magic words (as well as things like "!" and "=") wrapped in {{ }}. These were inserted as part or all of parameters in templates, in interwikis, and in categories. The use of {{PAGENAME}} in filenames makes me nervous, since they'll have to be fixed if the page is moved (those should have been subst:ed).

Come to think of it, either subst:ing all the {{PAGENAME}}s or replacing them with the pagenames themselves looks like a perfect job for a bot. Chuck Entz (talk) 03:47, 1 April 2024 (UTC)[reply]

@Chuck Entz Yeah, some people systematically insert {{PAGENAME}} into Wikitext. I think it's a bad idea. Benwing2 (talk) 05:03, 1 April 2024 (UTC)[reply]

@This, that and the other @Chuck Entz @Benwing2 This was down to an older version of the template parser which didn’t handle parser variables (i.e. magic words which don’t take any parameters), so it was still grabbing the title object. This was fixed about a week ago, but clearly hasn’t propagated through everywhere yet. Some parser variables can also act like magic words (e.g. {{PAGENAME}} vs {{PAGENAME:title}}), but many can’t (e.g. {{!}} and {{=}} will default to templates if you try), and some of them are case-sensitive while others aren’t, so I had to make sure it knew how to handle all the various possible inputs. As a side point, it is actually possible to use templates with those names by using (e.g.) {{msg:PAGENAME}}, which {{temp}} is also aware of, and is on my to-do list for the template parser. Theknightwho (talk) 15:15, 1 April 2024 (UTC)[reply]

Der3, Rel3, Col3

Please replace derx, relx in all Philippine languages (especially Tagalog) to colx templates. Thank you. Ysrael214 (talk) 02:36, 1 April 2024 (UTC)[reply]

Welsh word 'hambon'

I'm trying to add the Welsh word 'hambon'. I've given various sources for the word used in context, but I'm given a

"This action has been automatically identified as harmful, and therefore disallowed. If you believe your action was constructive, please start a new Grease pit discussion and describe what you were trying to do. A brief description of the abuse rule which your action matched is: various specific spammer habits"

Could someone help resolve this? Wemblydumblediddle (talk) 08:58, 2 April 2024 (UTC)[reply]

Your entry does not follow the correct formatting. Please see WT:EL and existing Welsh noun entries for examples. The "Cultural Significance" should perhaps at best be a "Usage notes" section. The most likely reason for the filter though is the usage of external links to e.g. YouTube. — SURJECTION ^{/ T / C / L /} 09:03, 2 April 2024 (UTC)[reply]

Thanks for getting back so quickly, I'm new to this. I can't think of a way around this, there isn't much in writing about the word. The only decent written source I could find is that Guardian article. Is there a way around the automatic filtering? Does someone have the authority to verify the entry, modifying it if necessary? Wemblydumblediddle (talk) 09:12, 2 April 2024 (UTC)[reply]

Try publishing the entry again, but without the YouTube links (and ideally also with formatting changes you can gather from the two links I posted). — SURJECTION ^{/ T / C / L /} 09:48, 2 April 2024 (UTC)[reply]

Thanks, that worked. It's a shame I can't include the video, though. That Hansh video is probably the best example of the word in use; it features hambons explaining what it means to be a hambon, and it's very entertaining for West Wales Welsh speakers. Wemblydumblediddle (talk) 10:41, 2 April 2024 (UTC)[reply]

Wiktionary doesn't seem to block youtube links per se, because I successfully added a youtube link in the third quotation of this revision via Template:quote-av in order to confirm pronunciation and the stressed syllable. Does this go against the WT:CFI#Durably_archived policy? --Ssvb (talk) 20:15, 2 April 2024 (UTC)[reply]

"Durably archived" is only a policy requirement when it comes to proving the existence of the word (as a WT:RFV/WT:ATTEST question); it's OK to add links to (ideally reliable or at least representative and inoffensive) youtube videos to show pronunciation, and we regularly provide References or Further reading links to various reliable online dictionaries. Brand-new users are currently prevented from doing so, because most such users are spammers, but we do get feedback like this maybe once a year(?) from legitimate users whom the filter has stopped... it's a question of whether the (large) amount of spam that gets stopped is worth the (small) amount of valid edits which get stopped. - -sche (discuss) 05:13, 3 April 2024 (UTC)[reply]

@Ssvb: Generally, abuse filters are much stricter on new accounts: vandals, spammers and self-promoters almost always get blocked long before they stop being new. As for YouTube: it shouldn't be used to meet WT:CFI, but it can be used occasionally for other purposes. In general we try to avoid linking to anything commercial or promotional, so it's best to be as judicious and selective as possible. Chuck Entz (talk) 05:17, 3 April 2024 (UTC)[reply]

@Chuck Entz: Thanks, that's good to know. The documentation of the quote-av template says "Do not link to any webpage that has content in breach of copyright" and this is very useful, but other than this, the information is pretty scarce and maybe it could be improved? I think that the new contributors would appreciate that.

In my quotation I provided a link to a fragment of a news report published by a news agency on their own official youtube channel, so it should be okay from the copyright standpoint. As for the "avoid linking to anything commercial or promotional" guideline, I'm afraid that even a quotation from a book of a modern author may be potentially twisted as a commercial promotion of that particular author. I guess, "don't quote the same legit source too often and don't quote any shady sources at all" could be a good plan, though the distinction between legit and shady sources may be subjective in some cases. --Ssvb (talk) 07:23, 3 April 2024 (UTC)[reply]

@Ssvb These are guidelines, and you should use your common sense when it comes to things like "avoid linking to anything commercial or promotional". Copyright infringement could lead to legal consequences for Wikimedia concerning Wiktionary, which is why it says "Do not" rather than "Avoid". Benwing2 (talk) 07:57, 3 April 2024 (UTC)[reply]

alternative forms respect labels

I've fixed {{alt}} so if the tags specified after || can't be found in the "dialect data", they are looked up as labels. This respects omit_preComma and similar flags, so you can say something like

{{alt|en|Shi-jia-zhuang||also from|_|Pinyin|rare}}

and it correctly displays as

Shi-jia-zhuang (also from Hanyu Pinyin, rare)

Here, the tag rare is a recognized label so it automatically links to the glossary; Pinyin is a label that normalizes to Hanyu Pinyin and links to Wikipedia; and the underscore prevents a comma from appearing. Benwing2 (talk) 03:36, 3 April 2024 (UTC)[reply]

Ethiopic Letter Kurk

I cannot add the Ethiopic Letter Kurk. 2A09:BAC3:378F:D2:0:0:15:1B5 06:37, 3 April 2024 (UTC)[reply]

I don't think that's a thing. See the letter names at w:Geʽez script#Geʽez abugida. --kc_kennylau (talk) 00:44, 4 April 2024 (UTC)[reply]

@Kc kennylau|2A09:BAC3:378F:D2:0:0:15:1B5 That's not a complete list, though. But unless the IP can show us why he thinks it exists, we probably can't help any further. If it hasn't been encoded in Unicode (either as one character or a sequence), it can't be added. --RichardW57m (talk) 16:38, 8 April 2024 (UTC)[reply]

google:"Ethiopic Letter Kurk" turns up exactly one hit: this thread. I suspect that this is not the right name for an Ethiopic letter. ‑‑ Eiríkr Útlendi │^{Tala við mig} 18:19, 8 April 2024 (UTC)[reply]

Category:Terms written in multiple scripts

I notice that entries are categorized into this category manually. It seems like {{head}} et al could detect multiple scripts and add the category automatically, at least in most cases. No? Is the issue that checking would be too 'expensive'? Would it be more expensive than the code that adds the "Terms spelled with..." categories? - -sche (discuss) 14:53, 3 April 2024 (UTC)[reply]

@-sche: I think there may be some complex cases because Wiktionary scripts may overlap, e.g. the Beng and as-Beng scripts for Sanskrit, and I'm not sure that Arabic script variants don't overlap even for some varieties of Arabic. It gets worse if one considers scripts not recorded as being used for the language of the text they're found in. --RichardW57m (talk) 16:04, 3 April 2024 (UTC)[reply]

Sure, some cases could still have to be added manually, but it seems like most cases could be handled automatically. Re "scripts not recorded as being used for the language of the text they're found in": isn't that orthogonal, or what am I missing? A Sanskrit term written (e.g.) partly in Beng and partly in Arab is a "term written in multiple scripts", regardless of whether the language has used both scripts, or neither script, or only one script, and regardless of whether our modules record either, both, or neither script as being used for the language, isn't it? The headword template/module just has to look at the characters in the pagetitle/head, determine if they're from more than one script, and add the category if so. We only need to fall back on manually adding the category if a pair of characters appear to be from the same ISO- or Wiktionary-code-having script, but actually represent different scripts (like might've been the case for subvarieties of Mong until we split Mong and gave them their own sub-codes, and like might be the case for subvarieties of Egyh if Egyh ever becomes computer-encodable and font-supported). - -sche (discuss) 16:20, 3 April 2024 (UTC)[reply]

@-sche: Consider Sanskrit কামো (kāmo). It's correctly recorded as being in both the Assamese and the Bengali scripts. A dumb algorithm could consider it to be written in a mixture. It's also a Pali word. Now, Pali is currently recorded as using the Bengali script but not the Assamese script, so there is no ambiguity.

Now considered Pali ৰরো (varo). We don't have a record of an attestation yet, but I think it's only a matter of time before it turns up. The word's currently treated as being in the Bengali script, but the first letter belongs to the writing system used for Assamese, but not Bengali, while the second letter is in the writing system not used for Bengali. If you don't like this word, look at the last word of Example 20 on page 8, the 20th displayed page at https://archive.org/details/pali-grammar/Ucchatar%20Pali%20Bhasha%20Shikkha%20by%20Karunabangsha%20Bhikkhu/page/n19/mode/2up. That word also has both the letters in it. To keep things clean, we might need to declare a new script (pi-Beng) for Bengali script Pali, and prevent the analysis considering the other scripts. So far I've preferred to avoid the complication of doing that, and put up with the inconveniences occasioned by the word ৰ, which is written entirely in a letter from as-Beng, which shows up in Example 4 (the second example on that same page).

Now, we may be able to do a reasonable job if we partition the scripts as Unicode does, and ignore the 'inherited' and 'common' characters. We might miss some interesting examples in Burmese script Pali, where different local groups have rather different sets of characters, and for Pali, I'm not talking about the difference between NGA and MON NGA, which are distinguished only by the encoding in real Pali words. --RichardW57m (talk) 17:18, 3 April 2024 (UTC)[reply]

@RichardW57m @-sche Category:Chinese terms written in multiple scripts is autogenerated by simply looking for terms that have both Hanzi and non-Hanzi characters in them. I don't see why we can't automate this everywhere by simply taking wha tever is the autodetermined script (which is based on which script has the most characters in the term) and looking to see whether all characters belong to that script. There's no problem in this approach if two scripts share some characters. Benwing2 (talk) 21:13, 3 April 2024 (UTC)[reply]

And worst-case scenario, if Indian scripts are actually problematic, just exclude those from being auto-categorized (so people still have to add entries in those scripts to the category manually, just like they currently do: they're no worse off). - -sche (discuss) 21:41, 3 April 2024 (UTC)[reply]

@-sche I implemented this. It started having false positives with spaces and hyphens, so I excluded them from consideration. However, there's still an issue with things like Area 51, where numbers aren't considered part of Latn. What do you think we should do here? Should we consider numbers as Latn, so that e.g. a Greek term with numbers in it still gets considered a "term written in multiple scripts", or should we exclude numbers entirely, or do nothing? Benwing2 (talk) 22:26, 3 April 2024 (UTC)[reply]

Also issues with apostrophes (devil's advocate), slashes (K/S), etc. Thoughts? Maybe all ASCII chars should be considered Latn? Benwing2 (talk) 22:28, 3 April 2024 (UTC)[reply]

@RichardW57m There are no terms so far in Category:Pali terms written in multiple scripts, and only one in Category:Sanskrit terms written in multiple scripts, which is उपेक्षिन्द्रिय. Do you know why that term is there? Benwing2 (talk) 23:12, 3 April 2024 (UTC)[reply]

NVM, the term wrongly contained a U+200B (zero-width space) at the end. Benwing2 (talk) 23:17, 3 April 2024 (UTC)[reply]

@Benwing2, -sche: My first cut solution would be to ignore all characters in the Unicode script Common, aka Zyyy, and Inherited, aka Zinh. See https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt for definitions. The first includes ASCII non-letters. Note that many Thai abbreviations end in full stops - just look at category Category:Rhymes:Thai/ɔː - and they're being assigned to the category Creating Category:Thai terms written in multiple scripts. --RichardW57 (talk) 23:54, 3 April 2024 (UTC)[reply]

There is at least one term in Category:Pali terms written in multiple scripts, but you have to look at the categories of এৰ to see it. These two categories could conceivably take a week for all the members to be recorded in the category views. --RichardW57 (talk) 23:54, 3 April 2024 (UTC)[reply]

Is Thai โควิด-19 written in multiple scripts? --RichardW57 (talk) 23:54, 3 April 2024 (UTC)[reply]

@RichardW57 I have already excluded periods (full stops) from consideration for all scripts, along with commas, hyphens and spaces. I would argue that โควิด-19 contains multiple scripts; certainly it looks that way on first glance. Benwing2 (talk) 23:56, 3 April 2024 (UTC)[reply]

Can you tell me what's going on with এৰ? Does this legitimately have two scripts? If not, why not? Benwing2 (talk) 23:57, 3 April 2024 (UTC)[reply]

@Benwing2: I didn't design the Wiktionary script concept. As far as Unicode is concerned, it's in a single script, the Bengali script, and from its usage, it would seem that at least some Bengalis think it is. There's a relevant discussion at Template talk:pi-alt#ৰ. For script determination, it's the same as ৰরো (varo) discussed above. Pali in Bengali script is a mixture of Beng (uses র) and as-Beng (uses ৰ, but for /v/, not /r/). We could put it in a single script pi-Beng created by adding 'ৰ' to Beng. --RichardW57 (talk) 00:16, 4 April 2024 (UTC)[reply]

@RichardW57 If it's just a single char (or a fixed set of chars), I can add an exclusion for it, just like I've done for things like apostrophes in Cyrillic. Benwing2 (talk) 00:17, 4 April 2024 (UTC)[reply]

And U+200C in fa-Arab. Benwing2 (talk) 00:18, 4 April 2024 (UTC)[reply]

And U+200D in Sinh, and in whatever we use for the Bengali script for Pali. (It's needed in the latter to stop 'vy' being rendered with a repha.) --RichardW57 (talk) 00:28, 4 April 2024 (UTC)[reply]

I think U+200C may needed in Deva to support pedantic Hindi and also the faking of Sanskrit quotations (though possibly the latter doesn't matter for this application). Also needed in Tham for some Lao words where the ᨶᩣ ligature was deliberately not used. Possibly also for some odd-looking Tham-script Pali. --RichardW57 (talk) 00:34, 4 April 2024 (UTC)[reply]

Also, some cases where the ligature isn't formed in Northern Thai when the consonant and vowel are in different syllables might not be errors --RichardW57 (talk) 01:17, 4 April 2024 (UTC).[reply]

@Benwing2: COVID-19 clearly contains a heathen (Arabic to be precise) number in it - shouldn't that similarly be categorised as mixed script? --RichardW57 (talk) 00:20, 4 April 2024 (UTC)[reply]

@RichardW57 Maybe; but Arabic numerals are the native numeral set for Latin script whereas Thai script has Thai numerals natively. Benwing2 (talk) 00:21, 4 April 2024 (UTC)[reply]

@Benwing2: What? Roman numerals are the native set for the Latin script, not these newfangled (Western) Arabic numerals, which incidentally are the usual set for Maghribi Arabic. And Thais do their arithmetic in European-style Western Arabic numerals, and may convert the results to use Thai digits. --RichardW57 (talk) 00:38, 4 April 2024 (UTC)[reply]

There may be an exception if an abacus is used, but I think that would be by Chinese Thai, and I've never seen a Thai use an abacus. --RichardW57 (talk) 00:44, 4 April 2024 (UTC)[reply]

Serious books in English often start their page numbering using Roman numerals; less commonly, serious books in Thai start with numbering in letters. Contrariwise, magazines in English and in Thai generally use Western Arabic digits for their page numbering throughout. --RichardW57 (talk) 01:07, 4 April 2024 (UTC)[reply]

١٢٣٤٥٦٧٨٩٠ ≠ 1234567890 Chuck Entz (talk) 05:41, 4 April 2024 (UTC)[reply]

The first set are the near eastern digits ('ARABIC-INDIC' digits in Unicode parlance), not the Western Arabic digits nor, in Unicode parlance, the EXTENDED ARABIC-INDIC digits (a slightly dodgy concept). --RichardW57 (talk) 20:36, 4 April 2024 (UTC)[reply]

@Benwing2: Lithuanian is going to have a problem with U+0301 COMBINING ACUTE ACCENT and U+0303 COMBINING TILDE not being included in Latn. For future-proofing, we should also include U+0300 COMBINING GRAVE and U+0307 COMBINING DOT ABOVE. You're probably better off ignoring characters from the combining diacritics block altogether - there are issues with Romanian (combining comma below) and Thai-Script Patani Malay. I'll dig into them on request. --RichardW57 (talk) 01:41, 4 April 2024 (UTC)[reply]

@RichardW57 I agree; done. Benwing2 (talk) 01:50, 4 April 2024 (UTC)[reply]

I am thoroughly confused by this category. I went to see what could be an example in English other than terms that have numbers in them, which is a pretty suspect inclusion, and I saw Holy Wednesday is in Category:English terms written in multiple scripts. 1.) Why? 2.) How??? That category does not appear when I look at the entry, it is not a hidden category, and I assumed that it must have been in the entry as a manual addition that was recently removed, so there was just a lag time in the MediaWiki software generating the category, but it hasn't been edited in a year! How is "Holy Wednesday" multiple scripts??? Note that this is just a random example but there are many more that seem to have no clear reason for inclusion. —Justin (koavf)❤T☮C☺M☯ 00:33, 4 April 2024 (UTC)[reply]

@Koavf That is because of MediaWiki lag. When I first added the category, I forgot to exclude spaces from consideration, so some terms with spaces got added. They will clear in time. Benwing2 (talk) 00:35, 4 April 2024 (UTC)[reply]

So it seems like most of the legit entries are letters-with-numbers, letters-with-@, and Roman-and-Greek-letters mixes, which is more-or-less sensible. As noted above, Arabic numbers are the native numeral system in English, so it's maybe arguable that this is "multiple scripts", but other typographic characters like "@" are definitely not a standalone "script", but perfectly normal parts of English-language writing. An entry like Borel σ-algebra seems legitimate. —Justin (koavf)❤T☮C☺M☯ 00:40, 4 April 2024 (UTC)[reply]

Letters-with-numbers and letters-with-@ aren't considered multiple scripts; I exclude all non-letter ASCII symbols from consideration when the script is Latin. Any of this nature that you see are due to MediaWiki lag. Benwing2 (talk) 00:53, 4 April 2024 (UTC)[reply]

@Koavf the category is now clear of all stray (lagged) entries.

@Benwing2 we still need to dismiss en rules (Einstein–de Haas effect) from the category. Also not so sure about superscript numerals like I²C. This, that and the other (talk) 04:55, 4 April 2024 (UTC)[reply]

The Unicode rules say they no more count for script determination than do ASCII digits. --RichardW57 (talk) 05:06, 4 April 2024 (UTC)[reply]

FWIW, I would also not consider B♭ to be "multiple scripts". Would it work to (a) only categorize entries if they use multiple code-having scripts (so, using one script like Latn + using characters that are not script-specific won't get categorized, only the use of 2+ scripts like Latn + Arab would get categorized), and (b) also exclude any non-script scripts that need to be excluded, like if ♭ or ' (etc) is in Zsym, then have things in Zsym count as "not script-specific" for this purpose. ? - -sche (discuss) 05:33, 4 April 2024 (UTC)[reply]

@-sche Sort of. I think your idea is a good one but there are still some special cases, e.g. I just had to add a case for Cyrillic ъ and ь used in Proto-Slavic Latin terms, and it is a bit trickier to implement than what I'm doing so far. Benwing2 (talk) 06:04, 4 April 2024 (UTC)[reply]

@Benwing2: A case could be made for exempting the entire Reconstruction namespace, since they're in effect not spelled so much as notated. Chuck Entz (talk) 06:14, 4 April 2024 (UTC)[reply]

@Chuck Entz I agree, and have added this exemption. Benwing2 (talk) 06:43, 4 April 2024 (UTC)[reply]

Well, it's clearing, but it still has (e.g.) in hysterics on my end. It went from 443 to 233, so MediaWiki is doing its magic, so thanks to whomever (BW?) did that. I reckon we will soon have it whittled down to the 60 or so semi-legitimate entries.

I would think that "letters-plus-numbers" terms are actually much more reasonable to put in Category:English terms with numerals or somesuch (note that Category:English terms containing Roman numerals exists), as that could plausibly be something that someone is searching. And I don't think that someone who wants to see "Latin-characters-with-Greek-characters" also wants to see COVID-19 or A♭. Since it seems like a substantial majority are actually entries with Greek characters, I could give a weak support to "Category:English terms with Greek characters" or somesuch. —Justin (koavf)❤T☮C☺M☯ 08:13, 4 April 2024 (UTC)[reply]

@Koavf I manually purged the whole category but some things have crept in afterwards. Benwing2 (talk) 08:23, 4 April 2024 (UTC)[reply]

I wasn't joking when I suggested it might take a week. I've certainly waited the best part of a week for a change to Pali categories to converge, and Pali is only a small part of Wiktionary. --RichardW57m (talk) 15:11, 4 April 2024 (UTC)[reply]

@-sche @RichardW57 I have redone the algorithm and made it simply elide the difference between e.g. Beng and as-Beng (in general ignoring the language-specific component of a script), which should fix the issue with এৰ. A side effect of this is that โควิด-19 no longer is considered to have multiple scripts (and wouldn't even if it mixed Thai characters with e.g. Devanagari numerals, I think). Benwing2 (talk) 03:27, 5 April 2024 (UTC)[reply]

Thanks. I'll defer to people who edit Thai, but my impression is that Thai uses Arabic numerals so normally that a text using them would not strike speakers as mixing scripts the way a mixture of Thai and Arabic letters would; certainly, I see that many languages like Chinese use Arabic numerals regularly enough that they don't seem to be part of a different script. So I think โควิด-19 not being considered to have multiple scripts is appropriate. - -sche (discuss) 04:30, 5 April 2024 (UTC)[reply]

Why is .nato in Category:Translingual terms written in multiple scripts ?

Equinox ◑ 08:46, 4 April 2024 (UTC)[reply]

Because it has "." Note that this will be purged and no longer appear in said category soon. E.g. I do not see it on my end. —Justin (koavf)❤T☮C☺M☯ 08:56, 4 April 2024 (UTC)[reply]

This is a good idea but there are still several terms being falsely categorized, including (within the English category) 5′ cap, Ger⁺⁶, H₂O, ni🅱️🅱️a, o͝o, and others. Now I realize that I've been criticized for the same thing, but in this case there really was a severe lack of testing before making a change. I think a much more conservative approach is required, where two scripts (e.g. Latin and Greek) are explicitly set as "different". It might even have to be done on a per-language basis, since Japanese being written using Chinese characters is clearly different from the other way around. By the way, @Koavf, your idea would exclude the entries い-adjective and な-adjective, which are definitely the most interesting of the bunch. Ioaxxere (talk) 19:24, 4 April 2024 (UTC)[reply]

@Ioaxxere I agree in general about testing, but this kind of stuff is difficult to test completely beforehand and the effect of getting things a bit wrong is fairly minor (just a false positive in a category). But I am going to implement User:-sche's approach of excluding all symbols and anything not a proper "script" from consideration; just had to get some sleep :) ... Benwing2 (talk) 19:44, 4 April 2024 (UTC)[reply]

Does anyone know how to do this?

Does anyone know how to check for changes on a Language as a whole? So say i wanted to keep an eye on what changes are mad on English as a whole, including entries, categories and what else, is there a way to easily view them instead of having to see the ‘newest changes’ table of every category? Melithius (talk) 10:07, 4 April 2024 (UTC)[reply]

@Melithius This is kind of possible: Go to Category:English lemmas and click "Related changes" on the left sidebar. For completeness, you would also need to monitor Category:English non-lemma forms' related changes page too. All English entries are in one or other category.

The big drawback, which will become obvious as soon as you attempt this, is that all changes for the entries concerned will be shown, even those relating to other language sections of the entry. But it may still be workable for you depending on what you want to do. It is likely to be very workable for languages written in scripts other than Latin. This, that and the other (talk) 11:41, 4 April 2024 (UTC)[reply]

Ah ok yes it worked, especially with the other languages i wanted to view, as you mentioned. Thanks! Melithius (talk) 13:02, 4 April 2024 (UTC)[reply]

Horizontal toclimit2

Would you like to test e.g. at te or a something like {{Template:User:Sarri.greek/toc2-hor}}
If you think it looks better that the vertical toclimit, could a real programmer take a look? (my amateurish Module:User:Sarri.greek/toc2-hor, style.css, Template:User:Sarri.greek/toc2-hor alert programmers MM Benwing2, Surjection PS Would editors of 3phased languages like something like wikt:el:Tempalte:test-ol? Thank you ‑‑Sarri.greek ^♫ I 05:13, 5 April 2024 (UTC)[reply]

Template:ja-new some changes

Accelerated Japanese entry creation {{subst:ja-new|へん-のう|s|returning|to return}} didn't work on creation 返納. Anatoli T. ^{(обсудить}/^вклад) 08:20, 5 April 2024 (UTC)[reply]

@Atitarev What went wrong? It looks OK to me, although maybe I missed something. Benwing2 (talk) 08:54, 5 April 2024 (UTC)[reply]

@Benwing2: To reproduce, paste the full code obove on an empty line in the same entry and preview.

I didn’t generate the entry, I made it manually. The code above is supposed to create a verbal noun and verb entry simultaneously. Anatoli T. ^{(обсудить}/^вклад) 09:52, 5 April 2024 (UTC)[reply]

@Atitarev Hmm, I tried it and it seems to work fine for me. What is the error you're seeing? Benwing2 (talk) 20:16, 5 April 2024 (UTC)[reply]

@Benwing2: Thanks for checking. Something happened between yesterday and today, I was getting some string concatenation error. Anyway, it's working now. Anatoli T. ^{(обсудить}/^вклад) 23:06, 5 April 2024 (UTC)[reply]

@Benwing2: Hi. It happened again on 返納金(へんのうきん) (hennōkin): Lua error in Module:template_parser at line 402: bad argument #1 to 'find' (string expected, got nil)

I used {{subst:ja-n|へん-のう-きん||refund, repayment}}

It fixed itself on the 2nd edit but I saved this revision. Anatoli T. ^{(обсудить}/^вклад) 05:22, 6 April 2024 (UTC)[reply]

Also calling @Theknightwho. It's your module. Anatoli T. ^{(обсудить}/^вклад) 05:27, 6 April 2024 (UTC)[reply]

@Atitarev Hmm, I took a look at the error but I'm not sure why it happened. Usually this would mean someone accidentally introduced a bug and then quickly fixed it, but I don't see evidence of this. The error is in Module:template parser, which has been edited recently by User:Theknightwho but not in the last few minutes (and he hasn't contributed anything in a few hours). Benwing2 (talk) 05:27, 6 April 2024 (UTC)[reply]

@Benwing2: I think it's the same as yesterday. It fails on the preview or first edit on a NEW page. Then it can be fixed by a new edit with the same code. Anatoli T. ^{(обсудить}/^вклад) 05:33, 6 April 2024 (UTC)[reply]

@Atitarev Hmm. Does it always happen on a new page? If so I may be able be fix it. Benwing2 (talk) 05:36, 6 April 2024 (UTC)[reply]

@Benwing2: Yes, on a new page. I don't know when it started to occur but I only noticed yesterday. It may have been a few weeks since I made new Japanese entries. Anatoli T. ^{(обсудить}/^вклад) 05:39, 6 April 2024 (UTC)[reply]

@Atitarev Yes, I can reproduce this, but I can't figure out how to get a full stack trace due to the substing that's going on. Hopefully User:Theknightwho should be able to fix this; I imagine it is a simple fix. Benwing2 (talk) 05:47, 6 April 2024 (UTC)[reply]

@Atitarev @Benwing2 I’ll need to check when I’m on my laptop, but that error suggests that something is feeding nil into the parser instead of the page content. I know that subst sometimes causes a page to need to be saved twice to fully take effect, so I wonder if that’s a relevant factor here. Theknightwho (talk) 12:49, 6 April 2024 (UTC)[reply]

@Theknightwho, @Benwing2: Thanks, please do check.

I've made a three language (including four Chinese varieties) entry on 再起 with:

{{subst:zh-n|v|to rise again, to make a comeback||resurgence, comeback|k=재기}}

{{subst:ja-new|さい-き|s|resurgence, comeback|to rise again, to make a comeback}}

Only the Japanese entry failed, you can see in the edit history. The error was different this time.

The only sort of strange behaviour with "subst" I observed before was when something is reliant on the entry existence and it wasn't created yet, it showed some temporary errors, e.g. Thai readings in a usex or even headword but that behaviour changed to better.

Please fix. It may discourage users from making new accelerated Japanese entries, they will just think it's not working at all. Anatoli T. ^{(обсудить}/^вклад) 00:03, 7 April 2024 (UTC)[reply]

For experimenting, you can try creating a new entry on e.g. 才気(さいき) (saiki, “wisdom”) with this:

{{subst:ja-new|さい-き|n|wisdom}} Anatoli T. ^{(обсудить}/^вклад) 00:07, 7 April 2024 (UTC)[reply]

@Atitarev This should be fixed. Let me know if you're still having issues. Benwing2 (talk) 07:28, 8 April 2024 (UTC)[reply]

Why do some Wikipedia images not show up when used on Wiktionary?

e.g. the cartoon I just added at Colonel Blimp. Equinox ◑ 13:32, 6 April 2024 (UTC)[reply]

@Equinox Non-free images are uploaded to Wikipedia directly rather than Commons (where they’re not allowed). You could do the same, but we don’t really have any infrastructure for it. Theknightwho (talk) 13:39, 6 April 2024 (UTC)[reply]

I see. Had noticed it seemed to happen with commercial-ish stuff like screenshots and comics. Equinox ◑ 14:06, 6 April 2024 (UTC)[reply]

@Equinox If you do decide to reupload here, one other thing to be careful of is that permission to use non-free images is sometimes only given to Wikipedia by the copyright-holder. Theknightwho (talk) 15:02, 6 April 2024 (UTC)[reply]

@Equinox, Theknightwho: A related discussion is Wiktionary:Beer_parlour/2023/August#Image_upload_rights, where people seemed to oppose including fair use images. I still think that Wiktionary is being seriously hampered by copyright paranoia. Ioaxxere (talk) 22:07, 6 April 2024 (UTC)[reply]

It's not the culprit in this case, but FWIW another reason I've seen some images not display (anymore) here recently is that we added a bunch of images to our blacklist recently (because vandals started to put a few of them on irrelevant entries), and it turns out we were using at least one of them (to correctly illustrate nipple). (Perhaps someone could check whether any of the other images on MediaWiki:Bad image list are actually being used.) - -sche (discuss) 15:53, 6 April 2024 (UTC)[reply]

There is a protocol for allowing the use of an otherwise banned image on an appropriate page, though I don't know the procedure offhand. bd2412 T 16:23, 6 April 2024 (UTC)[reply]

In the nipple case, I just removed the image from the blacklist (it had been added as part of a mass import of WP's blacklist and not because anyone was specifically misusing it; I think we have abuse filters which stop most bad-image addition anyway). — This unsigned comment was added by -sche (talk • contribs) at 19:36, 6 April 2024 (UTC).[reply]

I am willing to provide a free, tasteful image of my nipple. Equinox ◑ 22:09, 6 April 2024 (UTC)[reply]

Only the one? DCDuring (talk) 23:36, 6 April 2024 (UTC)[reply]

@Equinox we currently have a single non-free image at thagomizer. Indeed, we have a policy specifically to allow this file: WT:NFCC. If you want to upload a non-free file in the same vein as the Far Side strip we already have, you would need to ensure that "its presence significantly increases readers' understanding of the topic" (per point 5 of that policy). I'm not sure that a picture of Colonel Blimp would qualify. This, that and the other (talk) 02:54, 7 April 2024 (UTC)[reply]

@Ioaxxere also. This, that and the other (talk) 02:54, 7 April 2024 (UTC)[reply]

@This, that and the other See also Wiktionary:Beer_parlour/2024/April#Modify/deprecate_NFCC_or_request_re-enabling_Special:Upload_for_all_users? Liuxinyu970226 (talk) 04:08, 24 April 2024 (UTC)[reply]

As noted above, we have a very restrictive media upload policy and only four pieces of local media, two of which are basically required by MediaWiki software, one as redundant in case there is some vandalism to the item at c:, and a single fair-use file. While these are the only files, there are several discussions of deleted and moved ones as well and those could also be instructive about what the requirements are to upload locally. —Justin (koavf)❤T☮C☺M☯ 07:45, 7 April 2024 (UTC)[reply]

Automatic acute stress addition to Belarusian (and possibly also Russian/Ukrainian) words in book quotations.

A somewhat relevant old discussion: https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2014/February#Should_quotations_be_normalized?. Ping @Benwing2, Atitarev, Insaneguy1083.

The current (unwritten?) rule is to add acute diacritics to mark stressed syllables in the quotations taken from the Belarusian, Russian and Ukrainian books (ex. дзот, бревно, завдовжки). However this is an annoying and time consuming chore for the native speakers and possibly a much more challenging and error prone task for the others. Not to mention possible typos. Also touching the original spelling just doesn't feel right.

I think that the stress marks can be added automatically for the majority of words. And I have created two proof-of-concept modules: Module:User:Ssvb/be-autostress-simple and Module:User:Ssvb/be-autostress-bloom-filter. The former is simple and doesn't scale. But the latter allows to squeeze up to ~30-40K lemmas and all their inflected forms (~200-300K words total) into a ~2MB Lua module without becoming a resource hog. It's possible to use data from https://github.com/Belarus/GrammarDB for the Belarusian words. And for the Russian language it's possible to just extract the words inflection and stress information from the Wiktionary dump (~53K lemmas). May I integrate it into the transliteration module? Does anyone see any pitfalls or have objections? --Ssvb (talk) 01:49, 7 April 2024 (UTC)[reply]

@Ssvb I don't have major conceptual objections to this but there are a large number of considerations and edge cases that should be worked out *BEFORE* you integrate this into any transliteration module. I actually wrote an offline script awhile ago [2] to add automatic accents as well as lemma links to Russian terms, and it runs to 1,200 lines and took weeks of development effort to work out the kinks. Benwing2 (talk) 06:40, 7 April 2024 (UTC)[reply]

@Benwing2: Thanks for the interesting link. I'm curious, how is this offline script used in practice? For example, К.Артём.1 have been adding some nice Russian quotations recently, but without annotating stressed syllables in them. Do you periodically run a bot to fix such quotations from time to time? How is this process organized? --Ssvb (talk) 13:53, 7 April 2024 (UTC)[reply]

@Benwing2: As for the stress annotation in my Lua module, I want to keep it very simple and reliable without any extra bells and whistles. Your offline script has a lot more features, which are nice, but don't seem to be strictly necessary. Right now the Belarusian transliteration module already automatically annotates stress for the letter "o" and this doesn't seem to cause any problems. This algorithm guesses correctly in more than 90% cases. But it isn't perfect and makes mistakes, because compound words like "мовазна́ўства" or "штодня́" don't fit this model. This problem can be addressed by adding a small dictionary of these few problematic compound words. Once this is implemented, we just get a better user experience and no disadvantages at all! And once we have a dictionary framework up and running, nothing stops us from adding even more words to it. Conceptually this is still just an extension of the already existing letter "о" stress auto-guesser functionality.

As for the edge cases, the obvious ones are "гады́" vs. "га́ды". Also some capitalized proper nouns are tricky, such as "Та́ні" (genitive form of a girl's name) vs. "тані́" (imperative form of "to drown") or "Я́на" (genitive form of a boy's name) vs. "яна́" ("she"). The module needs testcases with a good coverage for such things, but handling them is pretty straightforward. At least that's how I see it right now. --Ssvb (talk) 14:24, 7 April 2024 (UTC)[reply]

I'm not really a coder at least when it comes to Wiktionary, so I'm probably not one to answer here. I'm perfectly happy doing the stresses by hand, although as you mentioned, it's error-prone for non-native speakers like myself. Insaneguy1083 (talk) 11:40, 7 April 2024 (UTC)[reply]

@Insaneguy1083: Thanks for your response. I can handle Lua coding myself and I'm primarily interested in your feedback as a user. I think that the Belarusian part of English Wiktionary needs a lot more editors to add a lot of the currently missing content, but the learning curve unfortunately seems to be too steep for many potential contributors. --Ssvb (talk) 14:37, 7 April 2024 (UTC)[reply]

Adding accent marks to the first form of the quotation is deeply wrong. If you want to add editorial opinion to the line, there are {{quote-book}} options such as |norm= for this. While I understand why we don't do transliteration for Thai, it bothers me that there is no necessary relationship between the apparent transcription and how the original utterer would have intended the sentence to be said. For comparison, imagine transcribing "the ignominy of either economic controversy". --RichardW57 (talk) 17:21, 7 April 2024 (UTC)[reply]

@RichardW57: I'm completely ignorant about Thai, do you mean that you would prefer |norm= instead of |transliteration= for Thai word quotations, such as the quotation used for "ระกาศก"?

I agree that it seems natural for |text= to precisely reproduce the original spelling of the quoted book, but these things are rather loosely documented in WT:QUOTE#Spelling_and_typography ("Generally, the original spelling of the word or phrase should be kept in the citation. In practice, however, this doesn't always happen") and new contributors tend to mimic the formatting of the existing entries. The language-specific guidelines in WT:ARU could potentially provide clarifications specifically for the Russian entries, but currently it has no clear explanations for book quotations.

I propose the following:

In a Russian quotation like |text=Мама мыла раму|t=Mom was washing a window frame, the Lua module can automatically create its normalization |norm=Ма́ма мы́ла ра́му using a dictionary and then the template can create transliteration |tr=Máma mýla rámu from this normalization. But if |text= already contains acute stress marks like it is done now, then the generation of normalization can be suppressed.
In a Belarusian quotation, Cyrillic normalization can be automatically created even from Łacinka and automatically stress annotated using a dictionary: |text=Ulezła ŭ chatu jak sztodnia|norm=Уле́зла ў ха́ту як штодня́|tr=Uljézla ŭ xátu jak štodnjá|t=Sneaked into the house like it was a daily routine.

The downside is that having both |text= and |norm= adds extra visual clutter, so I understand why the existing practice of replacing text with its normalization in Russian quotations has its appeal. --Ssvb (talk) 02:44, 8 April 2024 (UTC)[reply]

@Benwing2: I just noticed that the |norm= parameter and the https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2023/July#Adding_a_normalization_param_to_{{ux}},_{{quote}},_etc. discussion about it was a relatively new development. Is there a framework and some sort of standardized Lua modules naming convention planned for hooking the automated conversion from |text= to |norm=? I mean something similar to the Module:languages#Language:transliterate functionality. --Ssvb (talk) 04:49, 8 April 2024 (UTC)[reply]

For example, ภรรยา (pan-yaa, “wife”) may also be pronounced pan-rá-yaa. When we transliterate it in a quotation, we unintentionally attribute the 2-syllable pronunciation to the author. With โควิด-19, we have no idea whether the number part would have been pronounced as in Thai or (approximately) as in English. This problem is inherent in unfaithful transliteration, and I think it's rampant in Japanese with its multiple readings. With such systems, reason for scepticism increases as ones go from text to normalisation to 'transliteration' to translation. --RichardW57m (talk) 17:07, 8 April 2024 (UTC)[reply]

@Ssvb: I didn't have a chance to review it thoroughly but to me, it seems like an almost, if not completely impossible task. You will get a lot of false positives, especially for Russian and ~~Belarusian~~ Ukrainian. For users not familiar with East Slavic languages, I wouldn't recommend "guessing" th emain stress or stress and inflection pattern without consulting dictionaries. Anatoli T. ^{(обсудить}/^вклад) 05:37, 8 April 2024 (UTC)[reply]

@Atitarev: The essence of my suggestion is to incorporate a small dictionary with the most common ~30K lemmas and all their inflections into a Lua module. So that the stressed vowels can be marked automatically when generating the transliterated text. And do this automatically only for those words, where this can be done unambiguously. Stress markup for the remaining words can be handled with the |subst= parameter. Alternatively, User:RichardW57 also mentioned the |norm= parameter, which could be used too. --Ssvb (talk) 06:01, 8 April 2024 (UTC)[reply]

@Ssvb: I see thanks. If you're going to browse through the list of entries to be updated, as me and @Benwing2 did with the Russian accentuating effort, then that's fine, cases like (Russian) тра́ктора (tráktora) and трактора́ (traktorá) should be always kept in mind as well or words with multiple possible stress patterns. Anatoli T. ^{(обсудить}/^вклад) 22:09, 9 April 2024 (UTC)[reply]

@Atitarev: I have actually imported all Russian words from English Wiktionary and managed to fit them into a ~6MB Lua module. It's split into three ~2MB chunks due to module size limits. And now the Module:User:Ssvb/ru-autoaccent module can automatically mark stress and recover ё letters where it's possible. Feel free to experiment with its Module:User:Ssvb/ru-autoaccent/testcases to see if you can come up with something that this module can't handle correctly. --Ssvb (talk) 22:52, 9 April 2024 (UTC)[reply]

I'm confused (as apparently is RichardW57) why it would be the norm to add acute accents to quotations in this context, any more than we would add accents to mark stress position in English, Italian, etc. quotations. It seems preferable to keep the original spelling.--Urszag (talk) 05:45, 8 April 2024 (UTC)[reply]

@Urszag: The pronunciation of the Belarusian, Russian and Ukrainian words can be relatively easily deducted from their spelling in most cases. Except for the positions of the stressed syllables, which happen to be unmarked in books (other than the children's textbooks used for learning the language). I'm myself in favor of keeping the original spelling intact as far as the book quotations are concerned. And I would prefer to only add accents to the romanized transliterations. But the existing practice is to add acute accents to the original Cyrillic text. Maybe User:Atitarev can provide much better explanations about the reasons of doing it this way. --Ssvb (talk) 06:21, 8 April 2024 (UTC)[reply]

We can also add Latin. For our head words and inflected forms, we mark long vowels, but not in quotations. --RichardW57m (talk) 17:12, 8 April 2024 (UTC)[reply]

Stress marks are very helpful for language learners, because the add almost all the necessary information to pronounce the word correctly (and mispronounced words in Russians are often incomprehensible). Everyone knows that they are not in the original text and can ignore them, so they are not "falsifying" too much. The advantages are greater than the disadvantages in my opinion. MrBeef12 (talk) 14:17, 27 April 2024 (UTC)[reply]

@MrBeef12: There's a concern that stress marks clutter the text and make it less readable. Books normally don't include stress marks, so encountering excessive amounts of them is unusual and uncomfortable. Additionally, Wiktionary quotations turn words into wikilinks. Such links are useful, but they are also may feel distracting and uncomfortable to some people. Adding stress marks is just one sub-task of the text normalization, but there's also restoration of "ё" and pre-reform orthography modernization. I'm personally in favor of keeping both the original text and its enhanced/normalized version, because in some cases it may be ambiguous in a way that can't be automatically resolved. Some examples from Russian Wikisource:

"К. для удобства заряжанія пришлось дѣлать большаго діаметра, чѣмъ каналъ, именно, немного большаго діаметра снаряда по ведущимъ поясамъ." (большого/большего is ambiguous in "дѣлать большаго діаметра" and clearly means "большего" in "немного большаго")
"Малый водоемъ докачивается изъ большаго, пополняется насосом." (большого/большего is ambiguous)
"И послѣ сего онъ желаетъ еще бо́льшаго!" (not ambiguous and definitely means "большего" because of having the explicit stress mark in the original text)
"Англичане устремились за нами. Видимо, они все никак не могли насмотреться на русского мужика, попавшего в барские хоромы." (все/всё is ambiguous and both may potentially fit)

One may argue that ambiguous sentences shouldn't be picked for quotations in the first place. But sometimes there isn't much to choose from. --Ssvb (talk) 06:34, 28 April 2024 (UTC)[reply]

These are very good points! MrBeef12 (talk) 07:07, 28 April 2024 (UTC)[reply]

@Ssvb I am fine with the approach of keeping the original text unaltered and using the |norm...= parameters for the normalized text with ё, accents, links, etc. Note that at the time I did the last auto-accenting run for Russian, we didn't have support for a separate normalized text. Benwing2 (talk) 07:11, 28 April 2024 (UTC)[reply]

BTW how are you storing your bloom filters? It is best to use one big string (or three strings if you need to split it), although I imagine you have already realized that. You should test to see how much actual memory the addition of auto-accenting takes up (things like memory usage are shown at the bottom when you preview a page). If it's around 6 MB and you are loading using mw.loadData() so that the data only gets loaded once per page, that is probably fine; we can always have an exclude-list (aka blacklist) of high-memory pages to not auto-accent, if that becomes an issue. Benwing2 (talk) 07:16, 28 April 2024 (UTC)[reply]

Maybe of interest: I also developed a tool in Python for my thesis to add stresses to Russian text here: https://github.com/Vuizur/add-stress-to-epub/tree/master It mostly focused on ebooks, but you can also use it as a library. It tries to make some inferences using grammar analysis (with spacy) and not only word lists. One thing that I learned is that you should definitely use Russian Wiktionary data, for example from here: https://kaikki.org/dictionary/rawdata.html. It definitely has better coverage than only the English one. MrBeef12 (talk) 14:03, 27 April 2024 (UTC)[reply]

pre-phab question (categorically track uses of nonexistent templates)

If you use a module that doesn't exist, like {{#invoke:foobarbaz/templates|foobar}}, the page is categorized into Category:Pages with module errors. If you use an image or audio file that doesn't exist, the page goes in Category:Pages with broken file links. But if you use a template that doesn't exist, no categories are added AFAICT: the situation is not tracked, so unless someone looks at the page and sees the red "Template:foobar" link (thanks for catching this one, Chuck), it won't be noticed. I don't think this is something we can change, I think we'd have to ask the devs, so: is there already a Phabricator task about this? If not, who wants to start one? I can do it, but it might be better if someone with more technical expertise / sense of what would need to be done did it. (I poked around on Phabricator looking to see if there was already a task about this, and just saw tasks about turning non-existent template links red; that seems to work, so if the devs can do that, maybe they can also make such links categorize...?) - -sche (discuss) 17:21, 7 April 2024 (UTC)[reply]

There's a list of "Entries using nonexistent templates" at Wiktionary:Todo/Lists JeffDoozan (talk) 17:29, 7 April 2024 (UTC)[reply]

Great! But it would be useful if this were automatically generated (not requiring someone to run a bot), and on all wikis (not just this one), right? - -sche (discuss)

Yeah, definitely. I think it's weird that template redlinks aren't tracked or categorized automatically, which is why I was happy when TTO built a tool to do it for us. JeffDoozan (talk) 22:39, 7 April 2024 (UTC)[reply]

Special:WantedTemplates does something like what seems to be asked for the 5,000 most common "wanted" templates, but it has been flooded with "wanted templates" like Template:tracking/inflection of/tag/Attic‏‎ (14 links), which is the least "wanted" of the 5,000. The most "wanted" is [[Template:tracking/parameters/empty parameter]]‏‎ (4,386,357 links). This does seem to be the detritus of some effort to eliminate the need to use dump processing to track "problems". I know of no documentation or explanation for this use. I don't know who uses this. If someone is using it, they seem to like being anonymous. DCDuring (talk) 00:08, 8 April 2024 (UTC)[reply]

I once used this category to identify missing taxonomic and other templates. Now, of course, the name "WantedTemplates" is a misnomer. I doubt that there are even 50 actually "wanted" templates among the 5,000. I wonder about the value of this amount of tracking. Perhaps we are tracking non-problems or tracking real problems in overly fine detail. DCDuring (talk) 00:26, 8 April 2024 (UTC)[reply]

@DCDuring The template tracking mechanism is incredibly useful but has as a side effect that Special:WantedTemplates gets filled up with tracking categories. It would be great if Wikimedia provided a way to specify prefixes to ignore in Special:WantedTemplates. @-sche maybe you can file a Phabricator request to this effect? This is equivalent to adding a simple blacklist, which is usually very easy to do programmatically. A conceivable alternative would be to use userspace (or maybe some other namespace that's ignored by the code that generates Special:WantedTemplates) for the template tracking mechanism. Maybe a Phabricator ticket could request more info on how Special:WantedTemplates works. Benwing2 (talk) 00:33, 8 April 2024 (UTC)[reply]

Is it possible to have a compeletely invisible link like the ExpandTemplates trick (for others: see Module:debug/track) when linking to other namespaces? I can definitely do an invisible ping, but I have to use a sp ace to avoid the pipe trick. As for which namespace, I would recommend subpages of WT:tracking, since we can make sure that it doesn't conflict with anything else. Chuck Entz (talk) 01:23, 8 April 2024 (UTC)[reply]

@Chuck Entz I think this is a great suggestion. If no one complains in a couple of days, I will make the change. Benwing2 (talk) 02:03, 8 April 2024 (UTC)[reply]

Sorry, I'm trying to follow along: would a link being invisible (in the manner of that ping) make it not show up on Special:WantedTemplates, or what is making it invisible like that accomplishing...? It seems like the existing tracking links are already invisible; at least, they don't seem to generate visible redlinks in entries. As for changing "tracking" links to be links to some namespace the Special: list ignores, that sounds like a great idea. - -sche (discuss) 02:38, 8 April 2024 (UTC)[reply]

The way the tracking mechanism works is by calling the expandTemplate API call, as if a template call to Template:tracking/inflection of/tag/Attic‏‎ or similar had been inserted; this causes the page to show up in e.g. Special:WhatLinksHere/Template:tracking/inflection of/tag/Attic, but also makes the tracking page show up in Special:WantedTemplates. However, templates don't actually need to be in the Template: namespace. In fact, you can transclude any page (including mainspace pages) into another page using the template calling syntax; see w:Help:Transclusion. It appears, however, that only pages in the Template: namespace show up in Special:WantedTemplates; if you look at w:Special:WantedTemplates on the English Wikipedia, for example, there are only 68 pages listed, all of which are in the Template: namespace. So if we use a different namespace for the tracking pages, the Special:WhatLinksHere/... trick should still work, but the pages won't pollute Special:WantedTemplates. Benwing2 (talk) 02:49, 8 April 2024 (UTC)[reply]

@-sche: my point was that outputting a space isn't quite invisible. There's no redlink, but if the template is before other text, it will move it to the right. If I understand what Benwing2 is saying, specifying another namespace won't change the way the current code works- which is truly invisible. That's what I was asking about- whether it was possible to do something like that rather than my imperfect method. Chuck Entz (talk) 03:14, 8 April 2024 (UTC)[reply]

@Chuck Entz @-sche @DCDuring @Theknightwho I went ahead and switched the template tracking mechanism to use Wiktionary:tracking/.... Please let me know if anything goes wrong, and revert if so. Benwing2 (talk) 05:50, 8 April 2024 (UTC)[reply]

Specifically, my change to Module:debug/track. Benwing2 (talk) 05:52, 8 April 2024 (UTC)[reply]

after e/c: @User:Benwing2 How exactly is, say, [[Template:tracking/parameters/empty parameter]] used? Is there no other way to track 4.4 million empty parameters in templates, if that is indeed what is being tracked. What is the point of tracking them? I don't see how this (empty parameters) is even a problem at all. It isn't really even diagnostic of anything of substance. Do we need another user space for this kind of thing or just to rethink this tracking with more respect for the tools MW software provides and users use or at least used when they were usable? DCDuring (talk) 01:29, 8 April 2024 (UTC)[reply]

@DCDuring: Special:WhatLinksHere/Template:tracking/parameters/empty parameter, not to mention with hastemplate:"tracking/parameters/empty parameter" in searches. There may be some way for bots to use it, too. Chuck Entz (talk) 01:50, 8 April 2024 (UTC)[reply]

@Chuck Entz I can find many thousands of instances pages with that pseudo-template and Template:taxon, but what is the actual problem that category membership is diagnostic of? In Animalia the offending template is {{der}}, the documentation for which says that parameter 4 is optional. If 2/3 of all entries "want" the pseudo-template, how useful is it? We have lots of templates that have documented use of such optional numbered parameters. Is this kind of shoddiness to be found often in this morass? This seems like a case of boiling the ocean to me. DCDuring (talk) 02:13, 8 April 2024 (UTC)[reply]

@DCDuring It should be possible to exclude instances like that, which we already do elsewhere, since it's clearly a legitimate way to use empty parameters.

The reason why it's good to track this kind of thing is so you can see if things are likely to blow up if you turn on unused parameter checking for a particular template (which is helpful to catch mistakes), but that's done in a somewhat more intelligent way, and the two clearly should be aligned. Theknightwho (talk) 05:57, 8 April 2024 (UTC)[reply]

@-sche @JeffDoozan I'm pretty sure this is possible to do, but it may take some time for me to work out the kinks. Theknightwho (talk) 05:49, 8 April 2024 (UTC)[reply]

OK. FWIW, I was thinking it'd useful if a category was generated "server/software-side" like Category:Pages with broken file links, which gets updated automatically in near-real time, so if you're thinking of generating such a category "locally" (with the parser, etc) instead, then I would say that at least in my opinion, that's not a priority, if you have other things to work on, or if it would take resources (Lua memory, etc), or if it'd be liable to have unintended side effects like auto-categorizing multi-script terms initially turned out to, since we have TTO's lists, and (hopefully) will soon have Special:WantedTemplates again. - -sche (discuss) 13:46, 8 April 2024 (UTC)[reply]

@-sche I was thinking of integrating it into the parse which is already done by Module:headword/page, but @Benwing2's solution makes more sense. Theknightwho (talk) 15:10, 8 April 2024 (UTC)[reply]

How can normal contributors who are unaware of the existence and names of all the tracking templates (obviously many more than 5,000 of them) make use of them for any purpose? For me, a useful bi-weekly MW run to populate one of the special pages has been rendered useless. Can we be assured that any need we feel for a list of entries that have some kind of problem will be quickly responded to by our techno-mavens? I think not. This seems like a reversion of capabilities.

Can those who can explain how useful these things are do whatever it takes to get MediaWiki to do/permit whatever is needed to restore the reversion and make for room for technical innovation? Can @User:-sche explain it to them? If not, can someone else (BW, TKW, JD, TTO, ?) do it? DCDuring (talk) 16:20, 8 April 2024 (UTC)[reply]

How can normal contributors who are unaware of the existence and names of all the tracking templates (obviously many more than 5,000 of them) make use of them for any purpose?

I don't think you can. They are for Lua developers. Now that the tracking system has been moved to a different namespace, Special:WantedTemplates will gradually start to become more and more meaningful, and once MediaWiki finishes processing the changeover, it's unlikely you will come across or notice the tracking templates ever again.

How do Lua developers keep track of the them?

What is your best guess as to when these will be gone from Special:WantedTemplates? That listing is updated, I believe, every 2 weeks. I wonder whether I will be alive to see it emptied of these. DCDuring (talk) 12:30, 9 April 2024 (UTC)[reply]

At the current rate of several hundred per minute, it will probably be a week or two. Template:tracking/parameters/empty parameter is currently at 3,450,164 Chuck Entz (talk) 13:40, 9 April 2024 (UTC)[reply]

3,418,788 now. ~300/minute, which implies 8 days to completion at linear pace. DCDuring (talk) 15:25, 9 April 2024 (UTC)[reply]

It has always seemed to me that the amount of decrease per hour for such background updating diminished over time, but not quite exponential decay. Exponential decay would imply months or longer for total emptying. Does anyone understand the specifics of this updating process? DCDuring (talk) 14:12, 9 April 2024 (UTC)[reply]

3,113,477 at 01:21, 2024 April 10. But the least-frequent item is wanted only 4 times vs. 14 times when the process began. DCDuring (talk) 19:41, 10 April 2024 (UTC)[reply]

2,234,457 now, nearly halfway to being excluded from the page. Least frequent items are reported at 3, but some have 2, 1, or 0! links remaining, which means that many truly "wanted" templates will become visible the next time the 5,000-item list is generated. DCDuring (talk) 00:56, 12 April 2024 (UTC)[reply]

767,747 now. Clearly not simply linear. If it were, there would be none left. The pace of removal is much slower. DCDuring (talk) 11:59, 20 April 2024 (UTC)[reply]

731,379 now. DCDuring (talk) 01:00, 22 April 2024 (UTC)[reply]

@DCDuring: Try it now. I figured out how to change the parameters in the API Sandbox setup linked to from CAT:E so that it purges pages that transclude this. I have it set to purge 75 pages at a time, which takes about 20-30 seconds from when you click "Make request" (it times out if it hits 30 seconds). Of course, after a while you feel like one of those lab rats they trained to press a lever for a food pellet, so it's not practical to clear it all at once that way. I would recommend setting it up in a separate browser tab or window so you can click on it every once in a while and go do something else. Chuck Entz (talk) 17:44, 20 April 2024 (UTC)[reply]

Oops! Forgot the link: [3]Chuck Entz (talk) 18:41, 20 April 2024 (UTC)[reply]

More usefully, you can click on generator=embeddedin and replace the geititle field with the tracking template you want to remove from Wanted templates, then click "Make request". If it has fewer than 75 links, running this once or twice will clear it. Chuck Entz (talk) 19:11, 20 April 2024 (UTC)[reply]

@DCDuring I assume that what happens is that when you make a change to a module, it computes the graph of all pages needing recomputing and queues them up for recomputation. That means eventually all the pages will get recomputed. But a strict first-in-first-out approach wouldn't be fair, because in a case like this, where a change is made that touches a very large number of pages, it would block all other recomputation requests until all the large number of pages get recomputed. So those other changes must get prioritized ahead esp. if smaller, which means over time the remaining pages to compute will get more and more diluted by other requests. It can't be strictly exponential in its decay but there clearly is a decay. In a case like this I can do a purge operation to force this to speed up, but it's not clear it's worth it as all wanted templates are now represented in the list (the max-5000 list goes up only to 4,528 wanted templates). Benwing2 (talk) 01:12, 22 April 2024 (UTC)[reply]

On top of that, I've done hundreds of null edits on the 1-link and 2-link items, so I'm confident that the tracking templates will only number in the hundreds on the next list- mostly clustered at the top of the list, but fewer in number than some other types. For instance, old discussions dating to the era when language codes were separate templates are probably responsible for hundreds of items in Wanted templates.

As for the pattern of removals: my take on it is that the automated part is very slow, but the fact that these are so widely distributed means that the higher-traffic pages have been cleared by unrelated edits first. Thus the quick ones get done quickly, leaving the rarely-edited, slow ones to make up a higher and higher proportion of the remainder. Also, the more of these there are on one page, the higher the higher the number that gets cleared by a single edit, and the higher the likelihood that a null edit or a purge will be performed on it. That means that the the thinly distributed and out-of-the-way ones will be the last to go. Chuck Entz (talk) 02:55, 22 April 2024 (UTC)[reply]

An algorithm that was seeking to shorten the list would only work on the least frequent items that appear on the list or the least frequent items that do not appear on the list and leave the items that needed millions of changes for last. There might be some other ordering principle, like last edit date/time. In my ignorance, I would consider it random until authoritatively told otherwise. The effort to create an ordered list of items by frequency would probably not be worth doing multiple times, as evidenced by the relatively infrequent updates of many of the special pages. DCDuring (talk) 12:08, 22 April 2024 (UTC)[reply]

What I was talking about assumes random ordering. When anyone edits a page, all the waiting changes are processed independantly of the automated processes during the saving of their edit. That's why we do null edits. These unrelated, unscheduled edits by normal editors are responsible for the faster pace at the beginning- the pages more likely to be edited are more likely to be cleared ahead of schedule, but once they're edited, they drop off the list. That leaves the ones that aren't edited to be processed by the slower, scheduled automated process. In other words, you have the completely random scheduled process combined with the work by people who just coincidentally edit one of these pages. The choice of pages edited by humans isn't random- no one bothers to edit a form-of entry that's already in good shape, but translations get added to English entries for basic concepts all the time. Chuck Entz (talk) 12:56, 22 April 2024 (UTC)[reply]

Do we get 100,000 edits per day? (There have been only 100,000,000 edits since enwikt began.) At 100,000 edits per day, we would have only had 1.5MM since the change. I expect that would mean many fewer than 1.0MM (~0.5MM?) entries have been edited. That would mean that the automated process is doing the bulk of the work. Do bot edits count as edits for these purposes? DCDuring (talk) 17:18, 23 April 2024 (UTC)[reply]

Can we be assured that any need we feel for a list of entries that have some kind of problem will be quickly responded to by our techno-mavens?

I'm always open for requests for a new WT:Todo/Lists list, although my time is finite. This, that and the other (talk) 11:57, 9 April 2024 (UTC)[reply]

Thanks for the offer. I'll try not to take up your time frivolously, but I am not good at guessing at what is hard and what is easy. Also, JeffDoozan has waded into taxonomy, so working with him to perfect and extend what he has already done is probably my best course. I can also do a lot with Cirrus regex searches. DCDuring (talk) 12:30, 9 April 2024 (UTC)[reply]

It occurs to me... as the Special pages update, won't moving tracking "templates" from the Template: namespace to the Wiktionary: namespace just mean they're going to swamp Special:WantedPages instead of Special:WantedTemplates? Pages in the Wiktionary namespace do seem to show up in Special:WantedPages, e.g. Wiktionary:Ushojo transliteration. Will something stop the tracking "templates" from showing up there? Perhaps it would be prudent to evaluate how many tracking templates we actually need, and whether we are actually getting use out of 5,000+...? (Alternatively, do they have to be wanted pages, redlinks? Could we clean them out of the Special pages by just mass-creating the pages, having a bunch of empty subpages of "Wiktionary:Tracking/..."?) - -sche (discuss) 23:17, 13 April 2024 (UTC)[reply]

@-sche I don't think that's going to happen. By now there should be lots of such pages in Special:WantedPages but in fact there are none, so something in the way the tracking mechanism works must not be triggering Special:WantedPages (I think it's because they're only linked to and not actually transcluded). Benwing2 (talk) 23:34, 13 April 2024 (UTC)[reply]

@Benwing2 vice versa actually: WantedPages only tracks links. WantedTemplates only tracks transclusions. This, that and the other (talk) 05:10, 14 April 2024 (UTC)[reply]

@This, that and the other I see. I guess what's happening then is that expandTemplates is transcluding the tracking page and throwing away the result, and Special:WantedTemplates only includes pages in Template space, so the tracking pages don't end up anywhere (as is desired). Benwing2 (talk) 05:13, 14 April 2024 (UTC)[reply]

@User:Benwing2 Special:WantedTemplates is, even now, dated 11 April 2024, so we should probably be waiting for its next update before celebrating victory. DCDuring (talk) 11:53, 20 April 2024 (UTC)[reply]

@DCDuring Are you sure? It's dated 00:27 19 April 2024 for me. Benwing2 (talk) 19:38, 20 April 2024 (UTC)[reply]

@User:Benwing2 Sorry, I meant to type Special:WantedPages, which still has 11 April 2024 as date of last update. I was thinking of -sche's concern about merely shifting the problem from WantedTemplates to WantedPages. Conclusive evidence may not be in until next update. DCDuring (talk) 20:19, 20 April 2024 (UTC)[reply]

@DCDuring I see. I think this page must update 3x/month (whereas the others update every 3 days); this presumably means there should be an update tomorrow. Benwing2 (talk) 20:21, 20 April 2024 (UTC)[reply]

@-sche: There was an incident not that long ago when someone created one of these tracking templates and used it to insert objectionable material on all of the pages that transcluded it. There's now an abuse filter to prevent that recurring. I don't know if there's any difference between a page with no content and a redlink as far as the "expand templates" trick is concerned, but I would want to be real sure that it wouldn't have unwanted side effects before doing what you're suggesting. Chuck Entz (talk) 23:37, 13 April 2024 (UTC)[reply]

For any admins who don't know what I'm talking about, I managed to find the deleted page: [[Template:tracking/links/redundant wikilink]]. I'm not going to undelete it because it was an attack page about someone offwiki that the perpetrator wanted to force every site visitor to see. It does show the need to be careful about this kind of thing.Chuck Entz (talk) 00:29, 14 April 2024 (UTC)[reply]

suppress talk page links to old templates

As tracking templates clear out of Special:WantedTemplates, I notice that another source of cruft filling it up is links to old templates on talk pages and the like: Template:ru-noun-old, Template:onym, Template:pos_vi, etc are not "wanted" anywhere in mainspace, but are mentioned/linked on some talk pages, so they seem to be most of what's on the Special page apart from tracking templates. Do we want to (1) systematically unlink these, and/or (2) request a category after all, with capacity to "hide" "wantings" outside of mainspace, the way module errors in unimportant namespaces are in a subcategory rather than the main CAT:E? On one hand, 2 is how broken file links, module errors, parser function errors, etc are handled; on the other hand, 1 would address the fact that it doesn't really make sense to leave Special:WantedTemplates in a permanently unusable or cruft-filled state. - -sche (discuss) 15:16, 10 April 2024 (UTC)[reply]

Clearly a good idea. If they can't be suppressed by changing how WantedTemplates works, perhaps we could have a mass replacement of the template link instances wherever they occur by something that doesn't link, essentially a nowiki wrapper. In the event an admin-archaeologist wants to find the old template, they could, after all, type the template name into the search box. DCDuring (talk) 02:30, 14 April 2024 (UTC)[reply]

My concern about this is that it would essentially render some old discussions unintelligible (though this is already the case where templates have been broken etc). It might be better to replace the templates in those discussions with the current equivalents (where that's possible to do). Theknightwho (talk) 03:45, 14 April 2024 (UTC)[reply]

I have to question the intent of this push. Isn't it less effort to simply use Wiktionary:Todo/Lists/Entries using nonexistent templates, which in my mind exactly reflects the Special:WantedTemplates output that is ultimately desired? Of course, I acknowledge -sche's point that this list is dependent on an external system, unlike the native WantedTemplates, but I don't plan on disappearing any time soon. This, that and the other (talk) 05:18, 14 April 2024 (UTC)[reply]

@User:Theknightwho One implication of the unintelligibility argument is that we should stop deleting templates and restore all those that were actually implemented and have some functionality. The discussions that are the locations of many of the links are already unintelligible to someone who doesn't remember the templates because the templates have been deleted or never implemented and their invocation cannot display their former functionality. Maybe we could link them to an archived copy of the modules and/or other Templates (and CSS etc). they used when they were functional. Try looking at the links for a few of these.

@User:This, that and the other The vast majority of the 'wanted templates' (other than the thousands of "Template:tracking templates" that remain) that we are once again beginning to see are templates that have been deleted or were proposed and not implemented, at least not under the redlinked name. Many Talk, Wiktionary, User talk, User pages (eg, sandbox), and some others have discussions that mention them, usually in discussions of their deletion and/or replacement or of the proposed functionality. They are (almost?) entirely gone from principal namespace and those remaining instances in principal namespace are exactly what we would hope the WantedTemplates would help us find, once we are able to see the very-low frequency-of-occurrence templates.

Admittedly, the value to admin-archaeologists of these discussions is slightly diminished in that there is no direct link to the deleted template page, where they could try to begin to reconstruct the functionality should they wish to do so. They would have to type the template name into the search box.

Special:WantedTemplates will be in its ideal state when it is empty of templates wanted in principal namespace. It is unlikely to reach that state when it is filled with the crufty, irremediable 'wants' that are still the vast majority of the items in that page. DCDuring (talk) 14:37, 14 April 2024 (UTC)[reply]

Another maintenance category

@Benwing2: Where LANG is a dummy substring, can we please have Category:Pages using bad params when calling LANG templates made a subcategory of Category:LANG_entry_maintenance. The contents of the contents of the former will often be of interest to those who review entries. --RichardW57m (talk) 11:55, 8 April 2024 (UTC)[reply]

@RichardW57m Done. Benwing2 (talk) 21:45, 8 April 2024 (UTC)[reply]

@Benwing2: Thank you. And thank you for making the language the chief part of the key within Category:Pages using bad params when calling a template. --RichardW57 (talk) 06:32, 9 April 2024 (UTC)[reply]

Toki Pona auto hyphenation

I'm creating Template:tok-IPA (based on Template:eo-IPA in function) and the only missing piece is hyphenation, as @Spenĉjo and I don't know any Lua. For anyone who does, this should hopefully be simple because of the regular (C)V(n) syllable structure (list of letters; possible test cases: a‧nu, an‧pa, si‧te‧len, ki‧je‧te‧san‧ta‧ka‧lu). Thanks in advance if anyone can help! AgentMuffin4 (talk) 01:34, 9 April 2024 (UTC)[reply]

@AgentMuffin4 Hey, has anyone reached out about doing this yet? Chernorizets already wrote a very sensible syllabifier for Bulgarian last summer, which I ported over to Lua, and I think we can very nicely adapt it to Toki Pona if no one's done it yet! In fact, it could be even easier than needing to port it, since having that exact syllable structure makes it even more uniform than Bulgarian. Kiril kovachev (talk・contribs) 21:03, 17 April 2024 (UTC)[reply]

@AgentMuffin4 Update, I just went ahead and did it lol. Please check out Module:User:Kiril kovachev/tok-hyph. We can change the name to "tok-syllab" or something else, but this is basically how it works. If you want to integrate it into the tok-IPA template, you can just use {{tok-hyph}} or invoke the module directly.

I'm now in the process of porting this to the mainspace - hope this is okay! Kiril kovachev (talk・contribs) 22:51, 17 April 2024 (UTC)[reply]

Excellent, thanks! I managed to use it to fix an oversight with the template, as well (the IPA stress marker appearing for monosyllables). AgentMuffin4 (talk) 00:17, 18 April 2024 (UTC)[reply]

@AgentMuffin4 Nice one. Do you think we should try to summarize the existing pronunciation sections using this template instead? I noticed last night there were only about 4 entries using it, but you look to have proliferated it a bit onto some more — do we want this on all the Toki Pona entries? Kiril kovachev (talk・contribs) 20:55, 18 April 2024 (UTC)[reply]

I think on all the one-word entries at least, unless we want to refactor the whole thing to auto-process multiword terms, which doesn't seem urgently needed. (Also, for reference, we're similarly replacing the giant sitelen pona images with {{tok-sitelen}} under ===Glyph origin===.) AgentMuffin4 (talk) 22:04, 18 April 2024 (UTC)[reply]

@AgentMuffin4 Okay, that's good — I don't know sufficient template code to work on any upgrades to the IPA part (is there anything else that needs to change for multiword terms?), but fortunately the syllabification logic already does work for multiword terms, in case we do ever feel the need to deploy it for them as well. Kiril kovachev (talk・contribs) 17:39, 19 April 2024 (UTC)[reply]

It just adds a leading stress marker if the syllabification has any hyphenation point. So on mi tawa, it returns ˈmi tawa instead of mi ˈtawa, checking the whole string instead of iterating over each word.

I guess if this were Lua'd, then for the default secondary transcription (as on nanpa), that output could be fed into another function that replaces np with mp, nk with ŋk, and nj with ɲ(j), since that part of the template code is currently messy. Then, the rest could conceivably be handled at the template level, with an {{#ifeq:}} to check whether the broad and narrow transcriptions are actually different.

I expected there to be other problems with using the template on multiword terms, but I suppose if the other lines work, and if you're willing to write those extra functions, we might as well equip it for them. I'm still fine either way. AgentMuffin4 (talk) 20:47, 19 April 2024 (UTC)[reply]

Lua error

It seems most Wiktionary pages display Lua errors for some reason. Kwékwlos (talk) 23:17, 9 April 2024 (UTC)[reply]

This was because Module:utilities and Module:utilities/data got into an infinite loop for a brief period, because each would try to unconditionally load the other when first loaded. I've changed Module:utilities so that it only loads Module:utilities/data when it's actually needed for something. Theknightwho (talk) 00:04, 10 April 2024 (UTC)[reply]

triple brace abuse filter

Just received notice of this. There has been previous discussion, but nothing current. At the time of writing, the editing guidelines recommend the use of triple braces as the "currently preferred method." https://en.wiktionary.org/wiki/Wiktionary:Templates#Formatting_the_headword 203.158.37.134 09:03, 10 April 2024 (UTC)[reply]

Wow. This wasn’t current ten years ago, and I don’t see how it ever was, I think the documentation author tried to say that one can pass a modified headword there, he should have used var tags to express this parameter. Fay Freak (talk) 09:58, 10 April 2024 (UTC)[reply]

I am too chill to rewrite the documentation page today, and another option, moving for its deletion, I am not gonna pursue since evidently people use the page, which contains references to necessary templates, as it is general to up-to-date pages for particular languages like WT:About Arabic. Somebody needs to go through it with the bulldozer, like I want to delete the whole section about headwords and also the same about inflections implying that one would add inflections outside of template, so I encourage some similar action. Fay Freak (talk) 10:07, 10 April 2024 (UTC)[reply]

That help page is targeted towards people writing templates, not for dictionary entries. — SURJECTION ^{/ T / C / L /} 13:13, 10 April 2024 (UTC)[reply]

We should probably get rid of the "older method" and "still older method", which are both totally unacceptable in headword templates as they lack any kind of categorisation. The only thing the page has to say is that the second is "deprecated and discouraged", which isn't enough. Theknightwho (talk) 17:13, 10 April 2024 (UTC)[reply]

Yep, bulldozer still; on the other hand which template authors this page speaks to? Nobody can do anything about such complicated infrastructure as behind {{head}} without reading lots of modules, this page won’t teach a template author a damn, only divert his attention, you can argue for deletion. Redundant to categories, some sections should be moved (e.g. etymology templates to → Wiktionary:Etymology, others already refer to other project pages or categories), and misunderstood. Cumulatively there is a lot against this page. Fay Freak (talk) 17:32, 10 April 2024 (UTC)[reply]

User talk pages in CAT:E

I'm used to the occasional appearance of userspace pages in CAT:PFE due to the mysteries of Wikimedia transclusion updating, but those go away with a null edit. What I'm seeing now is user talk pages showing up in Category:Pages with module errors instead of Category:Pages with module errors/hidden even after a null edit. What has changed? Chuck Entz (talk) 14:16, 10 April 2024 (UTC)[reply]

@Theknightwho: I see you've been changing things in the MediaWiki namespace. If nothing else, having a Lua module decide which category module errors go in is asking for trouble- what happens if your Lua module has a module error? Chuck Entz (talk) 15:09, 10 April 2024 (UTC)[reply]

@Chuck Entz This is caused by an annoying bug in MediaWiki: the module checks whether the current namespace is a talk namespace by checking whether title.nsText (the current namespace) is the same as title.talkNsText (the talk namespace for that particular namespace), but for user talkpages, the first is "User talk" and the second is "User_talk", so it fails the equality check. The reason I did it like this is that that was how the old template worked, but it turns out in Lua there's a very simple title.isTalkPage check you can do instead, which is all we really care about.

In terms of using a Lua module: the reason I converted this template to Lua is because (a) it gives us much, much finer control over which pages go into the "hidden" category versus those which don't, (b) it gives us a Lua interface to call it from other modules, which Module:headword/page is now using, (c) I did check beforehand, and it seems to be exempt from things like out-of-memory errors when called automatically after things like that happen - presumably because it's done in a special way, and (d) the old version was just as prone to parser function issues anyway. However, to make sure we never run into that situation, I'll integrate some kind of error-catching mechanism so that it never ends up throwing raw errors directly when called from the template, since that's how the automatic error system uses it. Theknightwho (talk) 16:59, 10 April 2024 (UTC)[reply]

@Chuck Entz, Benwing2, Erutuon, This, that and the other, Surjection I've come across an underlying bug in MediaWiki's error handling. Currently, error handling is determined by two kinds of special pages in the MediaWiki namespace:

Pages like MediaWiki:Scribunto-common-error-category, which contain the name of a category, and the page is categorised when a certain event is triggered. These pages simply contain {{maintenance category|category name}}, which determines whether it should be the hidden category or not.
Pages like MediaWiki:Pfunc expr unrecognised word, which contain an error message, and that error message also happens to contain a category. These ones use {{maintenance category|category name|cat=1}}, which returns the category as a standard category link.

The first type of page works fine, but the second will treat the current page as "Special:BadTitle/Missing" unless an error which works in the first way already exists further up the page. This means that, for example if you put {{#expr:foo}} at the top of a talkpage (which gives an "unrecognized word" error of the second kind), it will be categorised in the unhidden category, since we don't hide errors in the Special namespace. If you then put {{#invoke:bar}} above it (which causes a Scribunto error; an error of the first type), it then gets categorised in the hidden category. However, if you move that invoke below, then it's unhidden again.

Importantly, this bug is not related to using the new Lua module: it affects all the magic words like {{PAGENAME}} as well. Theknightwho (talk) 18:34, 10 April 2024 (UTC)[reply]

Pinging @Umherirrender who might know about this type of thing. The offending MediaWiki code appears to be here. This, that and the other (talk) 23:38, 10 April 2024 (UTC)[reply]

PAGENAME only works correctly for real tracking categories added by the software (like scribunto or others listed on Special:TrackingCategories), not by categories added with the help of messages (like pfunc here), to use real tracking categories in ParserFunctions extension the task T25959 exists.

The title used for PAGENAME is set on the parser when transform the message, but it seems that is get reused sometimes, created T362364 for a solution. Der Umherirrende (talk) 21:36, 11 April 2024 (UTC)[reply]

clear out Special:WantedPages next

Following up on the partial cleaning-out of Special:WantedTemplates (above), I notice that Special:WantedPages is currently filled with a lot of things like "Module:labels/data/lang/mk/functions‏‎"; can we clean those out? I also notice there are a lot of links to SOP-seeming strings like administrative atolls, regional units, unincorporated communities‏‎, autonomous islands‏‎, and that these are coming from pages like Category:Chemical elements for no clear reason (maybe they're linked in the modules that generate category boilerplate, so every category is treated as linking to them??), similarly Unsupported titles/`lcub``lcub``lcub`1`rcub``rcub``rcub` is supposedly linked-to from 1,800 pages such as Template:sv-noun-irreg-c; can we clear up why that's happening? - -sche (discuss) 15:27, 10 April 2024 (UTC)[reply]

Support Overdue. Many are not at all correctable manually. Seems to be some kind of artifact of some module generating some kind of inherited link. Whenever there are 500K+ members of such, its a good bet that some automagical process of the black variety is generating it. {{auto cat}} is often involved. DCDuring (talk) 16:23, 10 April 2024 (UTC)[reply]
Some of the problem seems to simply be that fewer contributors are motivated to add the pages than are motivated to add the wants (bright shiny objects). Generating many of the pages would seem almost as automatable as generating the wants. DCDuring (talk) 16:46, 10 April 2024 (UTC)[reply]

@-sche The reason why that's happening is that that's the procedurally-generated title for the hypothetical page {{{1}}}, which means that someone has probably put that into a link template somewhere. Theknightwho (talk) 17:06, 10 April 2024 (UTC)[reply]

So, how does one find the problem and eradicate it? If it occurs in Module space, few of us can be trusted to correct it. DCDuring (talk) 17:46, 10 April 2024 (UTC)[reply]

Looking for "unincorporated communities" in Module space one finds

Module:place/shared-data

"[[census-designated place]]s", ["unincorporated communities"] = "[[w:unincorporated community|unincorporated communities]]", ["places"] = "places of all... 128 KB (13,792 words) - 16:24, 2024 March 12

Module:category tree/topic cat/data/Places

[[city]]"}, {"towns", {"polities"}}, {"townships", {"polities"}}, {"unincorporated communities", {"places"}}, {"valleys", {"places", "water"}}, {"villages"... 28 KB (3,599 words) - 20:53, 2023 November 12

Also some in Module:User. None have square double brackets, so some code must add the brackets. Who other than our technomavens can save us from this kind of problem? DCDuring (talk) 17:57, 10 April 2024 (UTC)[reply]

@-sche @Theknightwho regarding Unsupported titles/`lcub``lcub``lcub`1`rcub``rcub``rcub` and friends, the todo list WT:Todo/Lists/Entries linking to raw template syntax is relevant. Unfortunately the entries on this todo list are very difficult to clean up, because the large majority of entries relate to inflection templates that have not been filled in, and these require knowledge of the language in question. Moreover, the todo list is a bit of a mess due to the limitations of the way it's generated. I have ideas to improve this list so it's easier to work through – showing the template name, for instance. This, that and the other (talk) 22:49, 10 April 2024 (UTC)[reply]

A lot of the pages I noticed "wanting" such things were in userspace, and sometimes quite old; perhaps we could HTML-comment-out or otherwise suppress 'linking' on such pages (old userspace and sandbox pages). - -sche (discuss) 22:55, 10 April 2024 (UTC)[reply]

I've done that to the most repetitive lists on my own user subpages. The easiest cases should be the ones that have dates on them, for which we could comment out or nowiki all but the most recent page. We could also solicit the views of the user. DCDuring (talk) 00:51, 11 April 2024 (UTC)[reply]

Intermittently, I go to these special pages reports to clear them out and there are so many pieces of old cruft that has been hanging around for a decade-plus. —Justin (koavf)❤T☮C☺M☯ 04:27, 11 April 2024 (UTC)[reply]

@-sche @DCDuring @Theknightwho The things like Module:labels/data/lang/mk/functions are because the code in Module:labels checks for the existence of a .../functions module and loads it if so, to get a postprocessing function. Currently the only one that exists is Module:labels/data/lang/zh/functions. This doesn't occur with e.g. nonexistent versions of Module:labels/data/lang/LANG because we have a manually curated list of all the languages with data modules (this exists because I thought it would reduce memory and/or be faster but I don't think it is). We could create such a manually curated list for functions modules as well but it would be extra manual effort for no gain other than keeping Special:WantedPages cleaner. As for the things like unincorporated communities, that is coming from the {{place}} code and *might* be fixable, I'd have to take a look at the code (but it might not be; the code might check for the existence of a plural page before falling back to the singular version, and the check for a plural automatically generates a link that gets added to Special:WantedPages, and there's no way to tell the code to not count this particular check). In general though I think the ideal solution would be a customizable Lua function or template that can be run to determine whether to include a given page in Special:WantedPages, or failing that, a blacklist containing regexes listing the pages we don't want included. (If the blacklist could be generated on the fly, it would be essentially as good as the ideal solution.) So you might want to file a Phabricator ticket requesting this functionality. Benwing2 (talk) 09:00, 11 April 2024 (UTC)[reply]

But why does the software check for the existence of hundreds or thousands of modules when only one exists? Who are developing them? What is the schedule for their development? Why are resources gobbled up far in advance of need? DCDuring (talk) 13:56, 11 April 2024 (UTC)[reply]

@DCDuring That isn't what's happening: it's checking whether such a module exists, and uses it if it does. Theknightwho (talk) 14:01, 11 April 2024 (UTC)[reply]

@User:Theknightwho But @User:Bewwing2 stated above "Currently the only one that exists is Module:labels/data/lang/zh/functions." Why make it harder for the humans who use such pages to clean up errors that are beneath the notice of our error-filtering and -correcting software when there is so little benefit? DCDuring (talk) 14:36, 11 April 2024 (UTC)[reply]

@DCDuring Because presumably there will be more in the future. It's brand new. Theknightwho (talk) 14:52, 11 April 2024 (UTC)[reply]

@DCDuring I've changed Module:labels to use a different method to check whether the module exists which I think will stop these showing up, but I'm not certain. It's also a more efficient way of checking this anyway, so it was worth doing. Theknightwho (talk) 15:07, 11 April 2024 (UTC)[reply]

Thanks. I appreciate your doing it even before you knew it would be more efficient. DCDuring (talk) 15:21, 11 April 2024 (UTC)[reply]

Unfortunately, this hasn't worked - they're still showing up. Theknightwho (talk) 15:10, 13 April 2024 (UTC)[reply]

The other alternative is for me to reimplement Special:WantedPages as a todo list. This has the advantage that it can be customised to exclude unneeded pages and potentially be updated more frequently. This, that and the other (talk) 23:50, 11 April 2024 (UTC)[reply]

@User:This, that and the other Now that you mention it, an advantage of having our own version of this would be that we could make sure that only redlinks from principal namespace (and possibly others) were included. Getting rid of User space was mentioned by -sche, but Appendices, Wiktionary pages, Rhymes, talk pages, and others are of lesser importance. IMO, principal namespace most merits frequent updating, but any namespace might merit a run if someone was committed to work on it. Sorting by script and, to the extent possible, by language might make the lists much easier to work with. Cleaning out an entire list for a language (or anything else) one is interested in can be motivating. DCDuring (talk) 00:54, 12 April 2024 (UTC)[reply]

Template garbage in drag noun senses 9 and 10

Red error text says "Template:tracking/defdate/hyphen". Equinox ◑ 19:49, 11 April 2024 (UTC)[reply]

@Equinox Fixed. Benwing2 (talk) 22:24, 11 April 2024 (UTC)[reply]

-ment cleanup

Hello,

The following categories do not distinguish between terms suffixed with -ment (forms adverbs) and -ment (forms nouns):

Would it be possible to have a bot run through these categories, check a given term's part of speech, and assign id2=nominal or id2=adverbial to its etymology? The senseid's are all set up and the categories are ready to be populated. Nicodene (talk) 02:12, 12 April 2024 (UTC)[reply]

One can probably create a good list of the would-be members of such categories easily using Cirrus Search, depending only on uniform use of inflection-line templates. Do we really need permanent categories? DCDuring (talk) 14:27, 12 April 2024 (UTC)[reply]

For readers who do not know how to do that, a proportion just a hair under 100%. Nicodene (talk) 20:55, 12 April 2024 (UTC)[reply]

I was thinking that the categories are mostly useful to contributors and regular users, both classes of which might come to learn Cirrus Search (esp. regexes). I'd bet relatively few others use our categories at all. (I'd love it if we had facts about such matters.) DCDuring (talk) 14:52, 13 April 2024 (UTC)[reply]

@Nicodene I implemented this but haven't run it yet because I notice under Category:Catalan terms suffixed with -ment we have both the empty categories Category:Catalan terms suffixed with -ment (nominal) and Category:Catalan terms suffixed with -ment (adverbial) that you created, as well as partly-filled categories Category:Catalan nouns suffixed with -ment‎ and Category:Catalan adverbs suffixed with -ment‎ that predate your latest changes. What do you think should be done here? Should we adopt your new naming, the old naming, or something else? Benwing2 (talk) 07:43, 14 April 2024 (UTC)[reply]

@Benwing2 Thank you very much.

That’s interesting- I hadn’t noticed that about Catalan. I would favour the new naming as consistency across the languages is nice to have. Nicodene (talk) 18:01, 14 April 2024 (UTC)[reply]

@Nicodene Should be done except for Middle French estrangement, which needs cleanup. Benwing2 (talk) 02:55, 15 April 2024 (UTC)[reply]

Done. I suppose the confusion was because there are/were two different estrangement's. Nicodene (talk) 00:56, 16 April 2024 (UTC)[reply]

Character U+0486: COMBINING CYRILLIC PSILI PNEUMATA breaks Old Cyrillic transliteration

For whatever reason it seems that any Old Cyrillic quotations that contain the character U+0486: COMBINING CYRILLIC PSILI PNEUMATA fail to render a transliteration; I've been unable to discover why. Anyone have any ideas? The relevant module is at Module:Cyrs-translit; examples of broken quotes are at даждь (daždĭ) and кънигꙑ (kŭnigy). — Vorziblix (talk · contribs) 19:06, 12 April 2024 (UTC)[reply]

I believe this is because the transliterate function in Module:languages, if it finds any characters of the original script in the transliteration, removes all Latin characters from the transliteration and then checks if the language-agnostic majority script of the remaining characters is not equal to None. But it was checking if a script object (table) was equal to a string, so I changed it to compare the script code. I guess this problem has occurred for a while, but because transliteration functions usually convert all characters in the original script, the problem wasn't very prominent. — Eru·tuon 04:12, 13 April 2024 (UTC)[reply]

@Erutuon: Many thanks for the fix! — Vorziblix (talk · contribs) 13:37, 16 April 2024 (UTC)[reply]

Exempt Template:REEHelp from CAPTCHA confirmation?

Could the REEHelp template be exempted from needing anti-robot/anti-spam CAPTCHA confirmation for adding external links? Or is there a realistic risk of this being hijacked with usage like famous.celebrity@private-emailaddress.co - OneLook - Google (Books • Groups • Scholar) - WP Library or www.self-promotion-for-my-own-website.com - OneLook - Google (Books • Groups • Scholar) - WP Library? —DIV (1.145.112.83 09:10, 13 April 2024 (UTC))[reply]

I don't think this risk is worse than various other simple ways of accomplishing more or less the same thing. DCDuring (talk) 16:11, 13 April 2024 (UTC)[reply]

Apparently we can add the relevant URLs as regexes to MediaWiki:Captcha-addurl-whitelist.

There is a workaround in the meantime though: you could create an account for yourself. That would save you a lot of trouble! This, that and the other (talk) 22:44, 13 April 2024 (UTC)[reply]

@This, that and the other: Well, then, add quran.com and sunnah.com, as chosen for {{RQ:Qur'an}} and {{RQ:Sunna}}, for a beginning. It will greatly improve our closure rates and quotation coverage, you know those pesky IPs, being socialized muslim, dealing with the Qurʔān and the Sunna every day, so there will be a low-threshold motivation to expand our dictionary. I would add even more frequently needed domains, but I may have an unusual risk profile. However it be, at least some websites will be supported in the long run and do not maintain user-supported content nor ads. Fay Freak (talk) 22:57, 13 April 2024 (UTC)[reply]

Thanks for the input, This, that and the other.

It sounds like it would be worthwhile to consider a moderate expansion in that whitelist. (By the way, the whitelist is currently unpopulated??)

For comparison,

usage of the KJV template:
1611, The Holy Bible, […] (King James Version), London: […] Robert Barker, […], →OCLC, Genesis 32:15:
Thirtie milch camels with their colts, fortie kine, and ten bulles, tvventy ſhee aſhes, and ten foales.

likewise triggers a CAPTCHA confirmation; whilst

cross-references to W:Wikipedia use a different syntax and don't trigger a CAPTCHA confirmation.

Trouble? Some might say that's my middle name ;-)

Following Fay Freak's comments, I suggest that IP editors are underrepresented among correspondents on the WT Community pages.

—DIV (1.145.112.83 23:34, 15 April 2024 (UTC))[reply]

Rather than us having to whitelist the URLs from templates that you use on a regular basis, we would all have an easier time if you created an account. WikiDIV is not taken, for instance.

Whitelisting REEHelp is probably worth doing either way, but preparing the regexes will take some effort. This, that and the other (talk) 23:45, 16 April 2024 (UTC)[reply]

It's up to you (all), of course. But I strongly recommend that you don't think of it as doing me a personal favour. Think of it as whether it's worthwhile for the broad population of editors. For instance, I don't think I had ever used the KJV template until I posted the example in this discussion. —DIV (1.145.112.83 06:03, 18 April 2024 (UTC))[reply]

XFAIL feature for Module:UnitTests?

Would it make sense to be able to label "expected failures" in Lua module tests? Sometimes a new feature is still under development and doesn't work yet or there's a workaroundable bug that doesn't need urgent attention. So having failing tests for these corner cases is useful, but they should not affect the overall module test verdict and there's no need to list the module in Category:Failing_testcase_modules. --Ssvb (talk) 07:36, 14 April 2024 (UTC)[reply]

@Ssvb Yes, absolutely, this should be present. Benwing2 (talk) 07:37, 14 April 2024 (UTC)[reply]

Checkparams related cat:E flood

This morning, at about 9:30 UTC, I made an edit to {{lt-noun-m-is-1}}, a template that nowadays invokes function error from Module:checkparams, and I thereby unleashed a flood of 'module errors' because, since I first looked at it, it uses parameter {{{2}}}, but not {{{1}}}, for which callers have typically provided a value to be consistent with other Lithuanian noun and adjective inflection templates. I suspect someone was experimenting with a key part of the module's functionality, for it had completely failed to report an attempt to provide a then unsupported parameter, namely |n=. I then added support for that parameter, and cat:E then started to flood. At about 9:45 UTC I then switched off the unhelpful reporting of supplies of non-blank values for {{{1}}}, mostly relevant for pages that have not been edited recently, and cleaned up cat:E.

I don't know what was going on. Module:checkparams has not been edited for days, and @Theknightwho and @JeffDoozan had not been active for hours. Also, the change to {{lt-noun-m-is-1}} by 'AutoDooz' to invoke Module:checkparams has change history comment, "no existing calls with bad parameters, throw error instead of warning to avoid future misuse", but page truputis has been passing non-blank {{{1}}} to the template for years.

Are there any other lurking issues like this, or was it just a one off? --RichardW57m (talk) 12:52, 15 April 2024 (UTC)[reply]

It's the <onlyinclude> tags. The old version had them so the call to checkparams never actually ran and the category Category:Pages using bad params when calling Template:lt-noun-m-is-1 stayed empty. The bot interpreted the empty category as a sign that there we no bad calls and switched from 'warn' to 'error'. When you removed the <onlyinclude> tags in your edit, it caused checkparams to start running and throwing errors on the pages with bad calls. I'm sure there are more templates using <onlyinclude> where this is lurking, but now that we know it's a problem, it should be pretty easy to find and fix this automatically. Thanks! JeffDoozan (talk) 13:43, 15 April 2024 (UTC)[reply]

@JeffDoozan: Well done reading past my error - it was {{lt-noun-m-tis-1}} that I changed. I don't understand your explanation - <onlyinclude>...</onlyinclude> didn't bracket the call into Module:checkparams. Have you got some logic that says 'Don't check if no parameters are used'?

There are several Lithuanian declension templates that have this construct, and I find it detracts from the documentation page. It may explain why the checking seemed not to be working in some other cases. --RichardW57 (talk) 21:08, 15 April 2024 (UTC)[reply]

@RichardW57: The existance of <onlyinclude>...</onlyinclude> caused the parser to treat only the text inside the tags as code and everything outside the tags as documentation, including the call to checkparams. When you removed the <onlyinclude>...</onlyinclude>, it reversed the logic and made the parser treat everything on the page as code except the documentation inside <noinclude>...</noinclude>. See here for the Mediawiki documentation that might explain it better than I can. I manually adjusted the ~10 other templates that had checkparams with <onlyinclude> (and switched them from 'error' to 'check' to avoid flooding :CAT:E) so AFAIK everything should be working as expected. I'm not sure what you mean by logic that says 'Don't check if no parameters are used' or the construct used by the other Lithuanian templates so if I haven't answered your question, please give me an example or a link to the other templates. JeffDoozan (talk) 23:27, 15 April 2024 (UTC)[reply]

@JeffDoozan: I'd misunderstood the tag. What's weird is that text inside the tags was not displaying when viewing the template - or there was something else going on that was suppressing the output when viewing template pages. --RichardW57 (talk) 07:49, 16 April 2024 (UTC)[reply]

@RichardW57 It sounds like you're confusing <includeonly> and <onlyinclude>. The three tags are:

<noinclude>: this text won't be transcluded.
<includeonly>: this text will only be transcluded.
<onlyinclude>: only the text between these is allowed to be transcluded.

In other words, <onlyinclude> effectively determines what is treated as the page for the purpose of transclusion (so you can imagine the default position to be the whole page being between a pair of <onlyinclude> tags), and then the other two sets of tags are then applied on top of that. Theknightwho (talk) 13:52, 16 April 2024 (UTC)[reply]

Should "the" be linked?

Template:en-noun#Other_parameters gives a way to put "the" in the inflection template for an entry, which is: "def=1". I recently saw that removed in favor of adding "the" (linked)- see [4]. Under "def=1", the "the" is not linked. Is this intentional? I have no opinion on the matter. --Geographyinitiative (talk) 19:54, 15 April 2024 (UTC)[reply]

@Geographyinitiative I don't know why User:LlywelynII made that change, which seems counterproductive. I have no strong opinions either on whether to link the word "the". Benwing2 (talk) 21:25, 15 April 2024 (UTC)[reply]

@Geographyinitiative Ditto. No strong feeling. Nearly anyone able to make sense of the entry presumably already knows how the English definite article works. Two minor caveats are (a) future improved machine translation might change that for some users and (b) it's just better to have all the headwords linked for the curious imo. You'll notice the of was already linked in the previous version anyway. It's a very minor point and I know where the coder was coming from, but my own preference is that the utility of online dictionaries is linking and people who just have an aesthetic aversion to blue links can always tell their browser not to display them. Extra weight to the link for being part of a headword, even though I probably wouldn't link it in the definition of some sense. Ofc will defer to standing policy if there is one. — LlywelynII 08:26, 16 April 2024 (UTC)[reply]

Two questions, then.

What is the ultimate explanation of why "Fitz" can mean "Fitzwilliam College" while "the Fitz" means "the Fitzwilliam Museum"? (We currently lack entries for both of them.) A popular meme was students accidentally misdirecting tourists looking for the Fitz to Fitz - they're at opposite ends of the city.

How does a Wiktionary user switch off blue links on the definite article, but not other words? --RichardW57m (talk) 12:09, 16 April 2024 (UTC)[reply]

I really don't like linking the in this way, for the same reason we don't link every word in a definition. I think @Fay Freak has expressed similar opinions on this. LlywelynII seems to display the same misunderstanding of the problem now as he has in the past, which is that linking everything removes the prominence of the words which are important to link; it's not that people dislike the colour blue. Theknightwho (talk) 13:41, 16 April 2024 (UTC)[reply]

We could, theoretically, have a separate wikilink to each individual character, but, like this, it would just waste the time of anyone clicking the links. Chuck Entz (talk) 13:54, 16 April 2024 (UTC)[reply]

Oppose linking "the". Ioaxxere (talk) 15:47, 16 April 2024 (UTC)[reply]

Parameters type2 and journal2 of Template quote-journal

The examples of {{quote-journal}} show |journal2=, but its use in zacusi was causing the module error "Lua error in Module:quote at line 2660: Parameter "journal2" is not used by this template.". I have just fixed it by inserting |type2=journal, but neither |type= nor |type2= is documented in the lists of parameters. When is it needed? When is it allowed? What are its values? --RichardW57m (talk) 11:36, 16 April 2024 (UTC)[reply]

Its mentions are a bit more helpful in {{quote-book}}, but the same complaints apply. --RichardW57m (talk) 11:43, 16 April 2024 (UTC)[reply]

Sorry, I need to document this. The use of |type2= is correct here; I did this and made it default to book because it was more common to have journal articles quoting book entries than quoting another journal article. Benwing2 (talk) 18:53, 16 April 2024 (UTC)[reply]

MediaWiki Common.css issues

Several things:

There are three errors in MediaWiki:Common.css. These are on lines last edited by User:Erutuon. Are these real errors or just cases where the CSS editor isn't up to date?
How do you really force changes to MediaWiki:Common.css to take effect? The instructions say to "Reload" for Chrome but this does nothing. Eventually (10 minutes?) it seems to take hold, but that's a long time to wait.
Opinions on how to best display deprecated labels like color ((color)). Currently I made them show as green (same as deprecated templates) and struck-through, although I don't (yet?) see the strike-through; maybe I have to wait awhile.

Pinging User:This, that and the other as our resident CSS expert. Benwing2 (talk) 02:22, 17 April 2024 (UTC)[reply]

@Benwing2

The three errors relate to the use of :has(), which is a very recent addition to CSS. Indeed, Firefox only gained support for it in December 2023. The MediaWiki CSS editor must not be up to date with this new feature.
I believe the site CSS is cached for 5 minutes. After making a change, wait 5 minutes, then press Ctrl+Shift+R or Cmd+Shift+R in your browser to witness the effects. It's always a good idea to test changes in an incognito/private browsing window too, just to make sure they work for readers who do not have any gadgets enabled.
On the word "color" I see strike-through and a green color which is very subtle, although that may just be my poorly adjusted monitor.

This, that and the other (talk) 03:14, 17 April 2024 (UTC)[reply]

@This, that and the other OK thanks. I suppose there's no way to avoid waiting the 5 minutes? As for the color, it is set to darkgreen which maybe isn't the best choice. When not a link it appears as olivedrab, which looks like this: olivedrab; maybe we should make links that way too? Benwing2 (talk) 03:33, 17 April 2024 (UTC)[reply]

@Benwing2 The 5 minutes is in place to ease load on the servers. You can attempt to load the page with ?debug=true at the end of the URL, which bypasses all caches (both on the WMF server-side end and your client-side end), but this (intentionally) loads the page's JS and CSS very slowly, so may not actually be faster in the end. As for colors, I don't particularly have an opinion other than to say that dark green is a very difficult colour to "get right". On many displays it can be practically indistinguishable from black. This, that and the other (talk) 07:42, 19 April 2024 (UTC)[reply]

What put a bunch of Albanian entries into the "Latin terms with quotations" category?

All of the 'Newest pages ordered by last category link update' in Category:Latin terms with quotations are Albanian. I looked at the pages e.g. majth but couldn't notice what was causing it (most don't even seem to have quotations of any kind). They also weren't edited recently, so I assume the categorization change was caused by a bug somewhere else. Urszag (talk) 13:24, 17 April 2024 (UTC)[reply]

@Urszag: It's the use of {{R:sq:Bardhi:1635}}, which includes a |passage= from a Latin-Albanian dictionary and passed |lang=la to {{cite-book}}, plus my recent change to {{cite-book}}'s programming to use Module:quote instead of {{cite-meta}} to make it work like {{quote-book}}. I adjusted {{R:sq:Bardhi:1635}} to use |worklang=la,sq instead of |lang=la, which will avoid classifying the passage as a Latin quote. JeffDoozan (talk) 13:53, 17 April 2024 (UTC)[reply]

Aramaic and Nesting Dialects in English Translations

When translating words in English entries, I see some users add dialects of Aramaic (e.g. "Assyrian Neo-Aramaic", "Syriac", "Turoyo", "Mandaic", etc.) without nesting them under the banner "Aramaic". Is there a way to get a bot or something to do that automatically (like in this edit)? Would there be a technical page somewhere that has a master list of which dialects would be nested under which language? --334a (talk) 22:09, 17 April 2024 (UTC)[reply]

@334a Yes, this is possible, although I'd like to hear from other Aramaic editors to verify they are on board with this @Rhemmiel, Shuraya, Fay Freak. Benwing2 (talk) 23:17, 17 April 2024 (UTC)[reply]

Yes, this seems like a good idea instead of all the Aramaic languages being spread throughout Shuraya (talk) 04:52, 18 April 2024 (UTC)[reply]

I too prefer this nesting. It must be like this because the labels are quite idiosyncratic and most usually the general interest of someone seeking translations is just hopefully getting anything in any Aramaic anyway. Fay Freak (talk) 23:53, 17 April 2024 (UTC)[reply]

OK, I modified my existing sort-translation-lines script to indent Aramaic lects. Just verifying however that we want all such lects indented. See the family tree under Category:Aramaic language. This includes not only lects ending in "Aramaic" but also "Mlahsö", "Turoyo", "Classical Syriac", "Hulaulá", "Hértevin", "Koy Sanjaq Surat", "Lishana Deni", "Lishanid Noshan", "Lishán Didán", "Senaya", "Classical Mandaic" and "Mandaic". Benwing2 (talk) 02:07, 18 April 2024 (UTC)[reply]

@Fay Freak @Rhemmiel @Shuraya Can one of you help me with Aramaic lects? I am trying to clean up existing indented languages. Under "Aramaic" we have 14 occurrences of "Jewish", 57 occurrences of "Jewish Aramaic", 5 occurrences of "Jewish Literary Aramaic", 199 occurrences of "Hebrew", 2 occurrences of "Hebrew Script" and 4 occurrences of "Biblical Aramaic". Some examples:

month: Jewish Aramaic: יַרְחָא m (yarḥā)
eagle: Jewish Aramaic: נִשְׁרָא m (nišrā)
Greece: Jewish Aramaic: יון f (yāwān)
Syria: Jewish Aramaic: סוּרְיָא f (sūryā)
this: Jewish Aramaic: הנא m (hānā), הדא f (hāḏē)
all: Jewish Aramaic: כּוֹל (kôl), כָּל (kol)
daughter: Jewish Aramaic: בְּרַתָּא f (bərattā)
rue (plant): Jewish Aramaic: פיגנא m (peygānā)
tooth: Jewish Aramaic: שִׁנָּא f (šinnā)
with ("chez"): Jewish Aramaic: עם (ʿam, ʿim)
with ("by means of"): Jewish Aramaic: ב־ (b'-)
sleep: Jewish Aramaic: שֵׁינְתָא f (šênəṯā), שִׁנְתָא f (šinṯā)
word: Hebrew: מלתא c (melthā, meltho)
elephant: Hebrew: פילא m (pīlā’), פילתא f (pīltā’)
noun: Hebrew: שמא m (šmā’)
hour: Hebrew: שעתא f (šāʕtā’)
verb: Hebrew: מלתא f (miltā’)
two: Hebrew: תרין m (trēn), תרתין f (tartēn)
because: Hebrew: מִטּוּל (miṭṭūl)
weapon: Hebrew: זינא m (zaynā’)
Lebanon (mountain range): Biblical Aramaic: לִבְנָן (liḇnān)
Jerusalem: Biblical Aramaic: יְרוּשְׁלֶם (yərûšlem)
sambuca (musical instrument): Biblical Aramaic: סַבְּכָא m (sabbəḵā)

Based on these examples, is there any way to convert them to the correct lect? I also notice under "date (fruit)", we have three distinct entries for Jewish Babylonian Aramaic, Jewish Literary Aramaic and Jewish Palestinian Aramaic. What is Jewish Literary Aramaic? Benwing2 (talk) 22:17, 25 April 2024 (UTC)[reply]

@Benwing2: JLA is JPA with JBA vocalization upon it. As it was mixed, Jewish Aramaic is all three. Pure JPA has no vowels attested, since Tiberian vocalization, Palestinian vocalization, Babylonian vocalization was not even invented, though we often can assume the same ones, and especially for such basic words the terms will be the same. So we saved some space and Lua memory by not repeating the templates. Given the claims made by transcriptions given, we save time again and move all mentioned entries to Jewish Babylonian Aramaic, and I guess fewer than 10 doubtful ones will be left.

Christian Palestinian Aramaic is the same chronolect and regiolect as Jewish Palestinian Aramaic by sectarian division, likewise not attested with diacritics, and I suppose that I am the only one who ever added it, so any occurrences of mere “Aramaic” in Syriac script will also be Classical Syriac, unless it was me who entered such a thing, which I don’t remember, I haven’t even used the more likely label “Christian Aramaic”, nor facetiously misleadingly wrote “Syriac” when meaning both Classical Syriac and so-called Palestinian Syriac, which is a synonym of CPA. Fay Freak (talk) 23:20, 25 April 2024 (UTC)[reply]

@Benwing2: I have not mentioned that Hebrew-script Aramaic entries by 334a (talk • contribs), at least the older ones, are not even Jewish Aramaic, created by acquaintance with secondary or primary sources, but by reverse-transcribing Classical Syriac :/ – sometimes the inflections given in the headwords are even wrong for this, and sometimes the Jewish impression is feigned: Talk:פלסטין. It is plausible that some clueless driveby editors added them as translations.

Clearly I must have digged around and avoidance-doomscrolled too much to know this circumstance as the sole person of Wiktionary, not even properly learning Aramaic. Fay Freak (talk) 23:35, 25 April 2024 (UTC)[reply]

@Fay Freak Yuck. What would you suggest I do? Should we remove all translations by User:334a? Benwing2 (talk) 23:48, 25 April 2024 (UTC)[reply]

@Benwing2: I cannot suggest it if you don’t make a list; I have not seen if and how 334a added translations since this was long before I could have grasped the import of such a website, when I was 13 or something and had my first Latin exercises, so you just gave me the idea, previously dismissed by me, that she might have expanded the translation tables of English entries as well, in that dark time when Google Books was new, with later everything buried in the backlogs of the contribution lists. If that be so (or we can retrieve entries created by 334a which are also linked in translation tables, if only the driveby editors added them there), it can serve a to-do list for someone else and then we can ignore it. Perhaps you can automatically retrieve all fitting links to CAL so one can just click and look whether it exists in the claimed lects and/or corresponding references.

I just mentioned a point that you could think about, I don’t want to press or suggest anything since with respect to our mental hygiene, myself I didn’t want to make hands dirty with commencing Aramaic readings and you are also way too important to be urged into action 🧡. We won’t reach up to CAL’s standing as a comprehensive reference work in the foreseeable future, only can save face as not being a ghostword-sink, which we so far do successfully, and the close-to-zero access frequency of most Old and Middle Aramaic entries quietens the matters. I have learned that the limbic system magnifies fear of future disadvantageous outcomes 2× whereas we only circuitously conclude eventual rewards. So I reframe the present bias to delayed gratification.

We are at a terrific point if we have to argue our case to do nothing. The problem itself was found in an unhealthy manner, problematic internet use. This is why I try not to suggest anything, only pass on some of my otherwise rare observations and cognitive capabilities, in case someone is absolutely morally obliged to act out something. I can go through Aramaic when I am retired in the 2060s, who knows which tools, corpora and capabilities one will have available to compare with and against. If man has such an identity. There is little to regret. We are super meticulous. Fay Freak (talk) 01:24, 26 April 2024 (UTC)[reply]

@Fay Freak, @Benwing2, in my defence, I started editing Wiktionary back in 2007. At the time, standard practices were vastly different, and the rules and regulations regarding languages and their dialects were not what they are today. I didn't even have access to a Syriac keyboard; I had to make the very first entries by constantly flipping back and forth between tabs and painstakingly copy-and-pasting each letter individually from the summary table at the Wikipedia entry for the Syriac alphabet.

When I started editing, there had already been a few "Aramaic" entries and, as far as I know, they had all been written in the Square (Hebrew) script. No Syriac script or Imperial script or anything else. As far as I could tell, there was no distinction Wiktionary was making between dialects, and it's not like I was adding Syriac words under the heading of "Imperial Aramaic" or "Jewish Palestinian Aramaic"--it was all just "Aramaic". It was a free-for-all compared to today's practices. It's not uncommon for dictionaries to use a nontraditional script for a language anyway, even today: Sokoloff's Syriac Lexicon (published in 2009) mentions Mandaic words but uses Hebrew script to transliterate them.

Today, I would argue that it wouldn't be correct to label anything under the broad heading "Aramaic" on Wiktionary. It'd be like labelling all Mandarin entries under "Chinese" and then forcing every other dialect of Chinese to specify which dialect they are (Cantonese, Wu, etc.). There is no standard dialect of Aramaic in the same way there is, say, of Arabic. This is an entirely different topic though and should have its own discussion.

The vast majority of Hebrew script entries which only have a Syriac definition have been deleted for over a decade now. There are still some lingering here and there, for which I apologize. I do try to make a point of fixing them to the best of my knowledge and ability when I come across them, though life is keeping me busy and I'm not as active as I used to be. --334a (talk) 17:46, 3 May 2024 (UTC)[reply]

cleanup run on Nordic language lemmas

I am planning on doing a cleanup run on the lemmas in Swedish, Danish, Norwegian Bokmål, Norwegian Nynorsk and Icelandic. Hopefully these changes are noncontroversial. I have done similar runs on several languages before without complaints. The cleanups are:

Templatize raw links occurring in list format in certain sections (e.g. ==Derived terms==, ==Related terms==); e.g. * [[meio ambiente]] occurring in a ==Derived terms== section of a Portuguese lemma would turn into * {{l|pt|meio ambiente}}.
Detemplatize English links occurring in definitions, e.g. {{l|en|ambient}} -> [[ambient]]; but the opposite change happens when the English term is spelled the same as the pagename (because raw links to the same page turn into unlinked bolded terms). Note that since the JavaScript change of User:This, that and the other, raw links automatically link to the English section, so the extra templated linking has no effect except to make the Wikicode harder to read and edit.
Templatize raw category references to use {{C}} (for topical categories) or {{cln}} (for poscat categories), if possible; other categories are left alone. Also standardize category references using different aliases (e.g. {{topics}}) to use these names.
Convert synonyms and antonyms in ==Synonyms== and ==Antonyms== sections into inline synonyms and antonyms specified using {{syn}} and {{ant}}, when it is safe to do so. (Approximately, either (a) there's only one definition, or (b) there are {{sense}} tags associated with each synonym or antonym and all of them can be uniquely matched up with definitions.)
Convert raw links and {{l}} links in ==Alternative forms== sections into {{alt}} links.
Put Wikipedia boxes in a standard position. (Approximately, if there's only one part of speech in a given Etymology section, the Wikipedia box goes at the top of the Etymology section. If there's more than one part of speech, the box stays where it is because it might be associated with that part of speech.) Note, this only affects Wikipedia boxes, not inline Wikipedia links (using {{w}} or similar) or single-line Wikipedia links (using {{pedia}} or similar).

Benwing2 (talk) 05:11, 18 April 2024 (UTC)[reply]

@Benwing2: I suggest you keep {{l|en}} links with disambiguating parameters, especially |id= but also |pos=. You should probably also keep those with the alternative parameter. --RichardW57m (talk) 09:38, 18 April 2024 (UTC)[reply]

@Benwing2: just wanted to point out that I do use {{l}} in definitions when I need to provide a gloss for a term. — Sgconlaw (talk) 09:39, 18 April 2024 (UTC)[reply]

@Benwing2, Sgconlaw: Probably best to only detemplatise calls of {{l}} only when its only parameters are |1= and |2=. --RichardW57m (talk) 11:26, 18 April 2024 (UTC)[reply]

@Sgconlaw @RichardW57m My current script only replaces links of the form {{l|en|foo}} -> [[foo]] and {{l|en|foo|bar}} -> [[foo|bar]]; I should have clarified this. If there are any other params, the template is left alone. It's also smart enough to replace e.g. {{l|en|olive tree|olive trees}} with [[olive tree]]s. Benwing2 (talk) 19:52, 18 April 2024 (UTC)[reply]

I support 1,2,3,5,6. I would not touch the synonyms/antonyms. Thadh (talk) 11:14, 18 April 2024 (UTC)[reply]

Derived terms tool

We really need a tool to quickly add all these unlinked Derived terms. Doing it manually destroys my soul P. Sovjunk (talk) 18:49, 18 April 2024 (UTC)[reply]

I can't help you with a JavaScript tool but you might be able to make use of a new-entry creation template similar to the ones that exist for Japanese, Thai, etc. if that is what you're looking for. Benwing2 (talk) 22:10, 18 April 2024 (UTC)[reply]

MAybe. Whatcha got? P. Sovjunk (talk) 22:24, 18 April 2024 (UTC)[reply]

@P. Sovjunk Nothing yet but I could maybe be persuaded to write something if you'd actually use it. Take a look for example at the documentation of {{ja-new}} and {{th-new}} and tell me if something along these lines would be helpful. Benwing2 (talk) 22:41, 18 April 2024 (UTC)[reply]

If what needs to be done is "make amiability contain a link (in the Derived terms section) to unamiability", it seems like a bot could do that, at least for cases where amiability has only one part of speech (which is probably a large percentage of cases). I couldn't write such a bot, but it seems like the sort of thing a bot could be written to do, working from that list. - -sche (discuss) 23:06, 18 April 2024 (UTC)[reply]

@-sche Hmmm, you are right, somehow I assumed the terms in question needed to be created but I see they already exist. Benwing2 (talk) 23:08, 18 April 2024 (UTC)[reply]

Yeah, useful though those templates might be, not really what's needed. -sche hit the nail on the head with the desired function: Quickly add term fooable to Derived terms section of foo. I'd like to point out that after 20 years here I still am useless at the computing side of things (and to be fair, only slightly better at the lexi-stuff and equally as lame with the social side, TBH). However, I do love making my way through a big juicy cleanup list. P. Sovjunk (talk) 06:14, 19 April 2024 (UTC)[reply]

Yes,it was pretty lame not to mention that these were only English derived terms. --RichardW57 (talk) 08:04, 19 April 2024 (UTC)[reply]

Template:lq

I am thinking of creating a template {{lq}} that would be a combination of {{lb}} and {{q}}; essentially it works like {{lb}} but doesn't categorize. The idea is it could be used in cases where {{q}} is currently used but with proper linking of lects as well as terms like archaic, dated. Does this seem like a good idea?

I should add that if we add a language code to {{a}} and change it to accept labels, there might not be a need for this; or we could have two templates, one that takes a language code and one that doesn't, both of which process labels but the latter one only processing language-independent labels (things like archaic and dated, but not Southern US, Louisiana or the like). Thoughts? Benwing2 (talk) 23:17, 18 April 2024 (UTC)[reply]

A good idea.

A change of {{a}} would of course be massive, I figure you already ballpark over a week to execute multiple bot-runs.

You seem to have no clear idea yet though, just as I, where else than in pronunciation sections this {{lq}} would be used, though I remember the feeling that I wanted to use such a thing, somewhere in the past already, which wasn’t necessarily beside pronunciations. Fay Freak (talk) 00:28, 19 April 2024 (UTC)[reply]

@Fay Freak Examples would be ==Derived terms==, ==Synonyms== sections and the like. Benwing2 (talk) 01:01, 19 April 2024 (UTC)[reply]

Also ==Translations== sections. Benwing2 (talk) 08:08, 19 April 2024 (UTC)[reply]

I'm not opposed, but will caution that the more templates that do similar things (especially if used in the same places), the more likely people will not grasp the distinction and will use one where we want the other. (We already see people use T:q, T:a, T:lb or bare formatting in place of each other, e.g. T:a for T:q in dust, fright; T:ib/T:italbrac used to also be in that mix.) If neither T:q nor T:lq (even in translations sections) categorize, I guess that's not really a problem for anything but our sense of "this should be x, not y", if the only difference is "oops, sometimes 'Hakka' isn't a link". And if we want to have a {{q}}-like thing that links, adding T:lq is easier and less disruptive to people's habits than requiring every use of T:q include a language code.
Iff we add a language code to T:a, it does sound like it could become the same thing as this, but I suppose we could always create this now and, if we later add a langcode to T:a and make it use labels, reduce it to an alias of this [or vice versa] at that time.
BTW, T:a asserts it should only be used for {{a|UK|rare}} but that ~~{{a|rare}}~~ should use {{q}}, which is evidently too arbitrary a distinction because I often see entries use T:a even for non-accent labels, like horned, devil, ゑ; if we add a langcode to T:a (which would undoubtedly take some getting used to, but again, not opposed), or even if we don't, maybe we can also abandon that "{{a|UK|rare}} but {{q|rare}} {{IPA}}" distinction? (BTW I guess bots cleaning up uses of "wrong langcode to be using in this L2" will need to know that T:lq in a ====Derived terms==== section uses the L2's langcode but in a ====Translations==== section uses the translation's?) - -sche (discuss) 16:17, 19 April 2024 (UTC)[reply]

@-sche Yeah I get your point and I appreciate your thoughtful responses. I agree that the current idea that {{a|UK|rare}} is OK but not {{a|rare}} is silly. One possibility is to add a lang code to {{a}} and repurpose it as a general alternative to {{q}} for label-like qualifiers (although it might require a bit of thought to figure out what it ought to stand for :) ...). It has the advantage of being one character shorter than {{lq}}. In any case the current state of Module:accent qualifier/data is super messy and needs cleaning up and merging with the label data. Benwing2 (talk) 20:39, 19 April 2024 (UTC)[reply]

This could be useful, or at least something similar to it. Vininn126 (talk) 16:18, 19 April 2024 (UTC)[reply]

Reduplicated emoji in citation about said emoji triggered "emoji spam" abuse rule

I tried adding an extra definition to the "🍅" entry as it is also often used (particularly reduplicated) to express disapproval, like when booing, to mimick the act of audiences throwing tomatoes at bad performances. Found a citation for it, but it seems it got auto-flagged for emoji spam (and really, any citation I would've found/used would've triggered it due to the way this emoji is used in this sense):

2024 January 15, @cragmites, Twitter‎^[5], archived from the original on 2024-04-20:
BOOOOOOOO [five tomato emoji in a row]

Big Sprinkler (talk) 15:44, 20 April 2024 (UTC)[reply]

Seems legitimate enough, so done. This, that and the other (talk) 10:02, 21 April 2024 (UTC)[reply]

Translation adder langname to langcode functionality

I'm not sure when this broke (for all I know, it could've been broken for years), but it used to be possible to type a canonical language name (rather than only a code) into the "Add translation:" field, and some javascript(?) would automatically convert the name (right there in the field, before you preview or post the translation) to the corresponding code. Now, it only converts language names to "languages/javascript-interface at [[Module". Not sure how easy this is to fix, or how much of a priority it is. - -sche (discuss) 07:07, 22 April 2024 (UTC)[reply]

@Benwing2, re diff: at one time, it used to be possible to type a language name into the "Add translation:" field, e.g. to type "French" where Aaron Liu in his example screenshot typed "cmn", and ...something... would automatically convert "French" to the code, "fr". These days, it converts to "languages/javascript-interface at [[Module" instead. Perhaps it broke when 'convert langname to langcode' was moved into a separate module rather than Module:languages? As I said above, I don't know how long it's been broken, so it's perhaps also not a high priority to fix (but it was probably useful to casual users who are probably more likely to know a language name than a language code). - -sche (discuss) 01:50, 26 April 2024 (UTC).[reply]

@-sche OK thanks, I'll take a look. Benwing2 (talk) 01:52, 26 April 2024 (UTC)[reply]

@-sche It looks like {{#invoke:languages/javascript-interface|GetSingleLanguageByLanguagePrefix|French}} is broken. Benwing2 (talk) 02:02, 26 April 2024 (UTC)[reply]

@-sche I think this got broken by this diff [6] by User:Theknightwho on Jan 14, 2024. I don't think it was a good idea to add a non-English translation of a language into the data in the first place, much less do it without properly updating all the callers to support it, so I'm going to remove it. Benwing2 (talk) 02:18, 26 April 2024 (UTC)[reply]

Fascinating: I'm not sure why that would've broken this, but sure enough the "convert langname to code" functionality works again. Thanks for figuring out the issue!
Thinking about the reason that non-English names were added in the first place (discussed in the BP disucssion TKW's edit summary linked) I wonder: do we actually need the non-English names to be present in a way that templates or modules can call like that, or would it be enough just to stick them in comments next to the relevant language codes? That way they're still present for anyone searching through the module trying to work out what code the language they want to add is. (Alternatively: like we have a separate "canonical name to code" module, put them in a separate module? if we assume we're not going to be changing what Loloish languages we include overly often.) - -sche (discuss) 02:55, 26 April 2024 (UTC)[reply]

@-sche I think TKW's idea was that they could be displayed along with the aliases/varieties/otherNames data on the respective language page (maybe?). But the reason it broke was that the code for GetSingleLanguageByLanguagePrefix depends on another module that generates a map from language names (including aliases and otherNames) to codes, and that code couldn't handle the change in structure of the aliases field with the result that a table instead of a string got inserted into the key portion of the map. Benwing2 (talk) 04:03, 26 April 2024 (UTC)[reply]

@Benwing2 @-sche My bad - I had meant to implement this, and then never got round to it. I'd still like to be able to input the data properly, because the intention is for the translated names to be listed at the language's category page. As I said in the original thread, I would strongly oppose mass-adding translations - they're only supposed to be for languages where the academic literature is primarily non-English, since it facilitates research into the language. The example I used in that thread was the Loloish languages, since there are lots of them, many have very similar names, they all have at least 2 (some up to 5) in both English and Chinese, and there are several instances of separate languages sharing the same name. Theknightwho (talk) 10:16, 26 April 2024 (UTC)[reply]

Latin-script footer not responding to expansion

I may be editing in the wrong place, but I expanded the list of letters with palatal and retroflex hooks at Template:mul-script/Latn-list, and they're not visible on pages. For retroflex what displays is ᶏ ᶒ ᶖ ɭ ɳ Ʈʈ ᶙ ʐ ᶚ, which isn't even in the same order, and for palatal it's just ᶀ ᶁ, which is missing the most common letters. If the list is kept elsewhere, should the one in the template be replaced with a note on its current location? kwami (talk) 10:05, 22 April 2024 (UTC)[reply]

@Kwamikagami: If I'm reading the code right, the main template is just for choosing which group in {{Template:mul-script/Latn/groups-list}} is displayed. Chuck Entz (talk) 10:33, 22 April 2024 (UTC)[reply]

Ah, I figured it out. Thanks. It's at Template:mul-script/Latn/groups-list. kwami (talk) 10:45, 22 April 2024 (UTC)[reply]

Wrong warning by categorisation

It appears that gėlė is being put in categories:

because it contains [[:Category:lt:Flowers|Flowers in Lithuanian]]. The categories are for where sorting within other categories may go wrong, so isn't this categorisation wrong? I don't know which code needs correcting, and suspect I might not be able to edit anyway. --RichardW57m (talk) 14:01, 22 April 2024 (UTC)[reply]

@RichardW57m Yes, that's wrong, since it's just a regular link which should be ignored. I'll do a fix. Theknightwho (talk) 14:19, 22 April 2024 (UTC)[reply]

Fixed. The issue was in Module:headword/page, which parses categories on the page at line 676. It now ignores category links where a colon precedes "category". Theknightwho (talk) 14:31, 22 April 2024 (UTC)[reply]

@Theknightwho: Thank you for fixing it, and thank you for telling us how you fixed it. --15:37, 22 April 2024 (UTC) RichardW57m (talk) 15:37, 22 April 2024 (UTC)[reply]

And for the record, the corresponding Pali category has now emptied seemingly automatically overnight - just over 3 days to work through (almost?) all of Wiktionary. --RichardW57 (talk) 04:00, 26 April 2024 (UTC)[reply]

Template junk at human being

Template:tea room sense seems to be producing junk. Equinox ◑ 15:51, 22 April 2024 (UTC)[reply]

@JeffDoozan This is because of a bad parameter, but the warning message is corrupted; can you take a look? Benwing2 (talk) 22:02, 22 April 2024 (UTC)[reply]

This seems to be caused by @Theknightwho's addition of some very clever stuff to extract and format additional error details: diff. I don't completely understand what's going on with that, so I'm kicking this over to knight. JeffDoozan (talk) 22:39, 22 April 2024 (UTC)[reply]

@JeffDoozan @Benwing2 I'll take a look. The reason I made that change is because Scribunto error messages have standard wiki formatting applied to them (e.g. multiple spaces are compressed into one space etc.), which is a problem when you want to accurately display which argument is causing the problem: e.g. if a template contains {{{some arg}}}, it won't work if you accidentally put

|some  arg=

as there are two spaces. Module:checkparams (correctly) identifies this is a problem, but if you try to display that in a standard error message it'll get normalised to a single space, which is really confusing for the user since that looks identical to the correct input.

The normal solution to this would be to use <pre></pre> tags, but they don't work if you put them in a Scribunto error message. I also tried preprocessing the pre tags before throwing the error, but if you do that it simply displays the raw strip marker, which is even worse. Another alternative would be to display a manual error message (which can be formatted however we like), but that loses the benefit of things like automatic categorisation in CAT:E, traceback and so on (i.e. it's not considered a "real" error by the MediaWiki software).

To get the best of both worlds, the module (effectively) preprocesses {{#invoke:checkparams|placeholder_error}} using pcall, where placeholder_error simply throws an error with a placeholder string. This generates a real Scribunto error (i.e. it's automatically categorised in CAT:E, traceback works properly etc.), but because of the pcall the error block is caught and returned as a string to the main Scribunto instance. The placeholder can then be swapped out for the real message, which contains preprocessed <pre></pre> tags, and returned. Since the main module is simply returning a string, the strip markers for the pre tags expand into the desired output; it just so happens that output is a Scribunto error message.

At some point, I'll probably add a formatted_error function to Module:debug to handle this, since I expect Module:parameters (and a few other modules) would benefit as well. Theknightwho (talk) 14:16, 23 April 2024 (UTC)[reply]

Also, more specifically to the issue at hand, it seems to be some kind of weird interaction between the "catch my attention" tag and wiki list formatting. Theknightwho (talk) 14:27, 23 April 2024 (UTC)[reply]

Is there anything here we actually need to fix, per se, besides which parameters the entry itself is using (change them from the non-working parameters year= month= to the parameters the template uses, y= m=)...? (Or if it's not too hard, maybe just support year= month= as aliases of y= m=?) - -sche (discuss) 01:53, 26 April 2024 (UTC)[reply]

@-sche Yes, ideally the error message would not be garbled like it is. Benwing2 (talk) 01:56, 26 April 2024 (UTC)[reply]

I've not had time to look at this yet but will do today - hopefully it's a quick fix. Theknightwho (talk) 10:05, 26 April 2024 (UTC)[reply]

@Theknightwho When you have a chance, can you take a look? Benwing2 (talk) 02:15, 1 May 2024 (UTC)[reply]

Chiromantis

clearly Greek and not Italian, chir and mantis are both greek. 2A02:587:471B:F472:B659:C76E:8411:604 21:38, 22 April 2024 (UTC)[reply]

Not sure what this is about: Chiromantis is a Translingual entry with no etymology section. Perhaps they meant chiromante, where the etymology we give is a valid surface analysis but probably not the true original formation of the term. This, that and the other (talk) 07:59, 23 April 2024 (UTC)[reply]

I added an etymology. Not sure why brought to GP. DCDuring (talk) 19:40, 1 May 2024 (UTC)[reply]

Styling error with ئ

Hey all. I couldn't find another place to report this.

In the entry ئ (also عرب, among others), the character is displayed in a specifically Nastaliq font style, even though the entry is not solely for Urdu.

The character is used in Arabic, too, and therefore should not be marked with any style.

Marking font-style in the header is very excessive and unnecessary. Headers should remain consistent.

font-family: 'Noto Nastaliq Urdu', Tahoma, 'Arial Unicode MS', 'UT Cairo', 'UT Naskh', sans-serif;

font-family: 'Noto Naskh Arabic', 'Iranian Sans', 'Segoe UI', Tahoma, 'Microsoft Sans Serif', 'Arial Unicode MS', sans-serif;

The previous are the font specification for the header.

For those unfamiliar with the topic, w:Nastaliq is a specific rendering style commonly used for Urdu and Persian, but not other languages which use Arabic script.

Thanks. --Esperfulmo (talk) 13:19, 23 April 2024 (UTC)[reply]

@Esperfulmo Yes this is a known issue. It is a side effect of the current implementation of the code, which formats the title according to each language processed in turn. This is necessary in general to get the correct fonts for all sorts of different characters, but it has the weird side effect you've noticed when Urdu is the last language on the page (which is usually the case). I have proposed adding a special check which looks to see if there is more than one Arabic script language on a given page and if so disables the code mentioned above for Urdu. But I haven't gotten around to implementing it. Benwing2 (talk) 20:26, 23 April 2024 (UTC)[reply]

Sounds like a proper proposal. Let's wait and see. -Esperfulmo (talk) 23:30, 23 April 2024 (UTC)[reply]

Strange unwanted linking of Japanese transliterations

Somebody introduced linking of Japanese transliterations. I oppose it, even if it worked correctly. It happens with multipart terms

肌 (はだ, hada) - OK
肌の色 (はだのいろ, hada no iro) - OK but it's a sum of parts, split below
肌の色 (はだのいろ, hada no iro) - wrong, nothing should be linked. The current link is on はだのいろ, hada no iro

Anatoli T. ^{(обсудить}/^вклад) 21:51, 23 April 2024 (UTC)[reply]

@Theknightwho: Hi. It must be to do with your work on Module:ja. Please undo the linking. Anatoli T. ^{(обсудить}/^вклад) 21:58, 23 April 2024 (UTC)[reply]

@Atitarev It wasn't caused by those changes, but I'm not sure why this has happened. Theknightwho (talk) 22:02, 23 April 2024 (UTC)[reply]

@Theknightwho: Thanks for replying. Could you please try fixing it? I see it wasn't intentional but it worked until recently and Module:ja is the module that does it. Anatoli T. ^{(обсудить}/^вклад) 22:07, 23 April 2024 (UTC)[reply]

I can have a look, but not right this minute. I can see that it definitely wasn't caused by any of the recent changes to Module:ja, though, since it still happens if I preview old versions of it. Theknightwho (talk) 22:17, 23 April 2024 (UTC)[reply]

@Theknightwho: Thanks, it must be some other recent module change (not necessarily yours). Also calling @Benwing2 for help. Anatoli T. ^{(обсудить}/^вклад) 23:04, 23 April 2024 (UTC)[reply]

@Atitarev: I looked around but I can't see any recent changes that would have triggered this. Do you know when this happened approximately? Benwing2 (talk) 23:15, 23 April 2024 (UTC)[reply]

@Benwing2: I can't tell you exactly but it's rather recent. No more than a month ago. Anatoli T. ^{(обсудить}/^вклад) 23:19, 23 April 2024 (UTC)[reply]

Another case is with Roman letters: UNICEF (Yunisefu). Anatoli T. ^{(обсудить}/^вклад) 23:27, 23 April 2024 (UTC)[reply]

@Atitarev link_tr is set to true in Module:languages/data/2, which normally triggers Japanese transliteration linking. But this has been the case since this diff [7] in Aug 2023. User:Theknightwho will have to look into this more as I don't know the ins and outs of how Japanese transliteration is handled. Benwing2 (talk) 23:43, 23 April 2024 (UTC)[reply]

@Benwing2 @Atitarev Yeah, it definitely post-dates that change by quite a long time; that was added because it made it simpler to link transliterations in Japanese headwords. Theknightwho (talk) 23:55, 23 April 2024 (UTC)[reply]

@Benwing2, @Theknightwho: Thanks. Weird linking of SoP terms happened not so long ago, I would have noticed.

Calling (Notifying Eirikr, TAKASUGI Shinji, Fish bowl, Poketalker, Cnilep, Marlin Setia1, Huhu9001, 荒巻モロゾフ, 片割れ靴下, Onionbar, Shen233, Alves9, Cpt.Guapo, Sartma, Lugria, LittleWhole, Chuterix, Mcph2): : Hi. Would someone know who and why the change was made? Anatoli T. ^{(обсудить}/^вклад) 01:41, 24 April 2024 (UTC)[reply]

Fix script requirement in Chinese translation additions

Currently, when I try to add a Chinese translation to the box using the GUI, I cannot do that due to the script requirement not accepting Hans nor Hants because Hans is not in the list of valid scripts for Mandarin Chinese and Hants is not a valid script code. Thus, I propose to replace all occurrences of "Hants" in Module:languages/data/3/c with "Hans, Hant".

Also, it's weird how the edit notice under view source lacks the instructions in {{Edit protected}}. Aaron Liu (talk) 01:29, 25 April 2024 (UTC)[reply]

@Aaron Liu No, we need to fix the translation-adder. The whole point of Hants in the language data is to (a) avoid triplication (since there's Hani as well), and (b) means findBestScript can choose the most suitable one by treating them together. I also don't know why you're manually specifying the script anyway - it works fine if you just use the default. We don't want manual script codes when they aren't needed, and with Chinese they basically never are. Theknightwho (talk) 03:27, 25 April 2024 (UTC)[reply]

@Theknightwho It does not work fine if I use the default as Hants apparently isn't a valid script code, as I've said in the opening comment. Aaron Liu (talk) 12:57, 25 April 2024 (UTC)[reply]

@Aaron Liu Yes, I'm aware, but when I tested it it worked fine if I made no changes to the script code field; the failures happened only after I made any changes (even if it was resetting it back to the default). Does it still fail for you if you don't touch it at all? If so, what browser/device are you using? Theknightwho (talk) 13:00, 25 April 2024 (UTC)[reply]

No, I did not touch it at all. I'm not sure why this is happening, I'm using Firefox 115 with some userscripts you can find in my global.js, and either way it seems like a mistake to open up that field to new users without any instructions of "don't fill this out". When logged out, the scripts field defaults to Hani and seems to be able to preview.
Also, to my knowledge, Hani also means Chinese characters, and I don't really get your triplication comment. I feel like we should make new additions specify which one of Hans or Hant it actually is. Aaron Liu (talk) 13:06, 25 April 2024 (UTC)[reply]

Yes, we definitely need to fix the translation-adder to prevent this issue.

To explain the issue: until very recently, (almost) all Chinese languages had Hani, Hant and Hans listed as script codes, because we use Hant and Hans to display traditional/simplified (e.g. 馬／马 (mǎ)), and Hani when terms are the same in both (e.g. 人 (rén)). The problem with listing all three separately is that it causes a headache with automatic script detection, because that works by choosing the script that matches the most characters in the input text; for obvious reasons, Hani will always match at least as many characters as the other two (since it contains everything in Hant and Hans). To get around this, we had to have a bunch of dedicated code for Chinese, but it was quite slow because the script detection loop was still running 3 times (once for each of them) - this caused a noticeable impact on very large pages like 人. To get around this, I created the special code Hants, which is handled in a special way that allows all three script codes to be handled together for the purposes of script detection; it's meant to be completely internal, and for all other purposes it's supposed to be treated as though the three Han scripts had been listed separately. Evidently, that's not happening with the translation-adder, for some reason.

Also, on your final point, we don't want new users to be specifying anything as Hans, because links are always supposed to be traditional, with the simplified being generated automatically. As I've just explained, script detection is completely automatic for Chinese, so users manually specifying script codes mostly just gets in the way. (There are a tiny number of exceptions to this, of course.) Theknightwho (talk) 22:09, 25 April 2024 (UTC)[reply]

@Theknightwho The relevant file is MediaWiki:Gadget-TranslationAdder-Data.js and it lists Hani as the script for all Chinese varieties. Do you want me to change that everywhere to Hants? Benwing2 (talk) 22:49, 25 April 2024 (UTC)[reply]

However, I don't think my issue is related to Hani at all.
Perhaps the actual fix is to add Hants to Wiktionary:List of scripts in some way. Aaron Liu (talk) 23:25, 25 April 2024 (UTC)[reply]

@Benwing2 @Aaron Liu Hants is basically a shorthand in the data that has special handling in one situation. It’s not a real script code in any conventional sense.

I’ll have a look once I’m on my laptop in more detail. Theknightwho (talk) 09:16, 26 April 2024 (UTC)[reply]

Either that or add an exception for the script validation. Aaron Liu (talk) 13:12, 26 April 2024 (UTC)[reply]

I wonder if Wiktionary:Grease pit/2024/April#Translation_adder_langname_to_langcode_functionality is related, because I notice it happens similarly to what is described above: 1) sometimes, if you use "expected" input, the translation-adder works as expected, but if you input something with spaces ("Old French"), it loads the error message, and everything you type after that (even just "fro") will reset to the error message, but 2) sometimes it just doesn't work from the start. I'm not sure how recently that breakage happened, if one or more things changed in such a way as to make aspects of the translation-adder not work. - -sche (discuss) 22:53, 25 April 2024 (UTC)[reply]

@-sche If you can give me a reproducible test case, I will look into it and see if I can fix it. (No promises, I am not very good with Javascript.) Benwing2 (talk) 22:56, 25 April 2024 (UTC)[reply]

@Aaron Liu, Theknightwho, Benwing2, -sche Can someone tell me if this has been fixed? I currently cannot add any translations into Mandarin. Could we just go back to how it was before? This is a real headache. Thanks. ---> Tooironic (talk) 22:18, 27 April 2024 (UTC)[reply]
@Tooironic Not yet - the intention is for it to be as it was before. Theknightwho (talk) 22:21, 27 April 2024 (UTC)[reply]
I don't mean to sound rude, but who broke it? Can they revert it, please? ---> Tooironic (talk) 22:25, 27 April 2024 (UTC)[reply]
@Tooironic I can't replicate the issue - when I try to use an invalid script for Chinese, it says " Available script codes for this language are Hani, Hant, Hans, Latn, Bopo". Does it still happen if you do a null-edit to the page? Theknightwho (talk) 22:37, 27 April 2024 (UTC)[reply]
Yes. ---> Tooironic (talk) 22:44, 27 April 2024 (UTC)[reply]
@Tooironic Can you give me an example page and the term? It seems to work for me on Chrome and Firefox. Theknightwho (talk) 22:46, 27 April 2024 (UTC)[reply]

@Tooironic can you tell me exactly what you did to get the error? I will try on my end. Benwing2 (talk) 22:47, 27 April 2024 (UTC)[reply]
@Theknightwho I can reproduce it. If I go to "food" and type zh into the code box and "foo" into the translation box, I get an error "please use a valid script code (e.g. fa-Arab, ...)" with a link to WT:List of scripts. The default script shows up as Hants. If I change it to Hani, I get a slightly different error listing the valid script codes, which include Hants but not Hani. I think the problem is something in the translation adder is validating the script code against the list visible in WT:List of scripts, which doesn't include Hants. Benwing2 (talk) 22:50, 27 April 2024 (UTC)[reply]
@Benwing2 I copied exactly that and it worked fine. If I try using "Cyrl" with "cmn" it gives me "Available script codes for this language are Hani, Hant, Hans, Latn, Bopo".

I've noticed the translation-adder seems to remember the previous settings, so I wonder if that's corrupted something somewhere for a few people. Theknightwho (talk) 22:55, 27 April 2024 (UTC)[reply]
But Hants tells you it's an invalid script code, yes? Shouldn't we just add it to the list? I don't see why "it's internal" is that important. Aaron Liu (talk) 22:59, 27 April 2024 (UTC)[reply]
@Aaron Liu Nothing in the code is currently exposing Hants to the translation-adder - this seems to be a cache problem. The reason why "it's internal" is important is because it doesn't actually exist, and adding it to the list of scripts is just going to cause massive synchronisation issues between anything that is (correctly) handling it as 3 separate scripts, and anything that is (wrongly) treating it as a fourth, separate script. It's just a shorthand in the data, which has special handling in one particular situation which is completely unrelated to this. Theknightwho (talk) 23:01, 27 April 2024 (UTC)[reply]

For me the message is "Available script codes for this language are Hants, Latn, Bopo". Can you try it logged out? Benwing2 (talk) Benwing2 (talk) 23:03, 27 April 2024 (UTC)[reply]
@Benwing2 Still works fine, with Chrome and Firefox. Theknightwho (talk) 23:07, 27 April 2024 (UTC)[reply]

Yeah it works for me under Safari but not under Chrome. Dunno. Benwing2 (talk) 23:07, 27 April 2024 (UTC)[reply]
@Benwing2 I had a look at where the translation-adder gets the script data, and it comes from MediaWiki:Gadget-LanguageUtils.js, which in turn generates it from {{#invoke:languages/javascript-interface|AllLangcodeToScripts}}. That module gets its data from Module:languages/data/all, so this has to be an artefact from before that module was updated to split Hants into Hani, Hant, Hans, which was 4 days after Hants first got added to the data. Theknightwho (talk) 23:13, 27 April 2024 (UTC)[reply]

@Theknightwho I was able to get it working by clearing "cookies and other site data" over the last hour from Chrome, so it is in fact a caching issue of some sort. Benwing2 (talk) 23:14, 27 April 2024 (UTC)[reply]

I get that thing only when logged in and when I attempt to input a valid script code. Hants also doesn't work for aforementioned reasons. I've cleared my cache. Aaron Liu (talk) 23:28, 27 April 2024 (UTC)[reply]
@Aaron Liu Sorry, you get what when logged in? Benwing2 (talk) 23:34, 27 April 2024 (UTC)[reply]
What blocks me from adding a translation. When logged out, I can preview it, though the field is incorrectly pre-filled with "Hani" despite how my characters are simplified (魔's forms have a subtle difference) and, of course, changing the field presents an error again.
Also, I think that we should also fix what happens after you make changes and back again in the script field. That is still confusing, and maybe fixing that will also fix this. Aaron Liu (talk) 23:37, 27 April 2024 (UTC)[reply]
@Aaron Liu How did you clear your cache? Under which browser? Maybe you didn't do what I did? Also (a) you shouldn't have to change the script code, (b) I don't know "what happens after you make changes and back again in the script field", you have to be more specific. Benwing2 (talk) 23:44, 27 April 2024 (UTC)[reply]
I basically did the same steps for adding the translation as you did. You may read the first four comments in this thread to have my message self-clarify. Aaron Liu (talk) 00:40, 28 April 2024 (UTC)[reply]
@Aaron Liu You didn't answer my question about how you cleared your cache. Can you try under Chrome? Benwing2 (talk) 01:32, 28 April 2024 (UTC)[reply]
I cleared my cache for the website by holding shift and reloading. Under Edge the problem does seem to disappear. Aaron Liu (talk) 02:27, 28 April 2024 (UTC)[reply]
@Aaron Liu That may or may not have worked in Firefox; I had to go into the settings for Chrome and explicitly clear "cookies and other site data" for the last hour. Benwing2 (talk) 02:31, 28 April 2024 (UTC)[reply]

@Aaron Liu The field will automatically say Hani, but that doesn't have any bearing on automatic script detection. However, you shouldn't be adding simplified characters (and we don't add duplicates when the codepoints are the same - that difference is handled by the entries themselves). Theknightwho (talk) 23:51, 27 April 2024 (UTC)[reply]
1. Shouldn't it say "Hants" or be deactivated? (Not sure what to do with dialects that have alternative writing systems.) Saying "Hani" is just inviting any knowledgeable person to change it and encounter the error, though this wasn't the avenue where I encountered the error.
2. Why shouldn't I be adding simplified characters and why do you expect any stumbling reader to only add traditional translations? Doesn't the template auto-convert them anyway?
3. There are no existing entries where I tried to add this. Aaron Liu (talk) 00:44, 28 April 2024 (UTC)[reply]
@Aaron Liu As I have now said many times, Hants is purely internal, and is not something that you should ever need to be concerned with when adding information to entries. The only reason it ever became an issue here is because something wasn't synchronised properly - had that not happened, it's likely you would never have encountered it.

Why shouldn't I be adding simplified characters and why do you expect any stumbling reader to only add traditional translations? Because it's the longstanding consensus of the Chinese editing community. Simplified forms are generated automatically when you add traditional links, but not the other way around. You do not need to worry about manually specifying the script code, either - please just take my word for that. Theknightwho (talk) 01:33, 28 April 2024 (UTC)[reply]
Why shouldn't traditional links be automatically generated by simplified characters? Thanks to this conversation, I now know the many caveats of translations, but anyone new to Wiktionary wouldn't. Is this translation adder really something we should be displaying to everyone on Earth without listing out all the caveats? Aaron Liu (talk) 02:32, 28 April 2024 (UTC)[reply]
AFAIK several traditional characters map to the same simplified character so there's no automated way of converting simplified back to traditional. Benwing2 (talk) 02:40, 28 April 2024 (UTC)[reply]
Oh yeah, the meaning thing, right? Forgot about those. That's indeed a good reason to not offer conversion in the template, but still, something like this needs to be communicated, otherwise IMO there shouldn't be such a public interface. Aaron Liu (talk) 02:43, 28 April 2024 (UTC)[reply]
It is working now that I've cleared my browser cache. Thank you everyone for resolving this issue. ---> Tooironic (talk) 21:42, 28 April 2024 (UTC)[reply]
I've decided to stop being so smart and use the button to clear everything, and it also works now. Still, I think that there is a larger issue of caveats to adding not being specified. Aaron Liu (talk) 22:20, 28 April 2024 (UTC)[reply]
@Aaron Liu It's a symptom of a bigger issue that the translation-adder dates from a time when a lot more of this information had to be provided manually (script codes, transliterations etc), and the move to automation has been a gradual one over many years. It probably needs a bit of an overhaul, as it's a perennial problem that new users (especially IPs) often supply manual data that they don't need to. At best it's redundant, which wastes their time (and might cause issues later if we change the automation), and in many cases it's wrong because they've used a different transliteration system or whatever. Sometimes automation needs to be overridden (e.g. 蘋／苹 (píng) vs 蘋／𬞟 (pín)), but we should probably make it clearer which languages need manual input every time and which don't. Theknightwho (talk) 13:44, 29 April 2024 (UTC)[reply]
In the short term, we could probably set the irrelevant fields (like script code) to be hidden for cmn, the way the script code field is currently hidden for e.g. ru (you have to click "more" to see it, and to see e.g. the "qualifier" field). Which fields are hidden, and which fields are present at all, varies by language, so seems to be customizable by language. - -sche (discuss) 03:12, 1 May 2024 (UTC)[reply]
Agreed; this is a good idea. Benwing2 (talk) 03:15, 1 May 2024 (UTC)[reply]
Weirdly, though the script code field was hidden for ru when I wrote that, it now displays for ru, so sub enm or frm into my comment above (the script code field does not display for those... though it does display for fro)... that is about what I would expect from that gadget, which is so useful but so finicky and complex that it is like a living organism with a mind of its own. Some years ago, when people wanted me to help port it to another wiki because I had worked on it, I was struck by how much it resembled Palmerston's famous quip about the Schleswig-Holstein Question: only a few people have ever fully understood it, and one is dead, (another is inactive,) one is a German who went mad (and got WMF-permabanned), and I have unfortunately forgotten how it worked. If the translation-adder does not add an explicit script code at all—e.g., in this diff, neither "Hani" not any other script code is anywhere to be seen—we could perhaps simply make the script-code field not be present at all in the first place (for cmn)... I will see if I can refresh myself on how to do that later, if no-one beats me to it... - -sche (discuss) 22:46, 2 May 2024 (UTC)[reply]

Image upload requests

(continued from Wiktionary:Beer_parlour/2024/April#Modify/deprecate_NFCC_or_request_re-enabling_Special:Upload_for_all_users?) Most of these are for words which specifically designate a particular image (i.e., memes), thus making a non-free image practically mandatory to properly define the term.

amogus: add an image of a crewmate (e.g. [8]) and (possibly) the original amogus edit [9]
trollface: add an image of it [10]
The Dress: add an image of it [11]
Gigachad: add an image of the original [12]
Wojak: add an image of it [13]
galaxy-brain: add (what I think is) the original [14]
shocked Pikachu: add the original [15]
Derpina: add an image of it [16]
rage comic: add the original [17]
soyjak: add the prototypical image of it [18]
doge: add the original [19]

Pinging @This, that and the other, Koavf. Ioaxxere (talk) 06:31, 25 April 2024 (UTC)[reply]

I appreciate that having the non-free media would enhance these entries, but I think that linking to Wikipedia articles or external images is sufficient. The form of link on Gigachad is <en>very</en> ugly to me, but I don't know that it's something I feel like I need to push back on. If others think we need to start having uploads for memes, then I also won't try to stand in the way, but I would definitely prefer if we didn't. —Justin (koavf)❤T☮C☺M☯ 06:38, 25 April 2024 (UTC)[reply]

Hah, I didn't expect a test case to come so soon. I think the best process is for someone (maybe me, tomorrow) to upload and add these images, then if there are specific issues with any of the files, further debate can occur at RFDO.

I'd also make the following observations. Since our NFCC is based very closely on Wikipedia's, those images which are already uploaded on WP would comply with our NFCC too. Many of the images are highly, even centrally, relevant to the respective entries (how do you define trollface without showing the precise image that is the referent of the word?). If the images were free there is no question that they would already be in the entries; the only reason they're not is because they're nonfree and nobody with the necessary rights thought to upload them. This, that and the other (talk) 13:25, 25 April 2024 (UTC)[reply]

Thank you very much! Ioaxxere (talk) 15:14, 25 April 2024 (UTC)[reply]

I've so far uploaded the Wojak and trollface images. I had to create {{file metadata}} to allow the MultimediaViewer tool to pick up the relevant items of metadata and remove the images from Category:Files with no machine-readable license etc. I decided to keep this separate from the actual human-readable description on the file information page, unlike what most wikis do. This gives us more freedom to describe the file as we like, rather than just making it a form-filling exercise. I will continue to upload the other images as time permits.

@Ioaxxere could you please determine the author of the non-Wikipedia images? WT:NFCC requires us to acknowledge the source, and I think it is a bit lazy to just paste in the URL without crediting the image's actual author/creator, where known. If the author is unknown we can simply say so. This, that and the other (talk) 03:43, 28 April 2024 (UTC)[reply]

@This, that and the other:

[20] By the artists at Innersloth.
[21] By StoneToss, edited by Redditor u/Lewdvik.
[22] By Krista Sudmalis via the Instagram account https://www.instagram.com/berlin.1969/.
[23] Unknown.
[24] By the Pokémon artists.
[25] Unknown.
[26] Unknown.

Ioaxxere (talk) 04:08, 28 April 2024 (UTC)[reply]

@Ioaxxere Thanks. I've now also done doge, The Dress, shocked Pikachu and soyjak, and I'll do the rest soon.

Can you convince me why we should have a non-free image at Derpina? The referent of the word does not seem to be this image or anything related to it. It seems like it can just refer to any derp-y female. I'd have trouble writing a fair use rationale for that image. This, that and the other (talk) 04:51, 28 April 2024 (UTC)[reply]

@This, that and the other: Derpina is a stock character in rage comics: you can see examples at https://knowyourmeme.com/memes/derpina#notable-examples. But it's admittedly not as important of an image as some of the other ones on this list. Ioaxxere (talk) 05:02, 28 April 2024 (UTC)[reply]

af

At dunzo the etymology is categorising the term as being suffixed with -s and suffixed with -o. Is there a mechanism to allow only the last argument to categorise as suffix ? Leasnam (talk) 16:51, 26 April 2024 (UTC)[reply]

@Leasnam: but if, according to the currently etymology, the word is formed by the addition of -s and -o, why isn't it also suffixed with -s? (In any case, is the etymology correct?) — Sgconlaw (talk) 21:42, 26 April 2024 (UTC)[reply]

I imagine the formation was done + -'s (cf. done's-ville, splitsville, hell's no, etc.) + -o. So it presumes first that there is a suffix added to done, and then that is suffixed with -o. Even if this etymology is challenged, the fact that the entire term is only suffixed with -o and not both -s and -o is the root of the concern. Leasnam (talk) 00:19, 27 April 2024 (UTC)[reply]

So, translations subpages...

Now that the Lua memory limit has been raised, and (IIRC) better garbage collection has been rolled out, do we still need 100+ translations subpages, or could at least some of them be merged back into the entries, e.g. Angola/translations which is nominally just 3,416 bytes? Is there another limit we would hit, at least in bigger cases like water/translations (118k, with the rest of water 47k)? I do notice that e.g. WT:LOL, mentioned in a section above, takes a long time to load (roughly as long, despite being nominally just 3,484 bytes, as WT:RFM which is 938,565 bytes), even though it does eventually load.
How about Category:Derivative subpages? Could at least smaller ones (父/derived terms 4,700 bytes) safely move back into the main entries? - -sche (discuss) 17:42, 26 April 2024 (UTC)[reply]

@-sche: now it's time. Given the innumerable ways that things can be slowed done on the back end, the same entry can drift in and out (usually it requires a null edit) of CAT:E several times a day. Right now, a is the poster child for this. Chuck Entz (talk) 18:47, 26 April 2024 (UTC)[reply]

The only translation subpages I'd be concerned about moving back are water/translations (~3,600), and maybe woman/translations (~1,200) and man/translations (~800). The rest should be okay. None of the entries which are pushing the time limit have many (if any) translations, so are unlikely to be affected much by them. Theknightwho (talk) 19:40, 26 April 2024 (UTC)[reply]

I tried adding the translation table of water back, and there's no obvious Lua errors. CitationsFreak (talk) 06:22, 27 April 2024 (UTC)[reply]

"lb" template glitch

{{lb|en|intransitive|of a|musical instrument}} produces (intransitive, of a music). Mihia (talk) 21:18, 26 April 2024 (UTC)[reply]

@Mihia: that's not a glitch. It enables editors to categorize entries into "Category:en:Musical instruments". However, "music instrument" is an inappropriate label, so "music" is displayed instead. — Sgconlaw (talk) 21:38, 26 April 2024 (UTC)[reply]

Are you sure you read it correctly? Mihia (talk) 22:09, 26 April 2024 (UTC)[reply]

@Mihia Yes, Sgconlaw read it correctly. Many instruments have the label {{lb|en|musical instrument}}, which displays "music" but categorises in the musical instruments category (which would be inappropriate for your needs, by the looks of things). What you want to do is something like {{lb|en|intransitive|of a [[musical instrument]]}}. Theknightwho (talk) 22:21, 26 April 2024 (UTC)[reply]

Thanks, I didn't create this myself actually, but came across it at play, where whoever wrote it obviously believed that {{lb|en|intransitive|of a|musical instrument}) would create "of a musical instrument", or at least "of a musical instrument", and I have to say that I agree -- if that template usage is accepted at all, then, yes, that is what one would expect it to do. Mihia (talk) 22:45, 26 April 2024 (UTC)[reply]

It's a balancing act. It's very useful to have labels like "tincture" that display "heraldry" but categorize into the "heraldic tinctures" subcategory instead of the top-level "heraldry" category, or "Greek god" which displays "Greek mythology" but categorizes into the deities subcategory, because when editors don't have recourse to such shortcuts, then either things get labelled (and categorized) with the broad label and then manually double-categorized into the subcategory, which is not good, or editors resort to removing the broad {{label}}s (that have the correct display but unwantedly lump everything into just the general top-level category) and adding manual (''fake labels''), as discussed here and here (where we still probably need to add a "chemical isomer" label iff we want to categorize them), which is also not good. OTOH, in the rare case where someone wants to display "of a tincture" or "of a Greek god", they do run into this issue if they try to write "of a | musical instrument" with a vertical bar, instead of just "of a musical instrument". That is a con, but I'm not sure it outweighs the pros, the more numerous places where we want "musical instrument", "Greek god", etc to do exactly what they do at present, especially since the fix is relatively simple (remove the bar). I think Benwing mentioned adding the capacity for labels to display differently depending on their context / preceding labels, so perhaps "musical instrument", "Greek god" etc could be made to display as-written if preceded by "of a" (I'd want us to check for instances first, to confirm that we wouldn't be introducing a new problem, though), or perhaps we could just periodically monitor a database dump for such things. - -sche (discuss) 23:23, 26 April 2024 (UTC)[reply]

OK, thanks, I see that this is more complicated than I imagined. Originally I imagined that it would be a simple bug whereby the template was matching "musical instrument" to a known keyword "music" on the first five characters only, and erroneously ignoring the remainder, or something like that. Mihia (talk) 23:59, 26 April 2024 (UTC)[reply]

@-sche @Mihia I recently added the ability to override canonicalization of labels while keeping the categorization, by putting ! before the label (which I just fixed a bug in). So if you wrote {{lb|en|intransitive|of a|!musical instrument}}, it would display (intransitive, of a musical instrument) instead of (intransitive, of a music), and still categorizes. However, in this case I agree with User:Theknightwho and User:-sche that categorization would be wrong and it should not separate of a from musical instrument. Benwing2 (talk) 00:53, 27 April 2024 (UTC)[reply]

Clogging of borrowings/derivations categories with artificial transliterations

The Category:Hebrew terms borrowed from English contains for instance transliterations like קונלי, which is not an actual loanword yet clogs the category in question using the Template:name translit. What do you people think of this? On the talk page it seems that one user, @Fish bowl, decided this on their own. Shoshin000 (talk) 20:09, 27 April 2024 (UTC)[reply]

@Shoshin000 How did User:Fish bowl "decide" this? It is happening because of the {{bor}} template at the top, which wasn't added by them (they don't have any edits on the page). Benwing2 (talk) 22:53, 27 April 2024 (UTC)[reply]

I made the edit to the module here; however, it is precisely because people already use {{name translit}} along with {{bor}} that I decided it would be harmless. "Clogging" of the category is a bigger-picture question (what about placenames? what about names that do not use {{name translit}}? etc.). —Fish bowl (talk) 23:28, 27 April 2024 (UTC)[reply]

My edit has been reverted, but even without the bor template it still categorizes into the borrowing category. Shoshin000 (talk) 07:55, 28 April 2024 (UTC)[reply]

solder noun, see Greek Wikipedia: καλάι neuter (συγκολλώ is a verb)

. 2A02:587:4F0F:B600:E41E:76E:8149:289A 17:38, 28 April 2024 (UTC)[reply]

Ukrainian form-of 1928–1933 spelling template

From 1928 to 1933, the Ukrainian language had an orthography called "Kharkiv orthography" or "Skrypnykivka", and many different words had different spellings then, which differ from the current, and such forms already have pages on Wiktionary (for example Марію́піль (Marijúpilʹ, “Mariupol”), Евро́па (Evrópa, “Europe”), евфорі́я (evforíja, “euphoria”), альфабе́т (alʹfabét, “alphabet”), and others) without any proper labels. Given that, I would like to create a form-of template that would label the 1928–1933 spellings, but I am not sure about its significance on Wiktionary. If, after all, such a template has significance and the right to be on this site, I would like to clarify what name this template should have, since I have not yet had experience in creating templates and I do not want to create irrelevant or incorrect templates. Thank you. Rayreat (talk) 18:50, 28 April 2024 (UTC)[reply]

Do it 🔥. I morally support you. Invent something and we tell whether it is kooky or okay. I suggest taking the functionality of {{de-superseded spelling of}} as a model. Fay Freak (talk) 18:58, 28 April 2024 (UTC)[reply]

Thank you for the answer, I created the template: {{uk-1928 superseded spelling of}}, but I cannot figure out how to add translation (t=) support into {{m}} there. I thought it would work automatically, but it does not, as well as support of adding gender (g=) and other things. Also, when using the template, it creates an extra empty line, which, in fact, should not be there at all. I would like to ask you to see the code of my template and make edits to fix and improve it if you can. Thanks again. Rayreat (talk) 07:21, 29 April 2024 (UTC)[reply]

I suspect there may be an elegant answer using the form_of family of modules, which @Benwing2 can teach.

However, the obvious answer not using Lua would be that you have to tediously copy the parameters down, as can be seen in {{pi-link}}, though in your case it will be more direct, with your template passing them on directly to {{m}}. (The need for {{pi-link}} arises because some Pali writing systems represent the language less accurately than others and there are different spelling conventions.) I'm surprised {{pi-link}} didn't qualify for the attention of Module:checkparams. Your template probably does, as I can foresee people trying to pass new parameters to it.--RichardW57m (talk) 09:49, 29 April 2024 (UTC)[reply]

Done @Rayreat: I use Module:form of through a call of function form_of_t in Module:form_of/templates. --RichardW57m (talk) 11:27, 29 April 2024 (UTC)[reply]

@Rayreat @RichardW57m This is the right way to do it but I fixed it up a bit to use some invocation parameters of the function call. I also removed the final period since most non-English definitions are gloss-form don't have a final period. Benwing2 (talk) 02:13, 1 May 2024 (UTC)[reply]

Integration of USDA National Agricultural Library Thesaurus Concept Space

Is it feasible to integrate the public domain USDA National Agricultural Library Thesaurus Concept Space into wikitionary?

For example drip loss? GobsPint (talk) 00:15, 30 April 2024 (UTC)[reply]

I only looked at the taxonomic entries. It would generate yet another list of redlinks of taxa, of interest mostly to US residents. The downloads do not contain definitions and only sometimes contain English vernacular names. Importantly, the data for a taxon does not contain the name of higher taxa that include the taxon n question. Our basic definition template requires this. The greatest volume of data is taxonomic synonyms, which I don't think are a priority for inclusion in our taxon entries. IOW, it would be the basis only for stub taxonomic name entries and require a supplemental information even for that.

OTOH, the drip loss definition seems to be good enough for us, though it might benefit from shortening. Unfortunately most of the terms in my convenience sample seem not to have any definition, mostly because they are SoP and don't need one. Extracting the terms with definitions would probably yield the basis for acceptable English noun entries. DCDuring (talk) 17:57, 30 April 2024 (UTC)[reply]

Thanks for the feedback. I must've stumbled across one of the few complete entries.GobsPint (talk) 18:03, 30 April 2024 (UTC)[reply]

Decent specialized definitions are hard to find. I'll try to get a count of the definitions. If there are just a few, I'll add them manually. If there are 'too' many, I'll ask for help. DCDuring (talk) 21:11, 30 April 2024 (UTC)[reply]