Wiktionary:Grease pit

From Wiktionary, the free dictionary
(Redirected from Wiktionary:GP)
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Grease pit

A grease pit

Welcome to the Grease pit!

This is an area to complement the Beer parlour and Tea room. Its purpose is specifically for discussing the future development of the English Wiktionary, both as a dictionary and thesaurus and as a website.

The Grease pit is a place to discuss technical issues such as templates, Lua modules, CSS, JavaScript, the MediaWiki software, extensions to it, Toolforge, etc. It is also the second-best place, after the Beer parlor, to think in non-technical ways about how to make the best, free, open online dictionary of “all words in all languages”.

Others have understood this page to explain the “how” of things, while the Beer parlour addresses the “why”.

Permanent notice

  • Tips and tricks about customization or personalization of CSS and JS files are listed at WT:CUSTOM.
  • Other tips and tricks are at WT:TAT.
  • Find information and helpful links about modules, Lua in general, and the Scribunto extension at WT:LUA.
  • Everyone is encouraged to expand both pages, or to come up with more such stuff. Other known pages with “tips-n-tricks” are to be listed here as well.

Grease pit archives edit
2024

2023
Earlier years

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007
2006


February 2024

Bold marking in Lydian quotations (e.g. 𐤨𐤷𐤦𐤣𐤠𐤷)[edit]

Help! I added the first Lydian quotation (at 𐤨𐤷𐤦𐤣𐤠𐤷), but the bold-marking in the original quote seems to break. Currently, the correct word is displayed bold-marked, but the triple-apostrophe-marks are actually around the word next to it. Also, with different configurations of the bold-marking, the marking in the auto-transcription does not match that of the quote in the original script. Maybe it has something to do with Lydian being written right-to-left? I'm not that tech-savvy, so could someone help? AntiquatedMan (talk) 12:07, 2 February 2024 (UTC)[reply]

So correct me if I'm misunderstanding: the transliterated word in bold is "kλidaλ". Is that incorrect? If so, which word should be bold? —Justin (koavf)TCM 16:43, 2 February 2024 (UTC)[reply]
The word "kλidaλ" is the one that needs to be bold, yes. But when editing, I have to place the triple-apostrophe-marks in a weird spot to get the bold-marking to display correctly. When I simply place them before and after the word in the quote, it marks the entire quote bold, except for kλidaλ.
Might also just be a display error on my end, idk, I don't have experience dealing with right-to-left scripts. I mostly work with Ancient Greek. AntiquatedMan (talk) 18:59, 2 February 2024 (UTC)[reply]
Since it seems like everything is displaying correctly, I don't know that there needs to be a solution, but yes, dealing with mixed LTR and RTL script is a huge pain. Another option is to include spans but it seems unnecessary if it's rendering okay. Maybe I'm missing something, so let me know if I'm just ignorant. —Justin (koavf)TCM 19:09, 2 February 2024 (UTC)[reply]
@AntiquatedMan: What I usually do is create the surrounding formatting first, put the cursor in the right place, then enter the RTL text. The problem is that when LTR and RTL text are together, you may not really be typing where you think you are- you may be before the RTL section or after it.
In this case you would need to deal with the RTL text before the bolding, inside the bolding, and after it, so you might have to try a couple of ways. Here's one way:
  • First: enter the unformatted RTL text, and separately enter the formatting (I put an X to show where the cursor will go):
𐤨𐤷𐤦𐤣𐤠𐤷 𐤨𐤬𐤱𐤰𐤷𐤨 𐤲𐤦𐤭𐤠𐤷 𐤲𐤤𐤩𐤷𐤨 𐤡𐤦𐤩𐤷 𐤥𐤹𐤡𐤠𐤲𐤶𐤫𐤯
'''X'''
  • Then copypaste the word to be bolded inside the formating:
𐤨𐤷𐤦𐤣𐤠𐤷 𐤨𐤬𐤱𐤰𐤷𐤨 𐤲𐤦𐤭𐤠𐤷 𐤲𐤤𐤩𐤷𐤨 𐤡𐤦𐤩𐤷 𐤥𐤹𐤡𐤠𐤲𐤶𐤫𐤯
'''𐤨𐤷𐤦𐤣𐤠𐤷'''
  • Then select the unformatted word and copypaste the entire block of word and formatting on top of it:
'''𐤨𐤷𐤦𐤣𐤠𐤷''' 𐤨𐤬𐤱𐤰𐤷𐤨 𐤲𐤦𐤭𐤠𐤷 𐤲𐤤𐤩𐤷𐤨 𐤡𐤦𐤩𐤷 𐤥𐤹𐤡𐤠𐤲𐤶𐤫𐤯
  • Here's the result:
𐤨𐤷𐤦𐤣𐤠𐤷 𐤨𐤬𐤱𐤰𐤷𐤨 𐤲𐤦𐤭𐤠𐤷 𐤲𐤤𐤩𐤷𐤨 𐤡𐤦𐤩𐤷 𐤥𐤹𐤡𐤠𐤲𐤶𐤫𐤯
Chuck Entz (talk) 19:57, 2 February 2024 (UTC)[reply]

Formatting problem when |1= ends with ' in {{sic}}[edit]

See Template:sic/documentation for an example plus a workaround. I tried adding into the template itself but it didn't have any effect, possibly because it's only interpreted while processing {{sic}} and not actually returned as part of the result. JeffDoozan (talk) 17:05, 2 February 2024 (UTC)[reply]

Can you just use {{'}} instead of inserting the character <'>? —Justin (koavf)TCM 19:10, 2 February 2024 (UTC)[reply]
Related: as I've used ' lately because quote templates now break if I use e.g. &apos; to escape a quotation-apostrophe that's around a bolded word, I've noticed that because {{'}} was (apparently) originally intended to be used in the case of italicized possessives, it adds padding, which looks unaesthetic in other cases; if we're going to functionally require people to use {{'}} for more than just italicized possessives, we should remove the padding (or cleave "I just want an apostrophe" and "I want an apostrophe with padding" into separate templates). - -sche (discuss) 22:50, 4 February 2024 (UTC)[reply]
@-sche It looks like User:JeffDoozan fixed the issue with {{sic}}. Can you clarify what you mean by "quote templates now break if I use e.g. ' to escape a quotation-apostrophe that's around a bolded word"? Did this use to work? Benwing2 (talk) 03:58, 7 February 2024 (UTC)[reply]
@Benwing2: For some reason the html entity in @-sche's post was being converted in spite of being wrapped in nowikis. I think I fixed it to display the way they meant it to. Chuck Entz (talk) 04:38, 7 February 2024 (UTC)[reply]
Thanks, Chuck.
I mean that because HTML entities now get treated as being the actual character by the template when it comes to processing how they translate into bolding or italics, & apos ; -escaped single quote marks near italicized or bolded text now result in the quote displaying with bolding and italics in the wrong places, which means people have to use {{'}} rather than the HTML entity if they need to escape an apostrophe/single quote mark . . . but because {{'}} was AFAICT designed for a slightly different usecase — not merely replacing a single quote mark or apostrophe, because that used to be possible by just using & apos ;, but specifically for use between an italicized word and a non-italicized possessive s — {{'}} currently adds not only an apostrophe but also padding, which looks unaesthetic in a case where the whole possessive is italicized or the apostrophe is not possessive but quotative, like:
horse's doovers (where the {{'}} adds an awkward space between the horse and the possessive); or This they called 'oob', here mistranscribed as 'wob'., compared to the HTML-entity version
horse's doovers; or This they called 'oob', here mistranscribed as 'wob'.
(and the quote-template version, where HTML entities are converted so the text is bolded instead of quoted:)
  • 1605, Example, page 5:
    This they called oob, here mistranscribed as wob.
Hence, I'm wondering if we should have a version of {{'}} that is just an apostrophe without padding. - -sche (discuss) 15:27, 11 February 2024 (UTC)[reply]
@-sche: This seems related to Wiktionary:Grease pit/2023/June § Issue with bold apostrophes in quotations, which has not been fixed yet. J3133 (talk) 16:13, 11 February 2024 (UTC)[reply]

Past participle in Middle and Old Dutch verb inflection templates[edit]

Unfortunately, it looks like Middle and Old Dutch verb inflection templates do not allow the past participle to start with any other prefix that ge-, while there are definitely a lot of verbs that would have a past participle starting with ver-, be- or far-, to name a few. While I am not capable of working with Lua, I'm sure someone else is, who can help to fix this problem. Preupellor (talk) 17:24, 2 February 2024 (UTC)[reply]

@Preupellor I fixed this for Old Dutch and Middle Dutch (and also Old High German; Old Saxon still needs work). These modules formerly had a |pastpart= param to override the past participle but I removed it and made it so that |4= needs to specify the full past participle (including the ending -an or -on). I fixed up all callers to follow this. Benwing2 (talk) 05:21, 4 February 2024 (UTC)[reply]
Ah, I see. I was using the template that didn't have this problem fixed (Template:dum-conj-st). The problem with the one that is fixed, however, is that vowels like 'ēe' and 'â' aren't automatically doubled to 'ēe' and 'âe' when in a closed syllable, but I guess I should be able to fix that myself. Preupellor (talk) 07:51, 4 February 2024 (UTC)[reply]
@Preupellor I have fixed {{dum-conj-st}} as well; I didn't realize there were two different Middle Dutch conjugation modules in various states of completion. Benwing2 (talk) 01:35, 7 February 2024 (UTC)[reply]

Way for {{en-noun}} to handle nouns that can be both pluralia tanta and countable singular[edit]

Currently, {{en-noun}} can handle noun lemmas that're always countable, that're always uncountable singular, that're used either way with about equal frequency, that're usually countable but occasionally uncountable singular, that're usually uncountable singular but occasionally countable (with a rare-but-attested plural(s)), that're always pluralia tanta, and that're usually pluralia tanta but occasionally countable plural (with a rare-but-attested singular(s)). It doesn't currently have a way, however, to handle nouns that're usually pluralia tanta but which're occasionally used both as countable plurals (thus having a rare-but-attested singular form) and as countable singulars (thus also having a rare-but-attested plural form). For instance, labia is usually a plurale tantum, but is occasionally used as a countable plural (with the rare singular form labium) and occasionally used as a countable singular (with the rare plural forms labiae and labias); as {{en-noun}} doesn't support marking a noun as usually a plurale tantum but also having both rare singular and rare plural forms, only the rare singular can be included in the entry's headline alongside the plurale tantum, with mention of the rare plurals having to be relegated to the usage notes. Could someone please update {{en-noun}} to accommodate cases like this? Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 22:39, 2 February 2024 (UTC)[reply]

@Whoop whoop pull up I just updated {{en-noun}} yesterday in fact to be able to handle pluralia tantum (formerly you had to use a separate template {{en-plural noun}}). Can you give me some proposed template syntax as to how the template call for a term like labia should be formatted? Benwing2 (talk) 08:02, 3 February 2024 (UTC)[reply]
One possibility I can think of would be something like {{en-noun|p|labiae|labias|sg=labium}}, with the p parameter signifying a (usually-)plurale tantum, the unlabelled parameters being the additional plurals, and the sg= parameter being the singular (and sg2= etc. for cases when there's more than one attested singular). Whoop whoop pull up Bitching Betty ⚧️ Averted crashes 08:16, 3 February 2024 (UTC)[reply]
I have used {{en-noun|~}} ''or'' {{en-noun|p|sg=gnocco}} and {{en-noun|-|s}} ''or'' {{en-noun|p|sg=tagliatella}} at gnocchi and tagliatelle. J3133 (talk) 08:40, 3 February 2024 (UTC)[reply]

Homeric quotations[edit]

I've noticed that quote-links to the Odyssey don't seem to work properly. The Wikisource page does open on the correct book, but always on the first line. The issue seems to be that the generated link ends in e.g. #v320 for the verse number, while the Wikisource page works with simple #320 for the verse number. The Iliad quotes do link properly as those Wikisource pages do end with e.g. #v320 for the verse number. AntiquatedMan (talk) 09:51, 3 February 2024 (UTC)[reply]

The same problem seems to affect a number of Greek texts (and quotes). On the Wikisource pages three different templates are used to mark line-numbers: "r|", "fr|" and "χ|". The first one responds to simple #-links - the second one responds to #v-links - the third one responds to #p-links. The module for grc quotations, meanwhile, generates links with #, #v, or #p, depending on the author (or work), but these links don't always match what is needed for the Wikisource page. AntiquatedMan (talk) 13:49, 3 February 2024 (UTC)[reply]
@AntiquatedMan Can you be more specific about which templates are affected, which incorrect links are generated by these templates and what they should be? Benwing2 (talk) 03:55, 7 February 2024 (UTC)[reply]
Well, I already fixed it for the Odyssey (by changing all line numbers to fr| format on the Wikisource page). It make take a bit for me to inventarize exactly with what authors/texts there are mismatches. But to give an example:
  • when I want to cite line 15 from Aeschylus' "Agamemnon", the link that is auto-generated is https://el.wikisource.org/wiki/Αγαμέμνων#v15 (i.e. ending in #v15). However, because the wikisource page is marked with {r|15}, only links ending in #15 would work. Wrong links like these do lead to the correct text, but don't lead to the correct line.
To fix such cases we would have to either: 1) changes the formatting of line-numbers on the wikisource pages, or 2) change what links are generated by Module:Quotations/grc/data. I don't know what would be preferable. AntiquatedMan (talk) 07:45, 7 February 2024 (UTC)[reply]
For Sophocles:
  • Οιδίπους Τύραννος has {χ|} line numbers (in wikisource), which are correctly paired with links ending in #p in Module:Quotations/grc/data.
  • Αντιγόνη & Ηλέκτρα have {fr|} line numbers (in wikisource), which are correctly paired with links ending in #v in Module:Quotations/grc/data.
  • The other plays by Sophocles have {r|} line numbers (in wikisource), while quote-links ending in #v are generated by Module:Quotations/grc/data. These two do not match.
Which means that {quote|} does not link to the correct verse-line for this third group of Sophocles plays. AntiquatedMan (talk) 08:43, 7 February 2024 (UTC)[reply]
@AntiquatedMan IMO it would be easier and better to fix Module:Quotations/grc/data to produce the right links (although I wonder why there are three different Wikisource syntaxes for line number instead of just one). If you can make a complete list of all the mismatches, I can look into fixing them. Benwing2 (talk) 22:14, 8 February 2024 (UTC)[reply]
I would suggest (@AntiquatedMan or whoever else wants to do this) asking the Wikisource folks why they have three syntaxes, and if they could please consolidate them, or just make each template output all three anchors... - -sche (discuss) 22:51, 8 February 2024 (UTC)[reply]
From what I can tell, the only difference formatting-wise between the syntaxes is that {fr|} makes the line-number float right regardless of further style, which IMO is the preferable way, at least for Greek poetry. {r|} and {χ|} both produce the same formatting (flush to the right edge of the line) - ugly - but with different anchors (-# vs. -#p). For the time being, I'd prefer standardizing to {fr|}. AntiquatedMan (talk) 07:53, 9 February 2024 (UTC)[reply]

@AntiquatedMan, you have spotted the inconsistency very well. @Benwing2, -sche, In 2023, I have asked at el.wikisource.Secretariat about 'Kinds of anchors'. The editor who does a lot of Ancient Greek uploads and checks at el.wikisource is the excellent Ms. Αντιγόνη (=Antigone), who responded to my proposals and questions fully and in detail, as she always does. I asked what is #v (verse) and #p (paragraph). At the time, I thought they served some kind of technical-differentiation purpose. It seems that at the s:el:Cat:Templates for numberings there is a wide variety of styles (poetry #vEvery5Verses, prose with #pParagrph.Section, some #p.dot.Paragraph.colon.section. Plus, pagination differences" some 'books' named, not Α, Β, Γ.. but with other names). Some of the pages were changed to use s:el:Template:l (left simple anchor) or s:el:Template:r (right simple anchor), like the ones spotted by AntiquatedMan. The project was abandoned halfway, because it was explained to me that 'epub' needs some compulsory letter, and that a mere anchor-to-a-number does not work for epub ... oh well, I do not know what that is... ++it needs id= with a letter Here is the discussion of 2012 in English That 'letter' could be anything, as I gathered. It could have been a unified x or n or whatever.
My proposal was: give us a simple anchor, give us the international anchor numbers as in the standard abbreviations at Liddell-Scott(example) for Ancient Greek -we could help with that project too-. What I do now, from wikt:el:Module:quoteHomer, wikt:el:Module:quoteHerodotus and the rest at wikt:el:Module:testing/grc is, either to change the #v to # or whatever is needed, or: to enter el.wikisource each time and place our own {{anchor|00.00}} where a quotation at wikt is needed. For some writers, like Strabo we swapped experimentally to perseus.tufts.edu (example) which offers parallel English translation, and we use similar platforms for Anc.Greek-to-ModGreek translated.
I have more info on the matter if you wish and I can translate the documenations of Templates and Talks whenever you need it. I can help AntiquatedMan, by checking which numberingTemplate is used for each page (some pages do not have any numbering though). This issue has occupied me a long time -hhh Benwing2! I know you laugh with these test-modules I have done: but Sir, the only thing I know about Lua is if this == a do that, end. ‑‑Sarri.greek  I 15:11, 9 February 2024 (UTC)[reply]
@AntiquatedMan, Benwing2, -sche my s:el:User:Sarri.greek/anchors notes, if anyone wishes to take a look. Thank you. ‑‑Sarri.greek  I 16:22, 9 February 2024 (UTC)[reply]

Yes, might I suggest we continue this discussion in the future on (the discussion section of) that page?
BTW, I've started on "standardizing" the works of Sophocles on the model of https://el.wikisource.org/wiki/Αντιγόνη - with {fr|} etc. I think I will tackle Sophocles first. Perhaps we can start with the big three Tragedians (Aeschylus, Sophocles and Euripides), to at least get these three to use the same syntax - and then figure out what to do with the formatting of other genres?
The Iliad and Odyssey WikiSource-pages look fine to me in their current form, and they are properly linked to Wiktionary-quotes now, so these two can rest now. AntiquatedMan (talk) 16:55, 9 February 2024 (UTC)[reply]
Mr @AntiquatedMan! _continue discussion Of course. Perhaps a subpage [Module:Quotations/grc/workpage] or Module:Quotations/grc/data/workpage (@Erutuon might choose address) with proposals and things done
_Sophocles first Great! Note, that sometimes wikisource has no source to verify numberings. Aescylus s:el:Αγαμέμνων verse numbers are missing from manuscripts, especially at Chorica example, so, I corrected numbering according to the good @greek-language.gr.
en.wiktionary has links for every_10_verses, while normally Homer, tragedies, etc have anchors every_5_verses. (@Erutuon) If el.wikisource does not have every_5, I renumber the whole page based on external sources. (have done all Herodtus e.g. s:el:Ιστορίαι (Ηροδότου)/Κλειώ with #00.00 s:el:Template:lb.) Aristophanes Νεφέλαι & Βάτραχοι do not have every_5 (I didn't have time to correct them)
Let me know any time, if you need help with Greek. Also, en.wiktionary might consider perseus.tufts.edu (text sourced + Englisth translation visible for our readers) who might like to read more. Or similar platforms. ‑‑Sarri.greek  I 17:41, 9 February 2024 (UTC)[reply]
If we need a different place to discuss this, I'd suggest Module talk:Quotations/grc/data because it's the talk page corresponding to the module that has an interest in the anchors. — Eru·tuon 19:15, 16 February 2024 (UTC)[reply]

What's going on with this huge diff?[edit]

Looks like a software bug. The diff doesn't change all that much, but the diff page is really, really long. (Or is it my stylesheet, and OK for other people?) [1] Equinox 12:38, 4 February 2024 (UTC)[reply]

@Equinox: I removed the whitespace after sense 1.2. J3133 (talk) 12:41, 4 February 2024 (UTC)[reply]
I reckon they fell asleep on their spacebar... This, that and the other (talk) 01:16, 5 February 2024 (UTC)[reply]
The previous edit added 74,620 bytes, of which apparently 73,326 were spaces- after they were removed, the whole entry was just 1,415 bytes. That's a whole lot of whitespace... Chuck Entz (talk) 04:23, 5 February 2024 (UTC)[reply]

Possible for template/module to check for and display systematically-named files?[edit]

I'm working on isolating hieratic letters from Möller's Hieratische Paläographie, to upload to Commons. (Per Commons, Möller's work is out of copyright and letters aren't copyrightable anyway.) I'm wondering if it's possible to make an *{{Egy glyph evolution}} + module similar to the Chinese glyph origin one, for hieroglyph entries, which would

  1. automatically determine what page it was on like T:character_info does,
  2. look for the relevant files (named systematically as below), and display files that exist (and not ones that don't, e.g. for some glyphs there's an Elephantine form, for others there isn't), and
  3. allow users to manually specify additional files that should also be displayed (e.g. images of colored examples of the hieroglyphs as in 𓁐, which I could also start naming systematically/predictably, or other ad-hoc files with random names).

I'm using the following naming scheme :

I have about 800 images of 100 glyphs (all the uniliterals and various other glyphs), but I want to check that this idea is workable and desirable before I upload them or go any deeper. - -sche (discuss) 23:06, 4 February 2024 (UTC)[reply]

@-sche Yes this is possible and shouldn't be very hard. In Lua you can retrieve the current page name, look to see whether files exist, handle manually-specified parameters, etc. I did something similar to this for handling country-specific flags for categories like Category:Languages of Indonesia; the code is here: Module:category tree/poscatboiler/data/languages#L-722. Benwing2 (talk) 21:52, 5 February 2024 (UTC)[reply]
OK, I've uploaded 1,700+ images of hieratic glyphs to commons:Category:Hieratic glyphs (Georg Möller), and 51 hieroglyphs, named in the format described above. They all link to an explanatory page, so Ctrl-F "hieroglyph" here to find just the hieroglyphs. The dimensions of the files are small (I magnified Möller's text quite a lot but there's only so much I could do), but comparable to the dimensions of the font-extracted hieroglyph images we were already using, and probably fine given the small size we'd be displaying them at. Cross-linking other relevant discussion: User talk:Vorziblix#List_of_hieratic_glyphs. I don't think I'll have time to try writing a glyph-evolution template soon, but @Vorziblix if you're interested you probably have more relevant skills than me, since you wrote Template:egy-glyph. In fact, I think it'd be neat to also make something like {{egy-glyph}} for displaying hieratic-script quotations (which might require making a list, in some module or perhaps template, of the "best" image of each hieratic letter). - -sche (discuss) 22:46, 8 February 2024 (UTC)[reply]

Using {{en-noun}} on -ussification produces the text "noun-forming suffix" which is incorrect. I think there's some code that incorrectly assumes that any term starting with "-" must be a suffix. @Benwing2, you recently overhauled Module:en-headword so I think you would know how to fix this. Ioaxxere (talk) 18:36, 5 February 2024 (UTC)[reply]

@Ioaxxere Use |nosuffix=1 to defeat this. I'm in the process of documenting all the changes made to Module:en-headword. Benwing2 (talk) 20:48, 5 February 2024 (UTC)[reply]

{{hu-conj-ok}} is uncategorized, its documentation is too large to be included[edit]

This template's documentation is too large to be included in the template's page. As a result, it is uncategorized and doesn't show its documentation. What is the correct way to handle this? It would be important to add it to Category:Hungarian verb inflection-table templates. Thank you. Panda10 (talk) 21:35, 7 February 2024 (UTC)[reply]

@Panda10 The documentation page is simply too long. I'm not convinced that all the example transclusions are necessary, particularly in the "Unnamed parameters" section - I don't see anything surprising happening there that warrants so many examples. Can you and @Adam78 remove three or so, and try again? This, that and the other (talk) 23:57, 7 February 2024 (UTC)[reply]
@Panda10: It's not just the template page: the documentation page itself can't handle all the transclusions- there are a few of them at the bottom that display as plain wikilinks. In general, multiple declension tables for agglutinative languages on the same page is just asking for trouble. I know of at least one mainspace Coptic entry that has similar problems with intermittent Lua timeouts. I would recommend having the main documentation page serve as more of an index to tables located in its own subpages than a repository for all of them. Maybe have one or two especially representative ones on the main documentation page, and have either links to larger subpages for groups of tables that have something in common, or have each table on its own subpage with a sort of abbreviated version or summary on the main page serving as a gateway to it. Chuck Entz (talk) 04:29, 8 February 2024 (UTC)[reply]
@This, that and the other, Chuck Entz Thank you both for the suggestions. I will work on restructuring it. Panda10 (talk) 14:34, 8 February 2024 (UTC)[reply]
@Chuck Entz I think the timeouts on those Coptic pages could be fixed by rewriting the template code in Lua, so essentially there's one entrance into Lua for the entire conjugation table instead of one per link, which amounts to a lot more overhead. However, I haven't seen any Coptic pages pop up recently in CAT:E so this can probably wait. Benwing2 (talk) 22:11, 8 February 2024 (UTC)[reply]
@Chuck Entz @Benwing2 These gigantic wikicode inflection templates are a recipe for trouble, and really should be rewritten in Lua in all cases. It's still possible to get timeouts with Lua, but there's a reason why (with about 4,000 wikilinks) doesn't timeout, whereas we have trouble with those Coptic pages with less than half that. Theknightwho (talk) 03:38, 10 February 2024 (UTC)[reply]
@Theknightwho Who says it doesn't? this is one of several entries that show up in CAT:E every week- far more often than the Coptic ones. By the time I see them, they're okay and they clear with a null edit or an API Sandbox purge, but they're close enough to the maximum on time that it's obvious that delays in the system as a whole are just slowing them down enough to go briefly over the time limit. I just checked , and it's currently at 8.195/10 seconds. Chuck Entz (talk) 04:28, 10 February 2024 (UTC)[reply]
@Chuck Entz Fair point, but making them all separate links would be hopeless. Theknightwho (talk) 06:58, 10 February 2024 (UTC)[reply]

Automating Baybayin spellings[edit]

I keep running across the hideous line {{tl-adj|b={{tl-bay sc}}}}. Can someone update the Tagalog headword module/templates to produce these spellings automatically? Ultimateria (talk) 06:13, 8 February 2024 (UTC)[reply]

@Ultimateria: Baybayin is a historical script, so I would guess the only hard part is getting them to not produce the spellings automatically most of the time. In the meanwhile, I wonder if it would work to subst: the {{tl-bay sc}}s when you find them. Chuck Entz (talk) 07:25, 8 February 2024 (UTC)[reply]
I realize I was being ambiguous with my use of "automate"; I don't mean that each entry would produce the spelling automatically, just that the parameter could look like |b=1 or something (rather than invoke a template directly in the headword). Ultimateria (talk) 03:50, 9 February 2024 (UTC)[reply]
@Ultimateria @Chuck Entz I have cleaned up and revamped Module:tl-headword so you can use |b=+ or generally specify the Latin script equivalent of the Baybayin script, instead of having to call {{tl-bay sc}} yourself. Lots of other changes, too; these are temporarily causing some module errors till my bot is able to clean them up. BTW out of 20,000 or so lemmas, over 16,000 had a value for |b= specified so I wonder if we shouldn't default to generating the Baybayin script and require you to use |nob=1 to turn it off. Benwing2 (talk) 03:13, 10 February 2024 (UTC)[reply]
Excellent, thank you! If there's an ongoing effort to revive the script then a full rollout makes sense, but I'll defer to the Tagalog editors. Ultimateria (talk) 17:39, 10 February 2024 (UTC)[reply]
Well, in regard to Baybayin, there's revival of the script mostly for artistic purpose (such as on more the 2015 peso bills, emblems/seals of some Philippine government agencies esp. cultural ones, signage, music album art, tattoos, etc.). There were even laws that attempt to make it one of the Philippine's national script (none of them yet passed).
Also note there is difference between precolonial Baybayin and modern Baybayin due to the additional virama (final consonants were not rendered in precolonial Baybayin). Especially for older words, while we can have Baybayin automatically generated from the Tagalog headword module, I think we could still use {{tl-bay sc}} for cases such as older pre-virama spelling. TagaSanPedroAko (talk) 21:20, 11 February 2024 (UTC)[reply]
@TagaSanPedroAko Note that if you specify a Latin-script term using |b=, it's automatically passed through {{tl-bay sc}}. This might help reduce the manual calls to {{tl-bay sc}}. Benwing2 (talk) 21:27, 11 February 2024 (UTC)[reply]

Accelerated English plurals generate the wrong template[edit]

They still produce the old en-plural noun instead of the new en-noun|p. I don't see why we can't support both, but the old one now generates red template error garbage, so the accelerator needs fixing too. (As a general point of software development, we should have a mechanism allowing the people who change templates to fix related stuff like this automatically, like how we can find all linked pages after deleting an RFD-failed entry.) Equinox 20:45, 10 February 2024 (UTC)[reply]

@Equinox I see the problem here (at roguelike-likes): it (correctly) generated {{head|en|noun form}}, which is a generic language-neutral template that we generally use for non-lemmas. The reason it threw an error is because nolinkhead=1 was specified, which isn't a parameter that the generic head template has, since it was only added to the specific English headword templates. It isn't actually needed here, either.
Is it something you added yourself, or was it automatically generated? Theknightwho (talk) 22:49, 10 February 2024 (UTC)[reply]
Aha, yes, I added it, because (another recent change?) a hyphenated en-noun like "dog-biscuit" now turns into two links "dog-biscuit", which didn't use to be the case. Equinox 22:51, 10 February 2024 (UTC)[reply]
@Equinox See Module:en-headword/documentation, which should explain the (new) headword-linking algorithm in detail. Note also that I added |nolink=1 as a shorter alias for |nolinkhead=1. Benwing2 (talk) 23:24, 10 February 2024 (UTC)[reply]
@Benwing2: How do I fix "old guardists"? Sorry, not reading all those docs right now, too busy. Can you please make "nolinkhead=1" work on plurals as well as singulars? It should work everywhere. Equinox 22:25, 13 February 2024 (UTC)[reply]
@Equinox All right, give me a bit. Currently {{head}} itself doesn't support |nolinkhead=/|nolink= but I'll add that. Benwing2 (talk) 22:42, 13 February 2024 (UTC)[reply]
A problem with this hyphenation-link change is that it affects all existing entries, in many cases creating wrong links. See this edit I just had to make at All-Pro: [2]. Equinox 14:29, 12 February 2024 (UTC)[reply]
Yes, I am having this same problem with minor geographical terms that happen to include a hyphen. However, I am not "against" this. Instead, I am going through geographical terms I have worked on and adding headers like I did here: [3]. It will take a week or so for me to hit everything at my current pace. --Geographyinitiative (talk) 18:50, 12 February 2024 (UTC)[reply]
@Equinox @Geographyinitiative It occurs to me now that I can make the handling of terms like All-Pro smarter, by checking to see if the individual components exist as English terms before linking them. I'm gonna go ahead and implement that as the default. My instinct is to handle each term independently, i.e. just link whichever components in a hyphenated compound actually exist as English terms, but another possibility is only to link the components if all of them exist as English terms. This won't help for fan-tan as both fan and tan are English words, but it should help many cases. Benwing2 (talk) 22:38, 12 February 2024 (UTC)[reply]
Excellent my man. Keep up the great work. Geographyinitiative (talk) 22:40, 12 February 2024 (UTC)[reply]
@Benwing2 We'll still run into problems with various surnames (e.g. Little, Smith etc). Theknightwho (talk) 22:56, 12 February 2024 (UTC)[reply]
@Theknightwho I suppose yeah, but I'm not sure what can be done about it (and I would imagine most uses of Smith in a hyphenated compound are references to the surname rather than the occupation). Benwing2 (talk) 23:29, 12 February 2024 (UTC)[reply]
@Benwing2 That's true. Maybe Old English is a better counterexample, since Old is a surname. Theknightwho (talk) 23:42, 12 February 2024 (UTC)[reply]
An alternative would have been to require a switch for your new stuff, instead of turning it on universally. Equinox 22:23, 13 February 2024 (UTC)[reply]
@Equinox The problem with that is it would require adding that flag to thousands and thousands of pages, which would somewhat defeat the purpose of making it a default. Benwing2 (talk) 22:43, 13 February 2024 (UTC)[reply]

Charlatan creating Proto-Norse pages despite admitting in discord to not knowing anything about Proto-Norse.[edit]

I want everything added by User:Weltwehr removed in the category of Proto-Norse reconstructions. She is a charlatan that added pages despite knowing nothing about Proto-Norse and admitting to it on discord. I told her to remove her faulty pages as soon as possible on the 6th of December, but instead she added 2 more(at least) the next day.

Vandalised articles are here Fjarrai Feu & fere

Please remove these as soon as possible. 2A02:AA1:1643:E9F3:10ED:9CC1:1FA:BE13 01:12, 11 February 2024 (UTC)[reply]

Personally I don't think reconstructed Proto-Norse terms belong here on Wiktionary unless one of the inflected forms is attested (which won't be the case at least for Reconstruction:Proto-Norse/ᚠᛃᚨᚱᚱᚨᛁ, because it's an adverb). Benwing2 (talk) 21:42, 11 February 2024 (UTC)[reply]
@Benwing2 I think reconstructions are okay, but only to the extent we do Latin reconstructions, for example. Theknightwho (talk) 20:13, 12 February 2024 (UTC)[reply]
The reconstructed terms that were there before, occur as name elements or inflections, which is fine. 2A02:AA1:1004:730F:D60:E6B3:249E:A002 22:01, 12 February 2024 (UTC)[reply]
I think if we want to do this consistently we need to merge Proto-Norse with either Proto-Germanic or Old Norse. Otherwise it's fully valid to reconstruct a language based on its descendant(s) and ancestor. Thadh (talk) 17:31, 13 February 2024 (UTC)[reply]
@Thadh I'm confused about your response. Despite its name Proto-Norse is not a reconstructed language, which is why I don't see the point of random reconstructions (other than the sort I mentioned earlier, similar to what we do for Latin as noted by User:Theknightwho). Benwing2 (talk) 20:21, 13 February 2024 (UTC)[reply]
If tomorrow we find an attestation of PIE (say, HNER scribbled down somewhere), that will not invalidate any of our reconstructions. The same goes for Proto-Norse, but the other way around: The stage of the language that Proto-Norse represents also contains for a large part reconstructable language. If we have an ancestor and a descendant, we can be fairly sure there is a term in the middle and we can and should reconstruct it.
This has been my opinion for Proto-Italic as well, and while there this opinion wasn't popular, here we can run into serious issues making any rules of this kind because of countless loans from Proto-Norse into other languages, where calling these loans from PG or Old Norse will not be acceptable. Thadh (talk) 23:26, 13 February 2024 (UTC)[reply]
@Thadh Well, I actually agree with you concerning Proto-Italic, but I think the justification is much shakier for an attested language like Proto-Norse or Gothic (and IMO the possibility that written PIE will randomly turn up seems vanishingly small). Benwing2 (talk) 23:34, 13 February 2024 (UTC)[reply]
Gothic doesn't have descendants, so I'm completely on board with you on that. But Proto-Norse is (at least as it is handled on Wiktionary) an ancestor of Old Norse, and as such if any word is attested in Old Norse (or its descendants, provided it's inherited and not borrowed) and reconstructed to Proto-Germanic, it has to have existed in Proto-Norse, too, and since we know fairly well which changes have already occurred in PN, and how it developed into Old Norse, the reconstruction of such terms is usually pretty straightforward. Thadh (talk) 13:49, 14 February 2024 (UTC)[reply]
We have lots of inscriptions written in Proto-Norse. It is an attested, though fragmentary language. We know it exists because there are around 100 or more examples of written Proto-Norse, written by actual Proto-Norse people. The unfortunate name Proto-Norse combined with reconstructed languages always having the proto-prefix is exactly why people pop in thinking they can add reconstructed Proto-Norse terms based on wayward middle-ground phonology (Which nonsense reconstructions such as *fjarrai is). Vettlingr (talk) 19:32, 15 February 2024 (UTC)[reply]
Here is an extract from the Weltwehr on discord... It's not a very high level of maturity we are dealing with here.
"Look all due respect, and tjis was a while ago by now, í saw á clear lack of PN entries and a webpage that explained sound changes in north germanic. So í went and decided to do my part for a language that, Let's be honestly, most habe never heard of. I was operating with the best knoledge í had at the time, and I do NOT come from tge Academia with this, so for any mistake í make í do appoligize in advanced.
And í also kinda think it should be said that that incorrect entry has been out fir months now, easily, and NONE have said anything about it. Si í take it no one give á fucj but idk" Vettlingr (talk) 17:12, 13 February 2024 (UTC)[reply]

problems adding tanakh quote to חתונה[edit]

<#* {{RQ:Tanach|30|3|11|tsrc=KJV|passage=צְאֶ֧ינָה ׀ וּֽרְאֶ֛ינָה בְּנ֥וֹת צִיּ֖וֹן בַּמֶּ֣לֶךְ שְׁלֹמֹ֑ה בָּעֲטָרָ֗ה שֶׁעִטְּרָה־לּ֤וֹ אִמּוֹ֙ בְּ'''י֣וֹם חֲתֻנָּת֔וֹ''' וּבְי֖וֹם שִׂמְחַ֥ת לִבּֽוֹ׃(ס)|translation=O maidens of Zion, go forth<br/>And gaze upon King Solomon<br/>Wearing the crown that his mother<br/>Gave him on '''his wedding day''',<br/>On his day of bliss.|tr=''11 ṣə’eynâ| ûrə’eynâ bənwōṯ ṣîywōn bammeleḵə šəlōmōh bā‘ăṭārâ še‘iṭṭərâ-llwō ’immwō bəywōm '''ḥăṯunnāṯwō''' ûḇəywōm śiməḥaṯ libwō: s'''''}} returns error about the parameter 4 (transliteration), as if it's invalid. Yet on the page בניו it works just fine. Shoshin000 (talk) 09:58, 11 February 2024 (UTC)[reply]

That's because you used the pipe character, "|" within your text, which made everything after it a separate parameter as far as the wikitext parser is concerned. Template syntax distinguishes between named parameters, like |passage= and |translation=, and positional parameters, which have no name, and which are refered to by the order they occur in the template. Thus |1=30,|2=3 and |3=1. That makes the part of your transliteration after the pipe |4=ûrə’eynâ bənwōṯ ṣîywōn bammeleḵə šəlōmōh bā‘ăṭārâ še‘iṭṭərâ-llwō ’immwō bəywōm ḥăṯunnāṯwō ûḇəywōm śiməḥaṯ libwō: s. This never worked, but in the past the extra parameter was just ignored and not displayed. Now that the module behind the template has been changed, it gives you a module error to tell you something's wrong.
You need to use something other than the pipe to correspond to the Hebrew ׀ character. If necessary, you can use the HTML entity "&vert;" instead of the pipe: #* {{RQ:Tanach|30|3|11|tsrc=KJV|passage=צְאֶ֧ינָה ׀ וּֽרְאֶ֛ינָה בְּנ֥וֹת צִיּ֖וֹן בַּמֶּ֣לֶךְ שְׁלֹמֹ֑ה בָּעֲטָרָ֗ה שֶׁעִטְּרָה־לּ֤וֹ אִמּוֹ֙ בְּ'''י֣וֹם חֲתֻנָּת֔וֹ''' וּבְי֖וֹם שִׂמְחַ֥ת לִבּֽוֹ׃(ס)|translation=O maidens of Zion, go forth<br/>And gaze upon King Solomon<br/>Wearing the crown that his mother<br/>Gave him on '''his wedding day''',<br/>On his day of bliss.|tr=''11 ṣə’eynâ | ûrə’eynâ bənwōṯ ṣîywōn bammeleḵə šəlōmōh bā‘ăṭārâ še‘iṭṭərâ-llwō ’immwō bəywōm '''ḥăṯunnāṯwō''' ûḇəywōm śiməḥaṯ libwō: s''}} (note that I also corrected the bolding), which gives :
Tanach, Song of Songs 3:11, with translation of the King James Version:
צְאֶ֧ינָה ׀ וּֽרְאֶ֛ינָה בְּנ֥וֹת צִיּ֖וֹן בַּמֶּ֣לֶךְ שְׁלֹמֹ֑ה בָּעֲטָרָ֗ה שֶׁעִטְּרָה־לּ֤וֹ אִמּוֹ֙ בְּי֣וֹם חֲתֻנָּת֔וֹ וּבְי֖וֹם שִׂמְחַ֥ת לִבּֽוֹ׃(ס)
11 ṣə’eynâ | ûrə’eynâ bənwōṯ ṣîywōn bammeleḵə šəlōmōh bā‘ăṭārâ še‘iṭṭərâ-llwō ’immwō bəywōm ḥăṯunnāṯwō ûḇəywōm śiməḥaṯ libwō: s
O maidens of Zion, go forth
And gaze upon King Solomon
Wearing the crown that his mother
Gave him on his wedding day,
On his day of bliss.
I would also recommend getting rid of the "11" at the start of the transliteration, since it's not in the Hebrew or the translation. Chuck Entz (talk) 16:13, 11 February 2024 (UTC)[reply]
And (without considering whether or not that character should be used, only how to use it) if HTML-escaped & vert ; doesn't work, e.g. if it gets unescaped by the templates/modules too early the way & apos ; does, we used to have a {{!}} which worked similar to {{'}} but displayed a vertical line; it was deleted, and I'm not sure where (if anywhere) the functionality was moved to. As to whether to use it, perhaps our Hebrew-speaking editors know how best the character in question is to be transliterated: ping some people from Category:User he? - -sche (discuss) 16:47, 11 February 2024 (UTC)[reply]
@Theknightwho Do you know what's going on with the early resolution of HTML entities? Is this something due to a change of yours made mid last year? If so, how come we're manually resolving the HTML entities into literal characters instead of letting them through? At least we should let &apos; and &vert; (and their numeric equivalents) stay unresolved. Benwing2 (talk) 21:36, 11 February 2024 (UTC)[reply]
Ah, I'm sorry, I think I was unclear, & vert ; for its part currently "works" in quote templates (it displays as a vertical line and doesn't seem to cause unexpected behavior; OP's issue was using an actual vertical line instead of escaping it), I'm just saying that if someone didn't want to use HTML notation but wanted a template, to future-proof things against someone "resolving"/"unescaping" the HTML notation into the character it stands for, like e.g. AWB is set to do by default, I'm not sure if we have such a template anymore. It's & apos ; which gets unescaped/converted a little "too early" now, and results in things like this if for some reason there's a single apostrophe in a text (that particular example is made-up, but I have encountered the issue with real quotes), where bolding propagates backwards onto the line where the book title is and such. (There is a template for that, sort-of, but it has the issues discussed above.) - -sche (discuss) 21:55, 11 February 2024 (UTC)[reply]
@-sche @Benwing2 I *think* the reason why that example is doing weird things is because the software does some special trickery if the number of '' and ''' on a line are both odd, as it interprets one of the ''' as '' preceded by a plain apostrophe since it think that's probably what you intended. The rules are quite complex. Theknightwho (talk) 22:27, 11 February 2024 (UTC)[reply]
@Theknightwho Right but didn't you write some software to convert HTML entities to their plain equivalents during some sort of processing, maybe in Module:languages? IMO this shouldn't be being done. Benwing2 (talk) 22:43, 11 February 2024 (UTC)[reply]
@Benwing2 Yes, but it isn't a simple naive conversion. It's done after the formatting escapes are applied, and then it's re-escaped again before the formatting is changed back, so that there's a clear line of separation between the two. It's necessary, because otherwise we run into problems with transliteration modules (or whatever) trashing the HTML entities, and even aside from that there's the fact that most users will not expect characters input as HTML entities to be treated differently outside of situations directly relevant to formatting. Theknightwho (talk) 22:53, 11 February 2024 (UTC)[reply]
@Theknightwho Can you explain more what "after the formatting escapes are applied" means? (BTW honestly I've never really bought your explanation for why we need to do all this extra processing to make translit modules easier. It seems a bit like a solution in search of a problem.) Benwing2 (talk) 22:59, 11 February 2024 (UTC)[reply]
@Benwing2 If a user inputs ''a&#39;'', it's because they want to input a' in italics, and in every other respect it should be treated like a conventional plaintext apostrophe. What they don't want is to suddenly find that it does weird, unexpected things because it's being processed as the literal &#39;, and there's no way we can reasonably expect all of the hundreds of various string processing modules we have to all be able to handle that. Theknightwho (talk) 23:04, 11 February 2024 (UTC)[reply]
@Theknightwho OK but (a) that's not explaining why there is now this breakage in quote templates that didn't use to happen, and (b) were there actual issues happening when you decided to implement this or was it preemptive? (This is what I mean by "solution in search of a problem". You've effectively added a great deal of complexity to Module:languages that no one besides you understands, because it's not well-documented. In general before doing something like this there needs to be a clear need, not just a hypothesis.) Benwing2 (talk) 23:08, 11 February 2024 (UTC)[reply]
Also can you answer the first part of my question (i.e. what "after the formatting escapes are applied" means)? I'm still rather in the dark how all this stuff actually works. Benwing2 (talk) 23:09, 11 February 2024 (UTC)[reply]
The formatting escapes are the part where formatting characters get converted into PUA characters, so that we know what's left can be safely treated as actual text. Once that's happened, it's safe to convert HTML entity apostrophes (and any other characters used in formatting), since we can safely treat them as literals. Once the processing is done, we convert them back to HTML entities, and then re-convert the PUA characters into the original formatting characters.
It's not a hypothesis - it just centralised the work that was already being done by some other modules anyway, but on a language-by-language basis: e.g. Korean, Chinese, Tibetan. I don't just do it for fun. Theknightwho (talk) 23:13, 11 February 2024 (UTC)[reply]
And yes, it's imperfect, but that's one of the motivations for the wikitext parser, because character escapes are a massive pain and they shouldn't be something anyone has to consider when writing modules. It's the kind of thing that leads to some of the terrible workarounds we can see in the old Chinese code. I'm not saying this is great code either, to be clear, but it's better than some. Theknightwho (talk) 23:17, 11 February 2024 (UTC)[reply]
@Theknightwho OK. Thanks for the explanation. I am still skeptical but if there was actual code that got centralized, I suppose this is a win. I am still am confused though why the issue comes up with &apos; in quote templates that wasn't there before, can you explain this? Also once you added the centralized stuff to Module:languages, did you clean up the old hacks in the Korean, Chinese and Tibetan etc. modules or is that code still present? Finally, when will you be able to document how this works in the code? I've been asking for that for over a year now. Benwing2 (talk) 23:23, 11 February 2024 (UTC)[reply]
@Benwing2 I'm not sure exactly what's going on with the quote template, as it seems really weird to me that the apostrophe appears at the start. I'll have to look at it in more detail.
I totally rewrote the Tibetan code a while back, and cleaning up the Chinese modules is an ongoing monumental task which I've been doing along with @Wpi. I've cleaned up some Korean stuff, but not to the same extent.
To be honest, I've been mainly focused on finding a way to replace this code, since there are a number of things I don't like about it - not least of which is the complexity and unnecessary duplication of work. Theknightwho (talk) 23:33, 11 February 2024 (UTC)[reply]
@Theknightwho OK thanks, sounds good. Benwing2 (talk) 23:55, 11 February 2024 (UTC)[reply]
@-sche the {{!}} template (once Template:!) is now built into the MediaWiki software as a magic word. This, that and the other (talk) 22:19, 11 February 2024 (UTC)[reply]

{{quote-song}} issue[edit]

When |lyricist= is filled in, the name of the lyricist is displayed after the title of the song and before the name of the album (e.g. see the quotation at 豔紅艳红 (yànhóng)). To me, this is misleading because the lyricist wrote the lyrics of the quoted song, not the lyrics of all the songs in the album, and so the name of the lyricist should come before the title of the song. However, the author of the song is displayed before the title of the song when |author= is filled in but not |lyricist=. Is this an issue that needs to be fixed? RcAlex36 (talk) 18:24, 11 February 2024 (UTC)[reply]

@RcAlex36 For reference it currently looks like this:
1992, “相思風雨中”, in 簡寧 (lyrics), 真情流露‎[1], performed by 張學友 [Jacky Cheung] and 湯寶如 [Karen Tong]:
Maybe it should look more like this?
1992, “相思風雨中”, 簡寧 (lyrics), in ‎真情流露[1], performed by 張學友 [Jacky Cheung] and 湯寶如 [Karen Tong]:
Alternatively, if we put the lyricist first, it looks like this:
1992, 簡寧 (lyrics), “相思風雨中”, in ‎真情流露[1], performed by 張學友 [Jacky Cheung] and 湯寶如 [Karen Tong]:
Benwing2 (talk) 21:33, 11 February 2024 (UTC)[reply]
@Benwing2: I prefer the last one (i.e. putting the lyricist first) as it's the format we've been doing when |author= is filled in. I think a similar issue occurs when |composer= is filled in—the name of the composer is shown after the song title before the album name. RcAlex36 (talk) 03:04, 12 February 2024 (UTC)[reply]
@Benwing2: May I request that the module be edited accordingly? RcAlex36 (talk) 06:08, 15 February 2024 (UTC)[reply]
@RcAlex36 Sure, I will do this in the next couple of days. Benwing2 (talk) 06:14, 15 February 2024 (UTC)[reply]

Place name text for asteroids[edit]

At Liszt we have a sense "An asteroid in Asteroid Belt, Solar System". That looks kind of silly to me. Can it be improved to something like "An [[asteroid]] in the [[Asteroid Belt]]"? This, that and the other (talk) 10:30, 12 February 2024 (UTC)[reply]

@This, that and the other Yeah go ahead and make the fix, it looks good to me. You might check the 19 things that link to Asteroid Belt here (Special:WhatLinksHere/Asteroid Belt) to see if there are others needing fixing. Benwing2 (talk) 00:57, 14 February 2024 (UTC)[reply]
I looked at the code of Mod:place and accompanying data modules and couldn't see any logic for handling extraterrestrial locations. Perhaps the entries should just use plain text instead. Otherwise we end up with stupid stuff like at Translingual Mohorovičić. This, that and the other (talk) 05:49, 22 February 2024 (UTC)[reply]

URLs with square brackets[edit]

There is a URL I'm trying to use as the |url= parameter in the {{quote-book}} template, but a part of it is enclosed in double square brackets, so it renders incorrectly on the page, with the bracketed part showing up as a wikilink.
I've tried using zero-width spaces before and after each bracket, and replacing each bracket with its HTML equivalent, but to no avail.
Does anyone know a way to fix this? —— GianWiki (talk) 11:02, 13 February 2024 (UTC)[reply]

You need to encode the URL query string. Copy everything after the ? in your URL and paste it into a url encoder, click encode, and use the resulting text to replace everything after the ? in your url. ie https://foo.com/search?complex [stuff@here] becomes https://foo.com/search?complex%20%5Bstuff%40here%5D JeffDoozan (talk) 13:24, 13 February 2024 (UTC)[reply]

Template:transclude and multiple uses of the same template on the same line[edit]

@Theknightwho: There are now 14 (and counting) pages in CAT:E because the definition uses {{place}} twice on the same line- not nested, but separately. Why? Chuck Entz (talk) 14:11, 13 February 2024 (UTC)[reply]

@Chuck Entz I'll have a look. It's probably the template parser. Theknightwho (talk) 14:14, 13 February 2024 (UTC)[reply]
@Chuck Entz I'm not sure this has ever worked. It's due to the code in {{transclude}} not knowing how to process such lines. Let me see if I can fix it. Benwing2 (talk) 20:34, 13 February 2024 (UTC)[reply]
Thanks Ben. Theknightwho (talk) 14:25, 14 February 2024 (UTC)[reply]

Welsh pluralia tantum[edit]

Welsh has some pluralia tantum. Most of these nouns have a derived singulative, but some do not, like gyrgoed and athletau. I'm also not sure that the ones that do have derived singulatives should be called "pluralia tantum", as opposed to "collective nouns", as pluralia tantum are generally defined by not existing in the singular.

Welsh is like German in that it doesn't distinguish gender in the plural, so it isn't appropriate to label pluralia tantum that lack a singulatve as "m pl or f pl" as such words currently are.

The Geiriadur Prifysgol Cymru's policy is not to assign a gender to such words, and I can see that the same is done with German pluralia tantum on Wiktionary such as Händel. Could Template:cy-noun be altered to do the same please?

Collective nouns with both Welsh genders[edit]

Also, there are some collective nouns with singulatives that can be either masculine or feminine, such as talch.

Template:cy-noun isn't currently able to fully describe these:
talch m or f (collective, singulative telchyn or talchen)
doesn't contain the information that telchyn is the singulative for the masculine noun and talchen is the singulative for the feminine noun, rather than the two forms being completely interchangeable.

It would be nice to use the template to form something like:
talch m pl or f pl (m singulative telchyn, f singulative talchen)
instead of having to type this out manually.

Arafsymudwr (talk) 13:05, 14 February 2024 (UTC)[reply]

@Arafsymudwr I added p as a possible gender for use with pluralia tantum that don't have an obvious gender, and added |msg= and |fsg= for indicating masculine and feminine singulatives on pluralia tantum. I also changed the display so that if you specify a plurale tantum gender along with a singulative, the following happens: (1) the genders change to not say "plural" (and if the gender is just p it disappears); (2) the word "collective" appears before the inflections; and (3) the term is added to Category:Welsh collective nouns. Let me know if that handles all your issues. Benwing2 (talk) 05:19, 15 February 2024 (UTC)[reply]
Also, please review the documentation at Template:cy-noun/documentation at fix any mistakes. Thanks! Benwing2 (talk) 05:29, 15 February 2024 (UTC)[reply]
Thanks @Benwing2, that does seem to fix all issues! Arafsymudwr (talk) 13:55, 15 February 2024 (UTC)[reply]
Actually @Benwing2, could you modify it so the word "collective" appears before the parentheses, for consistency with the display for singular/plural nouns such as cath?
cath f (plural cathod)
gwenyn f (collective, singulative gwenynen) would be better as
gwenyn f collective (singulative gwenynen)
Having sorod pl (not mutable) appear as
sorod pl (plural only, not mutable) like how German treats its pluralia tantum would also help make clear its plural-only nature.
Also, words like coed that have both a derived singulative and a derived plural are still appearing in the "pluralia tantum" category - can this be fixed please?
It might also be a good idea to set up the template so that if an editor defines a word as plural-only, then no gender is displayed even if the editor tries to specify one, as these words can't meaningfully have a gender.
I'd do all this myself, but it looks like it involves tinkering with Module:cy-headword - not something I have much experience with! Arafsymudwr (talk) 01:07, 17 February 2024 (UTC)[reply]
I tried mocking this up using the gloss feature in Mod:headword but it doesn't do exactly what is desired here. @Benwing2 any thoughts? This, that and the other (talk) 09:14, 27 February 2024 (UTC)[reply]
@This, that and the other @Arafsymudwr IMO what we need to do is add a coll = "collective" gender code (type "number") to Module:gender and number/data. I've been meaning to get to this; I'll take a look tomorrow in the AM (US time). Benwing2 (talk) 09:17, 27 February 2024 (UTC)[reply]

Multiword terms[edit]

Is there any way to set the way multiword terms are determined on a language-specific basis? There are a number of terms in CAT:Scottish Gaelic multiword terms that are in fact single words that happen to be spelled with a hyphen in order to show that stress is on the second syllable (e.g. a-mach, a-màireach, a-nall). Is there any way to suppress that categorization? —Mahāgaja · talk 12:45, 14 February 2024 (UTC)[reply]

@Mahagaja You can suppress categorization for individual words using |nomultiwordcat= on {{head}}. (If there's a Scottish-Gaelic-specific template that calls {{head}}, you might have to add the parameter to that template and pass it to {{head}}.) Also see Module:headword/data, in particular data.hyphen_not_multiword_sep, which lets you turn off multiword status for all hyphenated terms in a given language. Benwing2 (talk) 22:13, 14 February 2024 (UTC)[reply]
Thanks! It's really only adverbs starting with a- that need to be removed from the category, so I'm doing it manually rather than adding gd to the data.hyphen_not_multiword_sep list. —Mahāgaja · talk 08:09, 15 February 2024 (UTC)[reply]
@Mahagaja OK. If there are very many I can work up a solution involving language-specific exclude or include patterns, so that e.g. you can say "exclude terms in a- for lang=gd". Benwing2 (talk) 09:07, 15 February 2024 (UTC)[reply]
@Benwing2: I've actually done them all now. I thought about finding a way of excluding adverbs beginning with a-, but then there are terms like a-nall thairis, which is a multiword term because of the space between a-nall and thairis, not because of the hyphen between a and nall. Anyway, they're all done now. —Mahāgaja · talk 09:13, 15 February 2024 (UTC)[reply]
@Mahagaja OK great. Yeah what I was thinking of implementing would only take effect when there are hyphens but no spaces, so a term like a-nall thairis wouldn't be affected. Benwing2 (talk) 11:07, 15 February 2024 (UTC)[reply]

senseid and similar links from multiword inflection-line templates[edit]

I note that I cannot follow a hard link to say#Noun from {{en-verb|have a say}}. Does {{senseid}} work for that? DCDuring (talk) 19:47, 15 February 2024 (UTC)[reply]

Apparently not, since the adapted link, with URL anchor, gets stripped by the module behind {{en-verb}}, as seen by the version you have saved. Fay Freak (talk) 20:13, 15 February 2024 (UTC)[reply]
@DCDuring Hmm, that shouldn't be happening. Let me take a look. Benwing2 (talk) 20:42, 15 February 2024 (UTC)[reply]
@DCDuring This was broken by my recent overhaul of default-linking in Module:en-headword, and should be fixed now. Specifically, if you specify brackets in the value of 1= in a multiword verb, it will respect the brackets (but also autolink the verb, unless noautolinkverb=1 is given); otherwise you get the default headword-linking algorithm. Benwing2 (talk) 02:09, 16 February 2024 (UTC)[reply]
As we used to say in the late 60s and 70s, dig yourself (~ dig#Etymology 2 + yourself). DCDuring (talk) 14:02, 16 February 2024 (UTC)[reply]
@Benwing2: Is, e.g., a move on supposed to be unlinked in get a move on? This seems to be an issue with many verb headwords (only linking one word). J3133 (talk) 17:24, 16 February 2024 (UTC)[reply]
Oops, let me fix that. Benwing2 (talk) 17:52, 16 February 2024 (UTC)[reply]
@J3133 @DCDuring Should be fixed. Question: Currently we leave the entire inflected term as a red link e.g. gets a move on, got a move on. Should we instead link the individual components, e.g. [[get|gets]] [[a]] [[move]] [[on]] or [[gets]] [[a]] [[move]] [[on]]? This is what we do for multiword expressions in Spanish and Italian. Benwing2 (talk) 18:15, 16 February 2024 (UTC)[reply]
The worst, at least for an English expression of this kind, is that it is a red link. It really seems quite silly to have links to the inflected forms, even when they exist. There is a (very, very) small value to having links to the inflected forms of the verbs. I'd vote for no links. DCDuring (talk) 18:27, 16 February 2024 (UTC)[reply]

Hundreds of Incomplete Greenlandic entries need to be cleaned up[edit]

There are currently 949 edits in the log for Abuse Filter 68 for edits done by @GabMarquetto in entries that were lacking headword templates. These were all done over a period of about 6 hours, along with about 800 other edits (that's about 5 edits per minute, on average). I've now blocked them from mainspace and Reconstruction space for running an unauthorized bot.

That leaves us with hundreds of Greenlandic lemma entries without headword templates. They also added an etymology section with {{rfe}} and a pronunciation section with {{kl-IPA}} to each entry.

Would someone with a[n authorized] bot be so kind as to fix these defective entries? I think it's safe to just remove all the etymology sections, and I have my doubts about the pronunciation sections- who knows how many of these would require respelling to render the correct IPA? That just leaves adding the appropriate headword templates.

Thanks! Chuck Entz (talk) 09:33, 16 February 2024 (UTC)[reply]

At the very least we could add {{attention}} to them. Vininn126 (talk) 09:44, 16 February 2024 (UTC)[reply]
As we don't have an (active) Greenlandic editor, it may be worth just deleting them all, as we need to verify every single one of them to make sure the bot didn't mess anything up. Thadh (talk) 11:30, 16 February 2024 (UTC)[reply]
@Chuck Entz To be honest, it may be best to just nuke them. Theknightwho (talk) 16:44, 16 February 2024 (UTC)[reply]
I can delete them if someone can help me compile a list. Note that the user may have made additional edits of the same nature that didn't trigger the filter. Benwing2 (talk) 18:13, 16 February 2024 (UTC)[reply]
I'm happy to clean up the ones with organism names, but do we know that they are linguistically accurate? I won't be doing any declensions or even high-quality inflection lines, just the basics. DCDuring (talk) 18:32, 16 February 2024 (UTC)[reply]
@DCDuring It looks like the user simply copied a bunch of entries directly from a Greenlandic-English dictionary, which may or may not be accurate. Benwing2 (talk) 18:40, 16 February 2024 (UTC)[reply]
IOW, the entries are sourced from a reasonable reference, which can be linked to, though not directly to the (one-line) entry. Is there isn't WT:COPY, the sensible course would seem to be to add {{head|kl}} with the appropriate PoS to the ones I have not gotten to, as Chuck suggested above. I've done mostly nouns, a few proper nouns, and a verb or two. DCDuring (talk) 19:47, 16 February 2024 (UTC)[reply]
@DCDuring There's also the issue of pronunciation. The user simply added {{kl-IPA}} in all cases. I don't know Greenlandic well enough to know whether this is a reasonable course in all circumstances, but I have my doubts. Benwing2 (talk) 20:00, 16 February 2024 (UTC)[reply]
Why do we have the template if it is not a reasonable course? DCDuring (talk) 20:01, 16 February 2024 (UTC)[reply]
@DCDuring What I'm saying is that most pronunciation templates require an additional respelling argument in some circumstances. This may be the case here as well, but I can't say because I don't know anything about Greenlandic spelling vs. pronunciation. Benwing2 (talk) 20:29, 16 February 2024 (UTC)[reply]
I don't see any such argument in the documentation for {{kl-IPA}}. DCDuring (talk) 20:52, 16 February 2024 (UTC)[reply]
It's in |1=. Benwing2 (talk) 00:30, 17 February 2024 (UTC)[reply]
Yes, but "1" is the headword automatically. If, to get correct IPA, someone has to understand the template well enough to feed it something other than the headword, they might as well add the IPA themselves. In any event, this is a wiki, an evolving entity, relying on volunteers with knowledge of many different languages of many kinds, with all levels of linguistic and technical skill, not to mention patience. Erroneous pronunciations will eventually get someone's attention and some kind of correction (one-at-a-time, a complaint, or change in the module data. If {{kl-IPA}} is so poor as to have, say, a 50% error rate (however measured) in its automatic operation, then it should be deprecated or disabled until its error rate could be brought up to, say, 75% (or 90%, or whatever). DCDuring (talk) 16:47, 17 February 2024 (UTC)[reply]

Deleting لىٔک[edit]

This Khowar term needs to be deleted, it doesn't show up on the "A digital Khowar-English dictionary with audio" dictionary they site. It was probably added by accident, I added a new entry for the same word but with the correct spelling. Akhaeron (talk) 17:02, 16 February 2024 (UTC)[reply]

Deleting رأىک[edit]

This Khowar term also needs to be deleted, again it doesn't show in up the dictionary. I replaced the wrod with the correct spelling. Akhaeron (talk) 10:32, 17 February 2024 (UTC)[reply]

Deleting لىٔک[edit]

This: لیئک is the correct version of "لىٔک". This page must be deleted. Akhaeron (talk) 11:00, 17 February 2024 (UTC)[reply]

@Akhaeron As you're asserting that the words don't exist in Khowar, feel free to move these discussions to WT:RFVN and add the {{rfv|khw}} tag to the entries. On the other hand, if you think there is a systemic problem with the way our Khowar entries are written, that would be a topic for the Beer parlour. Thanks! This, that and the other (talk) 09:24, 18 February 2024 (UTC)[reply]

Vowel length in rhymes pages[edit]

I was creating a rhymes page for words that rhyme with eon, but wasn't sure if I should make it under /-iɒn/ or /-iːɒn/. Ultimately, I chose the former as most pages at /-iː.../ don't have the long vowel sign (the only ones that do are followed by a schwa; I think this is because some templates or bots render as a single phoneme, as it is in New Zealand English. iːə is used because it is definitively two phonemes). However, depending on whether the long vowel sign is added or not, the rhymes nav either has to link to both /-i-/ and /-iɒn/, or both /-iː-/ and /-iːɒn/. In either case, it links to a nonexistent page. Here, I opted for the latter as it is linked from Rhymes:English, but it makes it seem like the rhymes page is /-iːɒn/ when it is actually /-iɒn/. It would be better to have the rhymes nav link to real pages you can navigate.
I'm wondering why pages (like the ones at /-iː.../) have removed the long vowel sign just because is followed by another vowel. I think it would make more sense to have pages like /-iːɒn/ where the long vowel sign is kept. When people add rhymes to words, it seems like they more often use the long vowel sign as /-iːɒn/ had three links, and /-iɒn/ had one link (until I added pleon and freon). Donopi (talk) 05:32, 21 February 2024 (UTC)[reply]

Rhymes on stressed /iː/ should include the length marker regardless of whether it is followed by a vowel or a consonant. As you noted, /iː/ + schwa can in some accents be 'compressed' into a single phoneme: this is traditionally transcribed /ɪə/ for Received Pronunciation. Unstressed /i/ before a vowel should omit the length marker, e.g. fermion, going by John Wells' convention for representing the variable unstressed vowel in such words.--Urszag (talk) 07:06, 21 February 2024 (UTC)[reply]

Quotations containing multiple languages, or not needing a translation[edit]

Recently the quotations I added to octarius were edited to put them into the quote-book template. That would be fine, except one of the cited sources is primarily in English (with Latin portions) and the translation of the Latin is given in the English-text portion of the quotation: "R. Pilularum Catharticarum Compositarum octarium unum; that is, Take one pint of Compound Cathartic Pills." I don't see any way to mark different languages in one passage using the quote-book template; also, the template automatically inserts a request for an English translation if no separate translation is given. For that reason, I had originally just added the text outside of the template. Is there a better solution?

I've also seen another editor resort to using an empty translation parameter with "| " in the ux template on Sulla, but that actually adds an extra line of empty space, which looks odd to me. Urszag (talk) 16:14, 23 February 2024 (UTC)[reply]

@Urszag: I have added |t=- to the quotation to remove the translation request. J3133 (talk) 17:19, 23 February 2024 (UTC)[reply]

Pagename fonts[edit]

Has something changed at the way page titles are written?

  • 1) I do not recall a problem linking to dictionary-suffixes examples template {{R:DSMG}} at -αίνω gives &#45;αίνω and now I have to rewrite all such refs from {{R:DSMG}} to {{R:DSMG|"-xxx"}} for correct like -αίνω@DSMG. Probably all refs with diacritics migth need rewriting ('ωωω, -ωωω, ωωω-, *ωωω) A, no, it is the dictionary's problem. It works fine test at κατα- (kata-).
  • 2) Also, I see that the greek grave βαρεία (vareía) is corrected automatically e.g. μολὼν λαβέ (molṑn labé) not to show μολωˋν λαβέ We used to create with acute μολών λαβέ (molṓn labé) and correct the bareia in the bodytext. (now the interwikis are lost, because other wiktionaries cannot do this).

Is it my browser or something happened? Thank you. ‑‑Sarri.greek  I 17:02, 23 February 2024 (UTC)[reply]

Unclickable links to diacritic entries[edit]

While cleaning up incorrect language headers, I've found several diacritical mark entries in categories that are (barely) visible on the page, but have no clickable link. If you inspect the right spot on the page, there's a target URL wrapped in span class="None", but I've hovered my mouse pointer over the whole area and there's absolutely nothing that registers. If you look at Category:Translingual diacritical marks they're scattered through the listings. Some examples: [ ͚ ] (U+035A)), [ ׁ ] (U+05C1), [ ] ([[U+01772), [ ] (U+01773) and [ 𖿰 ] (U16FF0). The last one is interesting because it uses the {{diacritic}} template, so the headword displays differently than on the top of the page or on the category page.

If I try linking to them myself without giving it something else for the display, whether I'm using a bare wikilink, as in [[%CD%9A]], or a template, as in {{l|mul|%CD%9A}}, I still get something unclickable: " ͚ ".

Is there any way to add some kind of clickable object through the css/js? It's kind of silly to have things displayed in the categories that no one can get to through the user interface. @Erutuon, This, that and the other. Chuck Entz (talk) 01:15, 24 February 2024 (UTC)[reply]

My understanding is that these entries should be moved so that the entry title includes the ◌ character before the combining diacritic character. That ought to solve the problems. I could be wrong though. This, that and the other (talk) 01:50, 24 February 2024 (UTC)[reply]
Probably, if it works equally with other-than-left-to-right characters, e.g. ٔ◌ instead of ٔ ARABIC HAMZA ABOVE; hasn’t been there before in Category:Arabic script characters, Category:Hebrew script characters, I think chiefly because eager editors created all pages for Unicode characters without thinking, as less thinking was admitted in the Tbot era. Alternatively we might programmatically add these characters on the displays in the category just as we add sorting, fonts and formatting stuff I know naught about, but this does not solve that we can barely link the pages either with brackets or linking templates; so at least unless {{m}}, {{l}} etc. get fixed then we have to move the pages. Fay Freak (talk) 02:03, 24 February 2024 (UTC)[reply]
No need to move, IMO. To link to e.g. the Arabic fatha diacritic, use ـ before it: ـَ (this is a redirect).
The same way Thai ◌ะ links to , even though it's clickable. Anatoli T. (обсудить/вклад) 02:04, 1 March 2024 (UTC)[reply]
Unfortunately I've never found a way to identify unclickable characters using their properties in the Unicode Character Database. Many of the unclickable diacritics (there are other zero-width characters such as the zero-width space) have the General Category of Nonspacing Mark and can be made clickable by putting them on a nonbreaking space or dotted circle, such as the acute accent ( ́ or ◌́), or on a tatweel like Arabic vowels ( ؘ or ◌ؘ or ـؘ), but other Nonspacing Marks are clickable (like Devanagari vowels: , ि; no dotted circle) and some fonts even display them as if they had a dotted circle (like in the font that my browser uses when I add class="Deva": , ि).
Theknightwho and kc_kennylau were reading the Unicode Standard but didn't find anything very clear (so it may be "undocumented behavior"). So I guess it's up to font makers whether to make a lone diacritic clickable or not.
If so, to identify unclickable characters, we could assume all Nonspacing Marks are unclickable, except in scripts (like Devanagari) where we have identified that some of the Nonspacing Marks are clickable in commonly used fonts. In relatively recent browser versions, JavaScript has added Unicode property-based matching to regular expressions, so a gadget could identify links where the text is probably unclickable based on this rationale. Though whether we want a gadget to fix this, or Module:links or Module:script utilities, I don't know. — Eru·tuon 23:38, 5 March 2024 (UTC)[reply]

Bug in Hindi transliteration: bolded words are not processed correctly[edit]

I found the following problem at मनुष्य (manuṣya):

{{hi-x|प्रति मनुष्य मरणशील है।}} transliterates मनुष्य as 'manuṣy':

प्रति मनुष्य मरणशील है।prati manuṣya maraṇśīl hai.(please add an English translation of this usage example)

While without the apostrophes for bold font, {{hi-x|प्रति मनुष्य मरणशील है।}} shows

प्रति मनुष्य मरणशील है।prati manuṣya maraṇśīl hai.(please add an English translation of this usage example)

How to fix this? Exarchus (talk) 13:24, 25 February 2024 (UTC)[reply]

Putting <nowiki/> between the word and the closing set of apostrophes fixes it, but obviously there should be a proper fix that doesn't require that hack. —Mahāgaja · talk 13:37, 25 February 2024 (UTC)[reply]
@Mahagaja It works when I change line 102 of Module:hi-translit to:
local vowel, vowel_sign = '*aिुृेोाीूैौॉॅॆॊ', 'अइउएओआईऊऋऐऔऑऍ\
So removing \' from the vowel list. Is the apostrophe ever used in the same way as a virama/halant? Exarchus (talk) 23:05, 25 February 2024 (UTC)[reply]
to retain the same functionality of the apostrophe, I added some extra lines instead Exarchus (talk) 10:21, 26 February 2024 (UTC)[reply]

Bug in template:Also[edit]

If you take a look at ɣ, you'll see RTL languages cause a problem with template:also. The template on that page produces the following:

See also: γ [U+03B3 GREEK SMALL LETTER GAMMA], ⁧ץ [U+05E5 HEBREW LETTER FINAL TSADI]⁩, Ɣ [U+0194 LATIN CAPITAL LETTER GAMMA], ૪ [U+0AEA GUJARATI DIGIT FOUR]

Regardless of whether the script is LTR or RTL, the character should appear before the Unicode number and name, which it doesn't in the case of Hebrew tsadi. kwami (talk) 05:49, 27 February 2024 (UTC)[reply]

@Kwamikagami Yeah there is something in the module already that hacks around R2L issues (Module:also#L-92) but it evidently doesn't do what it needs to do. Benwing2 (talk) 09:20, 27 February 2024 (UTC)[reply]
But the 'tsadi' does appear before Unicode name! Do you not see the character to the right of the name? I would, however, argue that we do not want to embed character plus name, but rather, embed them separately within the overall LTR context, so that the characters appear to the left of the names. (Alternatively, embed character and name in a LTR context, and then embed the whole.) --RichardW57m (talk) 10:31, 27 February 2024 (UTC)[reply]
As Benwing2 noticed, this situation was caused by this edit by Theknightwho, putting the tsadi and its code point label in right-to-left isolation, causing them to be ordered from right to left. I have moved the isolation to apply only to the tsadi, so it displays to the left of the code point label, which I agree is the desired behavior. I am not sure why isolation is needed because we have direction: rtl; unicode-bidi: embed; for most horizontal right-to-left scripts in line 521 of MediaWiki:Gadget-LanguagesAndScripts.css (that could be changed to direction: rtl; unicode-bidi: isolate;), but I don't clearly understand the difference between embedding and isolation in the Unicode Bidirectional Algorithm and I didn't see the problem that made the change necessary. Hopefully my edit didn't cause the problem to resurface. — Eru·tuon 20:23, 27 February 2024 (UTC)[reply]
@Erutuon Thanks - I forgot about code point labels when making that change, so your new change makes sense. Theknightwho (talk) 20:30, 27 February 2024 (UTC)[reply]
I read a section of the Unicode Standard, and it is apparently recommended to use isolation rather than embedding. So I will change the CSS to direction: rtl; unicode-bidi: isolate;, which will make the bidirectional control characters in the module unnecessary. — Eru·tuon 20:36, 27 February 2024 (UTC)[reply]
looks good now. thanks. kwami (talk) 23:45, 27 February 2024 (UTC)[reply]

can the label "in the world" be suppressed from conlang pages such as Category:Esperanto language and Category:Lojban language?[edit]

Esperanto and Lojban are described as being spoken in the world on Category:Esperanto language and Category:Lojban language. The template forces the inclusion of the word in when i think "across" or "throughout" would be more appropriate. Or we could reword it to say worldwide. We could also eliminate the sentence entirely, since it tells us little. Since it seems the output is already being modified in some way (only world is linked when "the world" is provided), I think this should be possible. But it's well beyond my ability and I wouldn't even know where to look for the code. Thanks, Soap 13:04, 28 February 2024 (UTC)[reply]

inglese should count as containing -ese[edit]

Problem is, the Italian word inglese (English) was borrowed from Old French, so isn't formed synchronically or superficially from Inghilterra + -ese. The similarly positioned francese gets around this by using both {suffix} and {bor} (slightly cheating), but it looks much more similar to a suffixed form. Is there any way inglese can be added to the list of words suffixed with -ese, without cheating? --Hiztegilari (talk) 15:38, 28 February 2024 (UTC)[reply]

well, it's possible to manually add a word into a suffix category, .... if that's what you mean by cheating, then while i think it's less than ideal, it's something we do and have been doing for a while now. For example scientific words in English that are considered direct borrowings from Greek will sometimes be added into the proper affix categories using the {{cln|en}} template. An example of this is amblyopia, which apparently has attestations going back to classical Greek. Another strategy is to use {{surf}}, but in this case it would probably leave a redlink for the root; still, an example of a word described like this is presbyopia. Soap 15:49, 28 February 2024 (UTC)[reply]

Template use stats[edit]

I generated some stats about each of our templates. It lists the # of times a template is directly called, whether the template is static/lua/wiki or mixed lua/wiki, all modules invoked by the template, and all other templates called. The list of invoked modules and called templates is recursive, so if A includes B, B includes C, and C includes D and E, then it will show that A uses B, C, D, and E. This is a neat way of seeing which are our most commonly used templates and getting a quick idea of how complex each template is. It only shows templates with at least 2 calls in order to stay below the 2M page limit, but if anyone is interested in culling unused templates, I can post a list of lesser used templates. JeffDoozan (talk) 17:50, 28 February 2024 (UTC)[reply]

Don't forget that some templates are (or should be) always subst'ed in, so they may seem unused when actually they are widely used. —Mahāgaja · talk 20:08, 28 February 2024 (UTC)[reply]

Japanese accel module[edit]

Does anyone know why Module:accel/ja isn't working? I wanted to add accelerated entry creation for romanizations and kana spellings (which appear in the headwords of Japanese entries), but it turns out there already seems to be "support" for these in the module — albeit somehow in a broken way, since the gadget doesn't produce any green links despite the module export. Is there some sort of problem with the headword template not generating accel tags or something? Kiril kovachev (talkcontribs) 23:20, 29 February 2024 (UTC)[reply]

@Kiril kovachev There must be a problem with the headword templates, but can you give me an example of something that isn't working? Benwing2 (talk) 00:53, 1 March 2024 (UTC)[reply]
@Benwing2, @Kiril kovachev: It must be about accelerated kana and romaji forms, which used to work. The kana spelling used to show green on kanji entries.
The rules about how to format kanji and kana and where the lemma is keep changing. Not sure what the latest is.
I checked a random Japanese entry without kana and romaji entry: 鎮西(ちんぜい) (Chinzei). Anatoli T. (обсудить/вклад) 01:17, 1 March 2024 (UTC)[reply]
The romaji spelling entry generation is supposed to work from kana entries, not from kanji entries. Anatoli T. (обсудить/вклад) 01:18, 1 March 2024 (UTC)[reply]
@Theknightwho I haven't touched anything in this area recently. Could you have done something that broke this? Benwing2 (talk) 01:19, 1 March 2024 (UTC)[reply]
For the record, from what I can tell, it's been broken at least since I first started using accel, which must've been at least since last year. Kiril kovachev (talkcontribs) 16:50, 1 March 2024 (UTC)[reply]
I suspect it was the major overhaul in March/April 2023, but I couldn't say for sure. Theknightwho (talk) 16:51, 1 March 2024 (UTC)[reply]
@Theknightwho Can you point me to some of the modules in question and explain a bit more about the overhaul? Benwing2 (talk) 01:36, 2 March 2024 (UTC)[reply]
@Benwing2 I didn't do it. It was @Huhu9001. Theknightwho (talk) 01:54, 2 March 2024 (UTC)[reply]
@Atitarev, do you remember roughly when it did use to work? I am trying to look back at the revision history of Module:Jpan-headword but I can't find any reference to any of the "form" names like "romanized"/"kana noun"/... explicitly in the code for several years. Presumably, it's not there are at all explicitly, and it was generated in some other way before? Kiril kovachev (talkcontribs) 17:29, 2 March 2024 (UTC)[reply]
@Kiril kovachev: No sorry, I don't remember but it's years, not months. I am not even sure that Module:accel/ja is the original module for it. I thought it was Module:ja.
It may coincide but have nothing to do with the creation of {{ja-see}} in 2018. Anatoli T. (обсудить/вклад) 23:51, 3 March 2024 (UTC)[reply]

March 2024

Japanese kanji appear as orange links[edit]

Hello, adjacent to my other post about accelerated editing, this time the orangelink gadget seems to be acting up somehow when linking to some Japanese kanji, a problem I noticed maybe a month or two ago now but am only now reporting; see e.g. 倒す, 不倒, 押掛ける, 怖気づく, 圧し殺す for some examples. I don't know why it affects some kanji and not others; I assume if the gadget's working by detecting headings, somehow there's been an interference with the way the page text is being parsed as of more recently. If I'm reading the gadget source correctly, the content is considered absent if the entry doesn't belong to a category like "Japanese lemmas" or "Japanese non-lemma forms", but no recognition appears to be made if "Japanese" is followed by "kanji" (despite "logograms" and "Han tu" being checked for); would it be possible that this is the culprit? Kiril kovachev (talkcontribs) 20:13, 1 March 2024 (UTC)[reply]

@Kiril kovachev When you say it's "acting up somehow" can you clarify what the issue is? I looked at those examples but I'm not sure what the desired behavior is vs. what you're actually seeing. Benwing2 (talk) 02:04, 2 March 2024 (UTC)[reply]
I believe the problem was that in {{ja-kanjitab}} in 倒す#Japanese was orange (or green in my case because of my CSS) because the kanji page doesn't have Category:Japanese lemmas but does have Category:Japanese Han characters, and the gadget doesn't recognize the latter as a lemma-like category. Adding "Han characters" to the regex for lemma-like categories seems to have fixed the problem. — Eru·tuon 03:25, 2 March 2024 (UTC)[reply]
@Benwing2 Sorry, I didn't make it clear what I meant, but as Erutuon pointed out, e.g. should not be an orange link, rather it should just be blue because it does have a Japanese entry for it. Up until now those kanji were displayed as orange inside the respective kanjitabs despite definitely existing on the linked pages. @Erutuon, thanks a lot for making the change. Looks good now. Kiril kovachev (talkcontribs) 15:21, 2 March 2024 (UTC)[reply]

Access to Raw Transliteration[edit]

For some languages,the transliterate method fails because of a workaround to the problem that some transliteration modules fail when the text to be transliterated includes mark-up. A formal statement of the problem with the current solution is given in the link above. One solution to the failure of the method is to bypass the workarounds in the method and access the transliteration modules directly. We have been discussing the issue for Sanskrit at Module talk:sa-translit#Getting_Text, where the context for interpreting accentuation marks is usually larger than the scope of mark-up, such as pairs of triple ASCII apostrophes for mandatory emboldening of words.

Should we have a generic template, analogous to {{xlit}}, and a generic Lua method or function to bypass the workarounds, or should we use ad hoc language-specific templates (e.g sa-tr) to do the bypassing? I believe the major use of such templates would be to generate 'manual' transliteration strings for quotation templates. --21:23, 1 March 2024 (UTC) RichardW57 (talk) 21:23, 1 March 2024 (UTC)[reply]

@RichardW57 IMO neither approach you're suggesting is good. The issue you're running into has been a point of contention between me and User:Theknightwho; the decision he made to chop up transliteration into parts and not pass formatting such as apostrophes through to the translit method seems to be causing problems in several languages. AFAIK this is only needed for certain languages with complex transliteration methods so I would recommend we switch it to be opt-in on a per-language basis, and pass the unmodified source text by default. User:Theknightwho do you have any objections to this approach and can you let me know which languages should opt into the chop-up functionality? Benwing2 (talk) 02:00, 2 March 2024 (UTC)[reply]
@Benwing2 It would potentially take a lot of work to determine that, and the current method is not something I want to keep around much longer, as obviously this is a major shortcoming. For now, I'd really prefer that we simply add languages to the opt-out list as necessary. Theknightwho (talk) 02:03, 2 March 2024 (UTC)[reply]
@Theknightwho Is there an opt-out list? I remember our discussion for Thai concluding that there wasn't a simple way to opt out entirely. Also when you say "not something I want to keep around much longer" are you planning on reworking the code? Benwing2 (talk) 02:05, 2 March 2024 (UTC)[reply]
@Benwing2 For what Richard needs, the opt-out list in Module:languages/data should be sufficient. In terms of replacing it, the wikitext parser should make that possible, since it knows how to work around formatting without needing to split up the string into chunks. Theknightwho (talk) 02:09, 2 March 2024 (UTC)[reply]
@Theknightwho: Is the opt-out list the table contiguous_substitution? --RichardW57 (talk) 02:45, 2 March 2024 (UTC)[reply]
@RichardW57 Yes. Theknightwho (talk) 02:48, 2 March 2024 (UTC)[reply]
@RichardW57 @Theknightwho Then I suggest to add Sanskrit to the list, so I can check if the Vedic accents work fine (and perhaps check for other bugs) Exarchus (talk) 12:04, 2 March 2024 (UTC)[reply]
@Exarchus I have added this. I doubt it will work for your purposes, though, as the opt-out is (IMO) badly designed in that it passes munged versions of the formatting characters rather than the formatting characters themselves. Let me know if you have issues; if so I'll create a second opt-out that does the right thing and opts out entirely of all processing. Benwing2 (talk) 12:19, 2 March 2024 (UTC)[reply]
@Benwing2 It doesn't work perfectly now, although there is a difference. Exarchus (talk) 12:40, 2 March 2024 (UTC)[reply]
if I can somehow know what characters these munged versions consist of, that might work too Exarchus (talk) 12:47, 2 March 2024 (UTC)[reply]
If I rewrite the code somewhat (by including whatever characters that aren't Devanagari vowels etc.), then it seems to work as intended

So a workaround can be found with the current opt-out, the code would just be a bit weird.Exarchus (talk) 12:55, 2 March 2024 (UTC)[reply]

One thing that doesn't work is using <br\> (without backslash) to detect the start of a new prosodical unit. So people should use danda । there (which is normal practice). Exarchus (talk) 14:03, 2 March 2024 (UTC)[reply]
I think the module is working fine now. I don't think there's currently a use case for adding different accentuation schemes than the Rigvedic one (I could try adding the Samaveda one, or the Atharvaveda symbol for independent svarita, U+1CE1). Exarchus (talk) 16:53, 2 March 2024 (UTC)[reply]
If there's no danda in the source, the only way to add it is as an explicit emendation of the source, which is supported using |norm= with {{quote-book}}. Or are fraudulent quotations acceptable now? --RichardW57 (talk) 23:13, 2 March 2024 (UTC)[reply]

This is currently in CAT:E with the message 'The language or etymology language name "Hindustani languages" is not valid.' Looking at entries in this category, I can find at least one where {{translit|en|inc-hnd}} has been in place since October, but this category was only created today. That leads me to think that something has changed in the modules recently. The code "inc-hnd" must have already been in existence since October, or its absence would have resulted in module errors. I can only conclude that one or more of the following has changed:

  1. The behavior of {{translit}}
  2. The settings for the code "inc-hnd"
  3. The behavior of one of one or more of the modules called by {{auto cat}}

It seems reasonable to me to have a category for transliterations where the exact language within a group isn't specified, so how do we fix this? @Theknightwho. Chuck Entz (talk) 21:49, 2 March 2024 (UTC)[reply]

@Chuck Entz I can't find anything that would explain why this category would've suddenly appeared recently, so it may be that no-one got round to creating it until now. I've certainly refrained from creating categories in the past if I notice the preview throws an error.
I agree that this should be allowed, though, so it's worth updating the category tree to permit families for this type of category.
Theknightwho (talk) 22:11, 2 March 2024 (UTC)[reply]
@ChuckEntz: Unless we are publishing lies again, Hindustani is not a *group* of languages, but another name for Hindi and Urdu. Again, always explaining a language designation by its Wikipedia entry is a bad idea, especially for Indian languages. --RichardW57 (talk) 23:27, 2 March 2024 (UTC)[reply]
@RichardW57: There's reality, and then there's the way the modules work. This is the Grease Pit, so I was talking about the latter. We have Hindi and Urdu as separate language codes, each with their own infrastructure. Combining them into one language code would cause massive disruption and require a huge amount of work. You would have to ask at the Beer parlour about whether there's a consensus to make that change. I'm just trying to fix something that's broken. Chuck Entz (talk) 00:00, 3 March 2024 (UTC)[reply]

5,705 errors[edit]

What is going on here? Whatever caused the error has been fixed but not before completely trashing CAT:E. This happened yesterday, too, just not as extreme. Whoever is editing core modules here needs to be more careful. Benwing2 (talk) 02:10, 3 March 2024 (UTC)[reply]

@Benwing2 It's really odd - the number keeps climbing, but I can't find any pages which are actually throwing the error. Theknightwho (talk) 02:41, 3 March 2024 (UTC)[reply]
@Theknightwho: that's not uncommon. More often than not, I never see more than a page or two still displaying the error. I believe that's because the category updates are a separate process from the page updates. The display goes back to normally fairly quickly, while the propagation of the category updates can take as much as a week. When I'm around, I do my best to clear everything using the API Sandbox link. In this case I had it going in two tabs for what seemed like an hour, starting with well over 8,000 down to the current 3. I'm sure I wasn't the only one working on it, though. I expect there to be a few every few minutes or so for an hour or more. Chuck Entz (talk) 04:19, 3 March 2024 (UTC)[reply]
@Chuck Entz Yeah, I also spent about 30 minutes trying to clear it before I gave up and went to bed. Thanks for sorting it. Theknightwho (talk) 10:10, 3 March 2024 (UTC)[reply]

Gothic script[edit]

The Wikipedia link on the page Category:Gothic script is to w:Gothic script which is a disambiguation page. It should link to w:Gothic alphabet instead. 212.179.254.67 08:14, 4 March 2024 (UTC)[reply]

Fixed. Theknightwho (talk) 20:27, 5 March 2024 (UTC)[reply]

der2[edit]

On the page cut off the der2 template is used to make a show more / show less list of derived terms. But there is only one line revealed by "show more". So the "show more" doesn't save any space... you may as well just show the one hidden line (and it will save the user the click). 212.179.254.67 12:08, 4 March 2024 (UTC)[reply]

FYI: Major romanization change coming in Japan[edit]

May impact modules, etc. if this is a change we want to adopt as well: https://languagelog.ldc.upenn.edu/nll/?p=62827

Tangentially related: Wiktionary:Grease_pit/2024/February#Japanese_accel_moduleJustin (koavf)TCM 19:08, 4 March 2024 (UTC)[reply]

@Koavf The change in question is a switch from Kunrei to Hepburn romanization. It looks like we already use Hepburn romanization, e.g. the page 松下 is transliterated Matsushita not Matusita. BTW there's some category breakage on that page; User:Theknightwho it looks something to do with sort key generation, do you have any idea what's going on? Benwing2 (talk) 23:23, 5 March 2024 (UTC)[reply]
@Benwing2 I think it's related to this diff @Erutuon. Theknightwho (talk) 23:39, 5 March 2024 (UTC)[reply]
Sorry about that! I should have checked a few more cases in {{tracking category}}. Fixed, I think. — Eru·tuon 23:49, 5 March 2024 (UTC)[reply]
@Erutuon Thanks. What does this template do? Can you add a bit of documentation? Benwing2 (talk) 23:51, 5 March 2024 (UTC)[reply]
@Benwing2: Okay, done. — Eru·tuon 00:02, 6 March 2024 (UTC)[reply]
Nice. Thanks. I'm glad it at least surfaced this issue that got swiftly fixed. —Justin (koavf)TCM 00:27, 6 March 2024 (UTC)[reply]

Time range with time ranges[edit]

This is about {{quote-book}}. If, for example, a work of literature was started somewhere between 1900 and 1905 (something to indicate using |startyear=), and finished somewhere between 1915 and 1920 (something to indicate using |year=), considering date ranges use an en dash (–), I would think to simply type:
{{quote-book|[LANGUAGE]|startyear=1900–1905|year=1915–1920|[...]}}
which would produce:
1900–19051915–1920 []
I'm wondering: is this too confusing? or is it a good enough way of rendering it? —— GianWiki (talk) 16:16, 5 March 2024 (UTC)[reply]

@GianWiki Definitely a very edgy edge case. Does this ever actually happen? I think the display form 1900–19051915–1920 is not going to be interpreted correctly. Maybe we could add some code to change the display if there are en dashes or em dashes in the |startyear= or |year= parameters but I think it's probably better to just put the appropriate explanatory text in the |year= param, something like |year=from '''1900–1905''' to '''1915–1920''' (exact years unknown). The code should not boldface the year if there's already boldface in the param value. Benwing2 (talk) 23:18, 5 March 2024 (UTC)[reply]
@GianWiki why not just put c. 1900–1920? Ioaxxere (talk) 22:20, 6 March 2024 (UTC)[reply]

Fix Module:place and Module:place/data for bugs[edit]

The function unpack is wrongly used in Module:place and Module:place/data.

Example code :

local prev_qualifier, this_qualifier, bare_placetype = unpack(split)

In cases where split is a table where index 1 and 2 are nil (i.e. a sparse table, eg: { [3] = 'continent' }), this will not work as expected (all 3 variables will be nil). Code should be corrected to :

local prev_qualifier, this_qualifier, bare_placetype = unpack(split, 1, 3)

I do not have right to fix it myself, but this should be fixed. Dodecaplex (talk) 19:17, 5 March 2024 (UTC)[reply]

@Dodecaplex You are right about that; it's unfortunate the unpack function was implemented in a broken fashion. Whether this correction needs to happen depends on whether the values can ever be nil; I'll need to take a look at the code in question. Benwing2 (talk) 23:13, 5 March 2024 (UTC)[reply]
For instance in page Mexico, the template is called for continent holonym (e.g. first definition), which has no qualifier. So, yes, it is nil in many places. As I extract all pages using an alternate Lua environment, I got at least 151849 errors of this kind. Dodecaplex (talk) 08:22, 6 March 2024 (UTC)[reply]
I think it will be better to specify the start and end indices. What happens when we don't specify the start and end indices, and split[1] or split[2] is nil, is that unpack sets the end index to basically #split, and the length operator (ultimate implementation found here) gives oddball results based on undocumented implementation details of tables. unpack and the length operator are only designed to work properly for sequence tables, because they don't traverse all keys in the table to find the actual maximum integer key. So the only way to ensure that this_qualifier and bare_placetype are always set to the values of split[2] and split[3] is to set the start and end indices ourselves. — Eru·tuon 22:16, 6 March 2024 (UTC)[reply]
I've gone ahead and added the start and end indices to unpack in Module:place and Module:place/data. — Eru·tuon 22:37, 7 March 2024 (UTC)[reply]
Thanks ! Dodecaplex (talk) 17:39, 8 March 2024 (UTC)[reply]

Disappearing text[edit]

My talk history page on Βικιλεξικό here shows edits which do not appear on the talk page itself here. There is obviously an explanation — I hope that it isn't me!!   — Saltmarsh🢃 19:25, 5 March 2024 (UTC)[reply]

@Saltmarsh, the user was blocked and the edit at your Talkpage reversed. It was a text by a blocked (at en.wikt, now also at el.wikt) by Shāntián Tàiláng who asked these questions: Request for English Wiktionary. Hello, I have noticed that όρισμα (modern Greek) may be derived from modern Greek ορισμός (from ancient Greek ὁρισμός). I do know that English orismology needs an etymology section added; that section should state that it derives from ancient Greek ὁρισμός and {{suffix|en||logy}}.
Also, Category:grc:Woodworking should be created, because πρίσμα needs that same category added to it.
Incidentally, tenpenny nail really needs "w:" placed just before "The Old Curiosity Shop" in its first quotation. Shāntián Tàiláng (συζήτηση) 20:18, 27 Φεβρουαρίου 2024 (UTC)
1) After that, was blocked by me 2024.02.28.@el.wikt#Block for continuing annoying admins with questions.
2) After that, repeated the text, as an IP, and another admin reversed and changed visibility. HistoryOfYourTalk
3) He tries to apply again at en.wikt for unblocking and asked me @meta how he could apply for unblocking. ‑‑Sarri.greek  I 20:05, 5 March 2024 (UTC)[reply]
Dear @Saltmarsh, tell me, if you wish me to unblock ‑‑Sarri.greek  I 20:12, 5 March 2024 (UTC)[reply]
@Sarri.greek I strongly advise not unblocking ST on el.wikt. Pinging @Surjection, who is most familiar with them. Theknightwho (talk) 20:23, 5 March 2024 (UTC)[reply]
ST should not be unblocked under any circumstance. — SURJECTION / T / C / L / 20:53, 5 March 2024 (UTC)[reply]

Deleting and moving of public sandbox submodules[edit]

User:Theknightwho and User:Benwing2 have been getting rid of /sandbox submodules by moving them to Module:User:Erutuon/ and deleting them. I'm uneasy about this idea, but haven't cared enough to complain before today, when my module sandbox subpages are filling up with various sandbox modules I've created in the past. (At least they're more discoverable in my module sandbox subpages than when they're deleted.)

I recall some sort of discussion (Wiktionary:Grease pit/2022/July#Sandboxes in CAT:E I guess) awhile ago, but I'm not aware of a vote that says that these public sandbox modules are banned.

I think it's counterproductive to remove the sandbox modules. Ideally we'd have a whole set of sandbox modules and lots of testcases in the main modules and sandbox modules, so casual users could just test a change in the sandbox and see what happens, without causing thousands of module errors. IP users don't have a place (Module:User:IPAddress/) to put sandbox modules in, and casual users who notice an error are also probably not going to know or bother to copy over modules to Module:User:whatever/modulename and test changes. User sandbox modules are hard to find and it's tedious to ask if User:whatever will mind you editing them, if you do find them. So I think it's good to have "public" sandbox modules.

Granted also that sandbox modules are not very useful when they are not in sync with the main module, which is very likely to be the case when production modules are being edited often. And we don't have very extensive testcases for main modules, much less sandbox modules, so it's currently hard for editors of sandbox modules to see what their edit actually does. It takes valuable time to add testcases for new changes. So I don't know how realistic my reasoning actually is.

To solve Wiktionary:Grease pit/2022/July#Sandboxes in CAT:E, I've expanded Template:tracking category so that it identifies all the types of sandbox modules listed in Template talk:tracking category#Identifying sandbox modules and templates. I did a bunch of regex on the list of titles in the dump to figure out all the formats of titles of sandbox modules, and then I ran some JavaScript code to make sure that the new version of the template identifies all the sandbox modules I listed. Now MediaWiki:Scribunto-common-error-category should put all the usual types of sandbox modules out of sight in Category:Pages with module errors/hidden rather than in CAT:E. — Eru·tuon 21:58, 5 March 2024 (UTC)[reply]

@Erutuon Hi. I moved some of them yesterday. The ones I moved were almost exclusively years old, almost exclusively yours, and usually not worked on by anyone else. My logic is that sandbox modules should not be cluttering the mainspace. In practice I have never found a need to use mainspace sandbox modules and I definitely believe that such modules should be in userspace. Mainspace sandbox modules by their nature don't support more than one person working on them at a time and there's no mechanism provided for multiple people to synchronize their edits to a given sandbox module. In general, all sorts of problems can potentially arise with mainspace sandbox modules. In addition they get out of date quickly since production modules do get edited fairly often. In practice, anyone working on sandbox modules has to copy over the latest production modules anyway, so I don't see how there's any benefit to having the sandbox modules in mainspace vs. in your own userspace. I understand that theoretically they could help IP users but I'm not sure how commonly this ever actually happens. Also, given the reality that testcases take effort to maintain that most people don't want to spend, I think it's unlikely we'll ever have a reasonable sandbox testcase infrastructure. That said, I won't move any more modules for the time being but I do hope you'll consider switching to userspace sandbox modules. Benwing2 (talk) 22:28, 5 March 2024 (UTC)[reply]
Just to chime in to say the same thing: the ones I deleted were in all cases hopelessly out of date, and none had been edited within the last year; many hadn't been edited since before 2020. Theknightwho (talk) 23:49, 5 March 2024 (UTC)[reply]

change to module categorization[edit]

FYI I made a change to Module:documentation so that modules are categorized even when documentation is present, as long as there is no <includeonly> section present on the page. I also made the module categorization smarter. Benwing2 (talk) 02:17, 6 March 2024 (UTC)[reply]

CJK Compatibility Ideographs in ranges for Hani script[edit]

I haven't run the bot that converts between {{t}} and {{t+}} since December, because when I tried, I ran into a problem: the entry-name rules for Korean (ko) contain a pattern whose Perl analogue is invalid, causing my code to blow up with Invalid [] range "豈-舘" in regex.

I took some time to investigate this yesterday, and I believe I now understand how to fix it (so no real action is required), but I figured (some) people might be interested in what I found, because it involves some MediaWiki tech stuff that we don't usually think about but does have user-facing effects.

Some background:

So anyway, the issue turns out to be with the character range from U+F900 to U+FA6D, which ends up as a character range in a Lua pattern in the Korean entry-name rules [link · link].

The problem is that U+F900 and U+FA6D are CJK Compatibility Ideographs, and MediaWiki applies Unicode Normalization Form C (NFC) to inputs and outputs, so by the time my bot sees the range, it's become the range from U+8C48 to U+8218, which Perl rejects because the greatest character in the range would be less than the least character. And that's actually kind of good luck; the range immediately below it, from U+FA70 to U+FAD9, gets normalized to the range from U+4E26 to U+9F8E, which includes a whole bunch of characters that it's not intended to, but is valid so far as Perl can tell, so I would never have noticed it.

For purposes of the translation-bot, I plan to fix this by just changing its server-side component to escape non-ASCII characters in some way, and the bot proper to de-escape them. That should completely circumvent MediaWiki's application of NFC.

More broadly, it may be worth asking if we really want ranges of characters that MediaWiki literally won't even let be saved; I can see arguments either way. Feel free to discuss. :-)     (FYI @Theknightwho.)

RuakhTALK
08:27, 7 March 2024 (UTC)[reply]

@Ruakh Thanks for doing the investigation! I know about the conversion to NFC form but I didn't suspect it would affect CJK chars in this fashion. The current code is probably OK since it doesn't store the characters literally but rather as numbers, and constructs the ranges on the fly (hence they never get saved and converted to NFC form). Whether the ranges are OK depends on whether there are any characters in the middle of the range that aren't canonicalized out of existence during the NFC conversion, and that I don't know. User:Theknightwho will hopefully comment on this. Benwing2 (talk) 22:23, 7 March 2024 (UTC)[reply]
@Benwing2 The reason I did this was for a couple of reasons:
  1. I wanted to cover any edge-cases which involved these compatibility ideographs, since I didn't know if they were used anywhere (e.g. in the Unicode modules).
  2. There are actually 12 CJK characters in the "compatibility ideographs" range which aren't actually compatibility ideographs, and don't get normalised to other characters in NFC (which I assume got added to that range by mistake many years ago, or have since been disunified for some reason): 﨎, 﨏, 﨑, 﨓, 﨔, 﨟, 﨡, 﨣, 﨤, 﨧, 﨨, 﨩. They don't form a continuous range, so it was slightly more efficient to simply include the whole block.
Theknightwho (talk) 22:32, 7 March 2024 (UTC)[reply]
@Erutuon Just letting you know about this approach of Ruakh's, since we were discussing something similar a while ago (which I have not had time to revisit). This, that and the other (talk) 11:47, 8 March 2024 (UTC)[reply]
I should say that if I had to do the bot over from scratch, given the current state of Wiktionary and given what I know now, I probably would not implement it this way. I think a better approach would involve some degree of asking the server-side to do transformations, plus aggressive client-side caching (storing previously-computed transformations in timestamped files and reusing them for an extended period, e.g. six months), a bunch of client-side special-casing for high-volume cases (e.g. "if the language code is [foo] and the translation matches [simple pattern] then compute the entry-name by [simple function] and don't bother querying the server"), and various other such optimizations. In fact, even though I have all the code/etc. for my current approach, I'm still considering migrating to an approach like that at some point.
So if you're planning on writing something from scratch, I think that's what I'd recommend. (But if you're comfortable with Perl, and would rather just reuse my code than write something from scratch, let me know and I can try to get it into a shareable state.)
RuakhTALK 10:01, 9 March 2024 (UTC)[reply]

T:km-xi got worse[edit]

At ថៃ (thay) is linking words to #English, rather than Khmer.

E.g. ព្រះរាជាណាចក្រថៃ  ―  prĕəh riəciənaacak thay  ―  Kingdom of Thailand

Not asking for any improvement, just fixes. Anatoli T. (обсудить/вклад) 05:21, 8 March 2024 (UTC)[reply]

I suspect a general problem. The same symptoms are showing with {{th-x}}, which invokes {{#invoke:th|usex}}, and earlier this week I found the same problem with plain double square brackets that link to translingual words rather than English words in glosses. Experimentation suggests that it only shows up on lines formatted by '#', so afflicting quotations and glosses. --10:07, 8 March 2024 (UTC) RichardW57m (talk) 10:07, 8 March 2024 (UTC)[reply]
It occurred to me that Module:th should be corrected to specify that the linked-to words are Thai, because Thai entries are usually the last on their page, and got as far as changing Line 223 of Module:th from
exSet, "[[" .. thaiWord .. "]]")
to
table.insert(exSet, "[[" .. thaiWord .. "#Thai|"..thaiWord.."]]")
, but then I realised that that wouldn't handle normal numbers or even idiomatic ones - Thai 555 bears no relation to Translingual 555, so I abandoned the edit. Test cases were อมฤต (à-má-rít) and โควิด-19 (with '14' as a normal number). More thought is needed on that one - (Notifying Alifshinobi, Octahedron80, YURi, Judexvivorum, หมวดซาโต้, Atitarev, GinGlaep, RichardW57, Noktonissian): . --RichardW57m (talk) 12:15, 8 March 2024 (UTC)[reply]
The cause of this is a JavaScript change mentioned in Wiktionary:Beer parlour/2024/February § Use of T:lang. I can fix this, but I think the templates should link to the correct language section. If ASCII numbers shouldn't be linked to the Thai section, that would be easy to fix in the module with if thaiWord:match("^%d+$") .... Granted I suppose that won't be the only thing that you don't want linked to the Thai section. Generally bare links to no section should be avoided when you know the probably correct language section to link to (which is wrong in the case of 14 here). {{th-x}} probably needs some way to link to 14#Translingual, and to disable linking. [[14#Translingual|1{สิบ} 4{สี่}]] doesn't work.
However, I've prevented the link-changing code in MediaWiki:Common.css from running within lang="..." text other than lang="en(-...)". — Eru·tuon 16:06, 8 March 2024 (UTC)[reply]
@Erutuon Thank you, that change seems to have removed most of the problems. However, I'm still confused how taxonomic names as definitions should be linked to. @DCDuring. It seems that {{l|mul}} isn't the recommended way.
The Thai ASCII numbers are now acting tolerably again, though they obviously can't all be translingual - we hit a limit at 101, though I would expect that one to have specific semantics in Thailand as a place name. I'm fairly happy with treating them mostly as semantically digit sequences, though I think there may be lurking chauvinistic problems, and possibly trouble with line-breaking. Roman script acronyms (CD, DVD, VDO and OT come to mind, though the last one may be overseas Thai and it's a word I've heard, rather than seen) and taxonomic names may cause problems for {{th-usex}}, though I've mostly seen the latter as definitions in Thai dictionaries. Again, nationalism may have stored up problems. --RichardW57m (talk) 17:46, 8 March 2024 (UTC)[reply]
@RichardW57m. I agree. Thai usex templates have the same problem now but the formatting colours don't reveal the problem. With Khmer, I am sure colours were right before but I can't say when exactly this problem occurred.
Pinging @Theknightwho, @Benwing2: Are you able to fix the language in the links? Anatoli T. (обсудить/вклад) 05:41, 12 March 2024 (UTC)[reply]
@Atitarev What's the problem you're seeing now? Erutuon's fix of 8 March seems to have removed the problem you were talking about. The outstanding issue with the {{th-usex}} is that there doesn't seem to be a mechanism to specify the language of the elements in the quotation, which causes at the least a colouring problem with translingual elements if we try tagging the elements for language. (I noted this problem nearly 4 years ago.) --RichardW57m (talk) 09:52, 14 March 2024 (UTC)[reply]
@RichardW57m:
I think the displaying colour is now fixed. When I posted, the linked terms showed in orange for the Khmer template. Which edit on which module was it? Can you ping me the {{diff|}}, please?
However, please compare the output by hovering over the word components. Only the last line shows expected links, like [[王國#Chinese]], the first two just show [[រាជាណាចក្រ]] without the language. So, if any of the words were shared by multiple languages, the links wouldn't connect to the correct ones.
  1. ព្រះរាជាណាចក្រថៃ  ―  prĕəh riəciənaacak thay  ―  Kingdom of Thailand
  2. ราชอาณาจักรไทย  ―  râat-chá-aa-naa-jàk tai  ―  Kingdom of Thailand
  3. 王國王国  ―  Tài wángguó  ―  Kingdom of Thailand
Anatoli T. (обсудить/вклад) 23:34, 14 March 2024 (UTC)[reply]
@Atitarev: I believe the fixing diff is Special:diff/78355928. The problem, as I said above, is tagging the entries in the quotation correctly - a quotation in Thai is not always composed only of Thai elements. The Chinese quote template seems to make the assumption that all the entries are in the same variety of Chinese; I don't know how well it handles translingual words within the quotation. For Thai and Khmer, the links actually connect to the page, rather than the first entry, which is correct, but not very helpful if the Thai or Khmer entry is not the first entry. (At least Khmer occurs before Pali.) --10:17, 15 March 2024 (UTC) RichardW57m (talk) 10:17, 15 March 2024 (UTC)[reply]
@RichardW57m: Thanks. The Chinese template works like other language template when the words are wikified (linked), in case you're not familiar, e.g.
В чужо́й монасты́рь со свои́м уста́вом не хо́дят (proverb)V čužój monastýrʹ so svoím ustávom ne xódjatwhen in Rome, do as the Romans do (literally, “You don't go to another monastery with your own charter”)
All the words above link to Russian entries.
You can also unlink foreign words in a Chinese usex:
  1. X什麼意思X什么意思  ―  X shì shénme yìsī?  ―  What does X mean?
As for varieties, of course, it's linking to "Chinese", since the varieties are merged under "Chinese" L2 header. Defaults to Mandarin transliterations. It's working with other varieties too with parameters, e.g |C= for Cantonese:
  1. X乜嘢意思 [Cantonese]  ―  X mat1 je5 ji3 si1 aa3? [Jyutping]  ―  What does X mean?
Delinking should work in the Thai and Khmer usexes as well, the trouble is, nobody seems to be able to make sense, let alone fix or enhance these language-specific modules, since Wyang left. Anatoli T. (обсудить/вклад) 08:15, 16 March 2024 (UTC)[reply]
@Atitarev: Can you make a list of things that are broken and what the correct behavior should be, with an example for each issue? I am going to sleep now but when I get up I will take a look and see about fixing them. Benwing2 (talk) 08:49, 16 March 2024 (UTC)[reply]
@Benwing2: Thanks.
If User:Atitarev/Khmer translit test cases and User:Atitarev/Thai translit test cases are still on your watchlist, yoy can start there. I will start with simple fix requests, since I don't know if you guys still plan to make it work like the Chinese counterparts.
  1. I made a comment about ។ symbol problem (and other punctuation symbols, foreign symbols) on the Khmer page.
  2. Khmer is behind Thai in handling ៗ (repetition symbol). Thai ๆ can, at least repeat the last full word.
  3. The Khmer, unlike the Thai template, demands an English translation parameter, it should be optional but can ask for it, like regular templates.
  4. As above, it's desirable to delink certain words with @ without making the output fail.
  5. Delinked foreign words (e.g. English words, numerals) should transliterate as they are, without trying to "transliterate" from Thai/Khmer. E.g. โควิด-19
  6. A harder fix. Please see @Erutuon's example above regarding numerals for re-spelling of numerals. I pinged you on re-spelling numerals but I have to find that topic. It's a harder fix. Remind me if you still have the motivation later.
(You can move/split) this discussion, if you wish. Anatoli T. (обсудить/вклад) 09:20, 16 March 2024 (UTC)[reply]
@Benwing2:
Here's the numeral respelling topic: Wiktionary:Grease_pit/2024/January#Transliterating_foreign_language_usage_examples_with_numerals
Chinese templates can respell numerals. Thai or Khmer can't.
หนองคายอยู่ห่างจากกรุงเทพฯ ๖๑๔กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep 614 gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
Delinking @๖๑๔ doesn't work either. ๖๑๔ (614) doesn't need to be linked in the usex.
๖๑๔ (614) is pronounced (hòk rɔ́ɔi sìp sìi)
respelling "6{หก ร้อย} 1{สิบ} 4{สี่}" doesn't work.
In words: หกร้อยสิบสี่  ―  hòk rɔ́ɔi sìp sìi  ―  six hundred fourteen. Anatoli T. (обсудить/вклад) 10:13, 16 March 2024 (UTC)[reply]
@Benwing2, @Atitarev: But
{{th-xi|หนองคาย อยู่ ห่าง จาก กรุงเทพฯ  6{หก-ร้อย} 1{สิบ} 4{สี่}  กิโลเมตร|Nong Khai is 614 kilometers from Bangkok.}}
หนองคายอยู่ห่างจากกรุงเทพฯ 614 กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep hòk-rɔ́ɔi sìp sìi · gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
does work.
Apart from irrelevantly fixing the punctuation errors - the number in the parameter should be flanked by double spaces so as to give visible spaces - the trick is not to have a space in the Thai phonetic spelling. Join the components with hyphens. The original example, which is an odd form of Thai, can be achieved by using Thai digits.
Of course, the documentation needs improvement. --RichardW57m (talk) 09:09, 18 March 2024 (UTC)[reply]
@Atitarev: Actually, the second error may matter. If one omits all spaces before the last word, it disappears. That's a problem with lax parsing. --RichardW57m (talk) 09:29, 18 March 2024 (UTC)[reply]
@RichardW57m, thanks. I see.
We need to have the ability to use both Arabic and Thai numerals (the example I provided earlier used Thai numerals, even if it's less common, not sure).
They need to be simply displayed, transliterated (if no respelling is provided) or transliterated with respellings - both Thai and Arabic numerals.
Does your example or Thai orthography require any VISIBLE space with numerals?
The example you gave also works with the Thai numerals!:
{{th-xi|หนองคาย อยู่ ห่าง จาก กรุงเทพฯ  '''๖{หก-ร้อย} ๑{สิบ} ๔{สี่}'''  กิโลเมตร|Nong Khai is '''614 '''kilometers from Bangkok.}}
หนองคายอยู่ห่างจากกรุงเทพฯ กิโลเมตร  ―  nɔ̌ɔng-kaai yùu hàang jàak grung-têep hòk-rɔ́ɔi sìp sìi · gì-loo-méet  ―  Nong Khai is 614 kilometers from Bangkok.
In my book the text appears exactly with this spacing, including the Bangkok spelling : หนองคายอยู่ห่างจากกรุงเทพ ฯ ๖๑๔ กิโลเมตร
Hope it all makes sense, @Benwing2, at least we know there is a way to work with numerals. Anatoli T. (обсудить/вклад) 23:52, 18 March 2024 (UTC)[reply]
Both Thai and Khmer modules need fixes and enhancements but Khmer modules are in a worse state than Thai.
{{km-xi|វា ជា ភាសា មួយ ដ៏ ចំណាស់ ដែល ប្រហែល ជា មាន ដើម កំណើត តាំង តែ ពី  '''២០០០'''  ឆ្នាំ មុន មក ម្ល៉េះ '''។'''|It is an ancient language that probably dates back to 2000 years ago.}}
Lua error in Module:km at line 211: The word ២០០០ was not romanised successfully. Please supply its syllabified phonetic respelling, enclosed by {} and placed after the word (see Template:km-usex).
{{km-xi|វា ជា ភាសា មួយ ដ៏ ចំណាស់ ដែល ប្រហែល ជា មាន ដើម កំណើត តាំង តែ ពី  '''2000'''  ឆ្នាំ មុន មក ម្ល៉េះ|It is an ancient language that probably dates back to 2000 years ago.}}
វាជាភាសាមួយដ៏ចំណាស់ដែលប្រហែលជាមានដើមកំណើតតាំងតែពី 2000 ឆ្នាំមុនមកម្ល៉េះ  ―  viə ciə phiəsaa muəy dɑɑ cɑmnah dael prɑhael ciə miən daəm kɑmnaət tang tae pii · chnam mun mɔɔk mleh  ―  It is an ancient language that probably dates back to 2000 years ago. Anatoli T. (обсудить/вклад) 00:26, 19 March 2024 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Thanks Anatoli. I will take a look. I am still planning on fixing the scraping of Thai and Khmer, it's just that it requires some non-trivial changes and I have some other things I'm also working on :) ... but let me see if I can make number handling work better. Benwing2 (talk) 22:00, 16 March 2024 (UTC)[reply]

How are we supposed to link to a page rather than an entry in lines formatted with '#'? --RichardW57m (talk) 12:15, 8 March 2024 (UTC)[reply]
@RichardW57m: What do you mean? What page? Anatoli T. (обсудить/вклад) 00:27, 19 March 2024 (UTC)[reply]
@Benwing2: Hi. Any luck? Let me know if you need any clarifications. Anatoli T. (обсудить/вклад) 07:39, 18 March 2024 (UTC)[reply]
@Atitarev Apologies, I was dealing with Chinese stuff today. Heading to bed now but I'll definitely take a look when I wake up. Benwing2 (talk) 07:43, 18 March 2024 (UTC)[reply]

Taxon linking[edit]

split from "#T:km-xi got worse"
For taxonomic-name linking there are now two distinct templates: {{taxlink}} (Now with more Lua!!!), which is for taxonomic names for which enwikt DOES NOT have an entry, used as before, eg, {{taxlink|Rosa noentry|species}}, and {{taxfmt}} (New!!!, with Lua!!!), to be used for taxonomic names for which enwikt DOES have an entry, used just as {{taxlink}}, eg, {{taxfmt|Rosa multiflora|species}}. I hope that "we" (@User:JeffDoozan, @User:AutoDooz) will soon (months) have applied {{taxfmt}} automagically to all taxonomic names that currently have some link and eventually (many months) even to all now-unlinked taxonomic names. At present this just addresses formatting (various configurations of italics) and makes searches easier. In the more distant future it may make other changes (improvements???) easier. The formatting should be the same for both templates, but categorization will be different, mostly effecting only me or someone else with an active interest in taxonomic names. DCDuring (talk) 19:10, 8 March 2024 (UTC)[reply]
@DCDuring: Thank you for the clarification. Are there any plans to document the user interface of {{taxfmt}}? --RichardW57 (talk) 13:54, 9 March 2024 (UTC)[reply]
On the input side {{taxfmt}} is identical to {{taxlink}}. I have always accepted that contributors may have trouble determining taxonomic rank (as taxonomists also seem to), especially at generic and suprageneric ranks (eg, homonyms, uncertain and changing placement, changes in nomenclature rules and fashions). The purpose of having two templates is that it be easy to count instances of missing taxonomic names ({{taxlink|Taxon name|rank}}) and that it be easy to rename the instances to {{taxfmt|Taxon name|rank}}. Further, each instance of {{taxfmt}} should not necessarily have to test for existence of an entry at each loading of the page it is on. Finally, categorization needs for taxa in {{taxfmt}} should be more modest than for those in {{taxlink}}. Not all of this is fully settled. DCDuring (talk) 15:30, 9 March 2024 (UTC)[reply]
I have added 'temporary' documentation for {{taxfmt}}. DCDuring (talk) 15:42, 9 March 2024 (UTC)[reply]
@DCDuring: While an improvement, it implies that {{taxfmt}} should not be used! Is the only difference most editors need know is whether an appropriate multilingual entry exists? --RichardW57 (talk) 21:46, 9 March 2024 (UTC)[reply]
Should there be |id= for linking to taxonomic names with homonyms in the {{senseid}} and {{etymid}} systems? That might apply to clades, and will apply to generic names used in different kingdoms, and also for some taxons that have changed greatly, e.g. Hominidae and Reptilia. --RichardW57 (talk) 21:46, 9 March 2024 (UTC)[reply]
Quite likely, at least for homonyms from different kingdoms (or, rather, different current taxonomic codes). We now have some 300 taxonomic entries with distinct homonyms, but a good number of them include an archaic or obsolete definition, many being synonyms of current taxa. For now, most readers would get one of the appropriate definitions without the help an id parameter would offer. Trying to follow the twists and turns of taxonomic history in terms of circumscription and placement is not something I have seen any taxonomic database do. They just leave breadcrumbs. Their breadcrumbs are more complete than ours, which is why I believe we need links to multiple other taxonomic databases. When WP articles try to follow twists and turns, it is limited in scope to 'recent' (< or <<20 years) changes and can be quite confusing, often because article contributors don't seem to understand how ambiguous English can be. Wikispecies just lays out 'systems' (with dates and authors) of higher taxa on the same page (See species:Holozoa for a short example of a recent (2002) name.). I always try to update to the latest accepted term, circumscription, and placement to be found in the better current databases, and retain any older taxon in our entry as a synonym.
Our coverage will probably always be limited compared to the comprehensive taxonomic databases. (Would we want to have a million taxonomic entries?) Our value added is in etymology (at least potentially), gender, vernacular names/translations, linkage to multiple taxonomic databases, images, and (potentially) definitions that address relevance (location, economic value, use for food, medicine, etc). I doubt that imprecise linking to definitions is our biggest deficiency, though it should and, I'm sure, will be addressed. DCDuring (talk) 23:01, 9 March 2024 (UTC)[reply]
Given the massive instability in taxonomic names, it would be very useful to record older meanings, especially those of or as polyphyletic taxa. There are also dictionaries that have tried to anchor themselves in the sand of taxonomic names. Even now, I'm not sure that usages of 'crustacean' are usually intended to include butterflies, let alone in works from the 1980's. --RichardW57 (talk) 12:02, 10 March 2024 (UTC)[reply]
We can give it a try. Century 1911, MW 1913, MW Intl. 2d would be reasonable sources for relatively common, older names. Beyond those, we can leave breadcrumbs. DCDuring (talk) 14:33, 10 March 2024 (UTC)[reply]
@DCDuring: I think you are confusing names and meanings. To quote from the equivalent vernacular, when I was a young man, one would not say that a chimpanzee was a hominid, but would say that a mammal-like reptile (such as Dimetrodon) was a reptile. These changes don't reflect a change in knowledge, but a rejection of the notion that we are not fish. (And objectively, a dimetrodon was closer kin to a Jurassic allosaur than to us.) I don't see how 'breadcrumbs' help with such shifts in meaning. --RichardW57m (talk) 09:54, 11 March 2024 (UTC)[reply]
Perhaps you could like to take a run at multiple definitions for a taxon so I could see what you mean? It would be interesting to keep track of the degree of acceptance of names and their circumscription and placement by date. DCDuring (talk) 13:52, 11 March 2024 (UTC)[reply]
@DCDuring: I've got to do some work on taxonomic examples, but to get an idea before then, you might find it helpful to look at velociraptor. --RichardW57m (talk) 09:21, 18 March 2024 (UTC)[reply]
@User:RichardW57 Generally, I don't think the taxonomic part of any etymology of a 'vernacular' word derived from a taxon, like velociraptor, belongs at the vernacular name, rather than at the taxon, eg, Velociraptor. A definition like the second one would seem hard to justify in an English vernacular-name entry, but this may be an exceptional case.
The definition at velociraptor seems encyclopedic. As we have an encyclopedia as a sister project just a link away, there is little justification for encyclopedic material here. Therefore, stylistically, a definition shouldn't need more than one phrase, possibly with a subordinate clause or absolute if there is particularly relevant information. For a taxon or a vernacular name of an organism, such information might be location, use to humans, disease, scientific importance, or other cultural significance (like use in Jurassic Park), etc. DCDuring (talk) 12:26, 18 March 2024 (UTC)[reply]
The relevance is that there are two different meanings of velociraptor. The first one, with, as you complain, a rather encyclopaedic definition, is the one that is a popular synonym of Velociraptor, and is the meaning normally found in documentaries. The second one is actually Deinonychus, and is the one found in the context of Jurassic World, and probably toy shops.
In this particular case, I am not confident that the meaning of Velociraptor having Deinonychus as a hyponym actually meets CFI. Perhaps I am setting too high a bar for independence, but I have little confidence of finding two independent usages of the second sense of Velociraptor. This is not typical of evolving meanings of taxonomic names; G.S. Paul's proposed merger of the genera has not been accepted.
I think we should make it clear that 'velociraptor' may actually refer to Deinonychus. Likewise, we should not hide the fact that 'hominid' may be used to exclude Sivapithecus. RichardW57m (talk) 14:23, 18 March 2024 (UTC)[reply]
I'm skeptical that there is such a meaning in actual English usage. In any event, defining velociraptor as "a member of the genus Velociraptor addresses the matter, adequately IMHO. DCDuring (talk) 14:54, 18 March 2024 (UTC)[reply]
It addresses the first meaning. It doesn't address the meaning used in association with Jurassic Park. --RichardW57m (talk) 16:14, 18 March 2024 (UTC)[reply]
You are so right and I so wrong. I am interested in how you would address the problem of multiple referents (or placements or circumscriptions) of a taxon, especially how they change over time. Taxonomic databases just leave breadcrumbs, of various kinds. DCDuring (talk) 00:28, 19 March 2024 (UTC)[reply]

aWa not working[edit]

Our archiving gadget, aWa, is broken. It is getting confused by the "[subscribe]" link which is now present on discussions, which makes it try to archive on the wrong page.

I disabled the gadget for now until it can be fixed (I haven't tried to debug the issue yet). Ping @Erutuon who last edited the gadget. This, that and the other (talk) 23:08, 8 March 2024 (UTC)[reply]

It's because of the changes to headers. The gadget was interpreting the "[subscribe]" link at the beginning of the header (in the HTML, though it displays as if it's after the header) as the link to the page to archive at. Also, the gadget wasn't going to the next HTML elements after the header correctly because they've added another layer of HTML elements in the header. I haven't fixed the fact that the "[subscribe]" link is interpreted as part of the header, which is a bug apparently tracked in phab:T13555#9592945 and due to be fixed soon. Not sure if the gadget works (because I don't really know where to test it), but give it a try and let me know. — Eru·tuon 01:18, 9 March 2024 (UTC)[reply]
@Erutuon it looks like you've fixed it. I just tested it and, although it displayed the [subscribe] text in its UI as part of the header, it didn't actually make a difference to the archival itself. See [4]. This, that and the other (talk) 03:25, 9 March 2024 (UTC)[reply]

Attempted to create a legitimate entry for "chmobik", tripping vague anti-spam measures[edit]

The specific abuse rule that was tripped was 'various specific spammer habits'. I'm not sure what that means, and the entry I wrote up has nothing I can find wrong with it.

Ishiura (talk) 10:27, 9 March 2024 (UTC)[reply]

I'm not sure exactly what it is, but my first instinct is that it's the Reddit/Twitter links. Those aren't considered durably archived sources for quotations either way. — SURJECTION / T / C / L / 10:33, 9 March 2024 (UTC)[reply]
OK. I actually modelled the "chmobik" entry on the "mobik" one, which uses pretty extensive Twitter quotations.
Ishiura (talk) 10:36, 9 March 2024 (UTC)[reply]

Derived terms tool[edit]

As I'm useless with programming, I asked AI to make a tool to quickly add Derived terms. It is stored at User:Denazz/Derived Terms Tool. Is it complete crap, as I suspect? Denazz (talk) 15:58, 9 March 2024 (UTC)[reply]

Lol, it looks very incomplete. Equinox 19:17, 9 March 2024 (UTC)[reply]

husband's[edit]

How should I resolve the red link on husband's stitches? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:50, 9 March 2024 (UTC)[reply]

What red link? Vininn126 (talk) 19:21, 9 March 2024 (UTC)[reply]
That fixes the immediate problem, but doesn't address the difference between the lemma and the plural entries in the way linking is handled in the headword. Chuck Entz (talk) 19:40, 9 March 2024 (UTC)[reply]

Template/deletion/inclusion error[edit]

Wiktionary:Beer parlour/2006/October and Wiktionary:Beer parlour/2006/August are showing up in pages for speedy deletion. Equinox 20:37, 9 March 2024 (UTC)[reply]

 resolved by deleting Template:zh-hanzi-box. This, that and the other (talk) 02:40, 10 March 2024 (UTC)[reply]

removing cruft from Module:labels/data/regional[edit]

Heads up, I am planning on moving quite a bit of stuff from Module:labels/data/regional to language-specific modules. There are over 4,000 lines of stuff in this module and 662 entries. Most of the entries are limited to one or two languages, but having them in the lang-independent data means that any use of the labels for any language will add the corresponding category. Hence we get CAT:French Translingual (with 5 current entries), CAT:Austrian English (with one current entry, which does not belong), CAT:Finland English (with one current entry, probably likewise), CAT:French Chinese (with one current entry, debatable), CAT:French Catalan (with one current entry that belongs rather in CAT:Northern Catalan), etc. I wrote a script to find all the existing per-language categories for each label in Module:labels/data/regional, which I am planning on using as a basis to move most entries out. There is a slight disadvantage to doing this in the case of a regional label that corresponds to several languages, in that the aliases and Wikipedia fields will get duplicated. For example, the current entry for France defines French as an alias with a link to the Wikipedia entry for France, and corresponds to six lang-specific categories: CAT:French French, CAT:French Ladino, CAT:French Latin, CAT:French Norman, CAT:French Vietnamese, CAT:French Yiddish. If we care enough about this, one way to minimize duplication is to support a field containing a list of allowed languages; I may do this. (OTOH the Wikipedia links should maybe be customized on a per-language basis. For example, rather than just linking to the Wikipedia entry on France, which is of questionable usefulness here, we could imagine linking to the Western Yiddish article for French Yiddish, the European French article for French French, etc.) Benwing2 (talk) 02:30, 11 March 2024 (UTC)[reply]

FYI I have written a function in Module:alternative forms to convert lang-specific labels data modules to {{alt}} data modules. I will eventually be merging the two sets of data modules so that all the info is found in the labels data and the separate dialectal information in Module:CODE:Dialects disappears. For now I have done Maltese and Albanian. @Catonif, Fenakhay Benwing2 (talk) 04:21, 12 March 2024 (UTC)[reply]
Nice, it's good to see this is gaining traction. Catonif (talk) 05:06, 12 March 2024 (UTC)[reply]
OK, I have written most of the necessary code. Tomorrow I will run some of the code. The plan is as follows:
  1. Allow a list of language restrictions to be added to lang-independent labels, esp. those in Module:labels/data/regional (DONE).
  2. Move most regional labels to lang-specific modules. The current criterion is as follows: A label remains in Module:labels/data/regional if either (a) it concerns more than 3 languages, or (b) it has more than 1 alias and concerns more than 1 language. This means something like Congo with 8 aliases (Democratic Republic of the Congo, Democratic Republic of Congo, DR Congo, Congo-Kinshasa, Republic of the Congo, Republic of Congo, Congo-Brazzaville, Congolese) and 3 languages (yom, fr, avu) remains, as does Nigeria with 6 languages and 1 alias, as does Erzincan with 3 aliases (Yerznka, Erznka, Erzinjan) and 2 languages (tr, hy). OTOH, Lānaʻi with 4 aliases (Lanaʻi, Lanai, Lāna'i, Lana'i) gets moved because it concerns only one language haw (Hawaiian). Overall this moves 591 out of 662 entries out, spreading them over 108 lang-specific modules, of which 34 are new. Sometimes there are clashes between a lang-independent and lang-specific label; in that case the code adds the moved lang-independent version in a Lua comment, for later manual fixing.
  3. Fix up the clashes noted in the previous step; needs to be done manually.
  4. Convert the existing {{alt}} dialectal data modules to label modules (I have a script to do this), and integrate them into existing label modules (I have another script to do this). I wanted to do this step after step (2) because there may be clashes between labels in a lang-specific {{alt}} data module and a lang-indepedent label data module (specifically Module:labels/data/regional), and I'd like to have as few of those as possible as they need manual handling. The integration/merging of the two modules may introduce clashes when there are conflicting specs; as in step 2, the code generates comments for later manual fixing.
  5. Fix up the clash comments generated in the previous step.
  6. Convert all the {{alt}} dialectal data modules to auto-convert from the corresponding {{lb}} data modules; or better, just use the latter directly in {{alt}}. Benwing2 (talk) 06:46, 13 March 2024 (UTC)[reply]
Benwing2 (talk) 06:46, 13 March 2024 (UTC)[reply]
Great work @Benwing2!. ... French French? haaa ha ha ha. I cannot stop laughing. Why not placenames. France French. Belgium French. USA English. British English (a! an exception, ok say Britain...). I haven't seen them all, but the repetition is ... ‑‑Sarri.greek  I 07:04, 13 March 2024 (UTC)[reply]

Missing category[edit]

@Benwing2: Category:Places in Baja California seems to have been overlooked; there is one for Category:Places in Baja California Sur. A bit confusing, I know - anyway it's needed for English and Spanish. DonnanZ (talk) 17:41, 12 March 2024 (UTC)[reply]

@Donnanz Baja California Norte is in the place data (the erstwhile state of Baja California was split into two states some time ago). Benwing2 (talk) 20:20, 12 March 2024 (UTC)[reply]
@Donnanz NVM, I see that Baja California is the official name. I changed the place data and set up Baja California Norte as an alias. Benwing2 (talk) 20:25, 12 March 2024 (UTC)[reply]
@Benwing2: More confusing than I realised. I see it's coming up as a red link for now. Thanks. DonnanZ (talk) 20:37, 12 March 2024 (UTC)[reply]

Sicilian vowels[edit]

The last time this came up Cato and I found ourselves bogged down in consonant difficulties. No need to let the perfect be the enemy of the good, however.

It is uncontroversial that Sicilian has five, and only five, phonemic monophthongs: /i ɛ a ɔ u/. So let's simply start there. Could we run a bot to identify, and hopefully fix some of, the phonemic transcriptions featuring nonsense like /ɪ i̞ ɨ ɛ̃ ɐ̠ ɐ ʊ Vː/? I can clean up the rest manually. Might as well remove the various full-stops while we're at it, as there is no /./ phoneme in Sicilian. Nicodene (talk) 00:48, 14 March 2024 (UTC)[reply]

Keeping the vowel transcriptions to those 5 phonemes sounds good. I don’t find “there is no /./ phoneme” a compelling rationale for omitting syllable divisions in phonemic transcriptions.
Many languages have phonological contrasts that are not normally analyzed as reducible to the presence or absence of a specific phoneme in a sequence; e.g. there is a contrast in Spanish between the pronunciation of ame and amé, which we transcribe (properly, I think) as /ˈame/ vs. /aˈme/. I have never seen it argued, nor would I argue, that /ˈ/ in this transcription is a phoneme, but I don't think these should both be transcribed as /ame/.
There might be other good reasons to omit syllable divisions. E.g. in the case of English, a lot of the time there isn’t even consensus between phoneticians about where syllable divisions falls. In some languages, syllable divisions might be completely predictable just from the sequence of phonemes in a word. (In other languages, this is mostly but not entirely the case, with morphology also affecting syllabification in some circumstances: e.g. in Latin, Catalan, and Spanish, heteromorphemic /bl/ can only be syllabified as heterosyllabic /b.l/ (as in Latin sublātus) but morpheme-initial or internal /bl/ can or must be syllabified as an onset.)--Urszag (talk) 01:27, 14 March 2024 (UTC)[reply]
It was a tongue-in-cheek way of saying that syllabification is not phonemic in Sicilian (unlike stress in Spanish).
Perhaps I should state my concern more plainly. At the moment our transcriptions claim syllabification as a phonemic feature of Sicilian. That is an extremely bold claim, and one made accidentally by editors unaware of what phonemes are. It should be removed on those grounds alone. In the event that a groundbreaking paper surfaces to prove that Sicilian syllabification happens to be phonemic after all, then we will apply its findings carefully, and systematically, to our transcriptions. The chances that our current transcriptions would have got all the details right are nil.
As for morphology affecting pronunciation - that is a concern that applies to many if not most of the languages we have here. That is, morpheme (or word) boundaries often have consequences on the phonetic level. The solution would be adding morphophonemic transcriptions, if we reach such a level. Nicodene (talk) 02:38, 14 March 2024 (UTC)[reply]
@Nicodene I'm not sure that just the presence of periods/full stops between slashes asserts anything about their phonemicity. It is fairly common to include syllable dividers in phonemic representations, esp. if the phonemic representation is all we have (yet another reason, I think, to prefer broad phonetic representations between brackets; it lets you include relevant info without having to worry about whether such-and-such a distinction is phonemic). In any case I can easily do a bot run to find occurrences of the non-phonemic vowels you mention above; correcting them automatically is a bit trickier as it depends both on having rules to do the conversion (which might not be so hard to work out) and making sure the rules are correct in all cases (which might be harder, as people might be doing surprising things with these non-phonemes). Benwing2 (talk) 03:49, 14 March 2024 (UTC)[reply]
@Nicodene Here: User:Benwing2/bad-sicilian-vowels. There are about 600 instances. If you give me a list of replacement rules I'll see about implementing them. Benwing2 (talk) 04:03, 14 March 2024 (UTC)[reply]
I've gone through the list. (/fʷ/ was really quite something.) These are all straightforward:
/ɪ i̞ ɨ/ → /i/
/ɛ̃/ → /ɛ/
/ɐ̠ ɐ ä aː/ → /a/
/ʊ/ → /u/
As for the long vowels, most of them are spurious, but some are indicated in the spelling with a circumflex and derive from contractions of /VV/. I'll see if I can find a paper discussing these before I do anything with them.
As for the full-stops – well, to place something in a phonemic transcription is to indicate that it is phonemic, inevitably and by definition. This is something that is often not grasped, for instance by (I would estimate) more than half of the contributors here, including otherwise very knowledgeable ones, but that is just how it is. One can either use phonological notation correctly or not use it at all.
I'm in favour of phonetic transcriptions as well, so long as we actually know enough to do them properly for the language in question. From what I have seen perusing the existing transcriptions, that is not the case for Sicilian. Someone has to put together a properly sourced and cited Wiki page on Sicilian phonology. Maybe I'll do it if I can find it in me to. Nicodene (talk) 05:17, 14 March 2024 (UTC)[reply]
@Nicodene OK thanks. Are you sure about converting /aː/ to /ɛ/? That seems odd, while the others look totally fine. Benwing2 (talk) 05:22, 14 March 2024 (UTC)[reply]
OK, I see you changed it. Benwing2 (talk) 05:23, 14 March 2024 (UTC)[reply]
Yes. As for the 'legitimate' long vowels, we have the following (if the information in the entries is accurate):
Very interesting. Nicodene (talk) 05:37, 14 March 2024 (UTC)[reply]
@Nicodene Done. Benwing2 (talk) 06:05, 14 March 2024 (UTC)[reply]
Thank you. Nicodene (talk) 06:07, 14 March 2024 (UTC)[reply]
I don't mean to be a wet blanket but I have to add that your claim that "more than half of the contributors here, including otherwise very knowledgeable ones" are confused about what "phonemic" means (by implication, this includes anyone who disagrees with you, including me and User:Urszag) is a very strong statement. I also see you are going ahead and removing the existing syllable breaks in the phonemic notation despite there being no consensus for this (since the two other people in this discussion both disagree with it). Benwing2 (talk) 07:19, 14 March 2024 (UTC)[reply]
@Benwing2 Sorry I didn't mean at all to imply that. I ultimately disagree but your reasoning has to do, respectively, with morphological complications and acceptance of a common practice, not any kind of basic misunderstanding.
My point, which I could have conveyed better, was that the common practice itself comes ultimately from that misunderstanding. Sicilian transcriptions like /çɪɾɪ(ɨ)ˈv(ʲ)ɛɖːu/, Galician transcriptions like /baˈβuʃa̝/, and Neapolitan ones like /ʃkuŋˈtʃi.ʝʝə/ were all common practice until recently, and this sort of thing is self-reinforcing: the more such transcriptions there are, the more they come off as a legitimate model to emulate, and so they can spread and take on a life of their own. Which is what I think happened with syllable divisions in phonemic transcriptions becoming a sort of Wiktionary canon, across languages.
I've not seen a phonology paper with syllable divisions in phonemic transcriptions unless the author is really proposing that they are phonemic. And (aside from my finding the concept itself unlikely) I've not seen a proposal to that effect gain widespread acceptance, e.g. for English, or seen one made at all for Romance languages or Latin.
As for removing syllable breaks - since I was already there cleaning up the long vowels, I fixed other issues with the same transcriptions, such as /e̞ u̞/. In my view /./ is also incorrect but I didn't intend to edit any entries just for that reason. I'll now simply leave it as-is. Nicodene (talk) 14:02, 14 March 2024 (UTC)[reply]
@Nicodene OK, my apologies as I think the tone of that message was stronger than I intended. What you say makes sense (although I'm pretty sure syllable breaks are in fact phonemic in English, cf. the classic minimal pair nitrate vs. night rate, unless you make morpheme boundaries in compounds be phonemic, which is six of one vs. half a dozen of the other). I still think it's helpful to include syllable breaks. Again this leads to my conclusion that for practical purposes (given that our foreign-language entries are not meant for a linguistics paper but as a learner's dictionary for English speakers) we should abandon a purely "phonemic" transcription in favor of a broad phonetic one, which allows us to pick and choose which level of detail to show. This is already done, for example, in Russian, where e.g. we are choosing to show broad /l/ as [ł] and notate some of the more important vowel allophones such as [æ] between palatalized consonants. A pure phonemic representation is a theoretical construct and sticking with such a thing can often put us in a straitjacket, sometimes leading to bizarre results, e.g. per User:AG202 the Spanish terms fui [fwi] and muy [muj] should be notated phonemically as /fui/ and /mui/, where the very salient vowel differences between the two are considered non-phonemic and lexically determined and hence not displayed. (Now, I don't really believe it makes sense to have lexically determined phoneme -> allophone rules like this, but per AG202, this is the consensus view among linguists working on Spanish.) Benwing2 (talk) 00:39, 15 March 2024 (UTC)[reply]
There is a boundary in night-rate, and that boundary is what causes people to pronounce it differently from nitrate. I agree. But it is surely more economical to explain that as a word boundary, given that night-rate is plainly recognizable to a native speaker as night plus rate — and given also that we know speakers have a mental model that can treat words as fundamentally distinct units (or else language wouldn't be possible I think) — than it is to posit that speakers have a mental model which, in addition to that, treats t.r and .tr as fundamentally distinct units. What does the additional assumption contribute?
I agree about promoting [] over // at any rate so long as the phonetic details are known. It would save a lot of headaches, for more than one reason.
As for the thing about Spanish - it sounds by definition impossible. Has AG202 cited a paper to that effect? Perhaps there is some other factor involved, like regional differences which have been shoved into one phonemic representation. Nicodene (talk) 04:26, 15 March 2024 (UTC)[reply]
See the discussion here: User talk:Benwing2/2023 § Borrowing module es-pronunc for Spanish Wiktionary. Particularly the part citing The Routledge Book of Spanish Phonology when it comes to syllabification. It specifically lists "muy" as an exception and phonemically represents it as /mui/, which is what I've seen for the most part elsewhere too from authors that don't list /j/ & /w/ as separate phonemes (which is the consensus). There's no minimal pair with a hypothetical "mui" [mwi] as well. There's an argument that could be made that it's instead /'mu.i/ though. AG202 (talk) 04:51, 15 March 2024 (UTC)[reply]
@AG202 Thanks for the response. Keep in mind there are other terms in -uy. Looking through the lemmas we have produces the following: ababuy, Chuy, cocuy, cuy, espumuy, Esteguy, huy, Jujuy, Luy, muy, pijuy, Ruy, tepuy, uy, Yaracuy. If there are no minimal pairs with words in -ui, it seems a random gap not an inherent feature of language. (And in fact cf. huy and hui, both Spanish words.) Benwing2 (talk) 05:21, 15 March 2024 (UTC)[reply]
Why is it not simply /kui'dado/, /ku'iko/, /fu'i/, /'mui/? I don't follow any of this. Nicodene (talk) 06:00, 15 March 2024 (UTC)[reply]
Because, as Urszag stated below, /fu'i/ implies a disyllabic word, when in fact it's one syllable. AG202 (talk) 06:27, 15 March 2024 (UTC)[reply]
So you're saying /fuí/ can only be [fwí], and not [fuí], while /kuíko/ can be both [kwíko] and [kuíko]? Nicodene (talk) 06:29, 15 March 2024 (UTC)[reply]

The contrast between Spanish fui and muy can be analyzed as a matter of the position of the stress (like the contrast between ame vs. amé). The problem with the standard IPA stress notation (aside from the fact that the stress mark is not a phoneme) is that the IPA stress symbol is supposed to go at the start of the stressed syllable, which calls for /'fui/, /'kuiko/, etc. Some phonologists use the acute instead (/fuí/ vs. /múi/) to avoid that issue. "Quasi-Phonemic Contrasts in Spanish", by José Ignacio Hualde (2004:5), cites Quilis and Fernández 1985 as giving transcriptions like "[bjénto] /biéNto/; [porfiában] /poRfiábaN/; [kwál] /kuál/;[fwérte]". Ralph Penny, in A History of the Spanish Language, also makes use of the acute to mark stress in phonemic transcriptions e.g. /kantáis/. I agree with Benwing that broad phonetic transcriptions can often be preferable to phonemic transcriptions. Linguists discussing Spanish glides and syllabification seem to usually use broad phonetic transcriptions, but I've also seen a few uses of slashed transcriptions that the authors don't seem to have obsessed over getting perfectly theoretically accurate. E.g. "The Syllable", Alfonso Morales-Front (The Cambridge Handbook of Spanish Linguistics, 2018, pp. 190-210) gives a number of phonetic transcriptions such as [suβ.li.mi.ˈnal], [su.ˈβli.me], [ˈpje.ðɾa], [gwe.βo], but also gives in slashes the transcriptions /uebo/, /-ecito/, /ˈaman/. There's no explanation of why the stress symbol is included in the last but not the first two, or why the symbol /c/ was used in /-ecito/.--Urszag (talk) 06:10, 15 March 2024 (UTC)[reply]

Thanks! You explained it better than I could, and I agree that it looks to be a matter of stress like Routledge also posits. I'm a bit wary of using the acute accent though as it's usually used for tone. I'm not sure how else we can show it though. AG202 (talk) 06:28, 15 March 2024 (UTC)[reply]
The "calls for /'fui/, /'kuiko/" part doesn't follow for me. I understand specific languages can have some 'home-brew' IPA practices, to an extent, but this just seems misleading. To anyone else this reads as if /u/ is stressed, then stress migrates rightwards in every surface realization. And it causes a clash with the actually stressed /u/ in muy. Nicodene (talk) 06:42, 15 March 2024 (UTC)[reply]
To be clear, I wasn't recommending the transcriptions "/'fui/, /'kuiko/". My point was that these (also /'fiesta/, /'fuerte/, etc.) would fall as a natural but undesirable consequence of the convention of placing IPA stress marks before the onset of the stressed syllable. Then again, I can't find that principle explicitly stated anywhere in the online IPA chart or in the 1999 handbook (just implicitly conveyed by the examples), so maybe it doesn't even technically have official status anyway--I know some phoneticians have violated it and instead adopted the convention of placing the stress marker directly before the stressed vowel, but we don't generally do that on Wiktionary (e.g. we don't transcribe floro as /flˈoɾo/).--Urszag (talk) 07:41, 15 March 2024 (UTC)[reply]
Is it true that Spanish phonologists agree on phonemic representations like /'kuiko/ in a phonology that contain the vowels /i/ and /u/ and no phonemic diphthongs? I ask because it isn't clear to me how that would work. Given the phonology as described, if I've not missed something, that transcription could only stand for a phonemically stressed /u/.
Also, does the pronunciation [uˈi] occur? The linked discussion suggests so, at least for cuico, whereas the comments here seem to suggest otherwise. Nicodene (talk) 08:52, 15 March 2024 (UTC)[reply]
I also am not saying that Spanish phonologists generally recommend using the transcription /'kuiko/. But it is a possible phonemic transcription of the disyllabic pronunciation ['ku̯i.ko]. Stress is analyzed as a suprasegmental feature, so the placement of /'/ relative to other symbols in a phonemic transcription is a matter of convention. One convention is to put it directly before the stressed syllable. If you think that convention doesn't seem to work very well in this context, you're not alone, but as a convention, it isn't something that can be true or false: it isn't a fact about Spanish phonology or the position of Spanish phonemes (since /'/ is not a phoneme and doesn't actually come before or after any phoneme in the phoneme sequence). Here are some relevant transcriptions and commentary from José Ignacio Hualde's chapter "Spanish", in Gabriel, Christoph; Gess, Randall; Meisenburg, Trudel (eds.), Manual of Romance Phonetics and Phonology, 2022:790: "/ˈbiaxe/ [ˈbi̯a.xe]", "/ˈbaile/ [ˈbai̯.le]", "/liˈana/ [liˈa.na]", "/ˈioɡa/ [ˈʝo.ɣa]", "/iˈato/ [iˈa.to]" "/ˈkon.iuxe/ [ˈkonʲ.ʝu.xe]" (these are given in the context of explaining the analysis where [ʝ] is treated as a positional allophone of /i/). In footnote 1, Hualde notes: "In yoga the vowel /o/ is the phonologically stressed element, not the initial /i/, which becomes a consonant as it does not receive the stress on itself in this context, although the initial syllable is stressed. The lack of clarity introduced by the IPA convention of marking the stress at the beginning of the syllable in sequences like /io/ without a preceding consonant is the reason why in Hualde (2005) stress is indicated directly on the stressed vowel instead." I can't confirm whether cuico is potentially trisyllabic, but I have no reason to doubt it.--Urszag (talk) 10:21, 15 March 2024 (UTC)[reply]
There isn't anything in the phonemic representation /ˈfui/ to convey that it is one syllable as opposed to two like /ˈmio/ and /'tea/. If we're to assume that /u/ here is inherently non-syllabic, then what we are really saying is that it is the phoneme /u̯/ or /w/ and the transcription has to be revised.
If we attempt an allophonic rule turning /u/ in that context to [w], we'll have to find a way to make sure it doesn't affect the /u/ in /ˈmui/ or any of the other -uy words mentioned by Benwing earlier. Most difficult of all, we would have to differentiate /ˈui/ from /ˈui/, namely the pair huy/hui. Nicodene (talk) 11:48, 15 March 2024 (UTC)[reply]

Autocloseable.close[edit]

All use of the {{tl}} template is now rendering as Autocloseable.close for some unknown reason. That's all I know. Thanks, Soap 01:59, 14 March 2024 (UTC)[reply]

It probably began when an IP editor changed it from a redirect to {{temp}} into a standalone page with the error. The error was actually appearing as plain text in the diff, so maybe this was just an odd form of vandalism? Either way, if {{tl}} is supposed to be a redirect to {{temp}}, it should be fine now. If not, we need to work out what the IP was trying to do. Soap 02:09, 14 March 2024 (UTC)[reply]
@Soap Thanks. Yes, {{tl}} is just supposed to redirect to {{temp}}. Benwing2 (talk) 03:42, 14 March 2024 (UTC)[reply]

Interesting failure (8,782,141,951 IDs)[edit]

Sure, this edit is just vandalism, but I'm intrigued by the effect it had: instead of just making the one use of {{af}} fail, it made every instance of a Lua-using template on the page fail, saying "The time allocated for running scripts has expired." Why? Was the module thinking that "id8782141951=agent noun" meant it should keep looking through the other parameters trying to find "id8782141950=", "id8782141949=", etc? (If I change the parameter to e.g. "testbadparameter=agent noun", only that one instance of {{af}} fails.) Do I gather the module supports arbitrarily many id= parameters, even 8,782,141,951 of them, and times out when it thinks there are that many? Would it make sense to set any kind of sanity-check/sanity-limit, like more than 50 id= per template makes it spit out an error so that only the one instance of {{af}}, and not the whole page, breaks? - -sche (discuss) 06:53, 15 March 2024 (UTC)[reply]

@-sche Yes, that's more or less what's going on. More specifically, I think what's happening is that it checks the maximum index of all numbered parameters and iterates from 1 up to that index, processing arguments. The reason for doing this is that potentially e.g. the tr could be supplied but not the term or display, etc. Yes, it probably should have some sanity checks in it, although it's not especially high priority because (a) it only breaks one page, (b) properly for this to be fairly robust we'd have to add sanity checks in lots of places, which is both a big undertaking and could backfire if we set the limits too low. Generally when I add sanity checks it's to prevent errors from swamping CAT:E, e.g. things like an alias loop in a label module used to cause all sorts of pages to get errors; now (if I remember aright) it only causes errors on certain pages (so we do get a few pages in CAT:E to alert us of the problem) and has some sort of fallback behavior on the rest (so they don't swamp the category). Benwing2 (talk) 07:07, 15 March 2024 (UTC)[reply]

Mohawk stems[edit]

Many nouns in Mohawk are contain noun stems that are useful for stuff like noun incorporation and also historical linguistics (kéntsion is much more easily seen to be from proto iroquoian *-tsjõɁt- when you can see that the stem is -itsion- so I wanted to create a template moh-stem but I'm not sure how to do that and if I should be doing that. If anyone has any help I've been trying to change the etymology for the page for mohawk onón:tsi to say "Noun stem -nontsist- from Proto-Iroquoian *-nõːtsiː-" but I'm not sure how to do that ChromeBones (talk) 07:52, 15 March 2024 (UTC)[reply]

uh oh, script timeouts[edit]

@Theknightwho semen, laven and kennen are now running out of time halfway through the page. This has only happened in the last hour or two. Could you have made a recent change (e.g. your bug fix to Module:parameters or some other change) that inadvertently slowed things down? If not, any ideas? Benwing2 (talk) 08:15, 15 March 2024 (UTC)[reply]

@Theknightwho It is indeed this change, because when you preview the pages in question without it, you don't get timeout errors. Interestingly they happen only with Middle English verb conjugations; Module:enm-conj must be doing something strange with parameters that is triggering an edge-case bug in Module:parameters. Benwing2 (talk) 08:39, 15 March 2024 (UTC)[reply]
@Benwing2 I've fixed this. In essence, {{enm-conj}} was relying on the old way that defaults were handled for list parameters, where if item 1 of a list was empty then the default value would get used as the first item. This applied even if the list contained higher values, such as (in this case) class2= etc. This is only relevant when lists are allowed to contain holes, as in this case, so the solution was twofold:
  1. Revert to the old method of handling defaults, so that they're always added if item 1 of a list parameter is empty.
  2. Move the handing of default values so that it comes after the handling of holes in lists. This therefore means that item 1 of a list can only be empty at that point if allow_holes = true.
There might still be some other module which relies on lists not having holes while also relying on the old default handling, so it might be worth tracking any instances where allow_holes hasn't been specified and an input list contains a hole at item 1, since that should hopefully flush out the possible instances where it could occur.
Going forward, we might want to change the spec so that defaults can either be (a) inserted only if the list has 0 items, or (b) inserted if item 1 is empty. 16:45, 15 March 2024 (UTC) Theknightwho (talk) 16:45, 15 March 2024 (UTC)[reply]
@Theknightwho Great, thank you for looking into this and fixing it! I think ideally we should have disallow_holes = true as the default but that might require a lot of work. Benwing2 (talk) 19:35, 15 March 2024 (UTC)[reply]
@Benwing2 That should be the default at the moment (or rather, allow_holes = true has to be set manually), but the issue is if a template relies on holes being removed automatically except for item 1, which is set as the default if empty. Theknightwho (talk) 20:28, 15 March 2024 (UTC)[reply]
@Theknightwho There are actually three states with regard to holes: allow holes (allow_holes = true), compress holes (the default) and disallow holes (disallow_holes = true). What I mean is probably "disallow holes" should be the default and the "compress holes" state should have to be requested explicitly using compress_holes = true. I think the behavior where holes can be present and are compressed away is surprising, esp. with named parameters. Benwing2 (talk) 20:42, 15 March 2024 (UTC)[reply]
@Benwing2 You're right - I'd forgotten (and it's not in the documentation, so I should update that). Theknightwho (talk) 20:46, 15 March 2024 (UTC)[reply]

Unicode 15.1 update for Appendix:Unicode[edit]

I just updated the Indonesian Wiktionary's version of Appendix:Unicode to Unicode 15.1 at Lampiran:Unicode. Here's the list of the relevant changes if anyone wants to update the Appendix:Unicode to Unicode 15.1 since I don't have permission to edit the modules:

Also slightly unrelated, I created a name rule for Lampiran:Unicode/Variation_Selectors so it doesn't need a name module anymore:

Thank you! Ekirahardian (talk) 20:30, 17 March 2024 (UTC)[reply]

Pinging @Erutuon who has edit-access to said modules. Ekirahardian (talk) 23:52, 18 March 2024 (UTC)[reply]