User talk:Benwing2: difference between revisions

Content deleted Content added

Inline

Revision as of 08:09, 2 September 2023

Links to Thesaurus:vagina#Portuguese from mainspace

Latest comment: 1 year ago10 comments3 people in discussion

It would be ideal if you would undo the changes your bot has done in mainspace, resulting in Portuguese entries no longer pointing to Thesaurus:vagina#Portuguese. Example: diff. If you don't do it, someone else will have to fix it. --Dan Polansky (talk) 07:33, 4 January 2023 (UTC)Reply

@Dan Polansky That's because you removed the Portuguese stuff from the translations page and moved it to Thesaurus:vagina#Portuguese, after I made the bot changes. IMO this is not my problem. I see you only moved Portuguese, not any of the other languages, which doesn't make a lot of sense. Benwing2 (talk) 04:31, 5 January 2023 (UTC)Reply

diff created Thesaurus:vagina#Portuguese, 9 October 2022

diff changed the link via a bot, 19 November 2022

Moving content from /Translations to separate pages is the right thing to do, and can be helped by those who want to help. --Dan Polansky (talk) 07:09, 5 January 2023 (UTC)Reply

@Dan Polansky Completely disingenuous. The Portuguese text was there in Thesaurus:vagina/translations at the time I did the bot edits (and I disagree it's the "right" thing to do to do the sort of thesaurus move you did, esp. when done in a half-assed fashion). You have your own pseudo-bot account (something-Maid, I forget the name), why don't you clean it up yourself? I've spent a LOT of time cleaning up your mistakes and questionable decisions. Benwing2 (talk) 07:19, 5 January 2023 (UTC)Reply

I do not know which mistakes of mine you cleaned up; any examples? Portuguese was in Thesaurus:vagina/translations because someone else has restored it after I had removed it; this restoration was a bad idea and was unjustified. --Dan Polansky (talk) 07:23, 5 January 2023 (UTC)Reply

The move that I did was not "half-assed" (incomplete or badly done); it was complete within the unit of Portuguese. There is no requirement that all languages need to be moved out from /translations in one fell swoop. --Dan Polansky (talk) 07:36, 5 January 2023 (UTC)Reply

I restored it because you had effectively vandalised the entry by removing the vast majority of the content with no justification, and had made no effort to adequately replace the info that you had removed. You then edit-warred over this, and even after realising your own mistake, you have the temerity to indirectly blame me anyway? Theknightwho (talk) 07:53, 6 January 2023 (UTC)Reply

As for Portuguese, I moved the content from one place to another, improving the state of affairs. As for other languages, one can justly charge me with removing content, to which my defense is that the loss was very little and that the lists are often unsubstantiated by mainspace and probably unverifiable inventions, as a cursory look discovers.

Be it as it may, I can restore the links to Thesaurus:vagina#Portuguese from mainspace myself provided the above editors promise not to undo such a correction; I hate to labor in vein and have my efforts thwarted by meddlers with no thesaurus contribution. --Dan Polansky (talk) 13:16, 6 January 2023 (UTC)Reply

Are you intentionally trying to be rude? Because talking about people you are responding to in the third person gives the impression that you are. Theknightwho (talk) 14:09, 6 January 2023 (UTC)Reply

I will trust the rudeness hypothesis after the above programmer traces the hypothesis to an authoritative source. I don't consider it to be rude at all, just taking some distance to the dear honorable gentlemen. --Dan Polansky (talk) 14:17, 6 January 2023 (UTC)Reply

Appendix:Russian Adverbs - Frequency List?

Latest comment: 1 year ago4 comments2 people in discussion

Hiya. Do you happen to have access to Russian adverb lists as well? Not sure if that short discussion helps: User_talk:Benwing2/2012-2019#Russian_adjectives_-_frequency_list?. Anatoli T. ^{(обсудить}/^вклад) 23:56, 17 January 2023 (UTC)Reply

@Atitarev I generated the adjective list based on a frequency list of words of all parts of speech, so it should be possible to generate an adverb list as well, let me see what I can come up with. Benwing2 (talk) 00:58, 18 January 2023 (UTC)Reply

@Atitarev I generated a list in Appendix:Russian Adverbs - Frequency List based on my largest (32,600-word) frequency list. I have a couple of other frequency lists that might give slightly different results but this should get you started. Benwing2 (talk) 01:54, 18 January 2023 (UTC)Reply

Great, thank you! Anatoli T. ^{(обсудить}/^вклад) 02:05, 18 January 2023 (UTC)Reply

Italian heads

Latest comment: 1 year ago3 comments2 people in discussion

Thanks for cleaning up those plurals in Italian headwords. Could you also clean up the heads of pages like numero di telefono? They're redundant but presumably missed by your bot because not every word was linked; I see lots of pages like this with just prepositions not linked. Also, something went wrong at capsa. Ultimateria (talk) 20:55, 25 January 2023 (UTC)Reply

@Ultimateria Thanks. I will do a run to handle missing preposition links. What happened with capsa is I thought I generalized the # support so that a # anywhere in the plural stands for the lemma; I just did that. Benwing2 (talk) 22:27, 25 January 2023 (UTC)Reply

@Ultimateria Done. Benwing2 (talk) 08:31, 27 January 2023 (UTC)Reply

Weird bug

Latest comment: 1 year ago1 comment1 person in discussion

This edit (diff) seems to have introduced text inadvertently. Gabbe (talk) 09:21, 4 February 2023 (UTC)Reply

Negativizzare and negativizzarsi

Latest comment: 1 year ago2 comments2 people in discussion

(@Catonif, too) Negativizzare must be quite recent, I can't find it on any of the major Italian dictionaries, i.e. Zingarelli, Devoto Oli, De Mauro, Treccani... On the other hand, they all have negativizzarsi, apparently attested since 1986 (Zingarelli). Technically speaking, negativizzarsi is the "reflexive" of negativizzare, but all evidence points to negativizzarsi existing prior to negativizzare. I feel we should make this clear in those entries, somehow... — Sartma ^{【𒁾𒁉 ● 𒊭 𒌑𒊑𒀉𒁲】} 09:10, 6 February 2023 (UTC)Reply

@Sartma Thanks! I've seen several cases where terms that were created after the 1970's or so are missing in Treccani and Hoepli but are present in Internazionale (=De Mauro?), which seems to have better coverage of such terms. For terms like this I often check context.reverso.net for attestation; this is automatically scraped so it's somewhat spotty in quality but I've found it a good check for recent usage. Benwing2 (talk) 18:15, 6 February 2023 (UTC)Reply

пересекший vs пересёкший

Latest comment: 1 year ago9 comments3 people in discussion

You moved пересёкший to пересекший, with a comment that it was a misspelling. I believe this to be incorrect.

Google translate pronounces it пересёкший regardless of whether the ё is there or not. Additionally a native speaker I talked to, was only aware of пересёкший in modern speech.

Here is a russian stackexchange post where someone cites some dictionaries that state that both are acceptable however ё is considered less of a good choice[1] 190.194.221.97 03:04, 9 February 2023 (UTC)Reply

The second-to-last discussion section of Module talk:ru-verb touches on this concerning засе́чь (zaséčʹ), which is a very similar verb. In fact, Zaliznyak's grammar says both meanings of пересе́чь (pereséčʹ) are conjugated like the two meanings of засе́чь (zaséčʹ), and says both meanings have засе́кший. Since apparently засёкший and пересёкший are also used in modern Russian, we can create these two terms as alternative forms of засе́кший/пересе́кший. I'll fix the module to generate both forms; but there are lots of other similar verbs, and I'm not sure if all of them allow for -сёкший participles: насе́чь (naséčʹ), обсе́чь (obséčʹ), надсе́чь (nadséčʹ), подсе́чь (podséčʹ), пресе́чь (preséčʹ), осе́чь (oséčʹ), посе́чь (poséčʹ), просе́чь (proséčʹ), ссе́чь (sséčʹ), рассе́чь (rasséčʹ), иссе́чь (isséčʹ), отсе́чь (otséčʹ), усе́чь (uséčʹ), вы́сечь (výsečʹ). User:Atitarev can you comment on this as a native speaker? Do all these verbs allow for past active participles in -сёкший? Benwing2 (talk) 03:30, 9 February 2023 (UTC)Reply

@Benwing2, 190.194.221.97: Hi. (Just back from a long Pacific cruise.) Without a long thought or check I would immediately say "пересёкший" sounds more correct and modern (like I said in that discussion) but since resources say that "пересе́кший" is the correct form, I think we should allow both. Confirmed by gramota.ru (Орфографический словарь). --Anatoli T. ^{(обсудить}/^вклад) 05:16, 9 February 2023 (UTC)Reply

@Atitarev Thanks and glad you had a good, hopefully relaxing trip! Presumably all the other verbs mentioned above are the same? Benwing2 (talk) 06:04, 9 February 2023 (UTC)Reply

@Benwing2: Thanks! Yes, treat them the same way. I checked some and they all have the same feel. Anatoli T. ^{(обсудить}/^вклад) 06:06, 9 February 2023 (UTC)Reply

@Benwing2@Atitarev I believe вы́сечь (výsečʹ) would be the exception because the emphasis is on `вы` so ё doesn't make sense. 190.194.221.97 01:24, 10 February 2023 (UTC)Reply

@190.194.221.97, Benwing2: Good pickup. Yes, definitely. Hope Benwing2 would figure that out. Anatoli T. ^{(обсудить}/^вклад) 01:27, 10 February 2023 (UTC)Reply

@Atitarev Yes, definitely. Benwing2 (talk) 01:39, 10 February 2023 (UTC)Reply

@Benwing2: Re: diff: I couldn't find anything about forms with "-сёкши(й)" being non-standard or proscribed. They are equally correct and common, as far as I can tell, not just attestable but included in dictionaries, modern, at least. Anatoli T. ^{(обсудить}/^вклад) 09:36, 10 February 2023 (UTC)Reply

`|translator=`

Latest comment: 1 year ago3 comments3 people in discussion

I find that the quotation templates render this parameter in a way that makes it completely unattractive to use. In my opinion it should behave the same as |author=: appear before the title, just with the specification ‘(translator)’. As it is now, not even the examples on the documentation page (the Andersen and Milne quotations) use it.

Or am I mistaken and the present rendering of translator information is the standard? ―Biolongvistul (talk) 21:42, 13 February 2023 (UTC)Reply

There was some discussion about this in the past which led to the current configuration. If it is to be changed, a discussion at the Beer Parlour is probably required. — Sgconlaw (talk) 22:19, 13 February 2023 (UTC)Reply

@Biolongvistul I have no particular opinions on this; please do start a Beer Parlour discussion about changing this if you think the current display is wrong. Benwing2 (talk) 23:41, 13 February 2023 (UTC)Reply

Replacement of unnecessary redirects and templates

Latest comment: 1 year ago3 comments2 people in discussion

Hi, could you please carry out the following replacements?

{{RQ:Butler Way of All Flesh}} → {{RQ:S. Butler Way of All Flesh}}
Uses in the form:

#* {{RQ:Barrow Sermon|The Consideration of our Latter End}}
#*: No virtue is '''acquired''' in an instant, but by degrees, step by step.

to:

#* {{RQ:Barrow Works|sermonname=The Consideration of Our Latter End|passage=No virtue is '''acquired''' in an instant, but by degrees, step by step.}}

Thank you! — Sgconlaw (talk) 19:53, 16 February 2023 (UTC)Reply

@Sgconlaw I went to implement this but then I suspected you mean to convert all sermons in {{RQ:Barrow Sermon}} to {{RQ:Barrow Works}}, not just 'The Consideration of Our Latter End', is that right? Benwing2 (talk) 04:35, 20 February 2023 (UTC)Reply

Yes, as {{RQ:Barrow Sermon}} is a really bad quotation template. It doesn’t even actually refer to any published work, but merely asserts “this is a sermon by Isaac Barrow”. — Sgconlaw (talk) 10:50, 20 February 2023 (UTC)Reply

Arabic spellings in Northern Kurdish headword templates

Latest comment: 1 year ago3 comments2 people in discussion

Hi, I've recently taken an interest in adding Northern Kurdish entries with the aid of the Ferhenga Birûskî: Kurmanji–English Dictionary, wherein I often find entries with multiple Arabic spellings (e.g. the noun aciz, spelled as either ئاجز (aciz)‎ or عاجز ('aciz)), but the templates seem to only allow one entry for the ar= parameter.
Since you created the template, I thought I'd ask you. Is there a way to enter multiple spellings that I'm not aware of? — GianWiki (talk) 19:57, 18 February 2023 (UTC)Reply

@GianWiki Hi. Use |ar2= for the second Arabic spelling, |ar3= for the third, etc. Benwing2 (talk) 20:21, 18 February 2023 (UTC)Reply

Thanks! GianWiki (talk) 21:05, 18 February 2023 (UTC)Reply

Updated vowel length in nū̆ptum: bot help?

Latest comment: 1 year ago6 comments2 people in discussion

Hello, I'm wondering if you could help me out with a Latin vowel length update. I recently found out that a number of sources show a short vowel in the first syllable of nuptiae and all other words built on the supine stem of nūbō. On the other hand, Lewis 1891 shows the vowel as long. There are potential arguments based on etymology or analogy for either quantity. So I updated the main pages of nubo, nuptus, nuptiae to mark the vowel as ū̆, but I don't want to have to go through all the other inflected and derived forms manually. Would this be something you could carry out with your bot? Here's a list of words I think would be affected: all forms of nū̆pta nū̆ptia nū̆ptiae nū̆ptiālis nū̆pturiō nū̆ptus; supine forms of prefixed derived verbs such as innūbō, transnūbō, obnūbō. Urszag (talk) 22:15, 18 February 2023 (UTC)Reply

{{ping|Urszag} Yes, I have a script to do this, let me run it. Benwing2 (talk) 22:25, 18 February 2023 (UTC)Reply

@Urszag Oops ... Benwing2 (talk) 22:25, 18 February 2023 (UTC)Reply

@Urszag BTW Bennett [2] with corrections by Michelson and Allen has long nūptus; this seems primarily by analogy with nūbō (compare scrīptus from scrībō, where the length is clear, e.g. from Italian scritto). Benwing2 (talk) 22:31, 18 February 2023 (UTC)Reply

Yes, I mentioned in the note I added to nūbō that a long vowel can be supported by analogy to scrībō, scrīptum; but a paradigm with a short vowel in just this verb part is also theoretically possible as in dūcō, dūxī, ductum. (Lachmann's law isn't expected to apply here since the stem-final consonant is an original aspirate.) Combined with the existence of sources that do mark the vowel as ŭ (Gaffiot 2016 version V. M. Komarov gives "nŭptum" in its entry for nūbō), and De Vaan who implicitly describes the vowel as short by listing its forms as nūbō, nūpsī, nuptum, I think it's appropriate for us to use ū̆ to mark uncertainty in vowel length.--Urszag (talk) 22:39, 18 February 2023 (UTC)Reply

@Urszag Agreed. Benwing2 (talk) 22:50, 18 February 2023 (UTC)Reply

References > Further reading

Latest comment: 1 year ago5 comments2 people in discussion

Hi, can I ask why you changed 'References' sections into 'Further reading' ones? Catonif (talk) 19:36, 28 February 2023 (UTC)Reply

There was a discussion in the Beer Parlour awhile ago about what goes in 'References' vs. 'Further reading'. WT:ELE isn't very clear on this. I and some others (e.g. Rua) argued that only footnotes should go in 'References', and other references tied to the page as a whole rather than a specific piece of text should go under 'Further reading'. There was no consistency in the Italian entries about what went where and IMO it looks bad if both footnotes and other references go into the same section, so I standardized on putting non-footnote references under 'Further reading'. Benwing2 (talk) 23:33, 28 February 2023 (UTC)Reply

I see, there was, at least for me, a reason for in which header the links where, "Further reading" = "works consulted" and "References" = "works cited", as Ultimateria says on the discussion. Does this mean I should start using References only for footnotes now? And should WT:ELE be amended? I also see some other changes like changing all occurrences of {{ng}} to {{n-g}}, are we deprecating the shortcut {{ng}}? Should I not use it? Catonif (talk) 15:37, 2 March 2023 (UTC)Reply

@Catonif The issue with {{ng}} vs. {{n-g}} etc. is that there are a bunch of equivalent redirects (see [3]; there appear to be 6 aliases of {{non-gloss definition}}), and this was getting in the way of some searching-and-replacing I was doing, so I standardized them all on {{n-g}}, without prejudice towards any of the other aliases (although I'd like to get rid of some of them eventually). As for the References vs. Further reading, essentially what I did is standardize on "works actually cited using <ref>" in ==References== and all others in ==Further reading==, which is apparently similar to what you've been doing except you've putting works in ==References== that you cited conceptually without actually citing using <ref>, right? (If not, let me know where I'm confused ...) I don't really see how the idea of "conceptually citing" a work can be implemented in practice, and I'm almost positive other Italian editors haven't been observing this, but just randomly putting references under one or the other header. For this reason I'd recommend going with the practice I've established, but this is just a recommendation. As for amending WT:ELE, that's a whole can of worms; I think it would be nice to standardize the "==References== only for footnotes" practice but there may be objections since the Beer Parlour discussion you cited had no real consensus. Benwing2 (talk) 07:51, 3 March 2023 (UTC)Reply

Yes I've been using Refs/FR how you described, sorry for not being clear, and thanks for clarifying the {{ng}} thing. I understand that bureaucracy is stressful, and I'm fine with editors having their editing preferences (I'm not Thadh, lol), but per WT:BOT, there should be some consensus before changing everything. We want to make a change that involves how two of our most used headers are used? That's cool, but in that case it should be done site-wide and by updating our written guides (like WT:EL), instead of removing it only from Italian entries, which looks like we're trying to be sneaky about it. Catonif (talk) 19:00, 3 March 2023 (UTC)Reply

regex help?

Latest comment: 1 year ago7 comments3 people in discussion

I was wondering how to limit a pattern match from the search box to a single heading, eg L2 or L3/4. My regex and general programming knowledge is very elementary. Can you point to any examples of good regexes that do what I want or to a kind of regex capability that might work. In the simple cases I've tried, I seem to have run afoul of the fact that the regex search is "greedy".

I am optimistic that I could accomplish what I want using Perl on the xml dump, but that it less convenient for many purposes. DCDuring (talk) 18:12, 6 March 2023 (UTC)Reply

@DCDuring I am very familiar with regexes but unfortunately not so much with the limitations of the Cirrus search box because I don't normally use it. (I typically use regex searching through the contents of either a category or all references to a given template, and if that won't work, I search through the dump.) I can definitely help you with Perl or Python regexes applied to the dump file and might be able to help you with the search box if you give me some more details: What exact pattern were you using, what did you expect to happen and what actually happened? Benwing2 (talk) 08:01, 7 March 2023 (UTC)Reply

I've had some trouble interpreting the various fragments of documentation at mediawiki.

I was trying to show how easy it was to find HTML comments using the search box, using "insource=" and 'filters'. I thought it would be handy to show a search focused on one L2 and one L3/4.

My search line is "Pronunciation incategory:"English nouns" insource:/[=]+Pronunciation[=]+[^<]+\<!--.+--\>/"

The regex pattern is what follows "insource". I tried many variations. This search finds any HTML comment in an entry that is in Cat:English nouns that has a pronunciation section, not limited to a section. DCDuring (talk) 13:51, 7 March 2023 (UTC)Reply

@DCDuring: I tried to come up with a regex to find "bor" only within an Etymology section (insource:"Etymology" incategory:"Greek lemmas" insource:/Etymology *[0-9]* *=+((?![^ -􏿿]=).)*bor/) and it didn't work. I guess the negative lookahead syntax ((?!)) is disabled in insource:// though it exists in PHP regex. Negative lookahead is the only way I know to really restrict the search to be within a section. (All this assumes the headers are not commented out and don't have HTML comments interspersed in them, which is legal MediaWiki syntax but not allowed by the style guide.) [^ -􏿿] matches ASCII control characters U+0000-U+001F, which is only newline (U+000A) and tab (U+0009) in wiki pages because all other ASCII control characters are replaced with a replacement character (�). (\n matches a literal n in insource://.) So I think it's impossible to match text only within a section with CirrusSearch. — Eru·tuon 22:51, 7 March 2023 (UTC)Reply

What I feared, not what I hoped, but it's wonderful to be able to stop wasting time. Thanks. DCDuring (talk) 01:47, 8 March 2023 (UTC)Reply

@DCDuring, Erutuon It seems to me it should be possible without negative lookahead. When I created a Python regex to find 'confer' within an Etymology section, I wrote this:

Etymology( [0-9]+)?==*\n((?!=).*\n){0,20}.*[Cc]onfer\b.*

which uses negative lookahead, but you should be able to rewrite it without the negative lookahead like this:

Etymology( [0-9]+)?==*\n([^=\n].*\n|\n){0,20}.*[Cc]onfer\b.*

That is, I'm searching for 0 through 20 occurrences of a line inside an Etymology section, which consists of either (a) a character that's not an equal sign or newline followed by any number of non-newlines followed by a newline, or (b) just a newline. You don't necessarily need the {0,20}, you can use * if it doesn't choke. You have to figure out how to avoid the use of \n but it seems like you've figured that out. Benwing2 (talk) 07:02, 8 March 2023 (UTC)Reply

Thanks, I hope. I'll try it when I can. DCDuring (talk) 15:14, 8 March 2023 (UTC)Reply

Changes to Module:languages

Latest comment: 1 year ago3 comments2 people in discussion

Flagging this so you're aware of what I'm doing: I'm making a minor change to Module:languages and Module:languages/data, which should allow me to eliminate much of the additional code in Module:links which you flagged with your comment. This also has the advantage of keeping everything self-contained in the transliterate method, which then subdivides the text where necessary itself.

For context: Module:languages/data/patterns has a series of patterns, which are used to find things like formatting, URLs etc. so that they can be converted in PUA characters (i.e. text which we definitely don't want to be changed). These are stored in a table, and reconverted at the end. However, to avoid feeding PUA characters through a bunch of modules which may be unequipped to handle them, the text is then subdivided using mw.text.split and fed through in chunks. Conveniently, this is also a useful model for page-scraping modules like (the now-consolidated) Module:zh-translit, which makes it possible to feed through terms with embedded links without requiring a page for the whole term. For example, 香港語言學學會粵語拼音方案／香港语言学学会粤语拼音方案 (Xiānggǎng yǔyánxué xuéhuì yuèyǔ pīnyīn fāng'àn).

I haven't yet documented this yet, because the ultimate aim is to have the functionality of {{zh-x}}, which uses spaces to achieve the same effect (but isn't capable of handling links). I'll explain this in more detail in my userspace. Theknightwho (talk) 21:28, 6 March 2023 (UTC)Reply

@Theknightwho Great, thank you for the message! Apologies, some RL stuff came up today and I haven't had a chance to look through Wiktionary pings or discussions. Will do that tomorrow (Tuesday). Definitely have some questions about the use of PUA chars and such, maybe your userspace docs will clarify this. Benwing2 (talk) 07:56, 7 March 2023 (UTC)Reply

@Benwing2 Sorry for the delay - a combination of real life stuff and procrastination. Will have things in my userspace this weekend. Theknightwho (talk) 18:17, 9 March 2023 (UTC)Reply

suízhe

Latest comment: 1 year ago5 comments3 people in discussion

Reporting bug: currently displays "隨著／随著, 随着". ---> Tooironic (talk) 21:54, 7 March 2023 (UTC)Reply

@Theknightwho: Character 著 has 着 as a variant or as a simplified form. What module needs updating?

The display should be 隨著／随着 as @Tooironic pointed out. Anatoli T. ^{(обсудить}/^вклад) 22:08, 7 March 2023 (UTC)Reply

@Tooironic @Atitarev This isn't a bug - you can do a manual override using // as a divider (e.g. 隨著//随着), or alternatively make sure the traditional/simplfiied correspondence is correct in Module:zh/data/ts. Someone has been making a lot of changes to that module recently, which may explain the error. Theknightwho (talk) 22:12, 7 March 2023 (UTC)Reply

@Theknightwho, @Tooironic: Thanks. I made a manual edit. I forgot about the // trick. Anatoli T. ^{(обсудить}/^вклад) 22:15, 7 March 2023 (UTC)Reply

Thanks everyone. ---> Tooironic (talk) 22:16, 7 March 2023 (UTC)Reply

On Northern Kurdish

Latest comment: 1 year ago1 comment1 person in discussion

Hi, there.
Since you seem to be an active enough user on the subject of Northern Kurdish, I wanted to try and ask you: do you have any idea if and how I can get answers to the questions I had here and here on the subject?
Thanks in advance for your time. — GianWiki (talk) 16:00, 10 March 2023 (UTC)Reply

Plus template conversion

Latest comment: 1 year ago3 comments2 people in discussion

Hey, I think Old Polish, Kashubian, and Silesian should use plus templates, could you run your script on them? Also I think some will have {{l|en|to [[define]]}} or {{l|en|definition}}, those should be bare links, and can we templatize categories (using {{C}} and {{cln}} and quotes? Any any femeq's should be applied to Silesian and Kashubian. Basically all of the changes you first proposed to Polish entries :) Vininn126 (talk) 23:51, 13 March 2023 (UTC)Reply

@Vininn126 Sure, I can do that. Benwing2 (talk) 00:00, 14 March 2023 (UTC)Reply

@Vininn126 I have done this for Old Polish, the others still to come. Benwing2 (talk) 07:37, 18 March 2023 (UTC)Reply

Quotation template replacements

Latest comment: 1 year ago3 comments2 people in discussion

Hi, could you please do a bot run to carry out the following replacements?

{{RQ:Holder EOS}} → {{RQ:Holder Speech}}
{{RQ:Lindsay Age of Consent}}: change the parameter |publisher=Ure Smith to |year=1962 wherever it has been used.

Thank you. — Sgconlaw (talk) 18:59, 15 March 2023 (UTC)Reply

@Sgconlaw This is done and also your request from Feb 19/20 (apologies for the delay, I had to rewrite the script that handles these requests to handle placeholder values in from-params that get copied to to-params (so in that case, 1= got copied to sermonname=). Benwing2 (talk) 09:28, 17 March 2023 (UTC)Reply

Thank you! — Sgconlaw (talk) 15:14, 17 March 2023 (UTC)Reply

Macrons in Classical Persian transliteration

Latest comment: 1 year ago22 comments4 people in discussion

Hi, it’s not appropriate to change macrons to circumflexes in Classical Persian romanizations. It might be right for Dari (since circumflexes suggest a qualitative rather than quantitative vowel difference), but macrons are both correct and absolutely standard for Classical Persian.—Saranamd (talk) 08:52, 17 March 2023 (UTC)Reply

Also @Atitarev.—Saranamd (talk) 08:53, 17 March 2023 (UTC)Reply

@Atitarev I discussed this and lots of other issues with User:Atitarev, where it was agreed to use circumflexes. I can switch the circumflexes to macrons specifically for Classical Persian if there's consensus to do so. Benwing2 (talk) 08:55, 17 March 2023 (UTC)Reply

Macrons are used in both Iranianist sources (e.g. Cheong, Etymological Dictionary of the Iranian Verb) and in the IJMES system, which is the standard “Orientalist” (for lack of a better word) transcription scheme used by English-language academic journals.

This is not universal, e.g. Thackston uses a circumflex in his Introduction to Persian and Millennium of Classical Persian Poetry, but it is certainly the minority for Classical transliterations and the decision should not have been made cursorily.—Saranamd (talk) 09:05, 17 March 2023 (UTC)Reply

Additionally, in the case of cases such as (anqa), I consider the transliteration of initial undesirable because Persian has initial glottal stop for all vowel-initial words, not just Arabic loans. We know that this was the case since Early New Persian.

If we are including this Arabic orthographic feature not reflected in any stage of Persian phonology, why not go the full way and use e.g. <ż> for ?—Saranamd (talk) 09:15, 17 March 2023 (UTC)Reply

@Atitarev, Saranamd Let's see what Anatoli says. It's a little late to make wholesale changes like you're suggesting to the translits since I just did a big run trying to clean them up and Anatoli is in the middle of handling the cases that couldn't be done automatically; would have been nice if you had taken part in the long discussion that Anatoli and I had before making the changes, see User talk:Atitarev#Persian questions. Benwing2 (talk) 09:26, 17 March 2023 (UTC)Reply

I wasn’t aware of the discussion (not having been pinged), my apologies. It does feel a little bad, since I have been the main (only?) regular contributor to Persian over the past year.—Saranamd (talk) 09:28, 17 March 2023 (UTC)Reply

@Saranamd My apologies, I didn't realize that you have been contributing. I have added you to the Persian workgroup data to future workgroup pings will reach you. See also the Beer Parlour discussion at Wiktionary:Beer parlour/2023/March#Cleaning up Persian templates. Benwing2 (talk) 16:14, 17 March 2023 (UTC)Reply

@Saranamd, Atitarev BTW I have thought about it and I believe that all quotes and other terms from Classical Persian should use the etymology code fa-cls in place of fa. (Note, Dari also has an etymology code prs.) In general it's not sustainable to have two translit schemes for a given language so separating by etymology language is a way forward. We will need to modify the Module:links and translit code to etymology-only languages can have their own translit modules (or at least, the etymology-only variant gets retained and passed into the language's translit module as an additional param rather than canonicalizing all etymology-only codes to their parent code, attention User:Theknightwho).

@Saranamd, Atitarev Fuck, forgot to sign so ping won't go through. Benwing2 (talk) 16:20, 17 March 2023 (UTC)Reply

@Theknightwho See just above. Benwing2 (talk) 16:21, 17 March 2023 (UTC)Reply

@Benwing2 This actually fits neatly with a broader idea I had about etymology-only languages: essentially, replacing them with “variants”. From a technical perspective, a variant object would basically start off as a clone of the parent language object, but we could use data modules to optionally vary them however we see fit. That would make things like this really straightforward (see also the Prakrits etc). It would also make conversions from language to variant/variant to language simpler, and would introduce opportunities for using them in other ways (e.g. it feels silly to have BrEng/AmEng as “etymology-only” languages, as it’s rarely relevant, but it might be useful to have variants to handle differing spellings or pronunciations). Theknightwho (talk) 17:24, 17 March 2023 (UTC)Reply

@Theknightwho Sounds good, can you write up a little more on your thoughts about this? I feel we should hash out how variants work before diving into an implementation. BTW didn't even realize British and American English have etymology codes. Benwing2 (talk) 18:10, 17 March 2023 (UTC)Reply

@Benwing2 Yep - will do that this evening. On a side note, I’d like us to get rid of nonstandard etym-only codes like "LL." if at all possible (or at least bar new ones), but I don’t know if that’ll be very popular. Theknightwho (talk) 18:13, 17 March 2023 (UTC)Reply

@Theknightwho Yeah people seem to like those codes but definitely at the very least we should prohibit new ones. Benwing2 (talk) 18:14, 17 March 2023 (UTC)Reply

In the case of Classical Persian, it might be justifiable to use modern Iranian translit since that is how Iranians pronounce Classical texts, much like how Mandarin speakers pronounce Classical Chinese with Mandarin readings.

But a dedicated Classical translit would be more desirable for many reasons:

Iranian pronunciation should not be overprivileged over Dari and other regional pronunciations, especially when Dari is closer to the actual pronunciations that Rumi or Sa’di would have used.
It is usually trivial to get Iranian pronunciation from a Classical translit, but not vice versa due to phonemic mergers in Iranian.
Some Classical rhymes do not rhyme in Iranian (e.g. (xward/xord, “he ate”) and (dard, “pain”)), and some non-rhymes are now rhymes. Classical texts, both prose and verse, tend to be heavily rhymed.

Saranamd (talk) 05:49, 18 March 2023 (UTC)Reply

@Benwing2, Saranamd Thank you both! I was "out of action" since Friday - a real life got in the way and it's going to be a busy week. I'll try to catch up on pings and outstanding things. --Anatoli T. ^{(обсудить}/^вклад) 22:31, 19 March 2023 (UTC)Reply

@Benwing2, Saranamd We're now in a position to create a Classical Persian transliteration module, as etymology-only languages now support full customisation. Theknightwho (talk) 03:42, 1 April 2023 (UTC)Reply

@Atitarev, Theknightwho Great! This can go on the list of Persian-related things to do (of which there are a lot :) ... the current code is messy and incomplete). Benwing2 (talk) 04:15, 1 April 2023 (UTC)Reply

@Theknightwho, @Benwing2 I do not think it is possible to automate Classical transliterations without making up our own vowel marks for the majhul vowels ē and ō.

In Classical/premodern dictionaries, pronunciations are indicated by poetic quotations where the meter and rhyme indicate the pronunciation by reference to a more widely known word, or by explicit designations (e.g. the dictionary might say that a given word has واوِ فارسی “the Persian و” for ō, etc.).

So I don’t think it is possible to transliterate CP, even when fully marked with diacritics.—Saranamd (talk) 04:54, 1 April 2023 (UTC)Reply

In late Indian sources, word-final majhul ē is distinguished from ī in the same way as modern Urdu, so that the critical difference between (āmadī, “you came”) and (āmadē, “he/she was coming”) can be represented. But I don’t think this is generally the case in premodern manuscripts.—Saranamd (talk) 05:07, 1 April 2023 (UTC)Reply

@Saranamd There are tricks that could be used in these cases. For example, we could implement special chars that are inserted after the majhūl vowels, which are passed onto the translit but removed before displaying the text or creating links. This exact thing is currently done for Korean with hyphens to mark prefixes, suffixes, etc.; the hyphens show up in translit but I'm pretty sure not elsewhere. I implemented this at the behest of User:Tibidibi. When I did this it was done in a very bespoke fashion for efficiency purposes, but it's possible that User:Theknightwho has generalized this mechanism by now. Benwing2 (talk) 05:30, 1 April 2023 (UTC)Reply

@Benwing2 @Saranamd Yep - you can make sure they're hidden with makeDisplayText in Module:languages. Theknightwho (talk) 05:46, 1 April 2023 (UTC)Reply

Listing of Periodic XML dump runs? etc.

Latest comment: 1 year ago10 comments2 people in discussion

I was going to ask you about a specific kind of error in out entries (mismatch between inflection-line template and L2 header).
But that got me thinking that it would be handy for each such run to be listed with a link its code, the date of the run, and the date of the dump on which it was run, as well as a description comprehensible by BP readers, rather than GP readers.
Relatedly, at one time years ago, either Ullman or Visviva did runs of L2 sections that were not linked-from, but had links-to other entries with the same L2. (I think it was called "Linkeration".) I don't remember exactly how it treated alternative and inflected forms, but such a run among lemmas has value and may lead to reduction of the number of orphaned entries.
Also, several of the special pages are no longer useful, probably beyond saving. Occasional runs duplicating the original intent of such pages (probably with more selectivity, eg, excluding user pages, or talk pages, or all spaces other that NS0) might be useful, even if the runs are infrequent (ie, annual, quarterly). Maybe you already do some of these things or have reasons for not doing them, but that gets back to my second point above. DCDuring (talk) 22:38, 17 March 2023 (UTC)Reply

@DCDuring Sure. It is pretty easy for me to write scripts to do various things with the XML dump. Can you clarify what some of the things above mean, with examples? E.g. which sorts of "runs" are you referring to in #2? By #3 you mean a given term in a given language that no other term links to (an orphan)? What is the purpose of restricting to only such terms that link to other same-language terms? In #4 by "special pages" you mean things in the Appendix and Wiktionary space, or my userspace pages, or ...? I agree there is a probably a lot of junk there in any case, although how would we identify such junk other than maybe by looking at the date of the last non-bot commmit? Benwing2 (talk) 23:40, 17 March 2023 (UTC)Reply

2. Any periodical processing of the XML dumps that identified recurring entry defects, whether formatting problems, misapplied templates, mismatches between Language and templates, labels, categories, etc.

3. AFAICR, in "Linkeration", considering only L2s in the same language, if entry A had a link to an entry B, but B did not have a link to A, that state of affairs appeared in a linkeration listing.

4. Special pages contains numerous pages with a maximum number displayed of 5,000, though there may actually hundreds of thousands of pages that actually have the characteristics. For, example, when I try to use such pages to find "wanted" items that are "Translingual" or "mul", I usually find that there are huge numbers of inflected forms that fill the displayed pages. Since that situation hasn't changes in 15 years, I believe it would be useful to have more selective replacements for such species pages, limited to, say, lemma pages, grouped by language in decending order of, say, "want".

HTH. DCDuring (talk) 00:05, 18 March 2023 (UTC)Reply

@DCDuring I actually don't normally do such periodic runs. The only things I do on a regular basis are create the {{auto cat}}-able categories in Special:WantedCategories every time it's refreshed (every 3 days) and periodically delete the empty categories in Category:Empty categories. I have done various runs fixing problems (e.g. a few months ago I did a run over almost all languages fixing misindented headers, and twice now I've done runs to remove unnecessary Unicode BIDI chars, particularly U+200E LEFT-TO-RIGHT MARK) but on a one-off basis. I don't currently have any infrastructure set up to automate recurring runs, although it's probably a good idea to do so, using toolforge or wmflabs or whatever.

As for Linkeration, looking for cases where "entry A had a link to an entry B, but B did not have a link to A" is very different from looking for orphaned pages; do you have an example of the output of any such run?
For #4, I see, and I think you've mentioned this before. I agree that a lot of the current Special: pages are filled with crap; it's too bad we don't have the ability to set custom filters to weed out that crap. I thought maybe there's an editor from Germany who produces the lists you're looking for? If not, and you can create a detailed requirements doc outlining what you are looking for, I can see about implementing it for a one-time run and later maybe automating it. Benwing2 (talk) 00:41, 18 March 2023 (UTC)Reply

2. Periodical runs would have value applied to new or recently revised L2 sections. I doubt that filters and patrols can prevent various outrages from being inflected on our entries, though maybe I underestimate our defenses.

3. Ullman is dead, Visviva has been inactive since 2022, but might be reachable. I hope he was the guy who did "Linkeration".

4. I can do that in a month or two.

Thanks. DCDuring (talk) 00:54, 18 March 2023 (UTC)Reply

3. See User:Visviva/Linkeration. DCDuring (talk) 02:03, 18 March 2023 (UTC)Reply

@DCDuring I see. It is looking for situations where A links to B using a relationship that "ought" to be reciprocated but isn't (synonym, antonym, related term, derived term reciprocated as related term?, homophone). I guess my question is, how important is this to worry about, vs. e.g. finding the non-junky "wanted" pages in a given language or finding orphaned pages? I can certainly implement this but each script takes some work and I'd like to do this in a priority order. Benwing2 (talk) 02:10, 18 March 2023 (UTC)Reply

Priorities are good, but very hard to decide on. Is quantity more important than quality? Are some errors more important than others? One class of priority items are those that have fundamental defects like inflection-line templates for a language other than that of the L2 in which they appear. Such defects lead to miscategorizaton, which makes them hard to find using searchbox searches to discover other defects, which are all too likely if there is a fundamental error. Generally, searching for and manually correcting picky errors leads one to entries that have multiple errors or, at least, weaknesses. In the English backbone, quality is now more important than quantity. Many quality problems need manual correction, using searches for defects likely to co-occur with multiple other, hard-to-detect defects. If defects need to be found using regex searches, then the "filters" (basic language and PoS categories, inflecton-line templates) needed to allow them to run to completion are fundamentals that need priority. Sorry if this seems like a TLDR rant and isn't clear. DCDuring (talk) 08:34, 18 March 2023 (UTC)Reply

@DCDuring I do understand what you're saying and I agree esp. for English (and several other well-covered languages) we don't need more entries for ever-more-obscure words, but we need to improve the quality of existing entries. One thing I've been particularly meaning to focus on with English is pronunciation, which is often missing or inconsistent; but that takes a good deal of work, either to create a pronunciation module (which will require a lot of thought given how inconsistent English spelling is) or to use some free source for English pronunciations (when I looked a few years ago, there were two of them and both had significant issues). As for things like wrong-language inflection templates, these are actually easy to find by bot (and I suspect many of the other fundamental defects you're thinking of can also be found pretty easily by bot). So if you want I can start with this and generate a list of such wrong-language entries. Benwing2 (talk) 19:45, 18 March 2023 (UTC)Reply

Can't help on pronunciation. I'd like to see whether we have very many of those wrong-language inflection templates. I've stumbled across ten or so lately, which I've corrected. Maybe there aren't as many as I fear. If they are few, we can wait a year or two before doing another run. Also, there are some untemplated "form of"-type definitions, which are easy to find if they have literally "form of" in the text. With all searches for defective entries I usually start with correcting taxonomic entries and English organism-name entries before attacking English as a whole. DCDuring (talk) 21:04, 18 March 2023 (UTC)Reply

Implementation of removing horizontal rule separators

Latest comment: 1 year ago2 comments2 people in discussion

Hi, I am an editor from Chinese Wiktionary. As the community in English Wiktionary decided to remove the ---- seperator, we want to follow up as well. I tried accomplishing this task with AWB, but it won't give the whole list of Special:AllPages, nor will it generate the list of entries with the seperator. There are other more powerful tools (like Pywikibot), but I am not familiar with them. Would you mind sharing your code, or the approach your bot used? --TongcyDai (talk) 19:09, 18 March 2023 (UTC)Reply

@TongcyDai Hi. What I did essentially is to download the English Wiktionary dump from dumps.mediawiki.org and use some existing scripts I've written that make use of Pywikibot. My scripts are here: [4]. The first thing I did was this:

bzcat enwiktionary-20230301-pages-articles.xml.bz2 | python find_regex.py --stdin -e '(^.*\n)?^--+$(\n.*)?' --all --namespaces 0 > find_regex.enwiktionary-20230301-pages-articles.xml.bz2.separator.out.1

This looks through the dump file for files matching the given regular expression.

Then I used this:

python rewrite.py --pagefile <(extract_pagename.sh < find_regex.enwiktionary-20230301-pages-articles.xml.bz2.separator.out.1) --from '\n+---+\n+==' --to '\n\n==' --from '\n+---+\n*\Z' --to '' --diff --comment 'remove horizontal rule separators per [[Wiktionary:Votes/2023-02/Removing the horizontal rule]]'

This does the actual change. There were about 478,000 pages needing changing so I actually did ten separate invocations of this command like this:

#!/bin/zsh

SAVE="--save"
cmd="/opt/local/bin/python rewrite.py --pagefile <(extract_pagename.sh < find_regex.enwiktionary-20230301-pages-articles.xml.bz2.separator.out.1) --from '\n+---+\n+==' --to '\n\n==' --from '\n+---+\n*\Z' --to '' --diff --comment 'remove horizontal rule separators per [[Wiktionary:Votes/2023-02/Removing the horizontal rule]]'"
sleep 0 && eval "$cmd $SAVE 1 25000 > rewrite.remove-horizontal-rule.1-25000.out.2" &
sleep 3 && eval "$cmd $SAVE 25001 75000 > rewrite.remove-horizontal-rule.25001-75000.out.1" &
sleep 6 && eval "$cmd $SAVE 75001 125000 > rewrite.remove-horizontal-rule.75001-125000.out.1" &
sleep 9 && eval "$cmd $SAVE 125001 175000 > rewrite.remove-horizontal-rule.125001-175000.out.1" &
sleep 12 && eval "$cmd $SAVE 175001 225000 > rewrite.remove-horizontal-rule.175001-225000.out.1" &
sleep 15 && eval "$cmd $SAVE 225001 275000 > rewrite.remove-horizontal-rule.225001-275000.out.1" &
sleep 18 && eval "$cmd $SAVE 275001 325000 > rewrite.remove-horizontal-rule.275001-325000.out.1" &
sleep 21 && eval "$cmd $SAVE 325001 375000 > rewrite.remove-horizontal-rule.325001-375000.out.1" &
sleep 24 && eval "$cmd $SAVE 375001 425000 > rewrite.remove-horizontal-rule.375001-425000.out.1" &
sleep 27 && eval "$cmd $SAVE 425001 478000 > rewrite.remove-horizontal-rule.425001-478000.out.1" &

wait

This may not be needed for the Chinese Wiktionary, which I assume is smaller. My scripts still use Python 2.7 and a somewhat old version of Pywikibot so you might have a bit of difficulty getting them running; you should also be able to write your own Pywikibot script, and I'm pretty sure Pywikibot already comes with a built-in script to do regex substitutions like my rewrite.py script, so you just have to figure out how to use it.

Benwing2 (talk) 20:43, 18 March 2023 (UTC)Reply

Some horizontal rule separators are still there

Latest comment: 1 year ago3 comments2 people in discussion

Did your bot complete the removal of horizontal rule separators already? I'm asking because I still see some, e.g. iBhayi, i.p.v., monoloog, Baai, toepassing, knorrig, pretpark, du Plessis, tikfout, inkus, speelkaart. Thanks! tbm (talk) 07:31, 19 March 2023 (UTC)Reply

@tbm That's because my bot used the Mar 1 2023 dump to figure out which pages need fixing. Any pages where the separators were only added afterwards will still have them. When the Mar 20 dump comes out (in two days or so), I will do another run, which should eliminate the remaining ones. Benwing2 (talk) 07:32, 19 March 2023 (UTC)Reply

Thanks for the explanation! tbm (talk) 07:33, 19 March 2023 (UTC)Reply

Module:cel-verbs

Latest comment: 1 year ago3 comments2 people in discussion

Thanks for reminding me to use the preview-page-with-template button. Since you intervened in my vain attempts to fix a problem in this module, I'll explain.

What I was attempting to do (and obviously struggled with) is to get the suffixal ablaut in the present stems to work properly. The issue is that Rua hard-coded the thematic e/o-conjugation by default via having "stem_e" and "stem_o" variables, which obviously doesn't work when the nasal infix on laryngeal-final roots gets involved. Most of my troubles have been attempting to work within this hardcoding instead of removing it entirely. I have since un-hardcoded this after you intervened (but this effort added around 2000 chars to the code). — Ceso femmuin mbolgaig mbung, mellohi! (投稿) 23:28, 19 March 2023 (UTC)Reply

@Mellohi! I see, thanks for the message. Yeah you should work on removing the hard-coding, although it will probably take some work; Proto-Celtic verbs, like other old Indo-European languages, are very complicated. I have created several conjugation modules and they can run to several thousand lines when all variants are supported, even with clever coding. Benwing2 (talk) 23:37, 19 March 2023 (UTC)Reply

BTW I wouldn't worry about adding lines to the code; this is necessary in any case and we can work later on eliminating the redundancy. Benwing2 (talk) 23:38, 19 March 2023 (UTC)Reply

Context objects

Latest comment: 1 year ago4 comments2 people in discussion

Hiya - just as an FYI, I've created Module:User:Theknightwho/contexts for "context objects", which provide a straightforward way to manage complex combinations of context flags. In essence, it's a dressed-up version of bitwise operations, with a few convenience measures thrown in. Once you've given it a list of of context names, they can be toggled/checked in various different ways. It also handles aggregate contexts (i.e. contexts as sets), and has a mechanism for adding further contexts/removing those which are no longer needed. There's also a way to save/load the context state, which remains compatible throughout the lifetime of the context object (even if certain contexts get added/removed in the meantime).

This was developed as part of my work on the wikitext parser, but I feel like it could probably come in handy elsewhere. Theknightwho (talk) 02:47, 21 March 2023 (UTC)Reply

@Theknightwho Cool, I will take a look. BTW are you up early or late? :-) Seems like it's around 3am in England now. As for parsing, I didn't respond to your Grease Pit comment from a few days ago but I agree that the path forward is probably to write a parser that punts the edge cases to frame:preprocess(). As long as the edge case detector is robust (which shouldn't be too hard to implement), this should work quite well, as the edge cases (such as nested templates and mismatched/misplaced brackets or braces) should occur rarely. Benwing2 (talk) 03:18, 21 March 2023 (UTC)Reply

@Benwing2 I mostly work from home and have a lot of control over my own schedule, so I guess it's very late! To be honest, I get most of my best work done at night, as there's nothing to distract me.

Re the nested templates, I think there are actually quite a lot of them lurking in other templates, which is where things get a bit tricky. I've made some decent progress on transpiling a general purpose parser, but I'm honestly unsure what the performance is going to be like. However, it'll be easier to know what I can cut out once it's more complete. The very basic one I made a couple of months ago showed massive memory gains very quickly, but that's also because it was very compact (and fragile): on a page with a few link and head templates, the memory usage went down from 8.32MB to 5.86MB with loading time increasing from 0.266 to 0.281 seconds, going from what I wrote down at the time. I also noticed that the memory savings were accelerating as the page got more complex, while the time increases were decelerating. Theknightwho (talk) 04:06, 21 March 2023 (UTC)Reply

@Theknightwho Yeah I am also a night owl, although more recently I've tried to avoid staying up past 2-3am. It's great that even a simple parser showed significant memory gains. I still think though before you go down the path of implementing nested templates, see how often these actually occur; and even if you handle them, I wouldn't bother trying to handle more than one level. Because of wiki syntax weirdness and a wiki parser where (AFAIK) the "spec" is simply how the code runs, things can get really complex really quick when you have nesting. Benwing2 (talk) 04:16, 21 March 2023 (UTC)Reply

Bad bot edit

Latest comment: 1 year ago2 comments2 people in discussion

Probably a one-off, but just FYI: https://en.wiktionary.org/w/index.php?title=tusk&diff=prev&oldid=72178172 JeffDoozan (talk) 14:45, 22 March 2023 (UTC)Reply

@JeffDoozan Thanks, yeah my script to move synonyms inline apparently doesn't work perfectly in the presence of HTML comments. Fixed manually. Benwing2 (talk) 19:17, 22 March 2023 (UTC)Reply

Updating module message.

Latest comment: 1 year ago3 comments2 people in discussion

The things I don't know:

Logic of module
Where message is displayed
Why we need taxon "rank" (or other type) in the message
How to pass a parameter from template to message

With this level of ignorance I can't responsibly edit the module. DCDuring (talk) 11:55, 28 March 2023 (UTC)Reply

@DCDuring OK, my apologies and thanks for the message. An example is here: Category:Entries using missing taxonomic name (group) The message has two parts, a "description" (the first line) and an "additional" (the second line). Can you simply write out what you think the text of the category should be, including the taxon type if you think it should be present? For example, maybe the message should read this:

Entries that link to wikispecies because there is no corresponding Wiktionary entry for the taxonomic name group in the template {{taxlink}}.

instead of this:

Entries that link to wikispecies because there is no corresponding Wiktionary entry for the taxonomic name in the template {{taxlink}}.

Benwing2 (talk) 18:16, 28 March 2023 (UTC)Reply

Entries that link to Wikispecies because there is no corresponding Wiktionary definition for it as a taxonomic group.

The underlining highlights how my wording differs from yours. DCDuring (talk) 18:28, 28 March 2023 (UTC)Reply

Polish Adjective Module

Latest comment: 1 year ago4 comments2 people in discussion

I'd like to add an obsolete form to the instrumental plural (feminine/neuter nouns could take -emi in hard stems or -iemi in soft), I'm not sure what is needed to do that. (I'd also like to eventually get acceleration on adjectives and nouns, but the code is so spaghetti-like I'm not sure that'd be easy. Up to the task? Vininn126 (talk) 21:26, 11 April 2023 (UTC)Reply

I also realized that our changed to olddat= should be bot changed to 1 instead of the absorbed form (some of the masculine virile nominative forms are set as the old dative). Vininn126 (talk) 21:26, 11 April 2023 (UTC)Reply

@Vininn126 I can definitely take a look after some more work on Czech nouns. Probably we should just rewrite the adjective module; the Czech adjective module only took a couple of days to write and hopefully Polish isn't much more complex. Benwing2 (talk) 21:39, 11 April 2023 (UTC)Reply

Thanks. Plus the module is... done in a way, at least. Maybe that will speed things up. We can probably even rename it to something simpler. Vininn126 (talk) 21:40, 11 April 2023 (UTC)Reply

Declension of Russian ха́нец

Latest comment: 1 year ago7 comments3 people in discussion

Hiya - Russian ха́нец (xánec) doesn't seem to have its declension accounted for by {{ru-noun-table}}, so at the moment it all has to be entered manually. It's a reducible stem with an emergent ь. Also pinging @Atitarev. Theknightwho (talk) 23:09, 17 April 2023 (UTC)Reply

@Theknightwho: This word is irregular, in terms of it having the "ь" in the stem. By Palladius System rules 漢／汉 (hàn) is transliterated as хань (xanʹ), hence ха́ньцы (xánʹcy). The singular form must be back-formed from plural. @Benwing2: I think a manual input is required, at least for some forms, not sure. I don't know another word with such declension.

Entry created by @Tetromino. Anatoli T. ^{(обсудить}/^вклад) 23:35, 17 April 2023 (UTC)Reply

Actually, тайва́нец (tajvánec, “Taiwanese male person”) belongs in the same boat. E.g. nominative plural can be both тайва́нцы (tajváncy) or тайва́ньцы (tajvánʹcy). The reason is the same. 臺灣／台湾 (Táiwān) is Тайва́нь (Tajvánʹ) per Palladius Cyrillisation system. Anatoli T. ^{(обсудить}/^вклад) 23:40, 17 April 2023 (UTC)Reply

@Atitarev Thanks. If this is down to Palladius rules, it would apply to (edit: words backformed from) any Mandarin borrowing ending in (pinyin) "n", which includes a lot of place names such as Сиань (Sianʹ, “Xi'an”). I know the system, as I added Palladius to {{zh-pron}}. Theknightwho (talk) 23:43, 17 April 2023 (UTC)Reply

@Theknightwho: Oh, thanks. I didn't pay attention. Now, I can see that Module:cmn-pron uses Palladius.

тайва́нец (tajvánec) can be created regularly (with -нц-) but for cases with "ь" (before ц), a manual override would currently be required. Anatoli T. ^{(обсудить}/^вклад) 23:49, 17 April 2023 (UTC)Reply

@Atitarev, Theknightwho This issue came up before and I remember either implementing something for it or saying I would implement something. Let me see whether I actually did anything. Benwing2 (talk) 02:04, 18 April 2023 (UTC)Reply

@Benwing2 Thanks for implementing this - it works great on ха́нец (xánec). Should we add a category for these? It's sufficiently weird/interesting enough that it's probably worth it; especially given that it's not predictable. Theknightwho (talk) 23:26, 18 April 2023 (UTC)Reply

Replacement of unnecessary redirects and templates

Latest comment: 1 year ago3 comments2 people in discussion

Here is another batch:

{{RQ:Pope Odyssey}} → {{RQ:Homer Pope et al Odyssey}}
{{RQ:Tillotson Advantages}} → {{RQ:Tillotson Works|sermon=IV}}
{{RQ:Tillotson Folly}} → {{RQ:Tillotson Works|sermon=II}}
{{RQ:Tillotson Wisdom}} → {{RQ:Tillotson Works|sermon=I}}

Thank you. — Sgconlaw (talk) 13:04, 22 April 2023 (UTC)Reply

@Sgconlaw Done, apologies for the delay as I was overseas. Benwing2 (talk) 17:10, 27 April 2023 (UTC)Reply

No worries at all. Thanks! — Sgconlaw (talk) 17:18, 27 April 2023 (UTC)Reply

`{{fa-IPA}}`

Latest comment: 1 year ago2 comments2 people in discussion

Hello, have there been any updates to the template? Thanks in advance.—Saranamd (talk) 06:10, 24 April 2023 (UTC)Reply

@Saranamd Not recently; I got partway through and then shifted to Czech. I will get back to Persian soon. Benwing2 (talk) 06:40, 24 April 2023 (UTC)Reply

More Latin vowel length changes

Latest comment: 1 year ago4 comments2 people in discussion

Hi, I made a few more changes to some Latin vowel lengths in words with "hidden" quantities: hirtus, hirsutus, luxus, luctor but I still need to get all inflected forms and derivatives updated. I'd appreciate any help you can spare. Also, I saw your comment to Brutal Russian about Bennett and changes from that source. I think while it's a great place to start, there is definitely more recent work that has been done that in some cases contradicts Bennett's conclusions; I've added citations to all of the above entries saying which authors give which lengths. Even if it isn't immediately clear why, Brutal Russian probably had some reason for each edit, so I'd be cautious when changing any entries back; I think it would be a good idea to first check De Vaan, and if there's an entry for it, Wartburg's Französisches Etymologisches Wörterbuch and Buchi and Schweickard's Dictionnaire Étymologique Roman as well to get more recent perspectives on vowel lengths. Urszag (talk) 15:09, 29 April 2023 (UTC)Reply

@Urszag Thanks, I'll work on this soon. I haven't changed any entries back but I would definitely like to hear from User:Brutal Russian; when there is any question about hidden length it's important to add sources so I'm glad you are doing this. Benwing2 (talk) 15:15, 29 April 2023 (UTC)Reply

@Urszag You pinged me on ullus about footnotes. There's no current way of specifying footnote symbols or numbers in template arguments. Probably the easiest way of adding this is to use Module:table tools when displaying the forms; I'll see about implementing it. Benwing2 (talk) 21:53, 30 April 2023 (UTC)Reply

Oh, thank you!--Urszag (talk) 23:49, 30 April 2023 (UTC)Reply

Replacement of unnecessary redirects and templates

Latest comment: 1 year ago3 comments2 people in discussion

Hi, a new batch:

{{RQ:Fitzgerald Great Gatsby}} → {{RQ:Fitzgerald Great Gatsby|year=1953}} – the 1st edition of The Great Gatsby is now available at the Internet Archive, so I have updated the template to use it. Unfortunately it has a different pagination from the 1953 version originally used by the template, so |year=1953 needs to be added to current uses of the template (except at aquaplane, but I'll fix that manually).
{{RQ:Lewis Magician's Nephew}} → {{RQ:C. S. Lewis Magician's Nephew}}
{{RQ:Wollstonecraft Vindication}} → {{RQ:Wollstonecraft Vindication Women}}

Thank you. — Sgconlaw (talk) 15:51, 7 May 2023 (UTC)Reply

@Sgconlaw Should be done. I also renamed {{RQ:Fitzgerald Gatsby}} → {{RQ:Fitzgerald Great Gatsby|year=1953}}. Apologies for the delay. Benwing2 (talk) 21:59, 13 May 2023 (UTC)Reply

No worries. Thanks! — Sgconlaw (talk) 22:16, 13 May 2023 (UTC)Reply

Russian transliteration scraper

Latest comment: 1 year ago11 comments3 people in discussion

Also tagging @Atitarev. I've develoepd a Russian transliteration scraper which is able to grab manual transliterations from headword templates, meaning that they only need to be entered in one place. That means, for example, that {{l|ru|атеисти́ческий}} would output атеисти́ческий (atɛistíčeskij) instead of атеисти́ческий (ateistíčeskij). It's currently in Module:User:Theknightwho/ru-translit as export.scrape_tr. Here is my idea of how it should work:

It checks for the exact stress pattern, and ignores anything else, which reduces the chance of ambiguity if there are multiple headwords with different stress patterns available.
If no stress is given and the term includes ё, then we treat it like a stress accent and check for an exact match only.
Otherwise, if no stress is given, then we grab all available possibilities and remove the stress accents, and then list them. That avoids terms like атеистический falling back on automatic transliteration, which would be wrong, because the entry shows that atɛističeskij is the only valid transliteration. However, in other cases where multiple transliterations are available, then either that's correct (e.g. terms like бел (bɛl, bel) where both are valid), or it's a prompt that the user should be more specific.
In some cases, transliteration is ambiguous even when the stress is given. I suggest that we give multiple transliterations as a comma-separated list in those situations, too: e.g. гэ́канье (hɛ́kanʹje, gɛ́kanʹje), for the exact same reason.
However, I think there should be an exception for alternative spellings where е is substituted for ё. Unless it's the only one given, they shouldn't be taken into account: e.g. берет (beret) and not берет (berjót), even though the latter is the only unstressed headword on the page. The reason being that these are marginal terms that rarely need to be linked to.

I've done my best to keep things as efficient as possible, and I haven't noticed a significant performance hit. For comparison, some of the Chinese entries scrape over 2,000 pages in a similar way. Do let me know your thoughts. Theknightwho (talk) 00:48, 8 May 2023 (UTC)Reply

@Theknightwho: Sounds good.

Will it work on inflected forms as well or only on exact same forms? E.g. атеисти́ческое (nom. and acc. neuter, sg.) is currently undefined.

гэ́канье differ by senses. "hɛ́kanʹje" and "gɛ́kanʹje" (or the verb it is derived from) are actually opposite perspectives of the pronunciation of letter г (g) in Russian (standard/regional). Overall, I am not sure, if pairs like |head2= and |tr2= are better than comma-separated transliteration2 but the former looks neater.

It's alright to fail the transliteration if the ambiguity exists. Anatoli T. ^{(обсудить}/^вклад) 01:00, 8 May 2023 (UTC)Reply

@Theknightwho Overall this sounds good but I need to think about some of the special cases. For example, how are multiword terms handled? Also, the current transliteration module has special cases to handle ё occurring along with a stress (трёхэта́жный (trjoxɛtážnyj)) and even occurrences of two ё in a single word (трёхколёсный (trjoxkoljósnyj)). Do you handle these correctly? I will take a look at your code when I have a chance. Benwing2 (talk) 01:06, 8 May 2023 (UTC)Reply

@Atitarev It'll work if the page has been created, but otherwise it's forced to fall back on automatic transliteration. In theory it might be possible, but it would involve reversing the inflection algorithm, and I doubt it's possible to do that unambiguously in a way that's efficient enough. Inflections aren't linked very often, though, and if you notice that the scrape isn't working then it's simple to just create the inflection page.

I would rather avoid transliteration fails, as unlike Chinese they will be rare, meaning they're much more likely to confuse the user. Giving multiple possibilities feels like a better hint to the user that they should clarify, and in some cases it's simply better to show that more than one transliteration is possible. We should probably cap it at two, though, if we do take that approach.

@Benwing2 Multiword terms are handled differently depending on the headword template. For {{ru-verb}}, {{head}} etc., it's simply a case of grabbing what's in the transliteration parameter. If there are multiple possibilities, they're already given as a comma-separated list. This is cross-checked against the relevant head parameter first, so that only the correct stress pattern gets used. For {{ru-noun+}} and {{ru-proper noun+}}, the template parses the list of arguments, creating a list of all possible combinations (for the cases where multiple transliterations are provided). Manual transliterations are substituted into the text, and then that text is fed through the main transliteration function. This works on the assumption that it's safe to input Latin text to the translit function, but I need to confirm with you that that's actually the case. As an example, {{m|ru|арифмети́ческая прогре́ссия}} would give арифмети́ческая прогре́ссия (arifmetíčeskaja progréssija, arifmetíčeskaja progrɛ́ssija) from the input

{{ru-noun+|[[арифметический|арифмети́ческая]]|+|_|[[прогре́ссия]]|or|[[прогре́ссия]]//progrɛ́ssija}}

, which matches what's shown on the headword line.

For cases like трёхэта́жный (trjoxɛtážnyj), I want to double-check how these should be handled. Using it as an example:

If трёхэта́жный and трёхэтажный were separate headwords on the page, then not including the stress mark would output the transliteration for трёхэтажный, and it would not be treated as a stressless version of трёхэта́жный. e.g. {{m|ru|трёхэта́жный}}: трёхэта́жный (trjoxɛtážnyj); {{m|ru|трёхэтажный}}: трёхэтажный (trjóxɛtažnyj).
However, if трёхэта́жный is the only headword (as is the case), then not entering the stress mark means it should be treated as a stressless version of трёхэта́жный: {{m|ru|трёхэтажный}}: трёхэтажный (trjoxɛtažnyj). This avoids misleading the user with the wrong stress.

The overall point is making sure that the output matches the transliterations as given on the page, while making sure to include/exclude stress marks depending on whether the user has included them.

I still need to implement the removal of stress marks for unstressed input, as I wanted to check with both of you first as to how it should be handled. I'll let you know once that's done (which will probably be tomorrow). Theknightwho (talk) 01:35, 8 May 2023 (UTC)Reply

@Theknightwho Can you explain more why you think showing no transliteration ("transliteration fail") is confusing? It seems to me it may be better to show nothing if there's ambiguity (not simply a single headword with two possibilities, but two separate headwords with different stresses), or show some explicit indication of ambiguity, not just put two translits. In any case the page should be added to a tracking category.

As for {{head}}, there may be multiple transliterations in |tr2=, |tr3=, etc. so we need to handle that.

As for Latin in the translit function, yes that should be safe.

It sounds like you're doing the right thing for трёхэта́жный (trjoxɛtážnyj). In any case I think that only words with the prefixes трёх- and четырёх- work this way, although I'm not completely sure. Benwing2 (talk) 02:33, 8 May 2023 (UTC)Reply

@Benwing2 At the moment it cross-checks head2 with tr2 and so on. I've realised I haven't accounted for when the number of tr params exceeds the number of head params, so I'll need to add something to handle that. I'd be very surprised if more than a couple of pages actually have that (as it could only happen on pages with ё), but there's no harm in handling it.

The reason why I think showing no transliteration is confusing is because it's not something that Russian editors are used to, and it would only happen for a very small number of terms. I think it's something that we could do, but only in situations when two identical headwords have different (sets of) transliterations, as opposed to where two transliterations are given for the same headword (where we can just show both): the former changes depending on the sense, while the latter doesn't. Because of its rarity, I think my preference would be for it to show [transliteration needed], in the same way headword templates do. Theknightwho (talk) 03:02, 8 May 2023 (UTC)Reply

@Theknightwho I think showing [transliteration needed] in place of the transliterations is fine, or maybe [ambiguous transliterations; manual transliteration needed] or something. Benwing2 (talk) 08:37, 8 May 2023 (UTC)Reply

Good idea. I've also found a couple of terms with four transliterations for the same headword: конгрессме́н (kongressmén, kongressmɛ́n, kongrɛssmén, kongrɛssmɛ́n) and сенеша́ль (senešálʹ, sɛnɛšálʹ, sɛnešálʹ, senɛšálʹ), which both seem a bit excessive. These are both single-word terms, but things can quickly get out of hand if a multiword term has two or three words like this, as the headword template gives every combination. Should we introduce a cap for the scraper? Theknightwho (talk) 13:29, 8 May 2023 (UTC)Reply

@Theknightwho For inflections I have the concept of a "variant code" that helps ensure that e.g. if the instrumental singular of a noun has either -ой or -ою and the corresponding adjective has the same variation, you get the -ой's lining up and the -ою's lining up rather than all combinations; but I'm not sure if that's feasible for transliterations. But normally you shouldn't see cases like this; it would seem extremely rare to have two such multi-translit terms in the same expression. However, just in case it might make sense to have a cap of 4. Benwing2 (talk) 19:50, 8 May 2023 (UTC)Reply

That makes sense. I’ve had the thought that this is more likely when you account for usage examples as well, so we should probably do something to prevent things spiralling. It should be relatively straightforward to match alternation between e and ɛ, which will catch most instances of exponential growth in outputs.

In practical terms I will work on the test cases today, as it would be good to have a lot of them before rolling this out. Theknightwho (talk) 14:37, 13 May 2023 (UTC)Reply

@Theknightwho That sounds like a good idea. Benwing2 (talk) 21:44, 13 May 2023 (UTC)Reply

autoelegirse

Latest comment: 1 year ago2 comments2 people in discussion

autoeligió is in CAT:E with an unfixable module error, but that seems to be just a symptom of the main problem: acceleration fails for a large part of the paradigm at autoelegirse: pretty much all of the finite forms in the inflection table are redlinks. elegir doesn't have this problem. My only guess is that it might have something to do with the lack of an entry for autoelegir (I'm not sure if we would want one, since the object is always the same as the subject, but I don't normally edit Spanish entries).

Not that this very narrow edge case needs to be fixed immediately, but deleting autoeligió will take it off the radar and I wanted to make someone aware of the issue before I do that. Chuck Entz (talk) 16:00, 13 May 2023 (UTC)Reply

@Chuck Entz It looks like the module needed a little help to know how to conjugate the verb (it doesn't look up conjugations on other pages). Benwing2 (talk) 19:33, 13 May 2023 (UTC)Reply

Reflexive verbs

Latest comment: 1 year ago6 comments2 people in discussion

Could you also run that script on Old Polish, modern Polish, Silesian, and Kashubian to switch reflexive to reflexive-PRONOUN? I've updated their respective labels data modules to include it. Vininn126 (talk) 11:25, 14 May 2023 (UTC)Reply

@Vininn126 How do I know which pronoun to use? (And can you enumerate the possible pronouns for each language?) Benwing2 (talk) 17:30, 14 May 2023 (UTC)Reply

It should be in the module, hopefully. It should be one since these pronouns are more like unchanging particles. Vininn126 (talk) 17:39, 14 May 2023 (UTC)Reply

@Vininn126 I guess what I mean is, e.g. Czech has both se (accusative) and si (dative); similarly for Bulgarian. Does any such thing exist for any of the Lechitic languages? Benwing2 (talk) 18:06, 14 May 2023 (UTC)Reply

Yes, however these are generally lemmatized separately, see wmawiać sobie or radzić sobie. This is how WSJP does it. I think there might be one or two with <nocode>(reflexive, with sobie)</nocode> or some variant thereof. This should only be in Polish, no other Lechitic language. Vininn126 (talk) 18:15, 14 May 2023 (UTC)Reply

@Vininn126 I am currently pushing changes for Polish; will do the others shortly. Benwing2 (talk) 04:38, 18 May 2023 (UTC)Reply

Changes to the headword module.

Latest comment: 1 year ago31 comments7 people in discussion

Hello. Some edit to Module:headword is probably causing the pagename to be reproduced over and over again when a page has more than one transliteration. Case in point: राज्य and पक्ति where |tr2= and |tr3= are used, causing the headword to appear multiple times. It was of course not like this before. Could you look into this issue? Thanks. -- 𝓑𝓱𝓪𝓰𝓪 𝓭𝓪𝓽𝓽𝓪^{(𝓽𝓪𝓵𝓴)} 04:33, 15 May 2023 (UTC)Reply

@Bhagadatta Do you know how long this has been like this? Benwing2 (talk) 04:35, 15 May 2023 (UTC)Reply

Since at least a month, as I noticed it in April. -- 𝓑𝓱𝓪𝓰𝓪 𝓭𝓪𝓽𝓽𝓪^{(𝓽𝓪𝓵𝓴)} 04:46, 15 May 2023 (UTC)Reply

@Bhagadatta OK, this may be due to my headword rewrite in March, which generally aligns headwords and the corresponding translit. I'll add a special case so if there's only one headword and multiple translits, the headword will display once. But I'm not sure what to do if e.g. there are 2 distinct headwords and 3 translits, as if I only display two headwords, it will be ambiguous which translit goes with which headword. Benwing2 (talk) 05:24, 15 May 2023 (UTC)Reply

Okay, as I understand it, in case of multiple headwords, |tr1 takes care of headword 1, |tr2 takes care of headword 2 and so on, with there being an uncertainty with regards to what the module will display if there is a third translit but no third headword. Looks like this issue was discussed as early as 2014.^[5]

Could a parameter be added so that a user can enter (for instance) |h1tr1, |h1tr2 and so on for all transliterations of the first headword and |h2tr1 and so on for the second headword? Is such a change feasible? -- 𝓑𝓱𝓪𝓰𝓪 𝓭𝓪𝓽𝓽𝓪^{(𝓽𝓪𝓵𝓴)} 05:45, 15 May 2023 (UTC)Reply

Alternatively users could manually enter |headN=[TERM] corresponding to the transliterations in case the headwords are distinct. Svartava (talk) 05:48, 15 May 2023 (UTC)Reply

@Bhagadatta I don't think these extra params are needed, because the module can look for adjacent headwords that are the same and collapse them. The issue however is how to display these; currently we display all headwords together followed by all translits, which makes for difficulties in the situation described above. We'd have to totally restructure the display and somehow show headwords along with corresponding translits; this is not conceptually hard to do but would require some discussion and consensus. Benwing2 (talk) 05:49, 15 May 2023 (UTC)Reply

Addressing just Benwing2's display concern: Does differential punctuation help? Eg, semi-colon between the sets of transliterations for headwords, commas within the sets. When I do this kind of thing for hyponyms of taxonomic names with two ranks (eg, tribe and genus) of taxa being displayed and attempting to show the membership of the genera in their tribes, I find that the punctuation difference is a little hard to notice. Could it be emboldened? DCDuring (talk) 17:08, 15 May 2023 (UTC)Reply

@DCDuring Yes, we can change the fonts and punctuation of the headword line. If you have the energy, maybe you could create a few mockups, and then I'll bring this to the Beer Parlour. Benwing2 (talk) 04:37, 18 May 2023 (UTC)Reply

Can you point me to an example of two headwords and three or more transliterations? DCDuring (talk) 17:58, 18 May 2023 (UTC)Reply

To illustrate "enhanced" semi-colons to group items, see under Hyponyms at Amygdaloideae. The semicolons are both emboldened and embiggened. Alternative approaches would have something like " - genera in Sorbarieae" after each group of members of a tribe and/or put each group of genera on a separate line, or place parentheses or other bracketing punctuation around each group.

In my case, I could also put the type genus for each tribe first in each grouping of genera, the tribe name almost always being derived from the type genus. But, in this case, the type genus of Amygdaleae is its sole genus Prunus. DCDuring (talk) 18:33, 18 May 2023 (UTC)Reply

@DCDuring Hmm. The boldfaced and upsized semicolons aren't so noticeable to me and when I look closely at them they look a bit strange; I wonder if it would be better to use oversize slashes with a space on each side. As for headword lines with multiple headwords, each with multiple translits, I can't think of an example but I'm almost sure there are Russian examples with two or more possible stresses, each of which can have a palatalized or nonpalatalized consonant before /e/. User:Atitarev, can you think of such an example? Benwing2 (talk) 19:39, 18 May 2023 (UTC)Reply

деоккупация, деактивированный, деактивировать. Different types of speech are implemented differently and input is somewhat different.

The term роженица has multiple stresses.

Check also Arabic بروليتاريا, which looks differently but there are words with multiple vocalisations, which differently still. can't think of any ATM. Anatoli T. ^{(обсудить}/^вклад) 23:14, 18 May 2023 (UTC)Reply

I think you're after something like роженица in this discussion. Anatoli T. ^{(обсудить}/^вклад) 23:21, 18 May 2023 (UTC)Reply

It looks like there is at present a one-to-one relationship of stressed-headword-form to transliteration in the case of the entry for роженица (roženica). I thought we are looking for how to handle the situation where there are multiple transliterations for the same stressed-headword-form. The inflection lines already seem too complicated. Is there evidence that one of the stress patterns is significantly more common? Perhaps only that one could appear on the inflection line, with the others concealed under a show-hide bar? With a simpler inflection line the complication of the palatalize/nonpalatalized consonant might not be too much for the inflection line. DCDuring (talk) 00:44, 19 May 2023 (UTC)Reply

@DCDuring: деоккупа́ция (deokkupácija) is an example of the same stressed-headword-form. The transliterations are comma-separated, which may be imperfect but what's better?

What kind of evidence do you require for роженица (roženica)? The most common pronunciation ро́женица (róženica) is proscribed. Anatoli T. ^{(обсудить}/^вклад) 01:13, 19 May 2023 (UTC)Reply

I'm losing track of what the problem is. DCDuring (talk) 01:35, 19 May 2023 (UTC)Reply

@DCDuring: To be honest, me too. @Bhagadatta complained about the display issues. Currently, I see no issues at राज्य (rājya). Multiple transliterations look fine to me. I actually prefer or to other delimiters. Anatoli T. ^{(обсудить}/^вклад) 02:23, 19 May 2023 (UTC)Reply

The biggest problem from my non-Russian perspective is that some Russian inflection lines seem too busy, but what do I know? The problem I started out thinking I was addressing seems relatively rare, but is made worse by the busy-inflection-line problem. DCDuring (talk) 12:15, 19 May 2023 (UTC)Reply

I was looking for a minimal change. I have not implemented what I did at Amygdaloideae elsewhere, because I was not happy with it. It is asking a lot for users to notice and ascribe meaning to this kind of unconventional extension of the interpretation of a semicolon vs. a common. People have enough trouble with the simplest nesting: lists, separated by semicolons, of lists of comma-separated items even where the hierarchical structure does not require reference to a list on a different line. We can forget about my problem. I'll try to find some way to exploit the nature of the content to resolve it - or I'll stop trying to overload the entry with data. DCDuring (talk) 00:56, 19 May 2023 (UTC)Reply

@DCDuring Those semicolons don't look great to me, and they also make the wikitext absolutely awful. Are you really sure that's the best approach? Theknightwho (talk) 17:01, 19 May 2023 (UTC)Reply

We could clean up the wikitext with a template (or my other means?). I assume the HTML would still be a mess, unless CSS could save the day. If I really were sure it was a good idea, I'd have implemented it. DCDuring (talk) 14:25, 20 May 2023 (UTC)Reply

See Scombrinae#Hyponyms for another approach to the two-level grouping problem. I'm not sure about this one either. DCDuring (talk) 18:17, 21 May 2023 (UTC)Reply

@DCDuring Seems a bit better. Note that I implemented two-level grouping in synonyms, antonyms and the like using semicolons for the outer grouping and commas for the inner grouping; but in that case I think it might be a bit clearer. Benwing2 (talk) 19:12, 21 May 2023 (UTC)Reply

In the end it is a tradeoff between space and user convenience. It would be easy to argue that the two-level hyponym content overloads the user, the formatting problem just highlighting the overload problem. I would need to come up with some criterion about Hyponyms to decide whether to have two levels or one and, if one, which one. I think I have more degrees of freedom than you do. DCDuring (talk) 21:15, 21 May 2023 (UTC)Reply

@DCDuring Yes, agreed. The reason I implemented the semicolon in synonyms etc. is that people often listed a whole load of synonyms of different sorts in the same line, and I wanted to make it easier to logically separate them and in particular to allow people to specify a single qualifier for all synonyms in the group and make it fairly clear that the qualifier applied to the whole group. An alternative in the case of synonyms is to list the different groups on different lines; maybe you can do that with Hyponyms as well. Benwing2 (talk) 22:30, 21 May 2023 (UTC)Reply

As we speak, I am working on the entry for Brassicaceae, with many important genera. I am just separating the wheat from the chaff at the moment, but obviously the two-level hyponyms are a problem. I may only present the tribes on the same line as the genera therein. I should have come up with that before, but I've no one who takes an interest, not even ChuckE. DCDuring (talk) 22:39, 21 May 2023 (UTC)Reply

@DCDuring Hmm, definitely I would break up the long Hyponyms line in Brassicaceae into several. Using multiple lines with different indents seems natural for phylogenetic data because of the tree structure. Benwing2 (talk) 22:41, 21 May 2023 (UTC)Reply

I don't like the tree structure for our entries because it takes up even more space, but you may be right. Probably by some time next week, I'll have this done as yet another alternative presentation structure. Then I will do the tree structure on another, similar entry. Of all the comprehensive taxonomic databases, none includes all the ranks, especially below family level. We aren't and can't be comprehensive, but we try to include such ranks. I need to reduce the clutter they cause, without eliminating them. DCDuring (talk) 22:53, 21 May 2023 (UTC)Reply

@DCDuring: I took a stab at a different formatting for Brassicaceae. I've only done the first few tribes as a proof of concept. I used the Wikispecies version of the tribes because no one has gotten around to the Wikipedia version: the tribes are listed and the genera are listed, but there's no information in most cases as to which genera are in which tribes. There are also a few genera that don't have Wikipedia articles yet. For instance, Irania is a redlink on a diambiguation page.

At any rate, when you have a huge list like that, I think you're better off doing something to visually mark the groups so you don't have to scan though dozens and dozens of identically formatted taxonomic names looking for them. Chuck Entz (talk) 01:15, 22 May 2023 (UTC)Reply

I believe the best source for Brassicaceae taxonomy is Brassibase at Heidelberg. {{R:Brassibase}} is how I plan to reference the links to the data item, the most important feature of which is tribe membership for species and genera. DCDuring (talk) 02:54, 22 May 2023 (UTC)Reply

Wingerbot edit - one full stop too many

Latest comment: 1 year ago3 comments2 people in discussion

In this edit, Wingerbot added an extraneous full stop after {{pedia}}. (Only within the etymology; later usages were correctly ignored.)— Pingku^dimmi 11:45, 19 May 2023 (UTC)Reply

@Pingku Thanks. My script has a long list of templates that include a period in their output but it missed this one; I have added it. Benwing2 (talk) 00:06, 21 May 2023 (UTC)Reply

Thanks.— Pingku^dimmi 12:10, 21 May 2023 (UTC)Reply

Serbocroatian femeqs

Latest comment: 1 year ago10 comments4 people in discussion

@Anarhistička Maca @Stujul I propose we use {{femeq}} on Serbocroation entries. Vininn126 (talk) 20:21, 23 May 2023 (UTC)Reply

@Vininn126 This is fine with me. Benwing2 (talk) 23:26, 23 May 2023 (UTC)Reply

Yes, I agree, as long as the definitions are still shown. Stujul (talk) 12:48, 24 May 2023 (UTC)Reply

Yes. I also like how @Stujul has started to separate diminutive senses from the others, like on verižica, seems a good practice.Anarhistička Maca (talk) 07:28, 25 May 2023 (UTC)Reply

@Stujul, Anarhistička Maca I am confused about that particular entry. Why are the three definitions indented under the "diminutive of" definition? If this is because the base word has three senses, and you can form a diminutive of each, the definitions at verižica should make this clear, e.g. "small pothook" rather than just "pothook". If these senses aren't simply diminutives of the base word senses, but have taken on a life of their own, they shouldn't be indented under the "diminutive of" line. Benwing2 (talk) 07:47, 25 May 2023 (UTC)Reply

As for including the definition of {{femeq}} forms, we do that for Russian as well. Benwing2 (talk) 07:48, 25 May 2023 (UTC)Reply

@Benwing2 I guess they should be described more fully with qualifiers. Notwithstanding, I like how it appears visually. I think it's a good use of subsenses. Anarhistička Maca (talk) 07:51, 25 May 2023 (UTC)Reply

I think most cases are going to not need that, since most will only have one definition, in which case they will look like Czech entries, with a {{{t}}} providing a definition. Vininn126 (talk) 08:49, 25 May 2023 (UTC)Reply

I wanted to use the {{diminutive of}} template in the definition for categorisation, but I don't like to use it multiple times. I thought this would be the best alternative. I've seen other people use "small" in front of the definition as you said, but then what's the point of the template? That would just be giving the same information twice. Stujul (talk) 12:24, 25 May 2023 (UTC)Reply

@Stujul The thing is, you're already duplicating the entire set of definitions of the base term at the diminutive. Generally we want to avoid such duplication, so I'd recommend the same approach as User:Vininn126, which is to use the |t= param on the {{diminutive of}} template to summarize the definitions of the base term (use semicolons to separate the different definitions). Benwing2 (talk) 23:39, 25 May 2023 (UTC)Reply

Unicode Collation Algorithm - ideas?

Latest comment: 1 year ago12 comments3 people in discussion

Hiya - I've created a reasonably efficient implementation of the Unicode Collation Algorithm: I took the data in Unicode's allkeys.txt, stored it in a human-readable form in Module:User:Theknightwho/sortkey, and then serialised it in Module:User:Theknightwho/sortkey/serialized. That last module is read by Module:User:Theknightwho/sort, which is able to sort tables of inputs. This seems like a decent solution for the column template, and it's relatively straightforward to add language-specific tweaks.

Unfortunately, category sort is a bit more tricky, because categories are divided into sections by first letter: the sortkeys produced by the UCA are completely arbitrary from that perspective (e.g. the sortkey for "dictionary" is Ồήẽ⃚ή‡´ẉ⁸⅓ plus a bunch of secondary + tertiary weighting). As far as I can tell, our options are:

Putting up with this, so we'd have a bunch of perfectly sorted categories with arbitrary section headers.
Coming up with a bunch of "default" start letters: e.g. the primary weight for ᴅ is 20C3, and the next lowest standard Latin letter is D (with a primary weight of 20BF). We could then add D before the sortkey if a term starts with &#x1D05. This is still imperfect, and would likely end up being really inefficient.
Getting $wgCategoryCollation changed to uca-default in the site's LocalSettings.php file, but I don't know how we'd do that. Possibly a Phabricator ticket? If so, they'd probably want to see some kind of vote in favour first.

Just wondering if you have any thoughts on this. Also tagging @Erutuon and @Surjection. Theknightwho (talk) Theknightwho (talk) 18:11, 25 May 2023 (UTC)Reply

@Theknightwho This is somewhat technical; can you clarify what the difference is between the UCA and the current sorting algorithm(s), and what implementing the UCA gets us? As for category sorting, if we want the UCA to apply there then IMO changing LocalSettings.php is definitely the way to go. But don't we already have language-specific sorting keys, which are what category sorting uses? Benwing2 (talk) 23:37, 25 May 2023 (UTC)Reply

@Benwing2 The current sorting algorithm uses the Unicode codepoint, so is pretty arbitrary. The UCA is systematically designed to provide a sophisticated default sort order that is (reasonably) language-neutral, which can then be tailored on a language-by-language basis where further changes need to be made. It's far superior, but obviously requires considerably more work. There are features like secondary and tertiary weightings (used as tiebreakers), and it also provides a baseline for sorting nonstandard characters within a given language.

We do already have language-specific sortkeys, but they're of variable quality, because they all have to create a "fake" sortkey using codepoint tricks to fool the MW software's algorithm into doing the correct sort order. They're also a lot cruder, because they generally don't handle nonstandard characters for the language well (e.g. diacritics in English are just stripped, which means sort order is not predictable). In some cases like Tibetan they've become very complex due to the need to produce a MW-compatible sortkey, whereas Module:Mymr-sortkey doesn't even try, so is restricted to use in columns. Theknightwho (talk) 23:46, 25 May 2023 (UTC)Reply

By the way - just as a comparison:

Codepoint order:

Ѐ, Ё, Ђ, Ѓ, Є, Ѕ, І, Ї, Ј, Љ, Њ, Ћ, Ќ, Ѝ, Ў, Џ, А, Б, В, Г, Д, Е, Ж, З, И, Й, К, Л, М, Н, О, П, Р, С, Т, У, Ф, Х, Ц, Ч, Ш, Щ, Ъ, Ы, Ь, Э, Ю, Я, а, б, в, г, д, е, ж, з, и, й, к, л, м, н, о, п, р, с, т, у, ф, х, ц, ч, ш, щ, ъ, ы, ь, э, ю, я, ѐ, ё, ђ, ѓ, є, ѕ, і, ї, ј, љ, њ, ћ, ќ, ѝ, ў, џ, Ѡ, ѡ, Ѣ, ѣ, Ѥ, ѥ, Ѧ, ѧ, Ѩ, ѩ, Ѫ, ѫ, Ѭ, ѭ, Ѯ, ѯ, Ѱ, ѱ, Ѳ, ѳ, Ѵ, ѵ, Ѷ, ѷ, Ѹ, ѹ, Ѻ, ѻ, Ѽ, ѽ, Ѿ, ѿ, Ҁ, ҁ, ҂, Ҋ, ҋ, Ҍ, ҍ, Ҏ, ҏ, Ґ, ґ, Ғ, ғ, Ҕ, ҕ, Җ, җ, Ҙ, ҙ, Қ, қ, Ҝ, ҝ, Ҟ, ҟ, Ҡ, ҡ, Ң, ң, Ҥ, ҥ, Ҧ, ҧ, Ҩ, ҩ, Ҫ, ҫ, Ҭ, ҭ, Ү, ү, Ұ, ұ, Ҳ, ҳ, Ҵ, ҵ, Ҷ, ҷ, Ҹ, ҹ, Һ, һ, Ҽ, ҽ, Ҿ, ҿ, Ӏ, Ӂ, ӂ, Ӄ, ӄ, Ӆ, ӆ, Ӈ, ӈ, Ӊ, ӊ, Ӌ, ӌ, Ӎ, ӎ, ӏ, Ӑ, ӑ, Ӓ, ӓ, Ӕ, ӕ, Ӗ, ӗ, Ә, ә, Ӛ, ӛ, Ӝ, ӝ, Ӟ, ӟ, Ӡ, ӡ, Ӣ, ӣ, Ӥ, ӥ, Ӧ, ӧ, Ө, ө, Ӫ, ӫ, Ӭ, ӭ, Ӯ, ӯ, Ӱ, ӱ, Ӳ, ӳ, Ӵ, ӵ, Ӷ, ӷ, Ӹ, ӹ, Ӻ, ӻ, Ӽ, ӽ, Ӿ, ӿ, Ԁ, ԁ, Ԃ, ԃ, Ԅ, ԅ, Ԇ, ԇ, Ԉ, ԉ, Ԋ, ԋ, Ԍ, ԍ, Ԏ, ԏ, Ԑ, ԑ, Ԓ, ԓ, Ԕ, ԕ, Ԗ, ԗ, Ԙ, ԙ, Ԛ, ԛ, Ԝ, ԝ, Ԟ, ԟ, Ԡ, ԡ, Ԣ, ԣ, Ԥ, ԥ, Ԧ, ԧ, Ԩ, ԩ, Ԫ, ԫ, Ԭ, ԭ, Ԯ, ԯ, ᲀ, ᲁ, ᲂ, ᲃ, ᲄ, ᲅ, ᲆ, ᲇ, ᲈ, ᴫ, ᵸ, Ꙁ, ꙁ, Ꙃ, ꙃ, Ꙅ, ꙅ, Ꙇ, ꙇ, Ꙉ, ꙉ, Ꙋ, ꙋ, Ꙍ, ꙍ, Ꙏ, ꙏ, Ꙑ, ꙑ, Ꙓ, ꙓ, Ꙕ, ꙕ, Ꙗ, ꙗ, Ꙙ, ꙙ, Ꙛ, ꙛ, Ꙝ, ꙝ, Ꙟ, ꙟ, Ꙡ, ꙡ, Ꙣ, ꙣ, Ꙥ, ꙥ, Ꙧ, ꙧ, Ꙩ, ꙩ, Ꙫ, ꙫ, Ꙭ, ꙭ, ꙮ, ꙳, ꙾, ꙿ, Ꚁ, ꚁ, Ꚃ, ꚃ, Ꚅ, ꚅ, Ꚇ, ꚇ, Ꚉ, ꚉ, Ꚋ, ꚋ, Ꚍ, ꚍ, Ꚏ, ꚏ, Ꚑ, ꚑ, Ꚓ, ꚓ, Ꚕ, ꚕ, Ꚗ, ꚗ, Ꚙ, ꚙ, Ꚛ, ꚛ, ꚜ, ꚝ

UCA order:

꙳, ꙾, ҂, а, А, ӑ, Ӑ, ӓ, Ӓ, ә, Ә, ӛ, Ӛ, ӕ, Ӕ, б, Б, в, ᲀ, В, г, Г, ѓ, Ѓ, ґ, Ґ, ғ, Ғ, ӻ, Ӻ, ҕ, Ҕ, ӷ, Ӷ, д, ᲁ, Д, ԁ, Ԁ, ꚁ, Ꚁ, ђ, Ђ, ꙣ, Ꙣ, ԃ, Ԃ, ҙ, Ҙ, е, Е, ѐ, Ѐ, ӗ, Ӗ, ё, Ё, є, Є, ж, Ж, ӂ, Ӂ, ӝ, Ӝ, ԫ, Ԫ, ꚅ, Ꚅ, җ, Җ, з, З, ӟ, Ӟ, ꙁ, Ꙁ, ԅ, Ԅ, ԑ, Ԑ, ꙃ, Ꙃ, ѕ, Ѕ, ꙅ, Ꙅ, ӡ, Ӡ, ꚉ, Ꚉ, ԇ, Ԇ, ꚃ, Ꚃ, и, И, ѝ, Ѝ, ӥ, Ӥ, ӣ, Ӣ, ҋ, Ҋ, і, І, ї, Ї, ꙇ, Ꙇ, й, Й, ј, Ј, ꙉ, Ꙉ, к, К, ќ, Ќ, қ, Қ, ӄ, Ӄ, ҡ, Ҡ, ҟ, Ҟ, ҝ, Ҝ, ԟ, Ԟ, ԛ, Ԛ, л, Л, ᴫ, ӆ, Ӆ, ԯ, Ԯ, ԓ, Ԓ, ԡ, Ԡ, љ, Љ, ꙥ, Ꙥ, ԉ, Ԉ, ԕ, Ԕ, м, М, ӎ, Ӎ, ꙧ, Ꙧ, н, Н, ᵸ, ԩ, Ԩ, ӊ, Ӊ, ң, Ң, ӈ, Ӈ, ԣ, Ԣ, ҥ, Ҥ, њ, Њ, ԋ, Ԋ, о, ꙭ, ꚙ, ꙮ, ꚛ, ꙫ, ꙩ, ᲂ, О, Ꙫ, Ꚙ, Ꙩ, Ꚛ, Ꙭ, ӧ, Ӧ, ө, Ө, ӫ, Ӫ, п, П, ԥ, Ԥ, ҧ, Ҧ, ҁ, Ҁ, р, Р, ҏ, Ҏ, ԗ, Ԗ, с, ᲃ, С, ԍ, Ԍ, ҫ, Ҫ, т, ᲅ, ᲄ, Т, ꚍ, Ꚍ, ԏ, Ԏ, ҭ, Ҭ, ꚋ, Ꚋ, ћ, Ћ, у, У, ў, Ў, ӱ, Ӱ, ӳ, Ӳ, ӯ, Ӯ, ү, Ү, ұ, Ұ, ꙋ, ᲈ, Ꙋ, ѹ, Ѹ, ф, Ф, х, Х, ӽ, Ӽ, ӿ, Ӿ, ҳ, Ҳ, һ, Һ, ԧ, Ԧ, ꚕ, Ꚕ, ѡ, Ѡ, ѿ, Ѿ, ꙍ, Ꙍ, ѽ, Ѽ, ѻ, Ѻ, ц, Ц, ꙡ, Ꙡ, ꚏ, Ꚏ, ҵ, Ҵ, ꚑ, Ꚑ, ч, Ч, ӵ, Ӵ, ԭ, Ԭ, ꚓ, Ꚓ, ҷ, Ҷ, ӌ, Ӌ, ҹ, Ҹ, ꚇ, Ꚇ, ҽ, Ҽ, ҿ, Ҿ, џ, Џ, ш, Ш, ꚗ, Ꚗ, щ, Щ, ꙏ, Ꙏ, ꙿ, ъ, ᲆ, Ъ, ꚜ, ꙑ, Ꙑ, ы, Ы, ӹ, Ӹ, ь, Ь, ꚝ, ҍ, Ҍ, ѣ, ᲇ, Ѣ, ꙓ, Ꙓ, э, Э, ӭ, Ӭ, ю, Ю, ꙕ, Ꙕ, ꙗ, Ꙗ, я, Я, ԙ, Ԙ, ѥ, Ѥ, ѧ, Ѧ, ꙙ, Ꙙ, ѫ, Ѫ, ꙛ, Ꙛ, ѩ, Ѩ, ꙝ, Ꙝ, ѭ, Ѭ, ѯ, Ѯ, ѱ, Ѱ, ѳ, Ѳ, ѵ, Ѵ, ѷ, Ѷ, ꙟ, Ꙟ, ҩ, Ҩ, ԝ, Ԝ, ӏ, Ӏ

Theknightwho (talk) 00:21, 26 May 2023 (UTC)Reply

@Theknightwho I see. Overall this sounds good but I'd like to see a bit of info presented on how much work it will be to maintain the code (and in particular the language-specific stuff) once written vs. the benefit that comes from it (esp. since I haven't seen anyone else complain about the current sorting mechanism). Benwing2 (talk) 06:49, 26 May 2023 (UTC)Reply

@Benwing2 The code as I've written it is very easy to maintain, as it just relies on plugging in the Default Unicode Collation Element Table (DUCET) data, which is published by Unicode at [6].

To explain: each codepoint is given weightings between square brackets corresponding to one or more characters. For example, Latin "A" has the weighting [.2075.0020.0008], which shows its primary, secondary and tertiary weights. The initial . means it has a fixed weight (i.e. it's not context-specific), while some characters have *, which means they have a variable weight: this generally applies to punctuation, where it may be desirable to downgrade the weighting in some contexts.

In simple terms, sortkeys are calculated by collating all the primary, then secondary, then tertiary weights, while disregarding any zero-weights (so "word" would be 2343 221D 2275 20BF 0020 0020 0020 0020 0002 0002 0002 0002). Diacritics are usually only given secondary weights (i.e. a primary weight of 0000), while tertiary differences tend to only come into play for things like capitalisation or equivalent hiragana/katakana. Some characters have multi-character weights: these usually correspond to their decomposed forms, but not always (e.g. ﷺ). There's a lot more info about this here, and the algorithm can optionally use even lower-order weightings if needed.

Module:User:Theknightwho/sortkey is a table, which is a lightly rearranged version of the DUCET data that took me about 25 minutes to create (and could probably be automated):

Key [1] is the codepoint as a string. Occasionally, strings of more than one character are assigned a specific weight: sometimes where decomposable parts are equivalent to a composed character (e.g. U+0418 + U+0306 Й ≡ U+0419 Й), and sometimes not (e.g. U+0E40 U+0E2E เฮ). Where that occurs, key [1] is a table of codepoints.
Key [2] is "." or "*".
Keys [3], [4] and [5] are the primary, secondary and tertiary weights, respectively. Where there are multi-character weights, keys [6] to [9], [10] to [13] etc. correspond to characters 2 and 3 etc.

The serialize function iterates down the list, and generates a string in the following format (taking "A" as an example):

"A" .. "\254" .. "₩\20\2" .. "\255"

The character(s).
"\254" (= ".") or "\253" (= "*").
Three characters, corresponding to the codepoints for each weight (we could concatenate the codepoints as 5-character strings with leading zeroes, but this keeps the length down).
For multi-character weightings, any further "\253" or "\254" followed by the three weights.
Final "\255", marking end of sequence.

These strings are then collated, and every byte converted into \XXX format for the sake of convenience, as the output string needs to be pasted into Module:User:Theknightwho/sortkey/serialized. So the above example becomes \65\254\226\130\169\20\2\255.

The sortkey algorithm in Module:User:Theknightwho/sort takes the serialized data and matches "\255(" .. char .. "[^\255]+)\255" (i.e. end + character + everything until the next end). It then iterates over the string with gmatch, matching "([\253\254])(" .. UTF8_char .. ")(" .. UTF8_char .. ")(" .. UTF8_char .. ")". Weights are collated in primary, secondary and tertiary tables, which are then concatenated as a hexadecmal string at the end. The sort function memoizes these as it goes, as it's fast and keeps memory use down. This is quite a crude implementation, as it doesn't yet take into account some of the more sophisticated things that are possible with the UCA, but it's certainly a major improvement over what we have at the moment.

In terms of maintenance, we'd just need to update it whenever the DUCET changes (about once a year).

In terms of language-specific needs, Unicode publish a big list of language-specific tailorings. These are often quite complex (particularly Arabic and East Asian scripts), so would need some work to implement - especially as they follow a different format to the DUCET. However, they have the major advantage of being produced by experts, and there is a Lua implementation of the UCA that we could probably use as a starting point. There's also nothing stopping us producing our own tailorings, too. Theknightwho (talk) 15:09, 26 May 2023 (UTC)Reply

If there should be a shortage of those willing and able to do the maintenance or if users in a particular language are unhappy, how hard is it to opt out and fall back to something generic, possibly just until someone willing and able does the maintenance or improvement work? DCDuring (talk) 15:41, 26 May 2023 (UTC)Reply

@DCDuring The UCA is meant to be the generic fallback (and ideally it's what we'd use as the default), so language-specific stuff goes on top of it. If language users are unhappy, we'd just need to change whatever is needed. It wouldn't be hard to turn it off, but I can't really envision any situations where that would be necessary, as specific problems should be dealt with on a case-by-case basis. The current default is a much more arbitrary order, as codepoint order isn't designed to be used for sorting.

@Benwing2 I've just written Module:User:Theknightwho/DUCET, which can generate the serialized data directly from Unicode's text file (stored at User:Theknightwho/DUCET). I removed the character names for the sake of space, but it doesn't make a difference either way. Theknightwho (talk) 16:21, 26 May 2023 (UTC)Reply

What is not easy to deal with on a case-be-case basis is the absence of people willing and able to address problems or dissatisfaction. DCDuring (talk) 16:24, 26 May 2023 (UTC)Reply

@DCDuring Sure, but that goes for whatever method we choose. Like I said: it would not be difficult to turn on or off. Theknightwho (talk) 16:26, 26 May 2023 (UTC)Reply

@Theknightwho Thank you for writing the automation module Module:User:Theknightwho/DUCET. IMO, what we need now is a good plan describing how to implement language-specific sorting on top of this, how to make sure we can transition bit-by-bit from using codepoint sorting as the fallback to using UCA sorting as the fallback, and how to switch off UCA sorting for a given language if for whatever reason this is deemed necessary (e.g. the editors of a given language don't like the results or find it too complex to implement or maintain the UCA version of the sorting). Then, we need to write how-to documentation on this plan (e.g. how to write a language-specific UCA sorting module and how to turn off language-specific UCA sorting and go back to codepoint sorting), along with how-to-documentation on how to run Module:User:Theknightwho/DUCET when a new DUCET version comes out. Essentially, you want to future-proof Wiktionary against the situation where you end up leaving the project or don't have the time to help with implementing or maintaining a given language-specific sorting module. (I am putting on my "tech company software engineer" hat here.) We also need to think about what happens if the change to $wgCategoryCollation in LocalSettings.php goes through; how does this impact language-specific sorting and how can we make sure it works correctly (or at least passably well) with language-specific sort keys written with codepoint sorting in mind? Benwing2 (talk) 04:04, 27 May 2023 (UTC)Reply

@Theknightwho Another thing, very important: how will adopting UCA sorting affect overall memory usage? I see around 23 entries now in CAT:E due to memory errors. The memory errors keep appearing and I assume it's due to functionality you keep adding, as AFAIK no one else is making changes to core modules. Are we ever going to be able to reduce the usage of {{*-lite}} templates and do you have a long-term strategy here? Benwing2 (talk) 08:48, 27 May 2023 (UTC)Reply

Hyphens in Catalan

Latest comment: 1 year ago4 comments2 people in discussion

Headword linking in porto-riqueny and costa-riqueny does not make sense. It is an orthographic convention instead of rr. Compare with novaiorquès not hyphenated. I know it can be avoided with a parameter. Is there any way to find "-r" or "-s" in Catalan headwords? Vriullop (talk) 06:45, 2 June 2023 (UTC)Reply

@Vriullop Hmm. I can turn off the hyphenation by default but there are a lot of words where the hyphenation does make sense. Using |nolinkhead=1 turns off linking of hyphenated components. I can look for words with -r or -s following a hyphen but if I turn off hyphenation in just those cases I suspect it will also turn off cases that should be hyphenated. Let me take a look at how many words occur where hyphenation makes sense vs. when it doesn't. Benwing2 (talk) 06:48, 2 June 2023 (UTC)Reply

OK, there are 447 lemmas with hyphens in them (not counting prefixes and suffixes), of which 171 have a hyphen followed by r or s. The vast majority of these do need hyphenation, e.g. penya-segat, barba-serrat, quaranta-set (and lots of other numbers), mata-rates, porta-revistes, quaranta-sis (and again lots of other numbers), cap-roig, etc. The only other one I can find that is somewhat like porto-riqueny and costa-riqueny is mont-realès. So I think the current solution along with |nolinkhead=1 is best. Benwing2 (talk) 06:59, 2 June 2023 (UTC)Reply

Thanks for checking it. Vriullop (talk) 12:25, 6 June 2023 (UTC)Reply

User:Conrad.Irwin/creation.js/intro

Latest comment: 1 year ago5 comments2 people in discussion

You deleted User:Conrad.Irwin/creation.js/intro. The MediaWiki:Gadget-AcceleratedFormCreation.js gadget still contains a link to that page. It's from the edit button in the notice displayed in the warning that says "Please ensure that the information is both complete and correct before clicking Publish changes. If you don’t speak this language, then be exceedingly careful not to propagate mistakes that exist in the source entry—a redlink is better than a wrong entry! edit". That broken link should be fixed. Daniel.z.tg (talk) 00:47, 9 June 2023 (UTC)Reply

@Daniel.z.tg Are you sure? I changed the code of that gadget in March to refer to the new location. Benwing2 (talk) 01:14, 9 June 2023 (UTC)Reply

@Benwing2: I checked again and it's still there. It affects both the old and the visual editor. In the old editor, the edit link shows the outdated URL on hover and its HTML source contains the following:


::		<div id="mw-content-text" class="mw-body-content"><div class="mw-editintro"><big>Please ensure that the information is both <b>complete</b> and <b>correct</b> before clicking Publish changes.</big>
::<p>If you don’t speak this language, then be exceedingly careful not to <a href="/wiki/propagate" title="propagate">propagate</a> mistakes that exist in the source entry—a redlink is better than a wrong entry!
::</p>
::<span style="font-size:85%;"><small class="editlink"><a class="external text" href="https://en.wiktionary.org/w/index.php?title=User:Conrad.Irwin/creation.js/intro&action=edit">edit</a></small></span></div><div id="wikiPreview" class="ontop" style="display: none;"><div lang="en" dir="ltr" class="mw-content-ltr"></div></div><form class="mw-editform" id="editform" name="editform" method="post" action="/w/index.php?title=pomeridianae&action=submit" enctype="multipart/form-data"><input type="hidden" value="ℳ𝒲♥𝓊𝓃𝒾𝒸ℴ𝒹ℯ" name="wpUnicodeCheck"><div id="antispam-container" style="display: none;"><label for="wpAntispam">Anti-spam check.
::

Daniel.z.tg (talk) 01:23, 9 June 2023 (UTC)Reply

@Daniel.z.tg OK I found the issue; the page I moved has a link to itself in it that wasn't fixed. Please check again; you might need to reload the page. Benwing2 (talk) 20:44, 9 June 2023 (UTC)Reply

@Benwing2: It's fixed now. Thanks! Daniel.z.tg (talk) 21:30, 9 June 2023 (UTC)Reply

Belarusian + Templates

Latest comment: 1 year ago5 comments3 people in discussion

I think we can also safely use + templates in Belarusian, I don't think we're gonna piss anyone off. Vininn126 (talk) 10:28, 9 June 2023 (UTC)Reply

Yes you are!!! P U C – 11:40, 9 June 2023 (UTC)Reply

@PUC Uh oh... I just the Slavic languages should resemble each other where they can (are you actually voting against this or...?) Vininn126 (talk) 11:43, 9 June 2023 (UTC)Reply

No, that's ok. I don't like these templates but won't kick up a fuss. P U C – 11:47, 9 June 2023 (UTC)Reply

@Vininn126, PUC OK I will get to this at some point. Benwing2 (talk) 20:38, 9 June 2023 (UTC)Reply

Manual conversion of Dari and Classical

Latest comment: 1 year ago15 comments5 people in discussion

Hi @Benwing2 ,

I have begun manually converting terms specific to Dari and Classical Persian (i.e. terms not applicable to Iranian Persian) to their respective transliteration at Persian transliteration/Dari. However, before I continue to go all out and convert all Dari & Classical terms, I just want to ask whether you think it's best to wait until the modules are complete or if it's fine to continue now? Like if you plan to use a bot to check transliterations again it may cause some issues (though Dari and classical specific terms already use a different transliteration so those issues would probably happen regardless). Do you think it's a good Idea to add the Classical-Dari transliteration on non-Iranian terms now? Or would it be best to wait until the support for multiple transliterations in one entry is ready, and add both transliteration schemes to every page (where possible)?

Let me know what you think the best course of action is, and if you still have questions about the formatting/modules atitarev and I will try to help you wherever we can.

If your fine with me manually converting them prematurely, is there someway you want me to mark the transliterations to prevent issues? سَمِیر | sameer (talk) 22:59, 9 June 2023 (UTC)Reply

Also I should clarify I don't intend to rush you, I understand you are very busy with other projects. I just want to make sure my conversions don't make it harder for you later on. سَمِیر | sameer (talk) 23:00, 9 June 2023 (UTC)Reply

@Sameerhameedy It is probably OK because my script was already ignoring Dari and Classical terms whenever it could identify them as such. But links to these terms will have to be updated as well; and such links should definitely use the appropriate etymology language codes and not just fa (specifically, prs for Dari, fa-cls for Classical). User:Atitarev do you have any comments? Benwing2 (talk) 05:46, 10 June 2023 (UTC)Reply

@Benwing2 Could we also have any clarification on phonetic IPA for Iranian Persian? Thanks in advance.--Saranamd (talk) 07:14, 10 June 2023 (UTC)Reply

@Benwing2 In that case I'll continue converting Classical and Dari entries but maybe i'll hold off on adding more Dari and classical translations to English entries. The translations are currently only marked with "fa", since "fa-cls" and "prs" are not compatible with the translation template. It would probably cause problems if a bunch of translations were unmarked. سَمِیر | sameer (talk) 09:01, 10 June 2023 (UTC)Reply

@Benwing2, @Sameerhameedy: It's fine as you suggest. I wonder if the Persian headword can also use labels (parameters) to mark and categorise varieties without making completely new language codes for Persian entries and usage examples. Please see User_talk:ZxxZxxZ#Option_3 (it's duplicated in a few places) for pronunciation, headword and usage examples.

(dowlat) [defaults to Iranian Persian if unlabelled]

(dawlat) (Classical Persian, Dari Persian)

Copying usage examples from Sameerhameedy's posts:

هِنْدوسْتانی را می‌فَهْمی ^{[Iranian Persian]}
hendustani râ mi-fahmi?

"Can you understand Hindustani?"

آیا زَبان هِنْدُوسْتانِی‌را خوش دارِی؟ ^[Dari]
āyā zabān-i hindūstānī-rā xōš dārī?

"Do you like the Hindustani language?"

Anatoli T. ^{(обсудить}/^вклад) 09:27, 10 June 2023 (UTC)Reply

@Atitarev, Sameerhameedy When you say the translations don't support etymology-only codes, do you mean {{t}}? If so I think it's better to fix the module code to support these codes instead of using a special way of marking the Dari/Classical uses. Benwing2 (talk) 19:17, 10 June 2023 (UTC)Reply

Although maybe the special way can be used as well when it's needed to make the distinction visible. Benwing2 (talk) 19:18, 10 June 2023 (UTC)Reply

@Saranamd I'll focus next on Persian stuff. Benwing2 (talk) 21:50, 10 June 2023 (UTC)Reply

@Benwing2 No the special labeling is needed for Persian entries not English translations. I've been writing translations similarly to how they're written for Chinese so the translation for "fan" would be written as

Persian:
- Classical Persian: پنکه (panka)
- Dari Persian: پکه (paka)
- Iranian Persian: فن (fan)

But because the {{t}} template does not support the codes fa-cls or prs these are all marked as "fa" so a bot cannot properly check these transliteration. So the {{t}} template doesn't need to show the transliteration, it just needs support for the other language codes since the label will be written manually (as it is for other languages).

The need to show special markings is for example sentences and headers (depending on what format you choose to show transliterations, one of the other options was to move all transliterations to the pronunciation section). سَمِیر | sameer (talk) 22:48, 10 June 2023 (UTC)Reply

@Benwing2 Also, is there anyway activate auto transliteration for Classical Terms using the module {{fa-cls-translit}}? I get that auto transliteration in entries is not feasible yet but in etymology's it can be useful, since most terms borrowed from Persian are from classical Persian. سَمِیر | sameer (talk) 05:40, 13 July 2023 (UTC)Reply

@Sameerhameedy Yes, as long as the etymology code fa-cls is used, this should be possible I think; User:Theknightwho can comment on how ready this is for prime time as they wrote the foundational code in question, but I think it's ready. If so, I'll look into it; I've been meaning to get back to doing some coding for Persian and it should happen in a few days. Benwing2 (talk) 05:58, 13 July 2023 (UTC)Reply

@Benwing2 Yep - it's ready. You can simply set the transliteration module in Module:etymology languages/data as you would in regular language data modules. Theknightwho (talk) 06:09, 13 July 2023 (UTC)Reply

@Benwing2 sounds great, thank you! and thank you @Theknightwho. سَمِیر | sameer (talk) 20:36, 13 July 2023 (UTC)Reply

And just to let you know all example entries for Persian referenced in this discussion have been moved to this page, for consistency. So instead of the examples being repeated on multiple pages they're all on one page. سَمِیر | sameer (talk) 02:29, 30 July 2023 (UTC)Reply

Edit at ty vogo

Latest comment: 1 year ago5 comments2 people in discussion

Hello. Your edit here was not useful. Thyself be knowne (talk) 14:10, 11 June 2023 (UTC)Reply

@Thyself be knowne Fixed. Benwing2 (talk) 16:39, 12 June 2023 (UTC)Reply

@Benwing2 There are more! See on the page User:Jberkel/lists/wanted/20230601/cs under the sections "See" (39 instances) and "also" (29). These aren't true synonyms, they have different shades of meaning. Thyself be knowne (talk) 08:12, 14 June 2023 (UTC)Reply

@Thyself be knowne Well, that is Dan Polansky's mistake. Synonyms shouldn't be formatted like that; no other language uses "See" or "also" notes under Synonyms. (And in any case it's common for listed synonyms to express different shades of meaning; Dan should have put qualifiers in that case but I don't think he bothered.) However, I'll take a look at the lists you mention. Benwing2 (talk) 08:14, 14 June 2023 (UTC)Reply

@Thyself be knowne In the examples I looked at, the term listed after the "see" or "see also" appears to be a true synonym, even if the synonyms listed under that term aren't. So I think it's safe to just remove the "see" or "see also" wording, as I did above for ty vogo. Benwing2 (talk) 08:55, 14 June 2023 (UTC)Reply

Request for cleanup

Latest comment: 1 year ago3 comments2 people in discussion

Hi
I stumbled upon the page preliare, which I had edited some time ago, and found a Request for cleanup banner (put there by WingerBot (talk • contribs), which is why I'm contacting you):
The definition(s) may be wrong or misleading, and important senses may be missing. The specified auxiliary may also be wrong. The remainder of the conjugation is probably correct for -are verbs but may be wrong in some particulars for -ire verbs (especially the present participle).
I'm not exactly sure what the problem is here. Could you please shed some light on that? — GianWiki (talk) 11:18, 12 June 2023 (UTC)Reply

@GianWiki I wanted to eliminate {{it-verb-old}} and make the conjugation argument mandatory for {{it-verb}}. Many existing verbs had missing or incorrect conjugations, definitions and/or auxiliaries, and what I did for these verbs was look up and verify only the vowel quality and position for -are verbs, and for -ire verbs I only checked whether they took the -isc infix. I didn't verify the auxiliaries or definitions or the -ire present participle (which is often irregularly -iente instead of -ente) to the standard I would like, because there were too many verbs to do that, so instead I left a cleanup banner. If you're confident that the auxiliary and definitions are correct and complete, you can remove the banner. Benwing2 (talk) 16:57, 12 June 2023 (UTC)Reply

@Benwing2 I see. Thank you very much for clarifying. — GianWiki (talk) 19:47, 12 June 2023 (UTC)Reply

French etymologies

Latest comment: 1 year ago1 comment1 person in discussion

Hey, just wanted to draw your attention to this bot edit which accidentally mangled the formatting of an {{etydate}} by separating it to a separate sentence without turning off nocap and adding an unnecessary full stop. Not sure if any other edits in that batch might have problems, I fixed this one manually. (Few weeks old but only just saw it since I'm not very active at the moment, sorry!) —Al-Muqanna المقنع (talk) 12:36, 17 June 2023 (UTC)Reply

repetition repitition

Latest comment: 1 year ago1 comment1 person in discussion

Hi. can you clean up these instances of edition edition and page page? No hago griego (talk) 17:56, 17 June 2023 (UTC)Reply

Functions to get the current page section from a module

Latest comment: 1 year ago17 comments3 people in discussion

Hi - I've seen your new comments on my page and will get to them shortly. I just wanted to share something I've been working on, which are a pair of functions that can both calculate the current page section of the calling template from within the module (i.e. they know where on the page they've been invoked from). This is particularly useful for Japanese, where terms are sorted by how they're read, which means that sortkeys for templates like {{lb}} need to change automatically based on where they are on the page (unless we do tens of thousands of sort=). However, I imagine they could have many other uses, too. See Module:User:Theknightwho/get header.

I've put detailed notes, as they're both very hacky, and both have drawbacks: the first is likely to be patched out by the devs at some point, and the second is pretty cumbersome (but may actually be workable in the medium-term, pending possible performance/maintenance issues). This exact functionality has been requested in the Community Wishlist Survey 2023, so hopefully it'll only be a stopgap. Theknightwho (talk) 09:35, 19 June 2023 (UTC)Reply

@Theknightwho Thank you for writing this code. You know a lot more about the internals of MediaWiki than I do. I think we should rely on the strip markers for now, both because this method is more efficient and because the strip markers actually seem less likely to me to change than the other method: They've been around quite awhile and I think a fair amount of code relies on them. I also think it would be worth reaching out to the MediaWiki developers if possible to see what their plans are, i.e. if and when they are planning to implement the above requested functionality, and if they have any plans to change the implementation of strip markers. I have definitely heard bad things about trying to contact the MediaWiki devs but I think if you can figure out who the relevant people are and reach out to them personally rather than through Phabricator or whatever, you might get better results. Benwing2 (talk) 00:20, 21 June 2023 (UTC)Reply

@Benwing2 Thanks - the first method was first written by Huhu9001, but I'm less optimistic than you are about it as they patched out something very similar involving the Cite extension, which is the reason it's only possible to unstrip nowiki tags and not any other strip markers.

If we do decide to go ahead with this, one use could be to deprecate {{l-self}}, {{m-self}} and so on. Page parsing is memory-cheap (because it's stored as a string), and the best way to do it is to have a gmatch call in Module:links/data which builds a table of the L2 indexes for each language on the page, which can then be accessed by any link template via mw.loadData. That way, it only needs to be done once. Theknightwho (talk) 14:59, 22 June 2023 (UTC)Reply

@Theknightwho I see. Since we have two ways of doing things we can always use the strip marker functionality and switch over if/when they break it. Benwing2 (talk) 18:28, 22 June 2023 (UTC)Reply

@Benwing2 Cool - that works. We should probably discuss it, but I did a quick test of self-links in {{l}} and it works as expected. Based on the current setup, I feel we should use strong for self-links if: (a) it's under the language's L2 and (b) no id= is given.

Because it's useful for any module that does page scraping on the current page, I've also added parsed headword info for the page to Module:headword/data (which seems to work fine - no new additions to CAT:E). With a bit of work it could be turned into something quite sophisticated, but for now it just gives the first and last indexes for strings and headings for each L2 on the page.

Module:headword/data is becoming a bit of a repository for general info about the current page, as it's a convenient way to make sure things like this are only done once per parse. We might want to split it out, though. Theknightwho (talk) 19:14, 22 June 2023 (UTC)Reply

@Theknightwho Can you explain what you mean by "once per parse"? Also your algorithm for using bold in place of a link seems fine to me. Benwing2 (talk) 19:47, 22 June 2023 (UTC)Reply

@Benwing2 It's because Module:headword/data should be accessed via mw.loadData, which means it's only run once for the whole page. Edit: my mistake - I wrote "parse" not "page".

I'll do some performance testing on the self-links to see what the overhead is. Theknightwho (talk) 19:52, 22 June 2023 (UTC)Reply

@Theknightwho Right, sorry, I was thinking this wasn't possible because of the code in Module:headword/data, but I realize that's only the case when the data file contains closures in the returned data. BTW my experience with the {{place}} data has made me realize that loadData() doesn't always decrease memory usage; in the case of {{place}} it actually increased it probably due to all the table wrapping. Benwing2 (talk) 19:56, 22 June 2023 (UTC)Reply

@Benwing2 I think it's because (a) the parser adds metatables to every subtable, and (b) it'll never get garbage collected. The things I've added are mostly expensive calculations that we need the headword template to handle, which means large pages might be accessing it tens or hundreds of times at a minimum.

In theory, we could pre-parse (e.g.) every link template or headword on the page and store the output as strings, which means the "real" calls just needs to access the data module to get their output. Could be more flexible a way to get the benefits of {{multitrans}}, with the added benefit of not needing to modify the wikitext. Theknightwho (talk) 20:12, 22 June 2023 (UTC)Reply

@Benwing2 I may have found a viable long-term solution to the memory issue - Module:User:Theknightwho/preparser is an implementation of the pre-parsing idea I mentioned above: it uses Module:User:Theknightwho/preparser/core to build a memoised table of template outputs, which can then be accessed by each template call via mw.loadData.

To adapt a module, it needs to be added to the templates table in the core module, which lists the module name, function name and any arguments. A small piece of code also needs to be added to the module itself, which is used to pull down any precalculated outputs.

The bulk of the code is actually for modified frame objects, which means they can be reused hundreds of times by simply swapping out the arguments tables. child_frame also carries a secret argument that tells a module whether it's being called by the core module, which is necessary to prevent an infinite loop. It's deleted immediately after the check, so shouldn't have any impact. By using pcall, we can simply fail a given memoisation if it throws an error, meaning that the error message remains localised to the template in question (unlike multitrans, where an error takes down the whole section).

Currently I've only put {{t}} ad {{t+}} in, and a quick test shows it to be about 30% as efficient as multitrans - however, (a) it can be generalised to the whole page, and (b) it can be combined with multitrans anyway. Theknightwho (talk) 00:55, 23 June 2023 (UTC)Reply

It also occurs to me that you can use this method to pass arbitrary info between invokes: you just store whatever you need in the modified frame object, and access it again from the later call. The "real" invocation might not be able to see it, but that doesn't matter because it's just copy-pasting the result from the preparse. Theknightwho (talk) 01:26, 23 June 2023 (UTC)Reply

@Theknightwho Cool, I will take a look. BTW what is the best way to parse a page and figure out what section a given template is in? I want to replace {{zh-syn-list}} and {{zh-ant-list}} with {{col3}}, which will work fine as long as I can figure out what section (usually called Synonyms or Antonyms) that the template is within. Module:templateparser doesn't provide this functionality. Is there an existing way of doing this or do I have to roll my own? Benwing2 (talk) 02:13, 23 June 2023 (UTC)Reply

@Benwing2 There isn't one at the moment, but it would be useful to have. Thanks for all the work you've been doing. Theknightwho (talk) 02:17, 23 June 2023 (UTC)Reply

@Theknightwho You're welcome. BTW I hacked up an implementation of {{saurus}}; you can see examples in User:Benwing2/test-saurus. This should make it possible to replace {{zh-syn-list}} and {{zh-ant-list}} (redirects to {{zh-der}}) as well as {{zh-syn-saurus}} and {{zh-ant-saurus}} (which call {{zh-der}} internally). Benwing2 (talk) 03:23, 23 June 2023 (UTC)Reply

@Benwing2 Fab - thanks for that. By the way - I've just linked Module:translations into Module:preparser as a trial, and I'm seeing results that are about 50% of the effectiveness of multitrans (e.g. it just brought teacher down from memory errors to 40MB, while multitrans manages about 30MB). With some optimisations, I'm sure we can do a lot better than that. Theknightwho (talk) 03:28, 23 June 2023 (UTC)Reply

@Theknightwho Sounds good. You might want to try this on something other than translations, since translations are more or less "solved"; e.g. on some of the templates where we currently have to use "lite" versions. Benwing2 (talk) 03:34, 23 June 2023 (UTC)Reply

@Theknightwho Ignoring Module:translations/multi-nowiki, which has its own problem, Template:hu-conj-ok, which is a template-include-size error due to its documentation subpage calling it too many times, the appendixes, which are Lua timeout errors, and the two Etymology scriptorium pages that are due to the preparser having issues with hyperlinks in |pos= parameters in other instances of {{m}} in other parts of those pages, that still leaves 39 pages with memory errors. It seems like just a few days ago we were down to about half of that. Chuck Entz (talk) 22:04, 23 June 2023 (UTC)Reply

Template sl-pr

Latest comment: 1 year ago3 comments2 people in discussion

Hi,

I just want to ask if you have found some time to start updating the {{sl-IPA}} template we talked about in January. I am planning to become a bit more active during the summer break and it would be great to have it ready. Perhaps I was requesting too much, but please at least make it work for Standard Slovene (you can forget about Natisone Valley and Resian standards if you have a lot of work). It would however be pretty important to include all those distinctions and SNPT pronunciation (only phonologists use IPA in Slovenia, even other linguists use SNPT). It would be great to also add that it also automatically generates rhymes and hyphenation. For rhymes, only /ɪ/ is included of the allophones as it is the allophone of more phonemes, otherwise the phonemic IPA is used. For hyphenation, if the number of consonants between two syllables is odd, the following syllable should take the additional one (e.g. ses∙tra), except sequences lj and nj should stick together if followed by a consonant. For the noun-inflection templates, I already made a 70kB large template {{sl-infl-noun}}, which declines the noun more or less automatically. Thank you again for helping. Garygo golob (talk) 15:31, 19 June 2023 (UTC)Reply

@Garygo golob Not sure I will have time to work on this, but I will take a look at the prior discussion; can you point me to where it is? Benwing2 (talk) 02:16, 23 June 2023 (UTC)Reply

@Benwing2 It can be found on this link. Thank you. Garygo golob (talk) 07:11, 26 June 2023 (UTC)Reply

Spanish plural issue

Latest comment: 1 year ago2 comments2 people in discussion

Hiya - I noticed this diff by WingerBot broke the plural of Spanish teacher, as the plural displays as "#s". Theknightwho (talk) 19:58, 23 June 2023 (UTC)Reply

@Theknightwho Thanks. I meant to fix Module:es-headword to account for that usage; will fix now. Benwing2 (talk) 20:02, 23 June 2023 (UTC)Reply

Tagalog pronunciation module sandbox

Latest comment: 1 year ago14 comments2 people in discussion

This is months in the making, but can you look at the Tagalog pronunciation module sandbox. I am adding functionality based on the Spanish template you started and expanded, but I have a hard time excising unnecessary code such as with pronunciation styles. I removed the pronunciation styles as not relevant for Tagalog, but the module should handle the occasional dialectal pronunciations as well. You may have received my ping into the module talk page, but that was at midnight. TagaSanPedroAko (talk) 20:55, 23 June 2023 (UTC)Reply

@TagaSanPedroAko Hi. I don't know much anything about Tagalog pronunciation so I'm not sure I can help you here. You might start with a version that doesn't deal with any dialectal variations and add that later once you have the standard pronunciation working. You definitely should create some test cases; see Module:es-pronunc/testcases for how to do this. Benwing2 (talk) 23:57, 24 June 2023 (UTC)Reply

For info on Tagalog pronunciation, there's a appendix on Tagalog pronunciation, also, there's Help:IPA/Tagalog in Wikipedia. I already have test cases for the sandbox version. TagaSanPedroAko (talk) 19:47, 25 June 2023 (UTC)Reply

Can this be followed up? Test cases for the expanded module adding support for automatic syllabification and rhymes currently gives error (table, no pronunciation generated). It's been months. Have a hard time to make it work.

TagaSanPedroAko (talk) 21:21, 27 June 2023 (UTC)Reply

@TagaSanPedroAko I think you're asking me to fix the module for you. Writing these modules is a lot of work and I don't have a lot of time right now; I have my hands full with existing requests. Benwing2 (talk) 21:23, 27 June 2023 (UTC)Reply

I understand. Also I'm editing from mobile. It's the complex nature of the original Spanish pronunciation module that makes it hard to adapt to Tagalog: there's this support for pronunciation styles. I stripped the adaptation of style support but the original is far too complex to begin with. TagaSanPedroAko (talk) 21:26, 27 June 2023 (UTC)Reply

I've just created an experimental Tagalog pronunciation template run by the sandbox module and able to trace some errors due to omissions regarding temporary substitutions for syllable dividers (which I just fixed). Hope this should work now, but other issues are deep in the module.

TagaSanPedroAko (talk) 21:51, 27 June 2023 (UTC)Reply

Sandbox version is nearly working, removing what remains of the code for handling Spanish pronunciation styles, but still far from stable. There's still issues, and it shows on use cases for the experimental {{tl-pr}} I just created.

@TagaSanPedroAko Sounds good, I would keep going in the same vein. Benwing2 (talk) 03:27, 28 June 2023 (UTX)

Appreciated. Thank you! TagaSanPedroAko (talk) 03:29, 28 June 2023 (UTC)Reply

Just noticed this es-pron derivative for Basque: Module:eu-pron/new. Possibly there's the key to what fixes are needed for the Tagalog pronunciation template. TagaSanPedroAko (talk) 19:55, 29 June 2023 (UTC)Reply

Unfortunately, the Basque one actually has some dialectal considerations, so holding off on this. Have tried to remove code triggering error messages, but only made the problem worse by rendering {{tl-pr}} useless. Please fix, thank you. TagaSanPedroAko (talk) 20:42, 29 June 2023 (UTC)Reply

@TagaSanPedroAko You can't just remove error messages; that only masks the underlying problem and doesn't fix it. Benwing2 (talk) 20:48, 29 June 2023 (UTC)Reply

I mean cutting out the code triggering the error.

TagaSanPedroAko (talk) 20:53, 29 June 2023 (UTC)Reply

@TagaSanPedroAko In general this leads to the same problem. Benwing2 (talk) 20:56, 29 June 2023 (UTC)Reply

Ah ok, holding off from editing the module directly.

Most of the module works fine already, but the problem is in line 1654. I think the problem lies somewhere. Stripped it of more code from the original Spanish to handle regional pronunciations, but there could be more as like what will be used to implement {{tl-IPA}}, the current template used to generate Tagalog pronunciation.

TagaSanPedroAko (talk) 21:03, 29 June 2023 (UTC)Reply

Changes made to Chinese thesaurus

Latest comment: 1 year ago14 comments3 people in discussion

Do you know what's going on with the Chinese thesaurus at the moment? The pinyin is not displaying (randomly it seems) for certain terms, e.g. at Thesaurus:了解, 清楚, 知道, 覺, etc. Also, I note that the head word is now showing up on the synonyms/antonyms list in the respective entries. Can this be disabled? ---> Tooironic (talk) 23:41, 24 June 2023 (UTC)Reply

@Tooironic I switched the Thesaurus to use generic templates instead of Chinese-specific ones. I'm not sure why the pinyin isn't showing up for certain terms, maybe User:Wpi and/or User:Theknightwho can debug this as I'm not sure how that code works. Also, what do you mean by "the head word is now showing up on the synonyms/antonyms list in the respective entries"? Can you give an example? Benwing2 (talk) 23:44, 24 June 2023 (UTC)Reply

For example, at 提供, the word itself (提供) appears in the synonyms list. This was not happening before the changes were made. ---> Tooironic (talk) 23:50, 24 June 2023 (UTC)Reply

Ahh right, yeah that is a bug; I'll look into fixing it. Benwing2 (talk) 23:51, 24 June 2023 (UTC)Reply

Thank you. ---> Tooironic (talk) 00:07, 25 June 2023 (UTC)Reply

@Tooironic OK, this issue should be fixed. Benwing2 (talk) 00:23, 25 June 2023 (UTC)Reply

Thanks again! ---> Tooironic (talk) 00:45, 25 June 2023 (UTC)Reply

@Tooironic All the terms where the pinyin isn't showing up seem to have two pinyins listed on their respective pages, which is probably what's going on. However, in most cases where there are two pinyins, one of them is just a variant of the other with neutral tone for the second syllable, so maybe we can handle these specially. Benwing2 (talk) 23:47, 24 June 2023 (UTC)Reply

@Tooironic previously the zh-specific templates were using Module:zh/extract and returns the first translit listed on the page (which is not always correct), which has some weird and unefficient code in it. It's now handled by Module:zh-translit, which now shows nothing when there are multiple readings. You can find a list of links with such situation at Special:WhatLinksHere/Template:tracking/links/manual-tr/zh.

The reasoning behind this is that always displaying a pinyin encourages people to not check them and leads to incorrect information. It's pending User:Theknightwho to add a functionality to do this more smartly by indicating a desired reading on the entry, but we've not gone into the details yet. TBH I think a better way would be using {{etymid}}s, but it seems the discussion or progress is going nowhere at the moment. – Wpi (talk) 07:47, 25 June 2023 (UTC)Reply

I think it's fine to display the most common pinyin reading by default. Careful editors like me will always check and modify readings where necessary. ---> Tooironic (talk) 08:11, 25 June 2023 (UTC)Reply

@Tooironic: yes, there are careful editors, but there are also people who don't speak Mandarin fluently (e.g. myself), or even not knowing Chinese at all (e.g. people who need to mention Chinese terms in the etymologies of other languages); these groups of people are, and will be the sources of such errors. – Wpi (talk) 18:38, 26 June 2023 (UTC)Reply

@Tooironic, Wpi I am inclined to agree with Wpi; it seems for every careful editor out there, there are 10 non-careful editors who either won't notice the error or won't care about fixing it. Benwing2 (talk) 22:24, 26 June 2023 (UTC)Reply

But there are soooo many 多音字. It will just create more hassle with little benefit. ---> Tooironic (talk) 00:20, 27 June 2023 (UTC)Reply

If there are really two readings, what good does it do to pick one at random when it may be wrong? IMO better to show both than pick one. Benwing2 (talk) 01:32, 27 June 2023 (UTC)Reply

Incorrect form of German noun

Latest comment: 1 year ago1 comment1 person in discussion

Hello, could you please delete the page “Nerden” (see talk page)? I have already requested the deletion of the page here a while ago. --Latisc (talk) 10:30, 25 June 2023 (UTC)Reply

Duplicate Swedish participles

Latest comment: 1 year ago2 comments2 people in discussion

Hi! First of all: thanks for the recent changes to the Swedish participles. The new format makes so much more sense, and the old system has been bugging me forever. However, I see that in some instances, the bot seems to have created duplicate "Participles"-headers. See this edit. I suspect this is a relatively easy fix. For all pages that have two identical ===Participle=== {{head|sv|past participle}} # {{past participle of|sv|...}}, just remove the second one. Again, thanks! Gabbe (talk) 11:50, 26 June 2023 (UTC)Reply

@Gabbe Should be fixed. Thanks for your well wishes; the old situation was an utter mess and I'm glad you appreciate the changes. In the case of vederbörande, it had entries for both past and present participle, which seems wrong, so I deleted the past participle entry. Benwing2 (talk) 22:22, 26 June 2023 (UTC)Reply

Slovak and Old Czech modules

Latest comment: 1 year ago3 comments2 people in discussion

Hello, I would like to thank you for your work at Czech modules. But Slovak has no noun or adjective declension modules, so I'd like to ask whether you're going to do something there. If not, I could possibly try to create “Module:sk-adjective” or “Module:sk-noun” the same way the Czech ones are. I will probably give up soon, but in case you aren't going to create them, can I try to create those two modules using your Czech modules as a pattern? And Old Czech has literally no modules. Can I try to create them using the existing Czech modules and just adjust them? I am worried about your and other authors' copyrights and I don't even know whether I have the right to create modules. Zhnka (talk) 07:09, 27 June 2023 (UTC)Reply

@Zhnka I don't have any current plans to work on Slovak inflection modules. I still need to finish the Czech verb module, which takes precedence. So go ahead and work on creating Slovak declension modules. Nouns are much more complex than adjectives so I would start with adjectives. Yes you have the right to create modules and you don't need to worry about the copyright issues for Wiktionary module code that you're copying into new module code. All Wiktionary module code is automatically released under two licenses: (1) the CC By-SA 4.0 license (i.e. you can use the code as you see fit as long as you keep the attribution and the existing licensing conditions) and the GFDL (GNU Free Documentation License, which is generally considered a pretty awful license so you can more or less ignore it). You can actually see the mention of these licenses just above the "Publish changes" button: It says (for me at least)

By clicking the “Publish changes” button, you are agreeing to the Terms of Use and the Privacy Policy, and you irrevocably agree to release your contribution under the CC BY-SA 4.0 License and the GFDL. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.

Benwing2 (talk) 07:52, 27 June 2023 (UTC)Reply

I only added missing pronoun forms to the template cs-noun. I just don't know how to put two footnotes at once to "něho". I hope I helped. Zhnka (talk) 15:16, 6 July 2023 (UTC)Reply

Spanish gender-neutral terms

Latest comment: 1 year ago14 comments4 people in discussion

At amigue and related entries, I've noticed the WingerBot has replaced the n (neuter/neutral) marker with "mf by sense" which isn't really accurate for how the terms work inherently. This change should be reverted; I'd revert it myself, but I'm unsure of how wide-reaching this change was. AG202 (talk) 04:16, 5 July 2023 (UTC)Reply

@AG202 I made the change on all such gender-neutral terms, which is probably around 10 or so. Can you explain why this isn't accurate? It seems to me that a term like amigue can be used to refer to both male and female friends, hence el amigue or la amigue, very similar to a term like terrorista that does not have an inherent gender and can be either male or female depending on the reference. The issue I have with n is that (a) there's not really any neuter gender in Spanish, and (b) to the extent there is, it occurs with terms like ello that are very different conceptually from amigue. Benwing2 (talk) 04:22, 5 July 2023 (UTC)Reply

@Benwing2 "Amigue", by nature of being a gender-neutral neologism, can't and shouldn't be used with the gendered el or la but instead the general-neutral neologism article le, so le amigue or une amigue simpátique. It doesn't follow the norms of "standard" Spanish grammar to begin with. It is not the same thing as words like terrorista or artista. As for the usage of "neuter", if we don't feel comfortable using n since it's used for things like ello, then it shouldn't have a gender marker at all (or maybe a new one). AG202 (talk) 04:40, 5 July 2023 (UTC)Reply

@AG202 I see, so maybe in this case we should use "c" = common gender. This is normally used for languages such as Danish and Dutch where masculine and feminine are merged but it could be coopted for this use. Benwing2 (talk) 04:51, 5 July 2023 (UTC)Reply

@Benwing2 Hmmm I'm still not sure if that'd be best. I'll ping the other Spanish editors: (Notifying Ungoliant MMDCCLXIV, Metaknowledge, Ultimateria, Koavf): AG202 (talk) 20:11, 5 July 2023 (UTC)Reply

@AG202 What is your preferred solution? Benwing2 (talk) 20:34, 5 July 2023 (UTC)Reply

To the extent that I've seen (e.g.) latine in the wild, it's been with neo-articles, not el/la articles, so in my head as a non-native, it seems like it's neuter, not bigender. —Justin (koavf)❤T☮C☺M☯ 21:15, 5 July 2023 (UTC)Reply

Maybe then we should create a new gender gender-neutral; using n will auto-classify the nouns in CAT:Spanish neuter nouns, which feels wrong to me. With a new gender, we'd get CAT:Spanish gender-neutral nouns. Benwing2 (talk) 21:19, 5 July 2023 (UTC)Reply

That works for me honestly. AG202 (talk) 02:07, 6 July 2023 (UTC)Reply

Sounds good. Neuter or common seem out of place here. Ultimateria (talk) 20:02, 6 July 2023 (UTC)Reply

@AG202, Koavf, Ultimateria OK, I added gneut as a new gender abbreviation for gender-neutral, which displays as gender-neutral. Currently the tooltip just says gender-neutral as well, but maybe we can change it to something more explanatory. You can see an example of this in amigue. Let me know if this abbreviation works, and I'll fix up the remaining pages. (Note also, currently amigue is getting added both to CAT:Spanish gender-neutral terms and the new category CAT:Spanish gender-neutral nouns; I'll clean this up.) Benwing2 (talk) 22:09, 6 July 2023 (UTC)Reply

I was thinking about this thread and remembered that we had this discussion a couple of years back: Wiktionary:Beer_parlour/2021/November#Gender-neutral_Spanish_neologisms_(amigx,_maestrx,_etc.). Not sure if this is just redundant, but I figured I would alert/remind you. —Justin (koavf)❤T☮C☺M☯ 07:22, 18 July 2023 (UTC)Reply

@Koavf Thanks. I was waiting for comments/go ahead from one of you, which no one gave, but I'll assume y'all are OK with it. I'll take a look at the BP thread tomorrow; now it's sleepy time. Benwing2 (talk) 07:25, 18 July 2023 (UTC)Reply

Oh yeah, I have no objection: silence is approval here. Sweet dreams. —Justin (koavf)❤T☮C☺M☯ 07:27, 18 July 2023 (UTC)Reply

Need your input on a policy impacting gadgets and UserJS

Latest comment: 1 year ago1 comment1 person in discussion

Dear interface administrator,

This is Samuel from the Security team and I hope my message finds you well.

There is an ongoing discussion on a proposed policy governing the use of external resources in gadgets and UserJS. The proposed Third-party resources policy aims at making the UserJS and Gadgets landscape a bit safer by encouraging best practices around external resources. After an initial non-public conversation with a small number of interface admins and staff, we've launched a much larger, public consultation to get a wider pool of feedback for improving the policy proposal. Based on the ideas received so far, the proposed policy now includes some of the risks related to user scripts and gadgets loading third-party resources, best practices for gadgets and UserJS developers, and exemptions requirements such as code transparency and inspectability.

As an interface administrator, your feedback and suggestions are warmly welcome until July 17, 2023 on the policy talk page.

Have a great day!

Samuel (WMF), on behalf of the Foundation's Security team 23:02, 7 July 2023 (UTC)Reply

Just to save you some time:

Latest comment: 1 year ago4 comments2 people in discussion

The module error at Wiktionary:Grease_pit/2019/April is caused by this edit. It's hard to find because it's literally hidden at the very bottom of the page. Even after I narrowed it down to the section by previewing one section at a time, it took a while to spot the "Click to show or hide list" markup that hides the error message. I don't think you could have hidden it better if you tried. Chuck Entz (talk) 00:52, 8 July 2023 (UTC)Reply

The error at Template:pi-nr-inflection of/documentation is caused by the same thing. Chuck Entz (talk) 00:56, 8 July 2023 (UTC)Reply

@Chuck Entz Apologies, not my intention to make it difficult to find. Benwing2 (talk) 01:13, 8 July 2023 (UTC)Reply

No need to apologize. I found it amusing, which is the only reason I mentioned what I went through to find it. Thanks for cleaning it up. Chuck Entz (talk) 01:46, 8 July 2023 (UTC)Reply

Early Medieval Latin (EML.)

Latest comment: 1 year ago7 comments3 people in discussion

Any chance that this could be added as an etymology-only language, similar to LL. or VL.? Nicodene (talk) 12:54, 12 July 2023 (UTC)Reply

@Nicodene We should definitely have Early Medieval Latin, but we're generally moving away from nonstandard etym-only codes as they add unnecessary complexity. Theknightwho (talk) 14:16, 12 July 2023 (UTC)Reply

@Nicodene, Theknightwho I am not opposed to adding an etymology-only language but I agree it would be better to just use la-eme or similar as the code. Benwing2 (talk) 18:13, 12 July 2023 (UTC)Reply

I'm fine with any code. Nicodene (talk) 18:44, 12 July 2023 (UTC)Reply

@Nicodene I added Early Medieval Latin with code la-eme; but hold off for a bit in using it until User:Theknightwho weighs in because they might prefer a different code (a more canonical code might be la-med-ear but that's four more chars to type). Benwing2 (talk) 19:13, 12 July 2023 (UTC)Reply

@Benwing2 No objections from me - I prefer being practical about it. Theknightwho (talk) 19:15, 12 July 2023 (UTC)Reply

Thanks. Nicodene (talk) 19:15, 12 July 2023 (UTC)Reply

Deletion tags

Latest comment: 1 year ago20 comments4 people in discussion

Why did you remove the deletion tags without resolving the problem?

For example, you now claim that a letter which may only exist in Aiton is "translingual". Why intentionally add what appears to be false information?

In other cases, you claim that letters which are not confirmed to exist in any language are [mul] translingual rather than [und] undetermined. kwami (talk) 04:19, 15 July 2023 (UTC)Reply

@Kwamikagami I asked you a few days ago to convert the deletion tags to {{rfd}} tags. Did you not get that ping? You can't just tag things with a speedy deletion tag when there is any controversy at all about whether to delete them. In many cases furthermore you deleted all the content and then tagged the page with a speedy deletion tag with the message "no content", which is just bizarre. Benwing2 (talk) 04:26, 15 July 2023 (UTC)Reply

In general you need to start a discussion on these issues *BEFORE* simply going and making all the changes. Benwing2 (talk) 04:27, 15 July 2023 (UTC)Reply

Also I see in some cases other admins reverted your changes and then you restored them (sometimes more than once). Please don't edit war any more. Benwing2 (talk) 04:30, 15 July 2023 (UTC)Reply

I need to confirm it's acceptable to correct a particular error before I'm allowed to fix the error? And I need to request permission to tag a spurious article for deletion before tagging it for deletion? Could you point me to where it says that in our policy? Because that would mean I can create all sorts of bogus content on Wiktionary, and it would take weeks to months for people to get rid of it. kwami (talk) 04:35, 15 July 2023 (UTC)Reply

I didn't delete any content. I deleted nonsense, and without the nonsense, there was nothing left. But even if I'd left the nonsense in, there would still not be any dictionary content, so the deleting reason would still be valid. kwami (talk) 04:48, 15 July 2023 (UTC)Reply

@Kwamikagami Speedy deletion tags are for non-controversially bogus content, which isn't the case here; many of these articles have been around for years before you came along. As for "correcting an error" and "nonsense", these are your opinions and not clearly the case. Benwing2 (talk) 04:51, 15 July 2023 (UTC)Reply

Calling Burmese "translingual" is clearly nonsense. That's not just my opinion, but the opinion expressed at the Beer Parlor -- language entries should be under the appropriate language header, and translingual sections should not be about specific languages. (Though a list of applicable languages would of course be acceptable.)

Also, there has been consensus that, when someone creates an entry for a Unicode character and blindly copies the Unicode name as the "definition" of the character, the article has no content and should be deleted. Unicode names are often inaccurate, after all, and in any case a name is not a definition.

If someone comes across a Unicode character and wonders what it means or what it's used for, sees that Wiktionary has an article on that character and comes here for clarification, and all we do is repeat the Unicode name as if that meant something, then the entry is useless. kwami (talk) 04:58, 15 July 2023 (UTC)Reply

@Kwamikagami I haven't been following the BP discussion closely so I can't say whether there is consensus but I do see some people disagreeing with you. Furthermore many of your changes made the entries ill-formatted and generally worse off, and you still seem to be missing the fundamental point that speedy deletion is inappropriate in these cases. Benwing2 (talk) 05:03, 15 July 2023 (UTC)Reply

Okay, but in many cases you're still purposefully restoring bogus material that I had fixed (at least provisionally) and not just deleting the tags.

As for empty articles, I'll take an example from a zoom call I had today. What if the only "content" for the article ɿ was "a reversed r with a fishhook". Considering that the letter is not a reversed r and does not have a fishhook, should that really be acceptable as a Wiktionary definition?

As for tagging these all for more substantial discussion, there are hundreds of such articles. Do we really want to clog up the discussion board with them, and have to wait weeks to months for resolution, when there are admins who are willing to delete articles that have no actual content without wasting everyone's time? kwami (talk) 05:09, 15 July 2023 (UTC)Reply

P.S. I see no indication that User:RichardW57m is an admin. Is he really? He sure doesn't act like it. kwami (talk) 05:17, 15 July 2023 (UTC)Reply

Take Ʈ. Can you honestly tell me that this letter is used across multiple languages? Because I've looked, and haven't been able to find a single one. Not saying they don't exist, but isn't it a bit premature to call it "translingual"? kwami (talk) 05:19, 15 July 2023 (UTC)Reply

@Kwamikagami OK, you're writing a lot. To respond to your points one by one:

Your attitude is often hostile and confrontational; I ask you to tone it down.
I restored the old material because you made a lot of ill-advised changes (which are often even contradictory to your arguments above), such as changing Translingual to Undetermined, deleting *all* the content in many cases, tagging with speedy deletion rather than trying to add explanations (if that's what you believe is lacking) and disingenuously tagging with "no content" after deleting the content. I don't have a strong opinion about whether these articles should be Translingual, but these sorts of changes are unwelcome regardless.
Yes this needs to go through RFD; it's not wasting people's time, and you can RFD all the articles together if you prefer. How many times do I have to tell you that speed deletion is only intended for entirely uncontroversial changes? This *IS* a policy, also on Wikipedia, so surely you know it.
RichardW57 is not an admin but I never said he was, and I didn't see any evidence he reverted your changes; it appears to be admins who reverted you.

Benwing2 (talk) 06:06, 15 July 2023 (UTC)Reply

It was mostly RichardW57 who reverted me, insisting that the Burmese Army makes Burmese superior to other languages, so Burmese must be "translingual" (and that it's a "ban-worthy" edit to deny that), while all other languages of Burma are properly pronounced as Burmese. In other words, he's an admitted bigot, and yes, that does make me rather hostile. kwami (talk) 06:18, 15 July 2023 (UTC)Reply

@Kwamikagami OK, I missed that, my apologies. I saw reverts by Surjection, Thadh and TheDaveRoss, all of whom are admins. Honestly no one seems to like RichardW57 much so don't get upset about his garbage. Benwing2 (talk) 06:46, 15 July 2023 (UTC)Reply

All right, I'll try to cool down.

At first I thought maybe he didn't recognize how 'script' and 'alphabet' are used on the WP pages he linked to, but no, he clarified that he means them in exactly that sense, and actually said that it's the army that makes Burmese superior to other languages. kwami (talk) 06:50, 15 July 2023 (UTC)Reply

Bigotry is not tolerated around here. I have not had a chance to read the thread in question but bigotry is a blockable offense. Benwing2 (talk) 07:52, 15 July 2023 (UTC)Reply

He even deleted entries for Burmese, arguing that they're redundant because they duplicate the translingual entry, which is specifically for Burmese, with the sorting order of the Burmese alphabet (and a link to the WP article on the Burmese alphabet), IPA transcription as Burmese, audio recording in Burmese, etc., and this is deliberate rather than due to misunderstanding of the scope of "Burmese" in this context (ethnic Burmese, not the country of Burma), and that it's justified by the Burmese army. He clarified this when I tried fixing it to the Mon-Burmese script (with a link to that article) or did anything else to remove specifically Burmese elements from a supposed translingual section. kwami (talk) 08:28, 15 July 2023 (UTC)Reply

@Benwing2: What is the Wiktionary part of the process for extracting the £20,000 (includes prompt payment discount) compensation from user Kwamikagami for these libellous statements? --RichardW57 (talk) 02:02, 29 July 2023 (UTC)Reply

@RichardW57 there isn't any, and anything that can be construed as a legal threat is always a bad idea. Chuck Entz (talk) 02:28, 29 July 2023 (UTC)Reply

Replacement of unnecessary redirects and templates

Latest comment: 1 year ago3 comments2 people in discussion

Hello, here is another batch of redirects and templates to replace:

{{RQ:Guardian Online}} → {{RQ:Guardian}}
{{RQ:LstWkTnt}} → {{RQ:Last Week Tonight}}
{{RQ:Moxon ME}} → {{RQ:Moxon Mechanick Exercises}}
{{RQ:WaPo Online}} → {{RQ:WaPo}}
{{RQ:Wiseman SCT}} → {{RQ:Wiseman Chirurgicall Treatises}}

Thank you. — Sgconlaw (talk) 16:13, 15 July 2023 (UTC)Reply

@Sgconlaw Should be done. Benwing2 (talk) 03:57, 17 July 2023 (UTC)Reply

Thanks very much! — Sgconlaw (talk) 17:25, 17 July 2023 (UTC)Reply

Customising the Lua error function to mitigate errors

Latest comment: 1 year ago4 comments2 people in discussion

Hiya - while working on the parser, I was thinking that it would be good to have a customised Lua error function so that we can catch errors in very large Lua invokes without causing the whole thing to grind to a halt. As an example, imagine if {{multitrans}} was able to catch errors caused by individual translations, and to display them as though they were in their own separate invocation. This would greatly reduce the risk of whole-section or even whole-page failures. It's probably possible using a combo of xpcall and debug.traceback, but I just wanted to get your insight. Theknightwho (talk) 15:41, 18 July 2023 (UTC)Reply

@Theknightwho I think it depends on whether adding a bunch of xpcall or similar function invocations will increase memory usage. I found when I implemented language-specific form-of data that I needed to have a list of the languages with modules to avoid having to pcall() the module invocation, which seemed to dramatically increase memory usage (at least when no module existed for a given language). Benwing2 (talk) 18:59, 18 July 2023 (UTC)Reply

@Benwing2 I think it should be okay - the wikitext parser makes tens of thousands of calls of xpcall, because when it sees (e.g.) {{{ it assumes it's an argument opening, and then uses errors to unwind the stack and fail out if it encounters a failure condition (meaning it'll then try parsing it as { followed by a template opening, and so on). Because everything between the layer start and the failure trigger needs to be unwound and rebuilt, the same bit of text can be reparsed many times under different sets of assumptions, each layer of which involves a call into xpcall. Caching massively speeds up this process.

I've just done a test on teacher, which is 20,506 bytes, and parsing it uses xpcall 34,943 times and uses 13.5MB of memory.

I'm not sure why pcall causes memory problems with failed module invocations. Does it happen if you try it with xpcall as well? Theknightwho (talk) 19:40, 18 July 2023 (UTC)Reply

@Theknightwho Haven't tried it, might try it at some point. Benwing2 (talk) 19:47, 18 July 2023 (UTC)Reply

Deprecated templates in Accelerated forms

Latest comment: 1 year ago9 comments2 people in discussion

Hi, you recently deleted {{Template:past tense of}} which has broken the accelerated forms for Danish. Presumably it's not too difficult to switch it over to instead use the current {{Template:inflection of}} but I don't know how, would you mind giving some pointers as to how I might go about it? Helrasincke (talk) 10:28, 19 July 2023 (UTC)Reply

@Helrasincke Sorry I will fix it. All you really have to do is specify the inflection tags in the accelerator form value in the link, and delete any special code in MOD:accel/da that uses {{past tense of}}. Benwing2 (talk) 18:25, 19 July 2023 (UTC)Reply

@Benwing2 Thank you - I don't yet understand enough to really contribute much to the module programming. For some reason it is now working on some verbs e.g. oplyse and not others e.g. tale (verb) (missing forms talende, talen. You need the OrangeLinks gadget enabled to notice though. On a side note, it also doesn't seem to work at all on nouns (see also tale (noun), missing form talen) - how simple would this be to get working? Do you need any extra info to get this working or am I better off asking at the grease pit? P.s. I've now given pønse a work over - I presume that's what you were after? Helrasincke (talk) 20:50, 19 July 2023 (UTC)Reply

@Helrasincke This shouldn't be hard to get working and I can fix it, just let me know what isn't working. I tried clicking on talende and talen and it seems to work for me. Benwing2 (talk) 22:02, 19 July 2023 (UTC)Reply

BTW the only difference I can see is that the forms talende and talen are already defined in another language. In cases like that you have to have the Orange Links gadget enabled I think, otherwise the acceleration won't work. Benwing2 (talk) 22:04, 19 July 2023 (UTC)Reply

@Benwing2 Thanks for that. I'm not sure what's going on with the OrangeLinks - I have the gadget enabled but perhaps there's something else going on on my end because I've definitely been able to use the accelerated generation in similar situations before, for instance with Ukrainian links where there was already a Russian entry present. As to your changes, it's now working properly for most forms but I'm still getting the wrong templates generated for a few forms (from what I can see it's mostly participles), for example at oplyse, the present and past participles are still using the old templates. Helrasincke (talk) 08:15, 20 July 2023 (UTC)Reply

@Helrasincke I actually made that change intentionally, to use the more specific {{present participle of}} and {{past participle of}}, on the theory that those two templates aren't being deprecated. However, I'm not sure the wisdom of it, maybe we should just use {{infl of}} like for everything else. What do you think? Benwing2 (talk) 19:13, 20 July 2023 (UTC)Reply

I see, that makes more sense. Yeah I think it'd be best to use {{infl of}} for everything other than the lemma entry. Helrasincke (talk) 19:17, 20 July 2023 (UTC)Reply

@Helrasincke OK I removed the special use of {{present participle of}} and {{past participle of}} and switched things to use {{infl of}} instead of {{inflection of}}. Benwing2 (talk) 19:28, 20 July 2023 (UTC)Reply

Module "bg-pronunciation" needs several fixes

Latest comment: 1 year ago7 comments2 people in discussion

Hi Benwing2,

I've noticed that the "bg-pronunciation" module deviates in several places from canonical treatments of Bulgarian phonology. As a native Bulgarian speaker and Wiktionary contributor, the number of resultant entries with incorrect IPA pronunciation truly bothers me.

What is the best way to point out issues and encourage their speedy resolution? The module's talk page doesn't seem to be getting much traction, and I don't know OTOH who the approved module admins would be so that I could escalate to them. I'd really appreciate your help!

Thanks,

Chernorizets (talk) 02:43, 21 July 2023 (UTC)Reply

@Chernorizets Thanks for pinging me. Apologies, I don't always follow the talk pages of various modules and templates, so in general I need to be pinged. Can you make a list of what needs to be fixed? Also maybe we can engage User:Kiril kovachev, who is also a native Bulgarian speaker. Benwing2 (talk) 02:51, 21 July 2023 (UTC)Reply

@Benwing2 no worries, and I'd be happy to try to come up with a good list. Where would it be best for me to provide that list?

Thanks,

Chernorizets (talk) 02:55, 21 July 2023 (UTC)Reply

@Chernorizets You can put it on the module talk page but make sure to ping me. Benwing2 (talk) 03:20, 21 July 2023 (UTC)Reply

Done. Thanks! Chernorizets (talk) 04:45, 21 July 2023 (UTC)Reply

@Benwing2 I believe @Kiril kovachev is done with the set of IPA fixes I had identified in the thread on the module talk page. I know the discussion got long and unwieldy - sorry about that - but, per Kiril's summary, the scope has been brought back to IPA improvements, and anything else we touched upon will get its own discussion when the time is right. What's next for getting Kiril's changes applied to the module? Those wrong pronunciations keep poking my eyes esp. now that I'm adding entries more regularly :-) No rush by the way, I just don't want this to fall by the wayside in view of newer Bulgarian-related discussions on BP such as anagrams, the Banat dialect, etc. Thanks! Chernorizets (talk) 09:08, 1 August 2023 (UTC)Reply

@Chernorizets Sorry, I will respond on the Module talk page. Benwing2 (talk) 19:45, 1 August 2023 (UTC)Reply

Something broke

Latest comment: 1 year ago4 comments2 people in discussion

Seems like something broke: Reconstruction:Proto-Indo-European/plew-. --{{victar|talk}} 06:30, 21 July 2023 (UTC)Reply

@Victar Is it still broken? Benwing2 (talk) 06:31, 21 July 2023 (UTC)Reply

I think this is because I made the Latinx -> Latnx change slightly out of sync. It should be all fixed now and I'm clearing the errors in CAT:E. Benwing2 (talk) 06:36, 21 July 2023 (UTC)Reply

I figured it was related to that. Looks to be loading now, thanks. --{{victar|talk}} 06:37, 21 July 2023 (UTC)Reply

New abuse filter - WingerBot help?

Latest comment: 1 year ago3 comments2 people in discussion

Hi - I noticed that WingerBot ran into the new abuse filter, which is supposed to prevent any language nesting under Malay or Indonesian in translation tables (with a couple of carve-outs for Jawi and Rumi, as they're legitimate script subheadings for Malay). We have a perennial problem of (mostly IPs) nesting Indonesian under Malay or vice-versa, or in some cases simply grouping any languages spoken in either country under the one heading.

Annoyingly, it's not possible to only run it on the newly added text (as we do with most filters), because if someone moves language X under language Y, only the moved lines are considered "added lines", and so the filter has to be run on the whole of the saved text instead. This means if a page already has the problem and someone edits an unrelated part of it, it still triggers the filter.

Would it please be possible for WingerBot to fix all of the affected pages? It would help to prevent random editors from unfairly running into the filter. Pinging @Chuck Entz, who may be interested to read this as well. Theknightwho (talk) 07:11, 21 July 2023 (UTC)Reply

@Theknightwho This should be possible although I need to look at the filter to see exactly how it works. In the meantime you might want to make it a non-blocking filter, i.e. the second time someone saves the page, it's allowed. As is, if there's an existing problem of this nature anywhere on the page, you can't save the page at all without fixing it, and the abuse filter name isn't so clear on exactly what needs to be done. Benwing2 (talk) 07:27, 21 July 2023 (UTC)Reply

@Benwing2 Thanks. I've changed it to only warn, and have created a specific notice that explains the issue (MediaWiki:Abusefilter-Malayic). Theknightwho (talk) 08:08, 21 July 2023 (UTC)Reply

attesting to Ʈ

Latest comment: 1 year ago4 comments2 people in discussion

Answered on my talk page. I finally found attestation of use. I don't see how correcting errors (e.g. you claimed Ʈ is used in Serer -- AFAICT it's not -- and that the capital form is Ƭ, also an obvious error) or providing an instance actual use (I can give you the citation if you like) would count as "messing" with an article, or how it could possibly be a blockable offense. kwami (talk) 07:11, 22 July 2023 (UTC)Reply

This discussion should happen in one place, either here or on your talk page but not both. You keep saying *I* claim such and such, e.g. Ʈ is used in Serer. By reverting your changes to the preceding stable version I'm not making any particular claims, but simply undoing the damage you caused. Benwing2 (talk) 07:18, 22 July 2023 (UTC)Reply

Should I revert myself? kwami (talk) 07:28, 22 July 2023 (UTC)Reply

Do you mean should you revert your changes to Ʈ? In this case someone else already did. Benwing2 (talk) 07:29, 22 July 2023 (UTC)Reply

Descriptions

Latest comment: 1 year ago5 comments2 people in discussion

I'm not trying to be disingenuous, but is it okay to move descriptions of letters to a 'Description' section? kwami (talk) 08:29, 22 July 2023 (UTC)Reply

@Kwamikagami There's no Description header here, see WT:ELE. But there's a 'Usage notes' section that you can use for arbitrary comments such as what the purpose of a letter is, which languages use which letters and how, etc. Benwing2 (talk) 19:46, 22 July 2023 (UTC)Reply

Hmm, I guess there is in fact such a header per WT:ELE although I haven't seen it used commonly. I would put such text under 'Usage notes'; see ʍ for an example. Benwing2 (talk) 19:48, 22 July 2023 (UTC)Reply

OK, I guess you put those usage notes there, but I'm fine with it. Benwing2 (talk) 19:51, 22 July 2023 (UTC)Reply

BTW it would be much better IMO for you to engage in discussions in the BP than ask me what to do and do it without discussions. Benwing2 (talk) 20:11, 22 July 2023 (UTC)Reply

Hani sortkey - mass data module deletion request

Latest comment: 1 year ago5 comments2 people in discussion

Hi - since the data for Module:Hani-sortkey was serialized, there's no point in having it split between 196 different modules anymore, which makes it totally unwieldy. Instead, I've consolidated it in a single (huge) table in Module:Hani-sortkey/data, which can be edited as necessary, and pointed the serialiser at that instead. This doesn't matter from a memory perspective, because it's not directly accessed. It might be that we decide it's better to split the massive table up (e.g. by Unicode block), but that still wouldn't necessitate hundreds of data modules like we have now.

Would it please be possible for you to mass delete Module:Hani-sortkey/data/001 to Module:Hani-sortkey/data/196 and their associated documentation pages? Theknightwho (talk) 14:47, 24 July 2023 (UTC)Reply

@Theknightwho Sure. BTW in general, for cases like this where a set of related data modules are split up (e.g. Module:languages/data/3/LETTER and Module:languages/data/3/LETTER/extra), we shouldn't be creating identical doc pages for each one; instead, add an entry to Module:documentation to auto-generate doc pages for all of them. Benwing2 (talk) 19:34, 24 July 2023 (UTC)Reply

@Benwing2 Cheers - thanks. I think these were created by Erutuon and pre-date that, but I'm not sure. Theknightwho (talk) 19:38, 24 July 2023 (UTC)Reply

These are deleted. Benwing2 (talk) 07:16, 25 July 2023 (UTC)Reply

Great - thanks. Theknightwho (talk) 11:33, 25 July 2023 (UTC)Reply

Some issues with sorting prefix categories

Latest comment: 1 year ago2 comments2 people in discussion

Hi Ben - I've noticed that Module:affix sorts "X terms prefixed with Y" categories by ignoring the prefix. On one level this makes sense, as the assumption is that all the terms start with the same prefix, so ignoring it means that you can group terms under different headings depending on what comes after the prefix. However, this leads to a few problems, one language-specific and one general:

Japonic languages sort by reading, instead of by orthography. Ignoring the prefix means that 雌(めん)鳥(どり) (mendori) (雌 + 鳥) is sorted as とり instead of めんどり, which is confusing and unintuitive (note the missing vocalisation mark on the stem). This is because Module:affix is ultimately telling Module:Jpan-sortkey that it wants to sort 鳥, not 雌鳥. In some cases this could be much worse, if the first kanji reading on the page happens to be totally unrelated.
This is also a lesser problem for any other languages which scrape sortkeys, if the stem happens to be a redlink. Currently I think this only applies to Japonic languages, but I bet there are other languages where this would come in useful.
More generally, it creates problems if we want to (e.g.) group together English anthrop- and anthropo-, because ignoring the epenthetic -o- in sorting is likely just going to lead to confusion. Even if English doesn't, I know that some languages do group together affixes like this (e.g. Mongolian groups vowel harmonic variants together, though I can't think of any prefixes which vary by vowel harmony off the top of my head).
If you look at a category like Category:English terms prefixed with ab-, you can see that some terms are sorted by including the prefix (e.g. abenteric), which has happened because the page has {{af|en|ab-||t1=away from}}. This inconsistency is especially confusing.

While I completely understand why this was implemented, I just don't think the minor benefit (i.e. category headings) outweighs the issues this causes, and given points 3 and 4 I don't think this can be solved by simply exempting certain languages. Theknightwho (talk) 15:15, 24 July 2023 (UTC)Reply

@Theknightwho Hello. I don't think I actually implemented this particular hack, although I'd have to look at the history to make sure. As a result this probably needs a wider discussion. However, I do think it may be possible to make this work. For issues #1 and #2, we could treat languages that scrape sortkeys different (e.g. by not ignoring the prefix or whatever). More generally, I recently implemented language-specific affix mappings so that you can (and should) specify the surface form of the prefix or suffix and it will categorize it according to the "base" form (for example, currently for Latin, il-, im- and ir- are mapped to in-, so that a word like illēgālis can be written as {{af|la|il-|lēgālis}} and will be categorized under Category:Latin terms prefixed with in-). The same could be done with Mongolian vowel-harmonic prefixes, if any exist. If we have the affix mappings in hand, it should be possible to do sensible things with variants like anthrop- vs. anthropo- (although you'll need to explain more what the issue is here). As for (4), such specifications shouldn't exist but if they do, we can subtract the prefix from the headword to get the remainder. I understand your concerns and this is definitely a hack but the alternative is to have all words sorted under the same letter, which doesn't seem super helpful. Benwing2 (talk) 19:31, 24 July 2023 (UTC)Reply

odd Unicode 'macron'

Latest comment: 1 year ago29 comments4 people in discussion

Hi Benwing.

This redirect is problematic. For one thing, it's not covered at the target. For another, I'm not sure it should be: although called "macron" in Unicode, it's not a macron in the normal sense of the word and won't be used/defined the same way. I think it should probably link to the same place as the characters it joins to on either side (forming one long macron/overline diacritic) and defined according to how that diacritic is used. I don't know what that is, or I would have done it myself. But linking it to the macron article can only confuse people. kwami (talk) 09:09, 26 July 2023 (UTC)Reply

@Kwamikagami Which char is this? I can't extract it from the link above. BTW I'm trying to stay out of the discussions involving these chars; as with other chars, I merely restored what was there before. You need to engage the relevant people, e.g. User:RichardW57 and whoever created this page, and get consensus with them, not with me. Benwing2 (talk) 22:26, 26 July 2023 (UTC)Reply

Sorry, it should link now, if you can stop it from automatically redirecting. kwami (talk) 04:09, 27 July 2023 (UTC)Reply

It's U+FE26. It forms a diacritic with FE24 and FE25; it means nothing on its own. If anything, it should link to ◌͞◌, as it is similar but used for 3+ base characters, but according to Unicode it's intended specifically for Coptic. kwami (talk) 04:14, 27 July 2023 (UTC)Reply

@Kwamikagami Yeah I see your point, it is some special diacritic used for Coptic and should not be conflated with the regular Unicode macron. BTW I would strongly recommend once the block expires that you not make any changes to character entries (which includes adding new ones) until you've outlined in the BP what you want to change and make sure that other people are in agreement. The problem is that you have strong views on how these pages should look that are not what anyone else seems to feel, so if you just make changes without getting consensus to make them, you're going to run into trouble and probably get blocked again. Does this make sense? (I feel like I've told you this about 10 times now and you're not getting it.) I would also strongly suggest you think of this as a "spirit of the law" thing; your prior actions show that you act according to the "letter of the law" and do whatever you feel you can get away with within this, which is not going to work. Benwing2 (talk) 04:36, 27 July 2023 (UTC)Reply

Whoop whoop pull up created that RD, so I contact them.

It's not "what I feel I can get away with", I'm simply trying to improve entries and cut the amount of nonsense. If you could point out where you see me trying to game the system against the spirit of the law, I'd appreciate you pointing it out because I don't see it.

As for that not being "not what anyone else seems to feel", it's what quite a few editors seem to feel. There are multiple editors who agree that articles with no content should be deleted (indeed, in the edit history you'll sometimes see multiple deletions with edit summaries such as "no content apart from the Unicode definition", which is where I got that wording from), that translingual entries should be translingual and not about a specific language (with the attitude that of course they should, why are we even asking about it), that the information we provide should be verified and, if we cannot verify it, should be deleted, etc. None of these are controversial ideas apart from Richard who wants to delete individual language entries and to promote Burmese above other languages, and a couple editors who are under the impression that the Unicode Registry is an adequate source for dictionary entries.

Today hasn't been a good day for starting a new BP discussion on the recreated/unverified character articles, but I'll try to get to it tomorrow. kwami (talk) 05:07, 27 July 2023 (UTC)Reply

@Kwamikagami OK sounds good. What I mean by "letter of the law" is sometimes you've interpreted what I've said very literally, e.g. when I said there should be a moratorium on changes to single-char entries, at first you didn't respect that and then you respected it only for changes to existing entries and not additions of new entries of the same sort, which logically should be part of "changes". When I said you should discuss your changes and get consensus on the BP, you engaged only with the "Kwami block" discussion and not with the other discussions related to this issue. Also when you assert that none of your ideas are controversial apart from Richard, that's clearly belied by several people stating (in the "Kwami block" section, e.g. User:Theknightwho, User:AG202, User:Sameerhameedy) that you are going against consensus and not properly engaging the relevant editing communities before making far-reaching changes. I don't want to get in an argument over whether your ideas have consensus; they clearly don't at this point, and you need to establish that before further changes. Benwing2 (talk) 05:52, 27 July 2023 (UTC)Reply

Part of not commenting at other threads was having the time to do it, part was whether I thought I had anything to add. kwami (talk) 06:23, 27 July 2023 (UTC)Reply

@Kwamikagami If you don't have time right now, that is fine, but in that case changes should wait until you have the time to carry out the discussion. Benwing2 (talk) 06:27, 27 July 2023 (UTC)Reply

I will need to travel out of town for an indefinite period sometime soon because a friend is having medical problems, and I have financial and other stuff to take care of before I leave, so I don't know how much time I'll have in the near future. It might be a matter of getting done what I need to and then seeing if I have spare time for WT before the procedures are scheduled, or maybe waiting until I'm up there and things have settled down. kwami (talk) 05:41, 28 July 2023 (UTC)Reply

@Kwamikagami Wow, I am sorry to hear that. I hope your friend's medical procedures turn out well and everything goes according to plan. Take all the time you need for your friend and yourself; one's health and well being (including financial matters) always come first. Benwing2 (talk) 06:12, 28 July 2023 (UTC)Reply

@Kwami, Benwing2, RichardW57m: U+FE26 may be peculiar to the Coptic script, though I wouldn't be amazed to find it used on Latin script letters, but it might not be peculiar to the Coptic language. It might be used in Old Nubian - have you checked? Typing hurriedly, I think the correct approach would be to raise an {{rfc}} on the macron page. Would discussing the matter in U+FE26's talk page meet the spirit of the moratorium? A link there in the Beer Parlour might not be inappropriate, perhaps under 'Moratorium Avoidance'. The macron's page completely omits the Semiticists' use as a fricativisation marker, which should be familiar to students of Biblical Hebrew. --RichardW57 (talk) 08:10, 27 July 2023 (UTC)Reply

At least typing [[talk:︦]] works to access the tale page; to access the page itself I had to use a URL ending "%ef%b8%a6?redirect=no", and I couldn't work out how to link to it from Wikitext. We also have the redirect between OVERLINE and COMBINING OVERLINE is the wrong way round. We will need a privileged person to do the swap so that attribution is not lost. (This is a legal requirement.) Once that is sorted out, I think U+FE26 should redirect to the page for OVERLINE. The left and right half overlines (U+FE24 COMBINING MACRON LEFT HALF and U+FE25 COMBINING MACRON RIGHT HALF) should, I think, also redirect there. 'LEFT' and 'RIGHT' refer to their positions in the combined overline, not to the positions of their inking! RichardW57m (talk) 11:42, 27 July 2023 (UTC)Reply

@RichardW57 Sure, discussions don't have to be on the BP, although if you put them on some random talk page they might be lost in the future. Might be good to have them centralized somewhere. Benwing2 (talk) 19:21, 27 July 2023 (UTC)Reply

For discussions about a page, the talk page is the centralised place. Discussion on the BP are at some random month, and can be hard to find later. I'll go link to this discussion from the talk page. --RichardW57 (talk) 19:35, 27 July 2023 (UTC)Reply

BTW if you need some pages swapped like this and there's no complaints about it, let me know and I can do it. Benwing2 (talk) 19:22, 27 July 2023 (UTC)Reply

OK, can you please swap OVERLINE and COMBINING OVERLINE round so that the entry for COMBINING OVERLINE is a hard redirect to OVERLINE in accordance with WT:CFI#Combining characters. --RichardW57 (talk) 19:30, 27 July 2023 (UTC)Reply

The problem with following CFI for the combining overline is that the supposedly non-combining character is also a combining overline in many fonts. If a reader uses such a font, they're going to have a hard time clicking on links to the titular character, just as people here have been having a hard time getting to the rd. With the official combining character, it can at least be combined with a null carrier so that a link to it can be clicked on, which isn't really an option for the supposedly non-combining character. kwami (talk) 02:54, 28 July 2023 (UTC)Reply

But after a swap, clicking on the combining line will take you to the nominally spacing character, if only via a hard redirect. I think we need a navigation tool to get to the entries of

Unidentified combining characters.
Characters by codepoint.

The first is the more important. That is something to discuss on the Grease Pit. We might already have the needed bits, and just need to add or publicise the links. --RichardW57m (talk) 08:44, 28 July 2023 (UTC)Reply

I don't know enough about Coptic use to know if it should be rd to the overline. But Greek also has this convention, the two are essentially the same script, and Greek use is covered at ◌̅. If that's the correct place for it, then yes, I think the Coptic characters should rd there too. We should add info boxes for them, though. kwami (talk) 02:59, 28 July 2023 (UTC)Reply

Actually, no, it's not the same as Greek, but your conception justifies treating them together. Overline for abbreviation should be handled by combining overline as in Greek. Overlines for names start and end at the middles of consonants, so doing them by character encoding requires a left half at the start and a right half at the end. I'm not sure why there's a separate CONJOINING MACRON. Perhaps it works at a different height above the characters, or perhaps the two can be used on the same bit of word and one needs to know which overline to join the ends to. It seems fairly horrendous, but I've seen Andrew Glass do quadrates in cartouches, which he says is as complicated as it seems. TUS recommends mark-up for getting the effect, which means it was seen as hard work for poor dumb 'smart fonts'. Incidentally, the Unicode Standard puts Coptic, surely an African script, in the middle of the first chapter of European scripts. --RichardW57m (talk) 08:57, 28 July 2023 (UTC)Reply

The left and right are for the ends of the overline, or for two-letter abbreviations. The conjoining macron is for the middle letters where there are three or more. kwami (talk) 11:31, 28 July 2023 (UTC)Reply

@Kwamikagami: For two-letter abbreviations, that's not what we read in TUS (Section 7.3, Supralineation) or see in Lesson 19 at https://www.suscopts.org/ssc/wp-content/uploads/2021/04/CopticLessons.pdf. (They clearly had font trouble preparing the viewgraphs.) Capital letters have undocumented special treatment. The mid-letter to mid-letter overlines are also reportedly used 'to distinguish words' - I couldn't find any examples of them in that quick introduction or in Wiktionary. --RichardW57m (talk) 13:42, 28 July 2023 (UTC)Reply

Could you link to the TUS section?

That is what we see in the Unicode proposal that was accepted.[7] The difference is that they expected the left and right characters to join with the combining overline; it looks like the UTC changed this by adding a dedicated conjoining macron.

For example, R + left-joining + N + right-joining is a two-letter abbreviation, whereas R + overline + N + overline is gematria (the number 140).

I don't know if with the addition of the conjoining macron, it or the overline is intended for use with gematria. kwami (talk) 22:19, 28 July 2023 (UTC)Reply

Presumably this is a message for User:RichardW57? Benwing2 (talk) 22:33, 28 July 2023 (UTC)Reply

@Kwamikagami: I don't know how to link to the subsections of TUS. I navigate to TUS by going to https://unicode.org/main.html , clicking on 'Latest Version', and selecting the chapter from the list on the left. For the current latest edition, the chapter link is https://www.unicode.org/versions/Unicode15.0.0/ch07.pdf . I then look for the table of contents on the left, click on the relevant script section, and scroll down, in this case looking for the in text heading 'Supralineation'.

From the text there, I believe the use of letters as numbers is flagged by the use of the combining overline. A line above whole letters is encoded by combining overline. A line extending from the middle of one letter to the middle of another is encoded by the half macrons at the ends and the conjoining macron in the middle.

You've misread 'M' as 'N'. Weird as it may seem, ⲣ︤ⲙ︥ is apparently a vowelless prefix, probably easy for a Pole to pronounce, not an abbreviation, and appearing in Coptic ⲣⲙⲃⲉⲕⲉ (rmbeke). I'm afraid I'm not sure that our Coptic supralineation is reliable - I can't find any. --RichardW57 (talk) 00:19, 29 July 2023 (UTC)Reply

@Kwamikagami: Supralineation gets stripped from links, which is why I couldn't find any in the categories. I still haven't found any, though. --RichardW57 (talk) 00:32, 29 July 2023 (UTC)Reply

It would seem we need to keep the conjoining macrons distinct from the overline, then. Possibly it could be conflated with the generic macron, but it may be best to keep a distinct article. kwami (talk) 04:41, 31 July 2023 (UTC)Reply

@Kwamikagami: I think not:

Spacing and non-spacing discritics are to be put in the same entry, so I don't think we can keep conjoing macrons in a separate article to combining overline. Usage notes should probably labour the differences.
At the human level, they're almost the same thing, with just the stopping places differing.

I'd put the half 'macrons' in the same article as the conjoining macron. Can't do it immediately because of the moratorium.

I see different rendering system have the left and right halves different ways round. The Emacs I use couldn't get them to join either way round - I think there may be some rather complex shaping behaviour that needs them to be in the same rendering run, which doesn't work well with the Emacs line-wrapping algorithm. U+FE24 COMBINING MACRON LEFT HALF should go above the right half and vice versa {source: https://www.unicode.org/charts/PDF/UFE20.pdf), so "ⲣ︤ⲙ︥" above is the correct encoding. --RichardW57m (talk) 09:59, 31 July 2023 (UTC)Reply

More WF audios

Latest comment: 1 year ago4 comments4 people in discussion

Hey. Here is another WF account that they uploaded a lot of audios under labeled as "RP". Might wanna have the bot change them to "Southern England". lattermint (talk) 16:57, 26 July 2023 (UTC)Reply

I think RP is probably okay for them, as someone who speaks in RP, but I’ve not listened to them in large numbers. Theknightwho (talk) 10:55, 27 July 2023 (UTC)Reply

@Theknightwho OK thanks. Maybe User:Sgconlaw or User:Equinox can comment. The issue with the Vealhurl audios was especially those that claimed to be from a specific region, it seems. (As an American I can't make these fine judgments about English accents, need some British people to help.) Benwing2 (talk) 19:19, 27 July 2023 (UTC)Reply

@Benwing2, Theknightwho: I think if the audio files were indicated as RP it’s fine to leave them unchanged. Personally all the WF files sound like RP to me, but at some stage WF claimed he did not speak with RP so we settled on “Southern England” as a compromise. — Sgconlaw (talk) 01:30, 28 July 2023 (UTC)Reply

Template:new es demonym

Latest comment: 1 year ago4 comments4 people in discussion

Hey. Can you recreate Template:new es demonym? Just as relevant as Template:new en noun, methink BeirutGirlXX (talk) 21:55, 26 July 2023 (UTC)Reply

@BeirutGirlXX OK, dude who is not from Beirut and not a girl, I have restored {{new es demonym}} using {{demonym-adj}} and {{demonym-noun}} for use with gendered demonyms, and also added {{new es demonym mf}} for use with non-gendered demonyms (i.e. male and female are the same, such as terms in -ense). I know you hate templates but you'll have to learn to use {{demonym-adj}} and {{demonym-noun}}, which are not hard to learn. Benwing2 (talk) 22:22, 26 July 2023 (UTC)Reply

Thanks a lot! Good luck with the 'crat vote. I'll try to vote 6letter acronym (talk) 22:38, 26 July 2023 (UTC)Reply

Well done on the 'crat vote. Don't work too hard, now Common surname (talk) 00:02, 23 August 2023 (UTC)Reply

URL quotation fixes

Latest comment: 1 year ago3 comments2 people in discussion

Your ‘correct errors in {{quote-*}} templates (manually assisted)’ bot edits have created errors where there were none (example)—parameters after |url= were shifted to the left, leaving the translation parameter void. Perhaps this is up to me for using an atypically condensed quotation format. ―⁠Biolongvistul (talk) 13:31, 30 July 2023 (UTC)Reply

@Biolongvistul Shit, you are right. I should have only added the url param when there was an equal sign. Will fix. Benwing2 (talk) 16:46, 30 July 2023 (UTC)Reply

@Biolongvistul OK, everything should be fixed. Benwing2 (talk) 18:44, 30 July 2023 (UTC)Reply

Italian multiword verbs headword

Latest comment: 1 year ago2 comments2 people in discussion

I'm assuming you changed t:it-verb with the "a/@" parameter to show conjugations. Is there a particular reason (e.g., a prior discussion) for this for the multiword Italian verbs? Imetsia (talk) 16:02, 2 August 2023 (UTC)Reply

@Imetsia Yeah I did this. It is parallel to how we do things with Spanish and English verbal expressions. In Spanish, in particular, we tend to put the multiword principal parts in the headword but not include a full conjugation table (although this is definitely doable). I think it's useful because it shows how to words of the verbal conjugation interact with the remaining words of the expression. Benwing2 (talk) 08:32, 3 August 2023 (UTC)Reply

Quotation templates again

Latest comment: 1 year ago7 comments3 people in discussion

Sorry for the second inquiry this week. What is the reason for adding named parameters to quotation templates? Is there any instability with usage of unnamed ones? Or is it to make the syntax clearer? I agree that it may be a little opaque at first, especially when there are empty parameters, but the practice can be easily learnt and it is a feature for a reason. Also, economy helps. ―⁠Biolongvistul (talk) 21:27, 4 August 2023 (UTC)Reply

@Biolongvistul Hi. There are a discussion recently concerning this, in the BP or GP (I forget which one). The problem with numbered params is that (a) they're opaque as you mention (someone not familiar with the template will have a problem interpreting them), but even more, (b) they're very fragile, esp. when there are a large number of them, as with many of these quote templates. (For example, {{quote-hansard}} had 10 numbered params, and no one was actually using this functionality.) In addition, (c) each quotation template has its own interpretation of the numbered params, different from {{m}}, {{bor}}, etc., which follow one or two standard patterns; this increases the fragility even more. There are lots of mistakes I've found where people have messed up the numbered params, and lots more mistakes where people insert a value that has an embedded unescaped vertical bar in it (typically either in a URL or a title, but sometimes in the quoted text, the author, etc.), which accidentally results in part of the param being interpreted as a numbered param. (User:JeffDoozan and I worked to fix such issues with URL's recently, and found over 1,000 such instances.) If there were no numbered params, such errors could be caught by implementing checking for unrecognized params (which I'd like to do eventually), but this isn't possible if numbered params are allowed. The replacement named params aren't that long (see the discussion I cited; there are very few more than 6 chars), and if this is an issue, we can shorten the common named params to make it easier to type. My actual plan is to replace and deprecate/eliminate the functionality for the templates where it isn't used that much, and leave it for the few templates where it's in common use (probably {{quote-book}}, {{quote-journal}}, maybe {{quote-web}}). Quotation templates take significant work to enter in any case since there are typically a lot of params to put in, and the extra effort of using named params seems very small in comparison, different e.g. from {{m}}, {{bor}}, etc. Benwing2 (talk) 21:59, 4 August 2023 (UTC)Reply

Well, if {{quote-book}} and {{quote-journal}} stay the way they are, everything’s fine by me. Thank you for the reply. ―⁠Biolongvistul (talk) 22:04, 4 August 2023 (UTC)Reply

If we can't deprecate numbered params completely, is it possible to have the template throw an error if there are numbered params and named params? For example {{quote-book|en|1999|author|title}} would be valid, but {{quote-book|en|year=1999|author=homer|title=This title has ||bars||}} would throw a warning? That would allow the "purely numbered" params to keep working while hopefully generating a warning on any non-expected use. JeffDoozan (talk) 23:27, 4 August 2023 (UTC)Reply

@JeffDoozan Do you mean if there is an occurrence of a given numbered param and the corresponding named param? We could throw an error (a warning is unlikely to be noticed). That will catch most but not all the cases, e.g. if someone uses purely numbered params and accidentally puts an unescaped vertical bar in a numbered param. Benwing2 (talk) 00:08, 5 August 2023 (UTC)Reply

I'd throw an error if |2= is defined (even with an empty value) plus any other parameter except |3= through |8= (or whatever the maximum is for the given template). That would allow simple usage of numbered parameters, but if a quote is too complex to fit the numbered parameters, then it should use only named parameters. JeffDoozan (talk) 13:45, 5 August 2023 (UTC)Reply

@Biolongvistul Just FYI, there are at least 1,557 cases of errors in numbered params in {{quote-book}} and {{quote-journal}} alone. These are just those that a script of mine was able to catch by looking for places where a numbered param and its corresponding named param both exist. This is out of 13,000 or so total uses. Benwing2 (talk) 04:59, 10 August 2023 (UTC)Reply

Updates on fa-IPA and transliterations

Latest comment: 1 year ago4 comments2 people in discussion

Edit: I shortened this because it was too much reading.

Since Saranamd asked you about adding phonetic transcriptions to {{fa-IPA}} a while ago, I think I could handle the basic character mapping. (finished) But I would still need your help exporting it.
I think you should consider this layout which uses {{fa-IPA}} to generate transliterations. Which would merge your work on fa-IPA and the transliteration stuff and, as entering non-standard characters would not be possible, solve issues with inconsistent transliterations.
- for the "phonetic Persian" spelling, if you think it's too much we can cut it. I could make a transliteration module for it, but I can't export it. So it's entirely up to you.

سَمِیر | sameer (^{مشارکت‌ها} • ^{با مرا گپ بزن}) 22:09, 5 August 2023 (UTC)Reply

@Sameerhameedy Hi, sorry for not responding already, I will respond tomorrow as it's my bedtime. Benwing2 (talk) 08:21, 6 August 2023 (UTC)Reply

It's fine. I'll try to do as much as I can and ask you about the rest later. سَمِیر | sameer (^{مشارکت‌ها} • ^{با مرا گپ بزن}) 00:08, 9 August 2023 (UTC)Reply

Hi, Ben. I managed to complete all of the changes to fa-IPA that Saranamd and I requested from you. Whenever you finish your current projects and have free time, could you fix the export of Classical Persian so it exports as phonetic // rather than phonemic []? Thank you, سَمِیر | sameer (^{مشارکت‌ها} • ^{با مرا گپ بزن}) 05:51, 22 August 2023 (UTC)Reply

<syntaxhighlight> tag

Latest comment: 1 year ago15 comments3 people in discussion

Hi, why did you replace <syntaxhighlight> tags with <pre>? Wikitext with highlighted syntax is much easier to comprehend. JWBTH (talk) 02:53, 6 August 2023 (UTC)Reply

@JWBTH It didn't seem to me to matter much and it was a pain in the ass to maintain the syntax highlighting tags when changing all the text around. Benwing2 (talk) 05:09, 6 August 2023 (UTC)Reply

Well, it's just another tag with an extra attribute lang="wikitext". As for "a pain in the ass", Wiktionary could make use of a template like w:Template:Demo and w:Template:Nowiki template demo. That way, what we write as

 <pre># ({{plural of|eo|kiu}}) [[which]] {{gloss|relative}}
#: {{ux|eo|La elementoj '''kiuj''' troviĝis sur piratflagoj estis skeletkapo, la simbolo por la morto, skeleto indikis turmentan morton, sablohorloĝo indikis ke la tempo venis, sanganta koro malrapida kaj dolora morto kaj ponardo aŭ maĉeto signifis impulso por batali.
|The elements '''which''' were found on pirate flags were: skull, the symbol for death, skeleton indicated torturous death, hourglass indicated that the time came, bleeding heart &rarr; slow and painful death and dagger or machete signified impulse for battle.
|ref=<sup>[//eo.wikipedia.org/wiki/Jolly_Roger]</sup>}}
</pre> gives

# ({{plural of|eo|kiu}}) [[which]] {{gloss|relative}}
#: {{ux|eo|La elementoj '''kiuj''' troviĝis sur piratflagoj estis skeletkapo, la simbolo por la morto, skeleto indikis turmentan morton, sablohorloĝo indikis ke la tempo venis, sanganta koro malrapida kaj dolora morto kaj ponardo aŭ maĉeto signifis impulso por batali.
|The elements '''which''' were found on pirate flags were: skull, the symbol for death, skeleton indicated torturous death, hourglass indicated that the time came, bleeding heart &rarr; slow and painful death and dagger or machete signified impulse for battle.
|ref=<sup>[//eo.wikipedia.org/wiki/Jolly_Roger]</sup>}}

could be expressed two times shorter:

{{demo|sep=gives|<nowiki># ({{plural of|eo|kiu}}) [[which]] {{gloss|relative}}
#: {{ux|eo|La elementoj '''kiuj''' troviĝis sur piratflagoj estis skeletkapo, la simbolo por la morto, skeleto indikis turmentan morton, sablohorloĝo indikis ke la tempo venis, sanganta koro malrapida kaj dolora morto kaj ponardo aŭ maĉeto signifis impulso por batali.
|The elements '''which''' were found on pirate flags were: skull, the symbol for death, skeleton indicated torturous death, hourglass indicated that the time came, bleeding heart &rarr; slow and painful death and dagger or machete signified impulse for battle.
|ref=<sup>[//eo.wikipedia.org/wiki/Jolly_Roger]</sup>}}</nowiki>}}

The template will produce the code snippet and its rendering itself. JWBTH (talk) 05:21, 6 August 2023 (UTC)Reply

@JWBTH We already have {{temp demo}} for this purpose but as the documentation for the Wikipedia equivalents show, you have to wrap the example template code in <nowiki>...</nowiki>, which adds a lot of verbiage. I'd rather not have to type all that; you're welcome to fix the documentation to use syntax highlighting tags if you want (but please do NOT import any templates from Wikipedia as we already have {{temp demo}} for this purpose). Benwing2 (talk) 06:24, 6 August 2023 (UTC)Reply

Cool that you have {{temp demo}}. This template has several flaws though:

It can only show the output of a template, not of generic code. So, if there is some code around the template code (like in Template:ux/documentation, we can't use it.
It renders the markup (like links, bolding, and other templates) inside the code snippet and messes up the order of parameters. For example,
```
{{temp demo|quote-book|fr|year=1973|author={{w|Claude Simon}}|title=Tryptique|publisher=Éditions de Minuit|page=12|passage=Les sons de la cloche '''égrenant''' les quarts, les demies et les heures {{...}}|translation=The sounds of the clock '''rattling off''' the quarters, the halves and the hours {{...}}}}
```
shows
{{quote-book|fr|author=Claude Simon|page=12|passage=Les sons de la cloche égrenant les quarts, les demies et les heures […] |publisher=Éditions de Minuit|title=Tryptique|translation=The sounds of the clock rattling off the quarters, the halves and the hours […] |year=1973}} ⇒
1973, Claude Simon, Tryptique, Éditions de Minuit, page 12:
Les sons de la cloche égrenant les quarts, les demies et les heures […]
The sounds of the clock rattling off the quarters, the halves and the hours […]
And no markup highlighting, of course.

Several years ago, I and other people have developed an analogue of {{temp demo}}, w:ru:Template:Example. It allows to add prefixes and postfixes, alleviating the problem 1 (but no solving it), and allows to use {{=}} instead of = to keep the parameter order. This is inferior to w:Template:Demo in terms of the overall capabilities, but still much better than {{temp demo}}.
So, I would still advise to import w:Template:Demo or w:Template:Nowiki template demo as the most straightforward solution that lacks the issues that other solutions have. JWBTH (talk) 00:22, 7 August 2023 (UTC)Reply

@JWBTH The problem is, unless you know Lua well and also know the Wiktionary modules well, you'll end up importing a bunch of crappy dependencies that the code depends on in Wikipedia but which are duplicative of code that already exists here. That's why I'm opposed to importing stuff from Wikipedia; we end up with all this extra garbage that we then have to maintain. So I would only tolerate importing this if you rewrite the dependency invocations to use the equivalents here. People (probably including you) have done this in the past, and I've had to spend a bunch of time cleaning up after them, which I don't want to do. Benwing2 (talk) 00:46, 7 August 2023 (UTC)Reply

To me, as an interface admin at Russian Wikipedia, this is a familiar concern. We've been cleaning up our codebase for a long time, but in many cases we dropped our solutions in favor of English Wikipedia's and other, because, as a rule, they are better maintained, if only because there's more people there to maintain. The fact that Wikimedia projects are so fragmented and unsynchronized is actually a huge problem that eats up a lot of people's effort, and measures have been suggested and implemented to address this. See mw:Multilingual Templates and Modules, for example.
The fact that people can't use templates that they've got used to in other Wikimedia projects (even if with local specifics) isn't contributing to their user experience either. First of all, I'm talking about utility templates that have no project-related specifics, like {{uses lua}}.
That said, I have no problem with integrating imported modules/templates with Wiktionary's codebase as long as this codebase is well-supported or has some unique features of Wiktionary, so that it makes sense to support it, and not just a poorly supported duplicate, often with extremely limited functionality.
> probably including you
I've only got involved in English Wiktionary this year, so it's unlikely. JWBTH (talk) 02:21, 7 August 2023 (UTC)Reply

@JWBTH My apologies, several people have been copying modules from Wikipedia, I can't keep track of them all. I would say the situation with Wiktionary is not like the Russian Wikipedia. In many ways Wiktionary is fundamentally different from Wikipedia since it's much more templatized. This means many things have to be done in a fundamentally different way, which is why the infrastructure is different. If you are referring to my rename of the {{Lua}} template to {{uses lua}}, this was because we already have a template called {{lua}} and in general Wiktionary templates don't begin with a capital letter. (Wiktionary has been around long before Lua came on the scene, and I suspect the use of an initial capital in templates on Wikipedia postdates Wiktionary's creation.) From what I've seen, many of Wiktionary's general modules do things better than English Wikipedia due to the need to support the more templatized wikicode. Benwing2 (talk) 03:39, 7 August 2023 (UTC)Reply

> If you are referring to my rename of the {{Lua}} template to {{uses lua}}
No, that rename is OK. (But I thought using lowercase letters for templates has to do primarily with the case-sensitiveness of Wiktionary, so it shouldn't matter much which case are the letters as long as their usage is consistent, and "Lua" is a proper name, so it could make certain sense? But of course having {{Lua}} – {{lua}} collision is undesirable.)
> From what I've seen, many of Wiktionary's general modules do things better than English Wikipedia due to the need to support the more templatized wikicode.
I've already noticed this, and was pleasantly surprised that Wiktionary has such a developed module architecture. But still, there are things that Wikipedia modules would do better. Speaking of which, Wikipedia for one has a great architecture for template testing decribed at w:Wikipedia:Template sandbox and test cases, and I haven't seen anything like that here. Having such an architecture could really come in handy for template/module development, especially given that Wiktionary is so much templatized, as you say. JWBTH (talk) 09:05, 7 August 2023 (UTC)Reply

@JWBTH Almost all templates use lowercase initial letters, so for consistency we should do the same here. As for template sandboxes and testing, yeah this would be good to have; I test by putting a copy of the relevant modules in my userspace, but maybe there's a better way. I know User:Erutuon uses some special functionality for this, but I don't know how it works. Benwing2 (talk) 01:40, 8 August 2023 (UTC)Reply

So, after all, why don't we import w:Module:Demo? Its only dependency is Module:Yesno, and Wiktionary already has that. I like working with template documentations, and I'm not a fan of copypasting stuff. JWBTH (talk) 09:18, 9 August 2023 (UTC)Reply

@JWBTH Module:Yesno is an example of a crappy copied module. Whoever copied it was unaware of Module:yesno, which already existed. You can copy w:Module:Demo if you want but please rename it to use a lowercase letter Module:demo, delete the p.module function (which is used to invoke other modules that don't exist in Wiktionary), make sure all non-exported functions are local and fix it to use Module:yesno. Benwing2 (talk) 04:48, 10 August 2023 (UTC)Reply

Also, the invoking template should be named {{demo}} with a lowercase initial letter. Benwing2 (talk) 04:50, 10 August 2023 (UTC)Reply

OK.
> which is used to invoke other modules that don't exist in Wiktionary
Nah, it's actually used to reduce server load to show examples of module usage.
> Whoever copied it was unaware of Module:yesno, which already existed.
Could a filter for mw:Extension:AbuseFilter be created to prevent this from happening? It could at least warn users that the first lowercase letter is the convention. It could also check for the existence of the page with the first lowercase letter and direct the editor there if it exists.
The problem with Module:Yesno is that it is currently specified as the interlanguage link for w:Module:Yesno (it was even purposefully changed from Module:yesno, see the edit; @Kwamikagami hi there, we believe your action was wrong, see above). Unless there is some meaningful difference in the behavior of Module:yesno and Module:Yesno (I see only the treatment of t and f values which can easily be added to Module:yesno without plausibly breaking anything), I believe they could be merged. JWBTH (talk) 09:38, 10 August 2023 (UTC)Reply

Yes, I would think the two should be merged. And yes, if there were a warning that the opposite capitalization already exists, that might help. (Doesn't the mainspace warning work in template space?)

Currently our 'yesno' template has the comment "It works similarly to the template {yesno}," so that should presumably be removed.

14 other wikt's have duplicate templates. These and the WD items should probably also be merged. kwami (talk) 16:14, 10 August 2023 (UTC)Reply

Blocked from logging into the English Wikipedia

Latest comment: 1 year ago7 comments4 people in discussion

Hello Benwing. I hope I don't trouble you by asking for your help: I just tried logging into the English Wikipedia using this username (0DF) and my password, but it won't let me, seemingly because I'm subject to an IP-range block (instituted by w:User:Yamaguchi先生 and running until 2025). Any idea what I can do to fix this, please? 0DF (talk) 01:00, 8 August 2023 (UTC)Reply

@0DF I thought IP range blocks don't apply to accounts, only to anonymous IP's, but I may be wrong; maybe it depends on how the block was instituted. I suspect User:Chuck Entz would have a better idea, although neither of us can do anything about Wikipedia; you'd presumably have to contact an admin there (although I don't know how you'd do that if you are blocked). Benwing2 (talk) 01:10, 8 August 2023 (UTC)Reply

To the last question, I think the 'intended' route is to e-mail the admin who implemented the block. (If the block also disables sending e-mail, I guess you might try pinging—on your talk page on Wiktionary—the admin who implemented the block, and/or pinging someone like Thryduulf who's an admin and active user on both sites and could pass your situation along.) I don't actually spot where Yamaguchi has implemented any long blocks recently, though. (Are you being hit by the range block on 2a02:c7c::/30?) - -sche (discuss) 01:32, 8 August 2023 (UTC)Reply

The block menu has an option to apply the block to logged-in users, which I almost never use except on very short-term blocks. The fact that this came to light during a login attempt might indicate it has something to do with account creation being blocked, in which case it would only apply to the first visit to Wikipedia with the account. As for what to do: I believe any Wikipedia admin can make an account IP-block exempt locally. I don't think anyone at Wikipedia would object, because it only applies while logged in to that one account and doesn't stop them from blocking the account itself in the event of any abuse. Chuck Entz (talk) 05:11, 8 August 2023 (UTC)Reply

Thank you all for your responses.
@Benwing2: Yes, I tried and failed to make contact with Yamaguchi and others on the English Wikipedia (via e-mail and automated forms), but without success.
@-sche: Yes, it's this range-block that's affecting me. Per your advice: @Yamaguchi先生, Thryduulf, could you help me with this issue, please?
@Chuck Entz: I think you're right. I was able to log into the German Wikipedia without issue. If I could only log into the English Wikipedia just once, I'm sure the block would no longer cause me problems.
— 0DF (talk) 15:52, 8 August 2023 (UTC)Reply

@0DF I see there have been complaints about this block on Yamaguchi's talk page. I posted about the effect it's having on you; hopefully they will respond. Benwing2 (talk) 19:39, 8 August 2023 (UTC)Reply

@Benwing2: I've just now been able to log in to the English Wikipedia. Thank you very much for your intervention. 0DF (talk) 23:38, 8 August 2023 (UTC)Reply

WingerBot changing `{{bg-phrase}}` to `{{head|bg|phrase}}`

Latest comment: 1 year ago9 comments2 people in discussion

Hi,

I've just noticed you deleted {{bg-phrase}} - what was the issue with it? Should the following code from Module:bg-headword be removed too?

pos_functions["phrases"] = function(postype, def, args, data)
	local params = {
		[1] = {required = true, list = "head", default = def},
		["id"] = {},
	}

	local args = require("Module:parameters").process(args, params)

	data.heads = args[1]
	data.id = args.id
end

In the future, it would be nice to get a heads-up - not to me necessarily, but to some Bulgarian editor. E.g. I was planning on adding more phrases to the Bulgarian phrasebook, which is how I noticed the template was gone.

Thanks,

Chernorizets (talk) 06:13, 9 August 2023 (UTC)Reply

@Chernorizets My apologies, I just rewrote Module:bg-headword to have various more features and support comparatives and superlatives in adjectives. In the process I noticed {{bg-phrase}}; the general thinking now is to avoid having templates like this that are a trivial wrapper around {{head}} (note for example we have no {{bg-interj}}, {{bg-con}} [for conjunctions] or the like). Here instead you can write {{head|bg|phrase|head=...}} or its shorter equivalent {{h|bg|phr|head=...}}. Most other Slavic (and non-Slavic) languages don't have a {{LANG-phrase}} template, preferring to use {{head}} directly, and for the ones that used to, I recently deleted them after converting the uses to use {{head}}. The logic behind this is that having all these extra templates (each of which invariably ends up working a little different from each other) adds up to a big maintenance headache in the aggregate. In the future I'll let you know if I'm going to make any such deletions. Benwing2 (talk) 06:18, 9 August 2023 (UTC)Reply

@Benwing2 do your changes affect how {{bg-adj}} and {{bg-adv}} work? We already had comparatives and superlatives for both adjectives and adverbs, and those two templates are widely used.

As for {{bg-phrase}}, the reasoning makes sense. It didn't seem to be doing much beyond {{head}}. Thanks for explaining. And just to be clear, you don't have to ping me specifically for template or module edits - any Bulgarian editor would do. I'm sure you're thorough, but IMO at least one other person should be clued in just to reduce the surprise factor.

Thanks,

Chernorizets (talk) 06:33, 9 August 2023 (UTC)Reply

@Chernorizets My changes affect headwords; {{bg-adj}} didn't support comparatives, while {{bg-adv}} did (and does). As for changes, see Template talk:bg-adecl where Kiril requested some changes, and I explained what changes I made. The most important one besides supporting comparatives for {{bg-adj}} is that {{bg-adv}} no longer defaults to displaying comparatives; instead you need to request them using |comp=+ or |2=+. The reason for this is that many people (esp. people who are more occasional contributors) won't expect that just writing e.g. {{bg-adv}} would generate a default comparative and you have to say {{bg-adv|HEAD|-}} to get no comparative, and would just go ahead and write {{bg-adv|HEAD}} by itself regardless of whether there's actually a comparative or superlative. So now, {{bg-adv}} by itself doesn't make any assumptions about comparatives; you have to request them explicitly as I mentioned above, or say {{bg-adv|-}} to specifically indicate that there's no comparative. Benwing2 (talk) 06:57, 9 August 2023 (UTC)Reply

@Benwing2 sorry, I guess I was thinking about {{bg-adecl}} but wrote {{bg-adj}}. Could either you or Kiril update the documentation of {{bg-adv}} and {{bg-adj}} to reflect the changes? Esp. for {{bg-adv}} since IIRC it used to show comparative & superlative by default.

Thanks for making improvements and the automated bot runs!

Chernorizets (talk) 07:16, 9 August 2023 (UTC)Reply

@Chernorizets Yup, I'll update the docs. Benwing2 (talk) 07:42, 9 August 2023 (UTC)Reply

@Benwing2 you may already be aware of this if you look at CAT:E, but your change inadvertently broke {{bg-part form}} by obsoleting its g parameter. This is evident even on the template's documentation page, and it affects ~1400 verb form articles. Chernorizets (talk) 07:44, 9 August 2023 (UTC)Reply

@Chernorizets Oops, thanks for letting me know. I thought participle forms didn't use the g= param and should have checked (my excuse is it's late and time for bed :) ...). I fixed this and am running a purge bot job on Category:Pages with module errors. Benwing2 (talk) 07:53, 9 August 2023 (UTC)Reply

@Benwing2 thanks for the quick turnaround! I'm on PST (WA) so I know what you mean :-) Chernorizets (talk) 08:10, 9 August 2023 (UTC)Reply

Reporting unwelcome edits that don't quite rise to the level of vandalism

Latest comment: 1 year ago2 comments2 people in discussion

Hi @Benwing2, is there a mechanism for doing that? RE: Special:Contributions/Djkcel - as of 21:06 on 8/12/2023, their last two edits are to remove portions of Bulgarian etymology sections that mention the PIE roots for the respective lemmas. At least according to their user page, they're not a Bulgarian speaker. I've asked about гъдулка (gǎdulka) on their talk page, and then I noticed a similar edit on свиня (svinja).

Thanks,

Chernorizets (talk) 01:02, 13 August 2023 (UTC)Reply

@Chernorizets Looks like this user has a long history of editing in languages they don't know and refusing to change. They have been blocked several times for this. Generally in cases like this you post on the user's talk page like you did, and if necessary bring it up in the Beer Parlour. It may be necessary to institute a longer block, depending on (e.g.) what the user says. Benwing2 (talk) 01:11, 13 August 2023 (UTC)Reply

Mixed numbered/named parameters

Latest comment: 1 year ago2 comments2 people in discussion

I've only spotted this once so far, but it looks like there was a small problem replacing numbered parameters in quote-* when the template also used some named parameters.

skimpflation, August 7, 2023

It fixed four templates that had numbered parameters only, but skipped two that had named month= in addition to numbered year.

Happy editing, Cnilep (talk) 01:55, 14 August 2023 (UTC)Reply

@Cnilep I haven't yet converted numbered params in {{quote-journal}} or {{quote-book}} (which are the only two templates still supporting numbered params). Benwing2 (talk) 01:56, 14 August 2023 (UTC)Reply

One more

Latest comment: 1 year ago3 comments2 people in discussion

bit: Etymology 2: Adjective. Chuck Entz (talk) 02:32, 14 August 2023 (UTC)Reply

@Chuck Entz If you see other such cases that you think I might miss, let me know. The quote templates are a mess because there are a zillion params and historically there hasn't been any param checking. I have a script to check for unrecognized params in the 522,000 or so quote-* templates; I'm down to about 400 cases of completely unrecognized params (from maybe 5,000 to begin with) but then there are all these other issues that appear when I start checking for duplicate aliased params and such. Benwing2 (talk) 03:27, 14 August 2023 (UTC)Reply

blanc-manger and kuiperoid (in case you're still up). Chuck Entz (talk) 04:45, 14 August 2023 (UTC)Reply

Brazilian nasalization of vowels before nasal consonants

Latest comment: 1 year ago19 comments8 people in discussion

The pronunciation module for Brazilian Portuguese by default makes all vowels before nasal consonants nasal as well, i.e. /ˈkɾẽ.mi/ instead of /ˈkɾe.mi/ for creme, and /ˈkɐ̃.nɐ/ instead of /ˈkɐ.nɐ/ for cana, but as a Brazilian I can confirm this is utterly wrong. I've never heard anyone pronounce those words in such way. What is that based on? - Munmula (talk) 10:53, 14 August 2023 (UTC)Reply

@Munmula It does this with stressed vowels but not unstressed vowels. What part of Brazil are you from? I have definitely heard stressed vowels pronounced nasal before a nasal consonant; this is why the vowel is raised/centered in cana and cama, otherwise why would this pronunciation happen? Although granted my experience with Brazilian Portuguese is mostly from Salvador and maybe this is a Northeast thing. I am basing this off of Wikipedia, which says this:

Another difference between Northern/Northeastern dialects and Southern/Southeastern ones is the pattern of nasalization of vowels before ⟨m⟩ and ⟨n⟩. In all dialects and all syllables, orthographic ⟨m⟩ or ⟨n⟩ followed by another consonant represents nasalization of the preceding vowel. But when the ⟨m⟩ or ⟨n⟩ is syllable-initial (i.e. followed by a vowel), it represents nasalization only of a preceding stressed vowel in the South and Southeast, as compared to nasalization of any vowel, regardless of stress, in the Northeast and North. A famous example of this distinction is the word banana, which a Northeasterner would pronounce [bɐ̃ˈnɐ̃nɐ], while a Southerner would pronounce [baˈnɐ̃nɐ].

This pronunciation of banana rings true in my ears but again my experience is Salvador in the late 1990's/early 2000's and maybe things have changed since then especially in the South/Southeast.? Benwing2 (talk) 19:31, 14 August 2023 (UTC)Reply

That passage is unsourced so I wouldn't give too much credibility to it. I'm from São Paulo city and by my experience the nasalization happens when the vowel is followed by M or N and then another consonant (as in the words banco, sombra, tinta) or by another vowel but in a different word (as in the phrases bem alto and vim a pé, which are pronounced as if they were written benhalto and vinhapé respectively). But I've never seen nasalization when it is a vowel, then M or N, and then another vowel in the same word, as in creme, which is pronounced /ˈkɾe.mi/ but not /ˈkɾẽ.mi/.

Maybe we should ask other Brazilian editors about their stance? Besides @Daniel Carrero and @Ungoliant MMDCCLXIV, both of which I already know from this wiki, I'm pinging some recently active pt-N users who are or might be Brazilian: @Cpt.Guapo, @Bezwzględny, @Baudelairesantos, @Capmo, @Gmestanley, @Holodwig21, @Jesielt, @Junglk, @LearningFromTheCradleToTheGrave, @OweOwnAwe, @Protegmatic, @Psi-Lord, @Vocênãosabeenemeu - Munmula (talk) 12:07, 15 August 2023 (UTC)Reply

I'm no expert in phonetics, but doesn't this sound nasalized? [8] Jesielt (talk) 12:58, 15 August 2023 (UTC)Reply

I feel like /ˈkɾẽ.mi/ would fit northeastern Brazilian Portuguese, but I don't think that anyone south of São Paulo speaks like that. I personally pronounce it /ˈkɾe.mi/. Maybe the last vowel isn't quite [i] either, but I don't know how to properly represent it.

Northeastern Brazilian Portuguese is known for their nasal vowels. It is also a region comprised of many different states, and a northeastern friend of mine told me that they have many different dialects, so I wouldn't be able to give you guys more details about it, as I don't live in that region. That's what I know. LearningFromTheCradleToTheGrave (talk) 13:15, 15 August 2023 (UTC)Reply

Despite being from the São Paulo State, I also nasalise the first sylable of banana, as well as cama and cana. But not creme, so I guess it only happens with the vowel A. Capmo (talk) 15:28, 15 August 2023 (UTC)Reply

I unfortunately do not know much from the situation since I’m not a speaker of Brazilian Portuguese. I know that Brazilian Portuguese has dialects that differ in pronunciation; however, my knowledge of the language is not as extensive. From little that I know, I haven’t heard any speaker of Brazilian Portuguese say creme and cana with nasalized vowels. 𐌷𐌻𐌿𐌳𐌰𐍅𐌹𐌲𐍃 𐌰𐌻𐌰𐍂𐌴𐌹𐌺𐌹𐌲𐌲𐍃 (talk) 16:58, 15 August 2023 (UTC)Reply

@Munmula OK let's see if we can gather some more info. We have the ability to generate multiple dialects and already do this in fact, but it's tricky because (a) there is insufficient info out there in many cases, (b) for the Northeast it seems there are a lot of different dialects so I'm not really sure how to handle them. Does anyone have any info on how the vowels are pronounced in creme, cama and banana in Rio? This is one of the dialects we specifically highlight. Benwing2 (talk) 19:41, 15 August 2023 (UTC)Reply

@Munmula BTW I take it from your statement about vim a pé being pronounced like vinhapé that both are pronounced /vĩj̃aˈpɛ/? This is approximately what we already generate with regards to written nh so I'm glad to get confirmation. Benwing2 (talk) 19:45, 15 August 2023 (UTC)Reply

Yes, that is the case. - Munmula (talk) 11:51, 18 August 2023 (UTC)Reply

I'm Brazilian, from São Paulo, and I do think that the nasalization of a vowel (especially of /e/, /i/, /o/ and /u/) in this case isn't necessarily phonemic. As for me, I would pronounce creme as /ˈkɾe.mi/; if there's any nasalization of /e/ in this case, at least in my accent, I wouldn't say it is that perceptible. I've just watched some videos of people pronouncing such words on [9] YouGlish, and it seems to me that /ɐ/ might be more often nasalized than other vowels in this case. It's also important to keep in mind that /ẽ, ĩ, õ, ũ/ are very often diphthongized ([ẽj̃, ĩj̃, õw̃, ̃uw̃]), which doesn't happen with /ɐ̃/; therefore, I think that the distinction of nasalization in the latter is less perceptible than in the former. (pente = [ˈpẽj̃.t͡ʃi], but pena = [ˈpe.nɐ]) OweOwnAwe (talk) 23:31, 15 August 2023 (UTC)Reply

Thanks for your comments. The issue of phonemicity is complicated by minimal pairs like caminha "small bed" with /ɐ̃/ vs. caminha "he/she walks" with /a/ (BTW how do you speakers from São Paulo pronounce these two words?), so I'd rather not get too hung up on whether the nasalization is phonemic; it's more important IMO (at first at least) to figure out where it exists at all. Benwing2 (talk) 23:44, 15 August 2023 (UTC)Reply

Most people would pronounce them [kɐˈmĩ.j̃ɐ] and [kaˈmĩ.̃jɐ], respectively. Pronouncing the second word as [kɐˈmĩ.j̃ɐ] is also possible, although I think it is less common in São Paulo. OweOwnAwe (talk) 01:00, 16 August 2023 (UTC)Reply

My two cents: the nasalization of vowels that precede nasal consonants is said to have been an African influence (Africanismos no português do Brasil, Maria do Socorro Silva de Aragão, Rev. de Letras - Vol. 30 - 1/4 - jan. 2010/dez. 2011, Universidade Federal do Ceará). This maybe explains the pronunciation in Salvador.

I was born in Bahia (Recôncavo region) and lived in Salvador for 8 years and I can confirm that this type of nasalization is very common. The same cannot be said of the State of Santa Catarina, where now I live. 2804:1790:83AC:D200:2C7E:5128:78AC:55FA 06:42, 24 August 2023 (UTC)Reply

Thank you! I will be fixing this module up soon, probably according to what User:Munmula said below. Benwing2 (talk) 07:12, 24 August 2023 (UTC)Reply

Thanks for your attention. If I'm not too late or off-topic, there are also other changes I would like to propose to the module to represent Brazilian pronunciations:

1 - The inclusion of the alveolar trill (/r/) for initial and double Rs (as in rádio and arroz) as an alternative or old-fashioned pronunciation in Brazil. I don't know about Portugal and other countries, but in Brazil some older people, or even younger people imitating a more "vintage" style, may still pronounce words that way, and it can be found in old videos. Take for example this subtitled speech of president Getúlio Vargas in 1951. You may find that pronunciation in stuff from the first half of the 20th century until about 1970.

2 - A similar old-fashioned pronunciation of Ls in the end of words as /l/ instead of /w/, such as /na.si.oˈnal/ instead of /na.si.oˈnaw/ for the word nacional. This can also be heard in the video linked above or in virtually any other old stuff.

3 - The occasional pronunciation of que in some words as /kwe/ instead of /ke/. Former president Jair Bolsonaro, for example, is known for pronouncing the word questão as /kwesˈtɐ̃w̃/ instead of the more common /kesˈtɐ̃w̃/. This is also reminiscent of a dated pronunciation of "que".

4 - The Caipira dialect and the accent/dialect of Northeastern Brazil, two of the most distinguishable in the country, seem to be missing from the module, being present only in some words. People from the Northeast would pronounce disto as /ˈdiʃ.tu/, just like in European Portuguese, while Caipiras would pronounce words like amor and porta with a retroflex R (/ɹ/), similar to how dark is pronounced in American English. - Munmula (talk) 16:18, 24 August 2023 (UTC)Reply

Thank you everyone. Judging by what was said in this discussion, it seems to me that keeping the nasalization in As but removing it from other vowels would be a reasonable change. How does it sound to you guys? Perhaps we would also include phonetic Brazilian pronunciations of "en" and "in" as /ẽj̃/ and /ĩj̃/ respectively. - Munmula (talk) 11:51, 18 August 2023 (UTC)Reply

@Munmula What is the rule for pronunciation of en and in? Currently the module generates /ẽj̃/ for word-final -em but not word-internally. Should it also happen word-internally? Benwing2 (talk) 18:05, 18 August 2023 (UTC)Reply

I think /ẽj̃/ and /ĩj̃/ are more appropriate as phonetic transcriptions, but we should rather check a reliable source before changing it. - Munmula (talk) 22:02, 18 August 2023 (UTC)Reply

Section

Latest comment: 1 year ago15 comments4 people in discussion

Great work fixing all these quotations! diff raises a problem I have run into before. It shunts the material to the end of the cite, and I think what that means is that "Section" means something like "Sports Section" or "Local News Section", not 'the parts within the article'. No idea what a "solution" would be here, just noting that this is strange. --Geographyinitiative (talk) 11:46, 14 August 2023 (UTC)Reply

@Sgconlaw Can you help with this? I have also seen this before. The |section= param puts its value after the volume and especially the publisher, when it seems it ought to go before them. What does the APA say about this? Benwing2 (talk) 19:23, 14 August 2023 (UTC)Reply

The intent of |section= was to have a catch-all parameter that could be used to indicate various minor subdivisions of a work, such as parts, paragraphs, stanzas, etc. Where a journal article or chapter of a book has been divided into lettered or numbered sections, I have sometimes indicated such a section like this: |section=section 1.3 (Name of Section). However, if this type of section is not lettered or numbered, and I've felt it is useful to indicate the section, I have sometimes added it to the article title or book chapter name like this: "Article Title. [Section Name.]" I'm afraid I'm not familiar with the APA Style, so I don't know if what I have done is in accordance with that guide. — Sgconlaw (talk) 20:06, 14 August 2023 (UTC)Reply

@Sgconlaw OK thanks, I think I'm gonna move it after the title rather than put it at the end. Benwing2 (talk) 21:02, 14 August 2023 (UTC)Reply

FWIW, I use the |section= parameter in the same way @Sgconlaw describes. 0DF (talk) 22:52, 15 August 2023 (UTC)Reply

Well let me give you another example of how there's some kind of problem in this area: diff. Here, the section is Taiwan News (section of the newspaper). And the title is the title at the top of the page. The journal is the newspaper. But! There's a subsection of the article "Temperatures to rise" which is pretty darn relevant to the quotation, but yet I just omit it because there's no parameter for it. And then: on diff, I would add "section=Tung lay-up plans", but it would be strange to see that at the end of the citation (to my eye). --Geographyinitiative (talk) 10:53, 18 August 2023 (UTC)Reply

@Geographyinitiative: Do you have a different presentation in mind? 0DF (talk) 11:23, 18 August 2023 (UTC)Reply

Maybe there should be a "subheading" parameter for quote-journal? Or something like that? Idk. --Geographyinitiative (talk) 11:24, 18 August 2023 (UTC)Reply

@Geographyinitiative: That would be good. And maybe |subheadingn= for further subdivisions, if that were considered desirable. Personally, I'd also appreciate |footnote= and |endnote= parameters for quoting from foot- and endnotes. 0DF (talk) 12:53, 18 August 2023 (UTC)Reply

@0DF You can quote from footnotes and endnotes using something like |page=15 + |line_plain=footnote 3. Let me see what I can do about journal article sections; User:Sgconlaw do you have thoughts? Benwing2 (talk) 18:04, 18 August 2023 (UTC)Reply

OK, sure; I'll use |line_plain= for foot- and endnotes in future. 0DF (talk) 00:54, 19 August 2023 (UTC)Reply

@Benwing2: I have been indicating footnote numbers like this: |section=footnote †. As for journal article sections, see my comment above on 14 August 2023. — Sgconlaw (talk) 20:59, 18 August 2023 (UTC)Reply

@Sgconlaw The problem is you're overloading |section= to indicate lots of different things, which makes proper formatting very difficult. We need to use different parameters for semantically different information, and not do things like bundle the section name or number with the article title. That's why I recommend using |line_plain=, because a footnote is much like a line and not much like a chapter. In my view, |section= is for subdivisions of a work similar to chapters, if "chapter" doesn't make sense, e.g. "act II, scene 2"; this is what the documentation specifically says. For a journal article, |section= should be for sections of the article. If we need to indicate information such as sections of the overall collection (i.e. journal; for books the collection and work are the same), we need a separate param for that. Benwing2 (talk) 21:54, 18 August 2023 (UTC)Reply

Related: [10] I plan to move Chinese author names like this; of course, both this version and my original version are abuses/work-arounds. Let me know if you have any solutions on this for me. --Geographyinitiative (talk) 22:11, 16 August 2023 (UTC)Reply

@Geographyinitiative You should use the inline modifier support that I recently added, as shown in the example. It doesn't work with {{lw}} because of the language tagging that {{lw}} adds; I'll come up with a workaround. Benwing2 (talk) 00:21, 17 August 2023 (UTC)Reply

Polish headword adjustments

Latest comment: 1 year ago8 comments2 people in discussion

Would it be possible to 1) add the relational adjective parameter to the Polish headword noun and 2) bot convert pages with the relational adjectives in the derived/related section to the headword? and 3) add "indeclinable" as a categorizing parameter to {{pl-adj}}? Vininn126 (talk) 22:35, 14 August 2023 (UTC)Reply

Also perhaps we can do this for other Slavic languages, at least Czech, and maybe we can also remove the declension from the headword like we were discussing in that thread? Vininn126 (talk) 09:38, 15 August 2023 (UTC)Reply

@Vininn126 Yup, all in good time :) ... I can do (1) and (3) easily and have a script where I did (2) for Russian so I might be able to leverage this. For removing the declension from the headword I'd like to keep the declensions in irregular cases, so this will take a bit of work. Benwing2 (talk) 19:38, 15 August 2023 (UTC)Reply

Thanks a ton! Vininn126 (talk) 19:41, 15 August 2023 (UTC)Reply

I think the way (2) might have to be handled is to check etylines and also deflines of adjectives... But I'm afraid a lot are missing etylines... Vininn126 (talk) 22:03, 15 August 2023 (UTC)Reply

@Vininn126 I took a look at the Russian script I wrote. It attempts to add both relational adjectives and diminutives to noun headwords. For relational adjectives it looks at etymologies and assumes that if an adjective is defined as noun + suffix and has a relational label, it's a relational adjective of that noun; and for diminutives it looks for {{diminutive of}} and similar. But rather than just automatically convert such cases, it has a verification step: it outputs a list of potential cases (along with the definitions of the nouns and potential relational adjectives), which are manually edited to remove the bogus ones, and then in a separate pass it applies them, which involves adding them to the noun headword and removing the added term from ==Derived terms== or ==Related terms== of the base noun. Note that I also have a separate script for Russian that attempts to add etymologies for relational adjectives and such, essentially by trying, for each noun, to construct possible relational adjectives using certain suffixes (-ный, -ной, -ский, -ской, -овый, -евый, -ёвый, etc.) and palatalization rules, and seeing if such an adjective exists. If so, it's output for manual review (again with the definitions of noun and adjective included), and the etymologies that remain after editing are added in a separate step. (The script also handles adjective-noun derivations using -ство, verb-noun and noun-noun derivations using -ник, etc.) I think I ran the second script before the first one, so that the added etymologies could feed the relational adjective "snarfing". Unfortunately all of this takes time. You might be able to help me in the manual review steps; essentially I would give you a file containing possible cases and you would remove the lines that are bogus and leave the remainder. Benwing2 (talk) 00:04, 16 August 2023 (UTC)Reply

Thanks for the explanation. Of course I'd be willing to help with the manual steps. Vininn126 (talk) 08:55, 16 August 2023 (UTC)Reply

Turns out the parameter already exists, so it's just a matter of bot converting. Vininn126 (talk) 16:48, 17 August 2023 (UTC)Reply

Next WingerBot open source commit

Latest comment: 1 year ago4 comments2 people in discussion

The "out of date" part of User:WingerBot's "Source code (currently out of date) is available on github" has reached like half a decade now. I'm curious about what changed in the meantime. When will the next open source publication be? Daniel.z.tg (talk) 20:13, 19 August 2023 (UTC)Reply

@Daniel.z.tg See [11]. This is the latest code. It needs some cleanup and splitting into separate directories as it's over the github limit for # of files in the top-level dir. If you're seriously interested in looking at the code I'll put some time in to clean it up. Benwing2 (talk) 20:24, 20 August 2023 (UTC)Reply

Thank you. I didn't know the code was already available and pushed frequently. No, I've written ML code, so I don't need any cleanup to read it. I'm happy in the spirit of open source to just look at what's going on in the community around me. Wow, this new repository is 300 kLOC instead of the 30k in the old one. Thank you for sharing your knowledge and providing a comprehensive example of a Wiktionary bot! Daniel.z.tg (talk) 20:38, 20 August 2023 (UTC)Reply

@Daniel.z.tg You're welcome. Yeah there is quite a lot of code I've written over the years. Some of the scripts have very specific purposes and are more or less one-offs that I keep around because occasionally they can be recycled for some other purpose. Some of the scripts are very general and I use them all the time (e.g. find_regex.py lets you download sets of pages according to all sorts of criteria; often I download a set of pages, edit the resulting file using vim, and push the changes using push_find_regex_changes.py, which facilitates doing quick semi-manual changes across a large set of pages). It was only a few months ago that I finally got around to upgrading to Python 3; previously the code was using Python 2 and an old version of pywikibot. I was procrastinating because I thought it would be painful given all the Unicode stuff in the code, but it turns out it only took a few hours. Benwing2 (talk) 20:46, 20 August 2023 (UTC)Reply

Overemphasis of Other Language?

Latest comment: 1 year ago12 comments3 people in discussion

Hey, thanks for your work on quotations and thank you for reaching out to me. When I made my recent edits, I was definitely thinking about doing the in-line like you were talking about here diff. But I thought it was an overemphasis of Chinese characters to put them first and the English in brackets, since this book is totally in English except for the last page of the book. (1) Tech Question: Is there a new way to put English first for that book, with Chinese characters second? (I tried to do an in-line that would have this effect based on your model at Quwo, but I couldn't immediately figure out what I should do after a few failed attempts. I don't know much about coding.) (2) Policy Question: Whether or not that's possible under the new ways, which is more appropriate for this book: English first with Chinese characters in brackets, or the opposite? I'm not in favor of one over the other, but I think that since the book is almost entirely in English, I didn't want to misrepresent what the reader would see if they went to find the source. The next question might be: what's the value of the Chinese characters? I would say that it's very valuable info for a bilingual reader to see and potentially explore if they are interested in this subject. Also, I want Chinese-speaking people who are looking for this book in an online search to be able to see the book on Wiktionary- they could thereby learn what the English words they are looking at refer to or similar. I am very happy that someone is taking an interest in cleaning up quotations and making things nice and professional. (I had done it for some quotations one by one.) There are all kinds of questions and ideas I have for quotations, but I don't want to overload you. （For instance: here I know the Chinese characters behind Shaodian and Wuying, and I want to show that to Wiktionary readers, but I don't know how. See also [12]. Also: Check out "et al." on this page: Citations:Qibin. Of course I avoid et al., but sometimes I don't have access to the names of other authors. I guess those should be author2=?) Thanks again! --Geographyinitiative (talk) 11:23, 20 August 2023 (UTC) (Modified)Reply

@Geographyinitiative I would like to eliminate |author2= entirely; instead you should use semicolon-separated values in |author=. I will fix all the issues regarding "et al.", some others have also pointed them out and there's no reason to avoid it. As for whether to put the English or Chinese first, that's a good question. It isn't currently really possible to put the Chinese in brackets as the code assumes that transliterations and translations are in Latin characters. I can add support for this using a new inline modifier but before doing that we should start a BP discussion to see what people think should be done in this circumstance; I am genuinely not sure. Benwing2 (talk) 18:58, 20 August 2023 (UTC)Reply

Just do your best, I am on board for your changes, and I'm glad these things are getting cleaned-up. If you have to drop some Chinese characters or delete some work I did, no problem- I always worked on the assumption that a lot of things I was doing were probably going to be changed/deleted/etc. I will adapt to the changes as fast as I can. --Geographyinitiative (talk) 19:25, 20 August 2023 (UTC)Reply

@Geographyinitiative All good. I did post to the BP and we'll see what the outcome is. Benwing2 (talk) 19:36, 20 August 2023 (UTC)Reply

@Benwing2: Re: "use the format with inline modifiers and a single author= arg instead of first=/last="

Are you going to coordinate this removal with Wikipedia? I sometimes copy and paste references between Wikipedia and Wiktionary, and I would like that to remain functional. Daniel.z.tg (talk) 20:15, 20 August 2023 (UTC)Reply

@Daniel.z.tg It's not realistic to do that. Wikipedia requires the use of last=/first= in all circumstances because they want to display the name in LAST, FIRST format, while we display the name in FIRST LAST format. In general we have a lot of differences in our {{quote-*}} templates from Wikipedia's {{cite *}} templates; they serve different purposes and it's not realistic to expect us to be forced to conform to however Wikipedia does things. I have no immediate plans to deprecate the first=/last= params but I reserve the right to deviate from Wikipedia's structure. In general, blindly copy/pasting between Wikipedia and Wiktionary is a bad idea. Benwing2 (talk) 20:23, 20 August 2023 (UTC)Reply

If our templates already diverged from Wikipedia, then my original point is moot. I just noticed the cite button was added to Wiktionary's visual editor (or was I blind the whole time). It nicely displays the author field and is visual, which is one of the things I want. Daniel.z.tg (talk) 20:41, 20 August 2023 (UTC)Reply

@Daniel.z.tg Hmm. I don't use the visual editor so I'm not sure how this cite button works. If there are problems with it, let me know and I'll see if it can be fixed. I wonder if it uses the TemplateData stuff that is stuck at the bottom of some doc pages (which I hate, BTW; it is horribly designed and not automatable). Benwing2 (talk) 20:49, 20 August 2023 (UTC)Reply

@Daniel.z.tg Yes, a lot of things have diverged from Wikipedia because there are fundamental differences in how the two projects work, which is necessitated by their different goals. Benwing2 (talk) 20:50, 20 August 2023 (UTC)Reply

Here is the result of me filling in the visual form to recreate the example in your diff:

1989, 车慕奇, 丝绸之路今昔, →ISBN, page 293:

Passing through Qira County on our way, we were asked to stay by Wang Yijun, Director of the Office of the County Party Committee. He said he was an amateur archaeologist and an old acquaintance of Li Yuchun’s. In 1978 the two men had gone together to the desert in northern Qira County to survey a buried ancient city.

It's good that it's automatically a quotation, not a citation. The other fields are should show up if I select them in the left panel listview. The only thing I had to do is remove the <ref></ref> tags. Daniel.z.tg (talk) 20:58, 20 August 2023 (UTC)Reply

@Daniel.z.tg OK, the wikicode of that quotation looks fine to me. The only thing I could see being done is adding translations of the author and title using inline modifiers, something like this:

1989, 车慕奇 [Che Muqi], 丝绸之路今昔 [Silk Road, Past and Present], →ISBN, page 293:

Passing through Qira County on our way, we were asked to stay by Wang Yijun, Director of the Office of the County Party Committee. He said he was an amateur archaeologist and an old acquaintance of Li Yuchun’s. In 1978 the two men had gone together to the desert in northern Qira County to survey a buried ancient city.

Benwing2 (talk) 21:04, 20 August 2023 (UTC)Reply

Do check out diff if you have a chance. --Geographyinitiative (talk) 08:09, 2 September 2023 (UTC)Reply

Template:es-pr

Latest comment: 1 year ago9 comments7 people in discussion

Hi! There's an error in this with "gua" spellings. The /g/ should be silent, so guapo, guarra and Guatemala are like /wapo/, [warra] and /watemala/ and agua like /awa/. Medved Karol (talk) 07:24, 21 August 2023 (UTC)Reply

@Medved Karol I think it's more complex than that. Normatively, the /g/ at the beginning of a word and after /n/ should be [g]. In agua, normatively it's a very soft approximant, which we indicate by [ˈa.ɣ̞wa]; the same applies when guapo, guarra and Guatemala occur after a vowel or any consonant other than /n/. I think informally what you're describing is correct. Requesting comments from User:AdrianAbdulBaha User:AugPi User:MiguelX413 User:Rodrigo5260 User:Ser be etre shi User:Vivaelcelta who may be somewhat active and are native speakers. Benwing2 (talk) 07:45, 21 August 2023 (UTC)Reply

I only pronounce the g in these words when I want to sound pedantic (or occasionally after an utterance [and always after an /n/] in the case of these words starting with /gw/), when I speak naturally I barely pronounce them. Rodrigo5260 (talk) 12:10, 21 August 2023 (UTC)Reply

Well, my dialect (like much of the rest of Central America, and much of Colombia) uses the plosive versions of /b d g/ after /l ɾ/ too, e.g. El Salvador [el.sal.baˈðoɾ] besides the more widespread [-l.β-], so for me there are more instances where this "g" is pronounced as a plosive: el guaro [elˈgwa.ɾo] 'the booze', traer guaro [tɾaˈeɾ ˈgwa.ɾo] 'to bring booze'. And after a nasal or a pause it's [g] too of course: guaro [ˈgwa.ɾo], traen guaro [ˈtɾa.eŋ ˈgwa.ɾo] 'they bring booze'. And as Benwing2 said, I do generally "pronounce the g" intervocalically, it just happens to be with the approximant allophone: verse guapo [ˈbeɾ.se ˈɣwa.po]. I can say [ˈbeɾ.se ˈwa.po] but that already has a very informal connotation.--Ser be être 是^talk/stalk 18:38, 21 August 2023 (UTC)Reply

For the people with silent /g/, does this mean there are no minimal pairs between words with gu and words with hu? WOuld huaca and guaca be perfect homonyms? Does a faint /g/-like sound ever appear in words where it is not spelled? —Soap— 12:03, 23 August 2023 (UTC)Reply

I think w vs gu are allophonic in a dialect continuum in the Romance languages (see Guadal#Etymology for it changing across Spanish, Arabic, English, and French). W vs gu is probably mutually intelligible and has dialectal variation within Spanish, but please definitely do put the more common or standard pronunciation if you know it. Daniel.z.tg (talk) 23:07, 23 August 2023 (UTC)Reply

Hi there! Since you all are talking about es-pr and /ɡ/, let me mention that the Spanish pronunciation given by es-pr in the article for exactitud (and related words, e.g, exacto, exactamente) does not seem (to mine ear) to sound quite right; it says /eɡsaɡtiˈtud/, whereas if you check the corresponding page in es.wiktionary, it gives [ek.sak.tiˈtuð]. That would be the normative pronunciation in Spanish for it, whereas the pronunciation with the /ɡ/'s replacing the /k/'s sounds anglified. Also, you can check this through, say, https://translate.google.ca/?sl=es&tl=en&text=exactitud&op=translate and clicking on the Listen button. Or you can go to es.wikipedia's article for Exactitud and try a Mac computer's system voice in Spanish on its text. Also, /eɡsaɡtiˈtud/ might by theoretically less likely since /g/ is voiced whereas /s/ is unvoiced; /gz/ would seem more likely since they are both voiced, but /z/ does not belong to Spanish; and also /ks/ would seem more likely since they are both unvoiced. AugPi 03:05, 23 August 2023 (UTC)Reply

@Ser be etre shi Can you comment on this? Wikipedia is very specific that x is /gs/ pronounced [ɣs], and this is not the first citation I've seen that says the same thing, yet multiple people seem to think that [ks] is correct. Is this a Latin America vs. Spain or an old vs. new thing? Benwing2 (talk) 03:18, 23 August 2023 (UTC)Reply

we seem to only allow voiced stops in the coda when before another stop. check, for example, óptimo where even what is spelled as p is pronounced as a voiced approximant /β̞/. —Soap— 12:08, 23 August 2023 (UTC)Reply

q= and qq= in `{{quote-book}}`

Latest comment: 1 year ago4 comments2 people in discussion

Hi. These parameters are not working. Vahag (talk) 18:55, 21 August 2023 (UTC)Reply

@Vahagn Petrosyan They currently work as inline modifiers attached to authors, titles, etc. but not yet as parameters. The reason for this is I'm not sure where to put them. Suggestions? Should they go on the same line as the quotation, before and after respectively? Or on their own lines? Or somewhere else? Benwing2 (talk) 19:20, 21 August 2023 (UTC)Reply

I prefer on the same line as the quotation, after it, like in {{usex}}.

I intend to use q= for dialect labels. But maybe we should have a tag= instead, which will fetch data from label modules like in {{desc}}. Vahag (talk) 19:30, 21 August 2023 (UTC)Reply

@Vahagn Petrosyan OK sounds good, I can implement that. Benwing2 (talk) 19:31, 21 August 2023 (UTC)Reply

Template:quote-av season parameter not working

Latest comment: 1 year ago1 comment1 person in discussion

Hiya - see above. Could you please take a look? Theknightwho (talk) 20:33, 21 August 2023 (UTC)Reply

`[[Category:Sassarese terms derived from Classical Latin]]`

Latest comment: 1 year ago3 comments2 people in discussion

Hi. I've recently added the Sassarese entry abi, and I noticed the category Sassarese terms derived from Classical Latin has been deleted by you—if I understand correctly—on the grounds that it was empty, except for the {{autocat}} template. Do you think it would be a problem if I were to recreate it, so that the entry can link to it? Thanks in advance for your time. —— GianWiki (talk) 04:00, 23 August 2023 (UTC)Reply

There's no problem re-creating such categories. They get automatically deleted periodically when empty, that's all. Benwing2 (talk) 04:36, 23 August 2023 (UTC)Reply

Ok. Thank you very much for your time. —— GianWiki (talk) 02:08, 25 August 2023 (UTC)Reply

Congrats on your promotion ;-)

Latest comment: 1 year ago3 comments3 people in discussion

Pretty neat that it was unanimous, too. Good luck! Chernorizets (talk) 09:37, 24 August 2023 (UTC)Reply

Very cool! --Geographyinitiative (talk) 10:18, 24 August 2023 (UTC)Reply

@Chernorizets @Geographyinitiative Thank you both! Benwing2 (talk) 05:57, 25 August 2023 (UTC)Reply

Bug in modifiers in `{{quote-av}}`

Latest comment: 1 year ago3 comments2 people in discussion

Hi, it seems that the inline modifiers for |actor= and |role= in {{quote-av}} does not work the way it is intended, since they both plug into |section= of Module:quote, and only after that the inline modifiers are processed. See for example diff where this issue occurs. – Wpi (talk) 15:06, 25 August 2023 (UTC)Reply

@Wpi Yup this is high on my list of things to fix; several templates have this issue. Benwing2 (talk) 18:23, 25 August 2023 (UTC)Reply

@Wpi This should work now. Benwing2 (talk) 18:07, 27 August 2023 (UTC)Reply

`{{quote-book}}` numbered parameter rightward shift

Latest comment: 1 year ago1 comment1 person in discussion

The seventh and eighth params, that is, text and translation, have somehow become 8 and 9 despite what the docu says. See călugăriță for a demonstration. Everything lines up if I add an empty numbered parameter between page and text. ―⁠Biolongvistul (talk) 22:23, 27 August 2023 (UTC)Reply

Just so you know:

Latest comment: 1 year ago11 comments3 people in discussion

There are 48 of these: [13]. And some of them also have |journal=. Chuck Entz (talk) 02:04, 28 August 2023 (UTC)Reply

@Chuck Entz Thanks. These were all created by User:Vininn126 and appear to have a lot of issues. I am trying to fix them but I'm hindered by not being able to read Polish. Benwing2 (talk) 02:06, 28 August 2023 (UTC)Reply

See pong for an example of "=" in Google URLs being taken as delimiting parameters. Chuck Entz (talk) 02:08, 28 August 2023 (UTC)Reply

Actually, it's pipes in the URL that are to blame. Chuck Entz (talk) 02:12, 28 August 2023 (UTC)Reply

I'll take a look; sorry everyone! Vininn126 (talk) 08:42, 28 August 2023 (UTC)Reply

@Vininn126 I tried to clean them all up. Mostly they need (a) review of the authors, all of whom I converted to |editor= or |editors=, to make sure this is correct; (b) English translations of the titles and journal names; (c) review of the pings I made to you. Benwing2 (talk) 08:45, 28 August 2023 (UTC)Reply

@Benwing2 Are there any others I need to take a look at aside from Chucks list and the pings? Vininn126 (talk) 08:51, 28 August 2023 (UTC)Reply

Also thank you for doing what you can; it seems you managed to make a lot of accurate corrections. Vininn126 (talk) 08:56, 28 August 2023 (UTC)Reply

Finally, I deleted the unneeded tempalates from Chuck's list as well as one other template - any without a documentation is likely unneeded and was created in error. Please let me know what else I need to fix so that I may! Vininn126 (talk) 09:04, 28 August 2023 (UTC)Reply

@Benwing2 If it's not botable, please make a list of all templates that need updating. Vininn126 (talk) 17:50, 28 August 2023 (UTC)Reply

@Vininn126 I fixed all the ones in CAT:E. There may be others needing updating that aren't throwing errors; we'll see. Benwing2 (talk) 21:00, 28 August 2023 (UTC)Reply

quote-book

Latest comment: 1 year ago16 comments5 people in discussion

Is it necessary for normalisation parameter to be visible, or can be somehow hidden, and only used for the transliteration? نعم البدل (talk) 20:25, 29 August 2023 (UTC)Reply

@نعم البدل There isn't any current way to hide the normalization. I'm not sure this is a good idea to implement. Can you point me to some of the pages in question? It seems to me the version marked up with vowel diacritics should be visible one way or another, either in the normalization or the usex itself. Benwing2 (talk) 20:44, 29 August 2023 (UTC)Reply

@Benwing2: Example being لَن٘گْھݨا (laṉghṇā), and lemmas which previously used the quote-book-ur template. It just looks a bit weird and unorthodox. I'm not sure what the normalisation parameter is exactly meant to be used for, but I'm essentially looking for a parameter which can let me modify the transliteration, while being able to retain the source formatting, without manually having to enter the transliteration. نعم البدل (talk) 20:52, 29 August 2023 (UTC)Reply

@نعم البدل The purpose of the normalization param is to present a normalized version of text that may be written in a nonstandard form. Here it would seem reasonable to use to present vocalized text. If you don't want to do that, an alternative is just to call {{xlit}} yourself in the translit field on the vocalized text. This seems a very niche use case that you're striving for and I don't want to burden the code with an extra parameter just for this purpose. Benwing2 (talk) 20:57, 29 August 2023 (UTC)Reply

@Benwing2: Perhaps it is, it's just because vocalised Urdu (Punjabi/Sindhi etc.) just looks weird. I might just take your advice and use xlit, or just utilise the tr parameter. نعم البدل (talk) 21:01, 29 August 2023 (UTC)Reply

@نعم البدل FYI we are starting to present Persian text in vocalized form as well; cc User:Sameerhameedy, User:Atitarev. So maybe it isn't so weird after all. Benwing2 (talk) 21:03, 29 August 2023 (UTC)Reply

@Benwing2: Vocalised Urdu, when it comes to dictionaries, is fine. Vocalised Urdu in actual usage, or quotes, is quite unorthodox. At most you'll have one or maybe two diacritics on a word to clear ambiguity, not on every single word. In any case would it be possible for you to utilise one of your bots, and just convert the norm parameters on the Urdu lemmas with quotes, and replace it with the tr parameter, with the transliteration that |norm gives, or would it better for me to do it? نعم البدل (talk) 21:07, 29 August 2023 (UTC)Reply

Actually never mind. I forgot Urdu doesn't have a transliteration module, and quote-book-ur used to call Module:pa-Arab-translit, so I'm going to have to render the transliteration manually anyways, for the Urdu lemmas with the norm parameter. نعم البدل (talk) 21:13, 29 August 2023 (UTC)Reply

@نعم البدل Are you sure? The normalization params are getting transliterated correctly, which means Urdu must have a translit module. Benwing2 (talk) 21:15, 29 August 2023 (UTC)Reply

They're fine for Punjabi lemmas, but for Urdu lemmas, the transliteration isn't there, only the normalised text. It's fine I'll sort them out. I did ask Kutchkutch to add Module:pa-Arab-translit as the transliteration module for Urdu, but I think he's not been active for some time. Thanks for the list. نعم البدل (talk) 21:21, 29 August 2023 (UTC)Reply

@نعم البدل Yes, we are a dictionary, that's why I think it's reasonable to make the fully vocalized text visible. If you want to convert the norm parameters, there were only 16 or so uses of {{quote-book-ur}}, which were the following:

So it shouldn't be hard to convert them by hand using {{xlit}}; I'd keep the vocalized text in the wikitext in case we change our mind about whether to show it. Benwing2 (talk) 21:14, 29 August 2023 (UTC)Reply

@نعم البدل: vocalised Urdu, Persian, etc. are totally fine for a dictionary. https://rekhtadictionary.com/ uses quite extensively vocalised Urdu. Check also mod:ur-translit and new Urdu lemmas where vocalisations are used in the headword. Anatoli T. ^{(обсудить}/^вклад) 22:58, 30 August 2023 (UTC)Reply

author field wikipedia links

Thank you for cleaning up the oftentimes messy {{quote-book}} templates, some of which I admit are my own additions. I notice also that the bot is replacing some of the author names with links to their Wikipedia articles. I may worry too much, but it occurred to me that it's possible that in just a tiny number of cases, the bot may be adding a link to the wrong person. Suppose that there is a relatively obscure fantasy author named Kevin Harvick .... would the bot see this and then add a link to Kevin Harvick, a racecar driver? Or does the bot know, perhaps through Wikidata, that in this case the person with the Wikipedia article is not a published author? Best regards, —Soap— 06:36, 30 August 2023 (UTC)Reply

@Soap Thanks for the concern. The code to clean up quote templates has grown to over 1,500 lines and it's easily possible for there to be bugs in this code. In this case, the links to the Wikipedia article are being added only when there's an existing |authorlink= field, which is supposed to point to the appropriate Wikipedia article. If the author name and |authorlink= field are the same, you get an author that looks like w:Kevin Harvick; otherwise you get something like [[w:Kevin Harvick (author)|Kevin Harvick]], because the w: prefix doesn't (or didn't, until today) support two-part links following it. So hopefully it won't be adding any bad Wikipedia links. Benwing2 (talk) 06:48, 30 August 2023 (UTC)Reply

Okay, thank you for the prompt reply. To be honest I dont think I even knew about authorlink= and Im pretty sure Ive never used it except if copypasting an existing citation from another entry where the quote happened to contain both words. I'd come to think of these entries as "mine" even though I didnt type out the templates for some of them. But this is good to know ... the only cases in which there would be an incorrect link to Wikipedia are those where there was an incorrect link already, so what i'm worried about isn't going to happen. Again, thanks for all of your hard work. —Soap— 06:56, 30 August 2023 (UTC)Reply

I have not noticed any problems so far, but I will let you know if I see something. --Geographyinitiative (talk) 10:09, 30 August 2023 (UTC)Reply

New Polish pronunciation module

Latest comment: 1 year ago10 comments3 people in discussion

Hey Ben, the new module is ready. I'm writing to check the best course of action - I think the name will change to {{pl-pr}} and we'll make {{pl-pronunciation}} still the main, but I suppose we have to orphan the old module first in order to reclaim the name. I suppose I could make {{pl-pr}}, write a docu and give examples of how to bot convert, then we orphan the old code? Indicdentally, could we make a few other easy bot changes? Vininn126 (talk) 12:51, 1 September 2023 (UTC)Reply

@Vininn126 I am pretty much ready to work on this now. Where is the module code? Benwing2 (talk) 19:02, 1 September 2023 (UTC)Reply

MOD:pl-szl-IPA. I'm aware that Polish and Silesian are merged, however, I had grand dreams of using similar code for tons of Slavic languages anyway :p Vininn126 (talk) 19:06, 1 September 2023 (UTC)Reply

@Vininn126 @Catonif I'd like to switch the module code to use inline modifiers instead of the current way with separate params, similar to how {{es-pr}} and {{it-pr}} work. Do you mind if I make the change? We now have a library to assist with parsing inline modifiers so it's pretty easy to make use of them. Benwing2 (talk) 19:15, 1 September 2023 (UTC)Reply

You mean for example using <> instead of a pipe? I have no strong feelings one way or the other. Vininn126 (talk) 19:16, 1 September 2023 (UTC)Reply

@Vininn126 Right, so instead of entering |q3=foo along with the third pronunciation, you write <q:foo> after it. The way that {{es-pr}} works, you can specify multiple pronunciations either in a single comma-separated param or in multiple params; the difference is that all pronunciations given in a comma-separated param end up on the same line, whereas different params go on different lines. This lets you group sets of pronunciations that differ only slightly but separate those that differ more significantly. Benwing2 (talk) 19:23, 1 September 2023 (UTC)Reply

I do find using <> more "intuitive" in many ways, as it reduces your need to "count" various inputs - I wonder if it'd be possible to have our cake and eat it, too? I.e. have both? If not, having inline params is fine with me. Vininn126 (talk) 19:26, 1 September 2023 (UTC)Reply

@Vininn126 Having both is possible if you want that; this is how {{syn}} and {{alt}} work as well. Benwing2 (talk) 19:30, 1 September 2023 (UTC)Reply

I don't see why not. It gives editors some freedom if they prefer one over the other. Vininn126 (talk) 19:37, 1 September 2023 (UTC)Reply

@Benwing2 Yes, I also had planned on implementing your angle bracket syntaxt but it's good I can leave you to it. The affected parameters would be qual, q (its alias) and ref in both export.IPA (that handles {{szl-pr}} and will handle {{pl-pr}}) and export.mpl_IPA (that will handle {{zlw-mpl-IPA}}, i.e. the standalone template for MidPolish-only words, not MidPolish transcriptions in general, that are handled by {{pl-pr}}). I'm not sure if you want to handle mp_q(ual) and mp_ref (params that will be given to {{pl-pr}}) as, e.g., |mpN=...<q:...>, rather than equivalent |mpN=...|mp_qN=... that would be used otherwise. Catonif (talk) 19:50, 1 September 2023 (UTC)Reply

adding Dingal language

Latest comment: 1 year ago2 comments2 people in discussion

I tried adding but there seems to be errors.Can you correct: Module:labels/data/lang/inc-din and Category:Dingal data modules(this one shows error) कालमैत्री (talk) 14:24, 1 September 2023 (UTC)Reply

@कालमैत्री You can't create new languages like this. In general in order to create a new language you should post in the Beer parlour (WT:Beer parlour/2023/September) requesting the language to be added. Some people may have opinions on how this should be done. Benwing2 (talk) 19:04, 1 September 2023 (UTC)Reply

@@ Line 1,258: / Line 1,258: @@
 :::::::: {{quote-book|en|author=车慕奇<t:Che Muqi>|title=丝绸之路今昔<t:Silk Road, Past and Present>|page=293|passage=Passing through '''Qira''' County on our way, we were asked to stay by Wang Yijun, Director of the Office of the County Party Committee. He said he was an amateur archaeologist and an old acquaintance of Li Yuchun’s. In 1978 the two men had gone together to the desert in northern '''Qira''' County to survey a buried ancient city.|year=1989|isbn=0-8351-2100-3}}
 ::::::: [[User:Benwing2|Benwing2]] ([[User talk:Benwing2|talk]]) 21:04, 20 August 2023 (UTC)
+::: Do check out {{diff|75958595}} if you have a chance. --[[User:Geographyinitiative|Geographyinitiative]] ([[User_talk:Geographyinitiative|talk]]) 08:09, 2 September 2023 (UTC)
 == [[Template:es-pr]] ==

User talk:Benwing2: difference between revisions

Revision as of 08:09, 2 September 2023

Archive

Links to Thesaurus:vagina#Portuguese from mainspace

Appendix:Russian Adverbs - Frequency List?

Italian heads

Weird bug

Negativizzare and negativizzarsi

пересекший vs пересёкший

|translator=

Replacement of unnecessary redirects and templates

Arabic spellings in Northern Kurdish headword templates

Updated vowel length in nū̆ptum: bot help?

References > Further reading

regex help?

Changes to Module:languages

suízhe

On Northern Kurdish

Plus template conversion

Quotation template replacements

Macrons in Classical Persian transliteration

Listing of Periodic XML dump runs? etc.

Implementation of removing horizontal rule separators

Some horizontal rule separators are still there

Module:cel-verbs

Context objects

Bad bot edit

Updating module message.

Polish Adjective Module

Declension of Russian ха́нец

Replacement of unnecessary redirects and templates

{{fa-IPA}}

More Latin vowel length changes

Replacement of unnecessary redirects and templates

Russian transliteration scraper

autoelegirse

Reflexive verbs

Changes to the headword module.

Wingerbot edit - one full stop too many

Serbocroatian femeqs

Unicode Collation Algorithm - ideas?

Hyphens in Catalan

User:Conrad.Irwin/creation.js/intro

Belarusian + Templates

Manual conversion of Dari and Classical

Edit at ty vogo

Request for cleanup

French etymologies

repetition repitition

Functions to get the current page section from a module

Template sl-pr

Spanish plural issue

Tagalog pronunciation module sandbox

Changes made to Chinese thesaurus

Incorrect form of German noun

Duplicate Swedish participles

Slovak and Old Czech modules

Spanish gender-neutral terms

Need your input on a policy impacting gadgets and UserJS

Just to save you some time:

Early Medieval Latin (EML.)

Deletion tags

Replacement of unnecessary redirects and templates

Customising the Lua error function to mitigate errors

Deprecated templates in Accelerated forms

Module "bg-pronunciation" needs several fixes

Something broke

New abuse filter - WingerBot help?

attesting to Ʈ

Descriptions

Hani sortkey - mass data module deletion request

Some issues with sorting prefix categories

odd Unicode 'macron'

More WF audios

Template:new es demonym

URL quotation fixes

Italian multiword verbs headword

Quotation templates again

Updates on fa-IPA and transliterations

<syntaxhighlight> tag

`|translator=`

`{{fa-IPA}}`

WingerBot changing `{{bg-phrase}}` to `{{head|bg|phrase}}`

q= and qq= in `{{quote-book}}`

`[[Category:Sassarese terms derived from Classical Latin]]`

Bug in modifiers in `{{quote-av}}`

`{{quote-book}}` numbered parameter rightward shift