Wiktionary:Grease pit/2021/July

From Wiktionary, the free dictionary
Jump to navigation Jump to search

String alternatives in templates[edit]

I've just found myself writing the following, and I'm wondering if there's a better syntax:

{{{2|{{#if:{{xlit|pi|{{PAGENAME}}}}|{{xlit|pi|{{PAGENAME}}}}|unknown}}}}}

It's simple meaning is the first non-empty (or non-blank - I don't care about the difference) value from:

  • Parameter 2
  • The result of a template call that depends on Lua
  • literal "unknown"

It bothers me because the template is called twice if Parameter 2 is empty, and I worry that so doing may increase the memory consumption. It's not impossible that there might be issues with execution time, though that looks unlikely. Fortunately, supplying Parameter 2 is not very tedious, and seems to be fairly resistant to typos, and the code above should not be invoked for a Roman script page. --RichardW57 (talk) 11:45, 3 July 2021 (UTC)[reply]

@RichardW57: In this case I think using another template is appropriate. w:Template:If empty looks like it does what you want. (The module w:Module:If empty would need the weird backwards-compatibility stuff removed.) — Eru·tuon 00:22, 4 July 2021 (UTC)[reply]
@Erutuon: That does the trick for readability. Unfortunately, it requires yet another module, so may use up even more memory from the Lua limit. At present, parameter 2 is normally provided, so with any luck {{xlit}} is not invoked at all. --RichardW57 (talk) 02:19, 4 July 2021 (UTC)[reply]
@RichardW57: Actually, now that I think of it, the {{if empty}} that you need (with only one parameter) could just be implemented as {{#ifeq:{{{1|}}}|||{{{1|}}}}}. The module is only needed to handle an unlimited number of parameters. (If parameter n is empty, show parameter n + 1, starting at n = 1, recursively.) — Eru·tuon 19:33, 26 July 2021 (UTC)[reply]

Further forms in default head template lack auto-translit[edit]

How come the forms given in |4=, |6= and so on of {{head}} are not auto-transliterated when everything else is auto-transliterated for that language? E.g. Geʿez—for example ሥጋ (śəga)—, not having its own modules for headers. With another script Sabaean 𐩱𐩨𐩬 (ʾbn). This is so for as long as I remember but seems an error. If one transcribes using |fNtr= entries get categorized as “terms with redundant transliterations”, for why should one even do it instead of it being auto-transcribed? Fay Freak (talk) 02:03, 4 July 2021 (UTC)[reply]

@Fay Freak Editors in some languages seem to want auto-transliteration of inflections while editors in other languages don't. I had this argument with User:Rua awhile ago; I wanted auto-transliteration of inflections in Arabic. Greek and Russian editors specifically don't want this, it seems. There's actually support in Module:headword for auto-transliterating the inflections using the enable_auto_translit setting, which is set by the Arabic modules; so for example غُدّة (ḡudda) has the plurals transliterated. Note also, there's a comment in format_inflection_parts in Module:headword around line 363 that says:
-- Don't show a transliteration here, the consensus seems to be not to
-- show them in headword lines to avoid clutter.
Unfortunately, the enable_auto_translit setting isn't exposed to {{head}}, so you can't currently set this unless you call Module:headword from Lua, but I will fix this. Benwing2 (talk) 04:36, 6 July 2021 (UTC)[reply]
@Fay Freak Done; use |autotrinfl= to get all forms to include automatic transliteration, or |fNautotr= to get an individual set of forms to include automatic transliteration. See User:Benwing2/test-head-auto-translit for examples. Benwing2 (talk) 05:01, 6 July 2021 (UTC)[reply]

It could be useful to have Template:cite-letter to, err, cite letters. TVdinnerless (talk) 08:34, 5 July 2021 (UTC)[reply]

Template:given_name's "A=" parameter[edit]

I asked this last December on this template's talk page, but I got no response.

What is the point of the |A= parameter? The main thing I see it used for is to override the default capitalization, so that e.g. "A female given name" is instead "a female given name". If leading with "A" is the preferred style, why does the Template not enforce it? If lowercase is the preferred style, why isn't it the Template's default?

173.72.124.108 02:52, 6 July 2021 (UTC)[reply]

It's partly historical. Some people use |A= to stick additional qualifiers before the text. IMO this isn't good practice, but it's been this way for a long time. Benwing2 (talk) 05:03, 6 July 2021 (UTC)[reply]

Hi ! Would it be possible to have an optional argument to display the language added to this template ? E.g. {{doublet|en|frog|showlang=1}} would therefore display "Doublet of English frog" ? Leasnam (talk) 05:09, 6 July 2021 (UTC)[reply]

You can call the new arg anything else which you feel is more suitable and in line with current naming conventions. I just used 'showlang' as an example. Thanks ! Leasnam (talk) 05:10, 6 July 2021 (UTC)[reply]
It's certainly possible, but can you describe the purpose for future reference? In what contexts do you see this as useful? — Eru·tuon 20:32, 6 July 2021 (UTC)[reply]
The entry that uses the template must be already same language of the doublets. Is it necessary to show the language name twice? --Octahedron80 (talk) 00:06, 8 July 2021 (UTC)[reply]
Well, sometimes I would like to specify that a word is a doublet of a dialectal word, so manipulating the display text would be useful. For example: at frog I may want to show "Doublet of dialectal English frosh" or "Doublet of Scottish <term>" where 'Scottish' here represents 'Scottish English' not the language Scots. Leasnam (talk) 13:44, 6 August 2021 (UTC)[reply]

Template problem[edit]

Hey, I'm really inexperienced at creating lua templates and am unable to fix this issue. Can anybody modify this template so the first person singular row header displays properly?

template:tli-inalienable-noun-inflection

Thanks!! Hk5183 (talk) 16:46, 9 July 2021 (UTC)[reply]

@Hk5183:
That template doesn't include any Lua...???
By way of fix, is this what you mean? ‑‑ Eiríkr Útlendi │Tala við mig 17:10, 9 July 2021 (UTC)[reply]
@Eirikr:
Shows what I know! Yes, that is what I meant! Thank You!! Hk5183 (talk) 17:53, 9 July 2021 (UTC)[reply]
@Hk5183: No worries, we all start somewhere! FWIW, Lua is the scripting language used in our module infrastructure, things like Module:nyms, for instance. Lua gets used in our templates by means of statements within the template like {{#invoke:nyms|nyms|synonym}}, as you can see over at the Template:synonyms template.
HTH! ‑‑ Eiríkr Útlendi │Tala við mig 17:58, 9 July 2021 (UTC)[reply]

Is there are anyone can help modify code about plural form? Probably to change to automatic for single word noun and not abbreviation. ―Rex AurōrumDisputātiō 04:26, 10 July 2021 (UTC)[reply]

Link colouring[edit]

@SodhakSH, Octahedron80, Benwing2: Automatically inflection or 'transliteration' tables tend to link to words without checking whether they are recorded in Wiktionary for the relevant language. I am looking to fix the Pali tables so that the unlogged-in reader will not follow a blue link only to be directed to a different language. I consider such misdirecting links a blemish on Wiktionary. At present by default, there are two colouring schemes:

  • red if the target page does not exist, blue if it does
  • black if the target page does not exist, blue if it does

How is the colour scheme selected at the Lua level? I've looked, but not found it.

For logged-in users, there is a little known customisation that substitutes orange if the target page does exists, but the section for the language is not there. Is there any legal reason why we should not make orange the default colour in this case? (There are threatening-looking accessibility requirements around that may mean our defaults are driven by the needs of the colour-blind.)

I am contemplating using Lua function calls like mw.title.makeTitle( 0, 'deva', 'Pali').exists to check for the existence of the target within pages, but I would like to play nicely with the existing system. Perhaps this enhancement is something to be incorporated in Module:links. --RichardW57 (talk) 10:35, 10 July 2021 (UTC)[reply]

See MediaWiki:Common.css. It makes some "redlinks" black when they are inside a HTML element with the inflection-table class.
I guess the "little known customization" is MediaWiki:Gadget-OrangeLinks.js? That colors links by getting a list of their categories from the MediaWiki API and looking for any that start with the language name. That doesn't increase the Lua memory or execution time.
We certainly can't color all links everywhere in Lua because it would be too expensive in memory and time on certain pages. It would be somewhat similar implementation-wise to our redlink categories, which we disable on certain pages using Template:redlink category and are planning to replace with jberkel's wanted lists because they cause out-of-memory errors. Maybe it wouldn't be too expensive to do that in particular inflection templates though. As a test of efficiency, one of us could extract the titles from an inflection table and write a module that looks for language headers on their pages. — Eru·tuon 04:47, 21 July 2021 (UTC)[reply]
Doesn't the use of Javascript require a logged in user's permission? I want the warning to be available to the unlogged in user. Logged in users can enable orange links. --RichardW57m (talk) 13:37, 22 July 2021 (UTC)[reply]
@Erutuon: I'm not quite sure what you mean by a test of efficency? Are you suggesting something like doing the checks for three different invocations of {{sa-alt}}, which would check for the existence of specific fragments in about a hundred pages? I presume the efficiency metric would be how much of the Lua limits it consumed. The problem areas are {{pi-alt}} and {{sa-alt}}, which point to the mostly automatically generated forms of a word in other scripts. (Manual override is available.) They cause a problem because attesting the word in another script can be hard, but a word of the same spelling may exist in another language. The form in the other script usually probably exists, though there may be a valid doubt for a recently adopted writing system, and even some of the older writing systems may have problems with predicting the spelling. For example, when is the consonant under a repha geminated? The autogeneration I've seen assumes no automatic gemination. It's an old habit that is currently out of favour. For inflection tables, there is usually much less doubt. It is only where the inflection tables are themselves in doubt that friction is risked when one just creates the entry to satisfy a false blue link.--RichardW57m (talk) 13:37, 22 July 2021 (UTC)[reply]
@RichardW57m: Yes, I mean, testing how much Lua memory or time is this going to use, and is it likely to put any pages beyond the Lua memory or time limits? It's an interesting idea, but we have been trying to avoid from wikitext parsing in modules to avoid having more out-of-memory errors. Errors aren't especially likely if this is only going to run in non-Latin-and-non-Han-script pages, but it is a good idea to test anyway. — Eru·tuon 20:14, 22 July 2021 (UTC)[reply]
Running the Orange Links gadget currently requires a user to be logged in, yes. We could make it run for all users, but that's inadvisable because it makes extra web requests and there would be no way to disable it if you weren't logged in. — Eru·tuon 20:22, 22 July 2021 (UTC)[reply]
Actually, it looks like logged out users can enable Orange Links at Wiktionary:Preferences/V2. That seems to be something we made because I don't see it on Wikipedia. — Eru·tuon 20:27, 22 July 2021 (UTC)[reply]

@Erutuon: I've just had a nasty thought. Would the content of page A depending on the content of page B and the content of page B depending on the content of page A automatically create a troublesome loop or perhaps disable page caching altogether? — This unsigned comment was added by RichardW57 (talkcontribs) at 14:01, 10 July 2021 (UTC).[reply]

@RichardW57: Loops like that already exist (for instance, +1s -> 加一秒 -> +1s, not to mention pages transcluding themselves), so apparently it doesn't cause the server any problems. This would include things like a page getting its own wikitext in Lua with mw.title.getCurrentTitle():getContent(). — Eru·tuon 04:47, 21 July 2021 (UTC)[reply]

Feature request...partial pattern or terminations options in search...[edit]

On English Wikisource I recently encountered a word that was partly illegible due to a clipped scan, the ending of the word being -erty.

The page concerned being:- s:Page:New winter evening's companion, of fun, mirth, and frolic.pdf/11

Using a third party site, calling it's self a crossword dictionary I was able to do a pattern search for words that were possible candidates and was able to determine a specfifc candidate for the illegible word, it being liberty and accordingly I marked this using the s:Template:reconstruct which is used at English Wikisource.

I'd like to be able to do similar searches based on 'terminations', or partial patterns for words in Wiktionary directly, on a Special or Project Page if needed.

To some extent I could have implemented a regexp for this, but that would require some degree of technical expertise.

Would it be feasible to have a 'terminations' search option that generated a set of links to potential candidate words present in wikitonary?

More generally , I'd also be interested in seeing additional 'word-solver' pages at Wikitonary, examples being:

  • 'Wordsquare' solver - which is typically finding a specific word in a row of other text ( including it being potentially reversed.)
  • 'Anagram' finder

ShakespeareFan00 (talk) 08:31, 11 July 2021 (UTC)[reply]

Links with ć[edit]

In trying to reach https://en.wiktionary.org/wiki/re%C4%87i from https://en.wiktionary.org/wiki/tell#Translations, I noticed I was instead taken to https://en.wiktionary.org/wiki/reci#Serbo-Croatian, a completely different word. I could not fix this through editing, and wonder if support for links with ć has been removed. On confirming this, I noticed that the Translations part of "can" no longer leads to Serbo-Croatian "moći", but "moci", again a completely different word. This needs to be fixed.

I believe this is due to a recent edit that @Benwing2 made to MOD:languages/data2. I'm reverting it for now, because each language should be checked before changing how diacritics are stripped. —Μετάknowledgediscuss/deeds 20:36, 15 July 2021 (UTC)[reply]

Adding Wikisource functionality to Quiet Quentin[edit]

I would very much suggest that we take on the project to add functionality to search and cite from Wikisource texts to Quiet Quentin. Wikisource increases accessibility and searchability of specifically public-domain (and otherwise freely licensed) texts. Many of the works presented there are also scan-backed, meaning that they are produced alongside scans with the page numbers available so that readers may check the transcription against the original scanned text.

There's also a growing interest there of producing film transcripts which could theoretically be used for attestation of some words. There's been talk of including comic transcripts as well. Many of those would not otherwise have searchable transcripts even existing online.

Any thoughts? PseudoSkull (talk) 21:41, 15 July 2021 (UTC)[reply]

This would be great, and has been suggested a few times as community wishlist project, but nothing ever came out of it. What's missing for this to work? Is the search on Wikisource already providing all the necessary data? — This unsigned comment was added by Jberkel (talkcontribs) at 20:59, 22 July 2021 (UTC).[reply]

Asturian frequency list[edit]

Can anyone generate a frequency list for Asturian? The best place to farm words is probably ast.wikipedia.org Emmett Lathrop Doc Brown (talk) 09:10, 16 July 2021 (UTC)[reply]

@Emmett Lathrop Doc Brown This is not so easy; to do a proper frequency list you need a morphological analyzer to convert words (especially verbs) to their lemma form. Just generating a raw frequency list won't be so helpful. Benwing2 (talk) 16:40, 18 July 2021 (UTC)[reply]
Plus, Wikipedias are an awful place to get usage. People routinely make up names for things so they can write about them, and there's no guarantee that anyone is a fluent speaker. On the whole, any wiki is a very artificial environment that should be the last place to look for evidence of usage patterns. Chuck Entz (talk) 19:22, 18 July 2021 (UTC)[reply]
Good replies, but I'd like one anyway, without having to learn coding to generate it :) Emmett Lathrop Doc Brown (talk) 22:04, 10 February 2023 (UTC)[reply]

Returning template-parsable text from modules[edit]

Whenever I return text from Lua that contains, for example, {{g|m}}, it renders simply as the text {{g|m}} instead of m (which is how I want it to). What can I do? Thanks a lot. Rishabhbhat (talk) 11:28, 17 July 2021 (UTC)[reply]

You need to invoke a special function for template expansion, namely frame:expandTemplate where 'frame' is a frame object. Documentation is available at https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual#Frame_object. I've written a module that uses it, Module:RQ:pi:Sai_Kam_Mong. It invokes an item-specific template to write the boilerplate for quoting from a durably archive item, and passes the selected text to it. --RichardW57 (talk)
Thanks a ton @RichardW57. Rishabhbhat (talk) 08:47, 18 July 2021 (UTC)[reply]

I have forgotten how to fix this category.SemperBlotto (talk) 13:03, 17 July 2021 (UTC)[reply]

@SemperBlotto: Fixed; the correct name is Category:unm:Animals. —Mahāgaja · talk 15:20, 17 July 2021 (UTC)[reply]

Lenape (Unami)[edit]

A relatively new user has been adding words in which the language is given as "Lenape (Unami)" but the words are in Category:Unami nouns &c. What language should be used? (An example is at mochipwis.) SemperBlotto (talk) 14:33, 17 July 2021 (UTC)[reply]

Unami (code unm). Lenape (code del) is a language family covering both Unami and Munsee (umu); per WT:LT, we only consider unm and umu canonical languages, not del. —Mahāgaja · talk 15:21, 17 July 2021 (UTC)[reply]
Exactly. [del] is a language family code for Lenape. @Andreas.b.olsson --{{victar|talk}} 04:34, 18 July 2021 (UTC)[reply]
"del" is not a valid language code. SemperBlotto (talk) 05:35, 18 July 2021 (UTC)[reply]
@SemperBlotto “del” may not be a valid language code, but Lenape is a valid language with dialect variations just like French and it’s various patois. Declaring Lenape deprecated and raising Wënami and Munsee to canonical language status is a Wiktionary construct with potential political and cultural impact.Andreas.b.olsson (talk) 05:47, 29 July 2021 (UTC)[reply]

Please note that using Lenape (Unami) for the language title does not go against Wiktionary conventions about languages. If two “languages” cannot be distinguished by name, then location distinction in parentheses is the convention to use. Both Lenape groups – the Wënami now living in Oklahoma USA, and the Munsee in Moraviatown Canada – refer to their languages as Lenape. Andreas.b.olsson (talk) 06:43, 29 July 2021 (UTC) Also, I would not say “Munsee was once spoken in my hometown” (NYC). I would say “Lenape, an Algonquin language was once spoken here”.Andreas.b.olsson (talk) 06:49, 29 July 2021 (UTC)[reply]

@Andreas.b.olsson: It does go against Wiktionary conventions, because it's important that we remain consistent in how we refer to languages. If you want to change an existing practice, there needs to be a discussion that leads to changing it everywhere, consistently. (But it seems your issue is with our splitting, not our naming... maybe @-sche can weight in.) —Μετάknowledgediscuss/deeds 06:53, 29 July 2021 (UTC)[reply]
@Metaknowledge Quoting from Wiktionary:Languages

If languages cannot be distinguished by alternative names, the place where each language is spoken is appended in parentheses after its name, as in the case of "Buli (Ghana)" (code: bwu) and "Buli (Indonesia)" (code: bzq). If languages go by the same name and are spoken in the same place, they can be disambiguated by their linguistic families. For example, "Mor (Austronesian)" (code: mhz) and "Mor (Papuan)" (code: moq), both of which are spoken in Indonesia.

Andreas.b.olsson (talk) 12:52, 29 July 2021 (UTC)[reply]
@Andreas.b.olsson: It's not the parentheses per se that go against our convention, it's using specifically ==Lenape (Unami)== as an L2 header, because the result of the previous discussion was to use "Unami" as the canonical language name, and L2 headers always use canonical language names. The quote above is about two completely different languages that happen to share a name. Your argument is that Lenape is a single language with two dialects. If the request to merge them that you made at WT:RFM is successful, the language will presumably be named simply Lenape here, so we would use ==Lenape== as the L2 header on entries. We would then distinguish the dialects by means of the {{lb}} tag, writing {{lb|del|Unami}} or {{lb|del|Munsee}} before the definition, and categorizing the words into categories named CAT:Unami Lenape and CAT:Munsee Lenape, which would be subcategories of CAT:Regional Lenape. That's how we treat dialects of other languages here, e.g. CAT:American English, CAT:Finland Swedish, CAT:Quebec French, CAT:Austrian German and so on. —Mahāgaja · talk 14:01, 29 July 2021 (UTC)[reply]
@Mahagaja This misses the point that all human categorizations however useful are in the end artificial. To distinguish between two languages is partly – and very importantly – a sociopolitical act. Are Wënami and Munsee separate languages or dialects of one language? Yes and no. You cannot academically and neatly separate the issues of “naming” and “categorizing”, and isolate them from their sociopolitical effects. Languages are not structures in reality we can investigate academically without causing a deep effect on what we are investigating and cataloging. This is of special concern to languages spoken by few and in the process of revitalization. For Lënape, it would be best to retain some ambiguity in Wiktionary. That means reversing the arbitrary declaration in 2013 to consider them canonical languages (regardless of whether some linguist or other would academically consider them separate languages or dialects). Or, retaining the language codes for Wënami and Munsee but L2’ing the header as Lenape (…). The latter is a perfectly acceptable solution that recognizes the ambiguity of whether they are languages or dialects. Having learned some Lënape based on the Wënami branch I can tell you that I can begin to understand some Lënape as spoken by the Munsee tribe. Andreas.b.olsson (talk) 15:21, 29 July 2021 (UTC)[reply]
@Andreas.b.olsson: I'm aware that it isn't always obvious whether two lects are better considered different languages or dialects of the same language, and that any such decision has sociopolitical ramifications as well as sociolinguistic ones, but for the purposes of Wiktionary we do have to decide: Are we going to treat unm and umu as separate languages that belong to the family del, or are we going to treat del as one language with two dialects unm and umu (which we may or may not decide to treat as etymology-only codes, meaning that we can still separate CAT:English words derived from Unami from CAT:English words derived from Munsee even when treating them as dialects of the same language). We've made such decisions for other languages in the past, and sometimes people aren't happy with those decisions because they aren't clear-cut. For example, we treat Serbo-Croatian as a single language and consider Croatian, Serbian, Bosnian, and Montenegrin to be dialects of it. On the other hand, we treat Norwegian Bokmål and Norwegian Nynorsk as two separate languages. Both of those decisions are unpopular in certain quarters, but they're the decisions we reached by consensus. Likewise, 4 years ago we agreed to treat Unami and Munsee as two separate languages, and on your instigation we're having a discussion whether to treat them as a single language. I personally don't care one way or the other what we decide, I just want to make sure that our entries are formatted with correct codes and correct L2 headers that correspond to the current decision. If we change our minds over at WT:RFM, someone with a bot will change all the relevant entries and there will be a new status quo. That's fine with me too. —Mahāgaja · talk 19:37, 29 July 2021 (UTC)[reply]
Note here @Mahagaja that it’s listed as “Norwegian Bokmål”, not simply “Bokmål”. There is no reason Lënape could not be treated the same: keep the canonical language distinction of unm and umu, but rename the headers to recognizing that both languages should be called Lënape. I hope the Nation of Delaware Indians and the Munsee of Moraviantown could be consulted on this since it should be recognized that I am not Lënape but merely an enthusiastic learner of their language and culture. Andreas.b.olsson (talk) 20:44, 29 July 2021 (UTC)[reply]

Descendants (pyriformis)[edit]

First the section could only be emptied but not deleted as this message appeared:

"Please check Special:WhatLinksHere/pyriformis for any desctree templates linking to Descendants sections you have removed, and fix them."

After the empty section was saved (diff), then in a 2nd edit the section could be removed (diff). That's stupid and should be fixed. --Macopre (talk) 07:55, 20 July 2021 (UTC)[reply]

That wasn't actually necessary. You could have saved the deletion of the section simply by hitting "Publish changes" a second time. The warning message doesn't prevent the edit, it just makes you the button a second time after reading the message. —Mahāgaja · talk 09:42, 20 July 2021 (UTC)[reply]
Well, now that works. --Macopre (talk) 12:41, 20 July 2021 (UTC)[reply]

Template:taxlink giving weird stuff[edit]

For example, at thésium des Alpes, there's a random link to HOUSERATA. Queenofnortheast (talk) 12:19, 26 July 2021 (UTC)[reply]

OK, it was easily fixed by reverting. You probably want that template semi-protected, though, lest folks like random IPs and me should edit it. Queenofnortheast (talk) 12:21, 26 July 2021 (UTC)[reply]

DynamicPageList and category pages[edit]

It looks like DynamicPageList, which is used by Module:category_tree to produce "Recent additions to the category" on pages like Category:Japanese lemmas, might be removed soon because the tool is not well maintained and has caused major disruptions to servers: phab:T287380. (Note: Phabricator is for technical discussion. More general comments can be made here.) Whym (talk) 10:07, 27 July 2021 (UTC)[reply]

That's a shame, it's a useful feature, I like Category:English citations of undefined terms. No replacement, "just use a bot" :/ – Jberkel 10:19, 27 July 2021 (UTC)[reply]
In some cases, you might be to able to use Special:RecentChangesLinked. It'll show every edit made to pages in a certain category. Edits that add entries to a category might get drowned out, though. — surjection??11:27, 27 July 2021 (UTC)[reply]
On the other hand, I can't blame them for wanting to remove unmaintained duct-tape code which can blow up at any given moment. The other question is isolation, why do "local" deployments (in this case ru.wikinews) have such a disruptive effect on the whole WMF platform. – Jberkel 09:36, 28 July 2021 (UTC)[reply]
An alternative is an API query. Is it possible to format this with Lua? Vriullop (talk) 06:01, 29 July 2021 (UTC)[reply]
@Vriullop: Yes, but unfortunately Lua doesn't have access to the query API. For Lua to see the JSON, a bot would have to save it into a page periodically. That would require a lot of edits because there are about 400,000 category pages that use Module:category tree (which probably generates the majority of the DynamicPageLists). — Eru·tuon 06:35, 29 July 2021 (UTC)[reply]

Anomalous Display of URL[edit]

Hey- I made this edit and you can see that there's a problem on the page. Is there anything I can do to fix this? Thanks for any help. --Geographyinitiative (talk) 12:35, 27 July 2021 (UTC)[reply]

Most punctuation in URLs needs to be properly encoded so the system knows that it's not wiki sytax. Chuck Entz (talk) 14:02, 27 July 2021 (UTC)[reply]

Sanskrit template {{sa-decl-adj-mfn}} adding entries to "nouns" categories[edit]

(Original at template talk page; moved here for exposure, also pining @Bhagadatta, RichardW57)

The template {{sa-decl-adj-mfn}} (which seems to use the {{sa-decl-noun-m/f/n}} templates under the hood) appears to add entries to categories such as "Sanskrit X-stem nouns" (where X is stuff such as a, ā, i, etc. depending on the stem). This means pages such as सर्व (sarva) are in Category:Sanskrit a-stem nouns and Category:Sanskrit ā-stem nouns even if there's no "Noun" header for the term. I wonder if this could be the intended behaviour at all. If not, is there a way to fix the underlying mechanisms?

Another example can be seen with एतद् (etad); the page uses the {{sa-decl-noun-m}}/f/n templates, and is added to the "a-stem nouns" and "ā-stem nouns" categories; even if the term is a pronoun.

Ideally the categorization should be decoupled from the actual engine generating the table; or perhaps with some "situation awareness" about whether the template is invoked under a Noun header.

Can this be done? Thanks! --Frigoris (talk) 17:08, 27 July 2021 (UTC)[reply]

@Frigoris: Well, it is the intended behaviour of {{sa-decl-noun-m}} etc, but it's not beyond the wit of man to add a parameter |nocat= to suppress this behaviour. It looks as though it will get passed down easily enough. I'm not sure whether the code is deliberately obscurantist. I'm not sure whether the Sanskritists believe in adjectives - it's conceivable that they believe an adjective is three nouns - it would be consonant with their hostility to Anglo-Saxon users. (You're only allowed your script for Sanskrit if it's an Indic script.) An obstacle to doing it is that one ought to document the templates while one's at it, which makes much more work.--RichardW57 (talk) 19:46, 27 July 2021 (UTC)[reply]
@RichardW57: I think "nominals" is an umbrella term for adjectives, nouns, proper nouns, etc. (anything that is inflected like a noun). Dictionaries written by Sanskritists (such as Apte's Practical Sanskrit–English Dictionary) regularly use the term "adjective" (see external image; the page is full of the abbreviation a. for adjective).
I agree that in practice many adjectives can be used nominally, so the line may be blurred. Still it feels strange and arguably makes the "nouns" categories less useful.
In addition, consider the current page for एतद् (etad). As I understand it, the page invokes the three noun-declension templates to automate the boring jobs of actually creating most of the inflected forms and putting them in a table. But by doing so the pronoun is nevertheless added to the noun categories. To me this looks more unsatisfactory than in the adjective case.
As for the technical part, I have zero expertise with that, but to me the idea about additional parameter sounds reasonable.
Overall I think the noun-declension templates are really brilliantly written and very useful. It's just a minor detail about how it interacts with the other features that's creating surprises and confusion. --Frigoris (talk) 07:32, 28 July 2021 (UTC)[reply]

Another issue is that the template doesn't allow for adjectives in -a with feminines in -ĩ, or for overrides in case forms. Doing a proper job will take much more work than you might think.--RichardW57 (talk) 19:46, 27 July 2021 (UTC)[reply]

@RichardW57: I believe you can add the MFN stems separately as parameters, as in {{sa-decl-adj-mfn|m-stem|f-stem|n-stem}}. The MFN stems could be anything spelled out in full. --Frigoris (talk) 07:32, 28 July 2021 (UTC)[reply]
@RichardW57, @Frigoris this template also doesnt have parameters like nom_s, etc, for manually entering the declension. and these declension templates needs more fixing. i got this message: Lua error in Module:sa-decl at line 218: No declension class could be detected. Please check the lemma form or specify the declension. , when i tried using {{sa-decl-noun-m}} in the page भगवत्. another thing: using different words all together for avoiding ‘no declension class detected’ is not good, at in page सम्पद्, एतद्. can this ‘no declension class detected’ be corrected? and -t, -d, ending declensions are straightforward so why they cant appear automatically . Svārtava10:36, 28 July 2021 (UTC)[reply]
But brute force like this gets the job done. In the case of Sanskrit, it may even work smoothly for its 32 (or whatever) scripts. It gets a bit of a chore with Pali, whose inflection system I extended assuming that transliteration is inherently unreliable when dealing with multiple writing systems. I am contemplating automatically transliterating Latin script form overrides for non-Latin stems to simplify the chore of irregular case forms in Pali. --RichardW57 (talk) 19:36, 28 July 2021 (UTC)[reply]
I think the first step is to document what we have. The obvious way forward for adjectival irregularities is to have |m_loc_s= etc. at the template level which then get used to derive |loc_s= at the next level. For example, one could have loc_s={{{m_loc_s|{{{mn_loc_s|{{{mfn_loc_s|}}}}}}}}} which would pass down, in order of preference, |m_loc_s=, |mn_loc_s=, mfn_loc_s and nothing as the override for the locative singular of the masculine form. The prefixes to the parameter names would be m_, f_, n_, mn_, mf_ and mfn_. I'm not sure how to handle the cases of having two feminine stems. Maybe the template would have to branch on whether there was a second feminine. I intend to do something like this for Pali adjectives, though that is complicated by typically having multiple forms for each combination of gender, case and number. --RichardW57 (talk) 19:36, 28 July 2021 (UTC)[reply]
@Svartava: it seems the underlying Module:sa-decl/data has no handling for some of the consonant stems. It recognizes the vowel stems, vowel-s-stems, an-stem, in-stem, c/j-, and bh-stems, but nothing else. Since I have zero technical knowledge about properly coding it up, let me ping @Benwing2, SodhakSH who have been working on it. --Frigoris (talk) 13:05, 28 July 2021 (UTC)[reply]
@RichardW57, Svartava, Frigoris I don't think there's any deliberate attempt to be "anti-adjective" here, or any deliberate attempt to make the modules obscure. It's just that User:JohnC5 who wrote the original version of the modules never managed to finish them, and later users have accreted various things onto them without necessarily thinking through the module design as a whole, so they're currently in an imperfect state. It should be easy enough to pass the part of speech down into the module. A couple of months ago I rewrote Module:sa-decl/data entirely to make it much shorter, and had plans to fix some of the missing declensions (e.g. consonant declensions), but I don't know Sanskrit well at all and got daunted by the massive number of exceptions and complexities described in Whitney's Sanskrit grammar. User:SodhakSH tried to hack support for c/j/bh stems (which don't actually exist as such, they are just instances of consonant stems); this code doesn't work well so I may remove it. Benwing2 (talk) 03:14, 29 July 2021 (UTC)[reply]
@Benwing2, RichardW57, Svartava, Thanks for the input. I don't think we should worry about the irregulars or the perceived lack of Sanskrit knowledge. It's not as if any one of us were a native Sanskrit speaker anyway?
The module already handles the vast majority of possible words (vowel stems, in-stem, an-stem, etc.) It's mostly just the at-stems that needs to be tackled. It doesn't have to be prefect for everything. We can always manually override specific forms if something goes wrong, as I recently found out with श्वन् (śvan), where you can use |ins_s= etc. for the exceptions. --Frigoris (talk) 08:55, 29 July 2021 (UTC)[reply]
@Frigoris It's not quite that simple, at the very least the module needs to be able to handle the numerous sandhi rules for consonant stems along with an extra param in cases where the underlying stem isn't completely derivable from the lemma (at least I think we need this) and definitely also an extra param to specify cases where the accent moves. If someone clearly enumerated all the rules and cases that need to be handled, it would make my life a lot easier; otherwise I have to parse through all the stuff in Whitney's grammar. I started to do that in the past but got lost in the details. I also need to figure out the existing code, which implements some but not all of the sandhi rules. Benwing2 (talk) 20:41, 31 July 2021 (UTC)[reply]
@Benwing2: I think in the case of the "-at" nouns, the stems are all derivable from the given lemma. There are two main categories: present active participles and "noun" nouns. The participles needs special treatment with accents etc., while the "true" nouns are more simple, without shifting the accent. For the nouns, the vast majority of them are formed by the "-vat"/"-mat" (possessor) affixes in masculine and neuter. (The feminine is treated just like the ī-stem; the user can simply provide the correct feminine stem manually). The consonant sandhi relevant is the change from -t to -d when followed by bh-, in the ins/dat/abl of dual and plural. The rules are in this table of Whitney: table (here Whitney retains the -s final without having resolved them into the visarga -ḥ). The Vedic forms on top of this include 1) Vedic -ā for -āu in the masculine dual, and 2) Vedic -an for -as in the vocative of masculine single. The rest of Vedic exceptions on that page of Whitney aren't a priority.
There are of course exceptions, such as the word for "tooth" दत् (dat) that's not from the "-vat"/"-mat" affix. These corner cases can always been treated specially by the user with manual overrides.
I understand this is much easier said than done; probably there are things we haven't thought of when referring to the table. It's also especially difficult when trying to understand someone else's work and improve on that. So far my wish list is just the "-vat"/"-mat" nouns, which "should" be the lowest hanging fruit, but I know it's easy to make wishes!
Then again, I can see how these more or less ad-hoc treatment can become a problem in the long term. Ideally, what we need is the "clean" solution, which I think was what you were thinking of, right? --Frigoris (talk) 15:08, 1 August 2021 (UTC)[reply]
@Frigoris Yes, I was trying for a clean solution, as you put it, otherwise we end up with lots of longer-term problems. I'm also trying to handle all consonant stems, not just those in -t. Benwing2 (talk) 23:37, 1 August 2021 (UTC)[reply]

Lists of non-existant pages and entries with missing information[edit]

Hello. Is there any way to generate a list of:

1. All links to non-existent pages within Macedonian entries, regardless of whether the link is marked as a Macedonian word or not (since many will not be marked in any way), but not within homographic entries in other languages that are displayed on the same page

2. All Macedonian entries lacking a pronunciation (not only those for which specific requests have been made)

3. All Macedonian entries lacking an etymology (idem)

I would like to have the first list so that I can quickly create entries for all Macedonian words that are referenced in existing entries without having to open all 25000+ entries individually and check whether they contain any red links or not. For example, шмука (šmuka) links to non-existent "шмукне", лизга (lizga) links to non-existent "лизне" and цуца (cuca) links to non-existent "цуцне", but if I had not opened these entries individually today, I would not have known about the red links. Thus, I would like to see "шмукне", "лизне", "цуцне" and hundreds of others in the same place in order to start creating entries according to the list.

Similarly, I would like to have the second two lists in order to efficiently add pronunciations and etymologies where they are missing without having to open all 25000+ entries to see whether this information has already been included or not. For etymologies, this is not really an issue at the moment, since very few Macedonian entries have an etymology, and in most of those cases, serious improvements are needed, but for pronunciations, about a third of entries already have an IPA transcription, so clicking on every entry in a row would be inefficient.

I'm sorry if this has already been answered elsewhere: I searched for comparable queries on Google and found nothing relevant.

Thank you in advance Martin123xyz (talk) 11:52, 28 July 2021 (UTC)[reply]

The links to nonexistent Macedonian entries is kind of covered by User:Jberkel/lists/wanted/latest, though that only covers links in link templates, not bare links. (A Macedonian list is not currently generated, but the data is there in the all.jsonl.bz2 file and I've added Macedonian to the list of language codes, so it will be generated the next time.) — Eru·tuon 18:00, 28 July 2021 (UTC)[reply]
Thank you for the reply. Will the new list be generated on the 1st of August? Martin123xyz (talk) 06:26, 29 July 2021 (UTC)[reply]
Yes but usually a few days later than that, it takes some time before the database dump is ready to be used. – Jberkel 08:10, 29 July 2021 (UTC)[reply]

Category:Pages using DynamicPageList and Muak Sa-aak orange links[edit]

@Erutuon I notice a category Category:Pages using DynamicPageList that is now appearing on the vast majority of category pages (over 37000 currently). Any idea where this is coming from? Also, I notice that links on pages like Category:Muak Sa-aak lemmas are appearing in orange even though they exist in this language. Any idea what the issue is? Benwing2 (talk) 02:12, 29 July 2021 (UTC)[reply]

@Benwing2: The category was apparently added by this commit or one like it listed in phab:T287380, in preparation for getting rid of DynamicPageLists. See also the Wiktionary:Grease pit/2021/July § DynamicPageList and category pages.
The Muak Sa-aak problem is because of line 147 of MediaWiki:Gadget-OrangeLinks.js, which is meant to deal with {{senseid}} fragments in URLs. Here it's removing -aak from the language name and then on line 155 it's searching for Muak Sa followed by a space and a lowercase character in the category names of the Muak Sa-aak lemma entries, which of course it doesn't find because the next character after Muak Sa is always a hyphen. I knew the code wouldn't work for all languages, but didn't have a good solution. — Eru·tuon 02:44, 29 July 2021 (UTC)[reply]
@Erutuon Thanks for your quick response! IMO that's a really nasty hack in OrangeLinks. Can you not check to see whether the part with the supposed senseID is a canonical language name? I see in MediaWiki:UpdateLanguageNameAndCode.js that it's possible to invoke expandtemplates to expand a template, so you could always use this to check whether a given fragment is a language name; or look directly in Module:languages/code to canonical name or similar source to find the list of languages. If none of these possibilities work for whatever reason, we can hardcode the list of languages with hyphens in them somewhere (and even update it using MediaWiki:UpdateLanguageNameAndCode.js). Benwing2 (talk) 02:59, 29 July 2021 (UTC)[reply]
@Benwing2: I'd like to see if we can use an automatically updated list of language names with hyphens in the gadget file so the server can make the least requests necessary. This gadget already makes lots of requests, so it'd be good not to add another. — Eru·tuon 03:14, 29 July 2021 (UTC)[reply]
Hmm, turns out a senseid doesn't always start with a lowercase letter (search query) as the script assumes. (I didn't read the whole note initially.) So it's not as simple as getting a list of names with a hyphen and a lowercase letter (which is manageable). And all names with hyphens joined by a pipe (for a regex) are 8872 bytes long (demo), which is a little much. It might be possible to compact them somewhat as a regex with something like this only translated to Lua or JavaScript. — Eru·tuon 04:15, 29 July 2021 (UTC)[reply]
A prefix-tree-based regex with all hyphenated language names is not much shorter than a naive regex... 7040 bytes. — Eru·tuon 04:55, 30 July 2021 (UTC)[reply]
The best solution would be to use some exotic character in the sense IDs, instead of a hyphen. Something that would never appear in a language name. Like English~fairly_safe or English‽even_more_unambiguous_but_unfortunately_multibyte. [Edit: Or just a colon, which looks nicer: English:_stylish.] But switching the character would inevitably break some links until all pages were updated. It would have been so much easier if the early Wiktionarians eleven years ago had thought about this. — Eru·tuon 06:45, 29 July 2021 (UTC)[reply]
Does the character only need to be valid in URLs as an anchor (or fragment identifier)? There should be plenty of ASCII characters to go around. — surjection??14:15, 29 July 2021 (UTC)[reply]
How about @, ~, = or +? — surjection??14:28, 29 July 2021 (UTC)[reply]
Sure, any of those would work because they don't occur in language names. The only requirement for an id attribute is that it can't contain whitespace (ref). — Eru·tuon 20:24, 29 July 2021 (UTC)[reply]
(Another point: we'd mostly only have to change anchor under Module:senseid, as most well-behaved modules use that anyway). — surjection??14:21, 29 July 2021 (UTC)[reply]
@Surjection: Looks like Module:links is currently using Module:utilities to make its sense IDs. Probably need to search for and switch all cases of bootlegged sense IDs to use Module:senseid. — Eru·tuon 20:24, 29 July 2021 (UTC)[reply]
You switched Module:utilities to use Module:senseid, and I switched Module:links and Module:script utilities to use Module:senseid directly because Module:utilities doesn't do anything extra. Now require("Module:utilities").make_id is only used in sandbox modules and a sandbox template. — Eru·tuon 21:23, 29 July 2021 (UTC)[reply]
@Erutuon Module:links light_link doesn't use senseid yet, but it doesn't allow a trivial conversion either. — surjection??23:05, 29 July 2021 (UTC)[reply]
@Surjection: Fortunately light_link is only used in sandbox templates according to this search. I suppose I added it because Module:links/sandbox didn't exist yet. It should probably be moved there and deleted from Module:links as a failed experiment. — Eru·tuon 00:58, 30 July 2021 (UTC)[reply]
I went through the remaining cases (all modules that do concatenation on both sides with a '-' and "-") and couldn't find any more cases that manually construct sense ID anchors. — surjection??22:37, 30 July 2021 (UTC)[reply]
I guess to change this we have to choose a new separator and accept that some links are going to be broken for a while. I'm partial to :_ as separator because it suggests what it is. English:_whatever is whatever in the English section. A colon never appears in language names and is never likely to in the future. There might be languages that use a colon as a letter, but they should use a letter character, and the English name is likely to omit the colon anyway. This would lock us in to never having something with :_ or : in place of the language name in the sense IDs, which doesn't seem a problem to me. — Eru·tuon 19:10, 31 July 2021 (UTC)[reply]
Sounds good to me (and certainly better than the existing -) — surjection??20:06, 31 July 2021 (UTC)[reply]
@Erutuon I'm fine with :_ or other separators that won't appear in language names. I think we should bite the bullet and go ahead and make the change. If I'm not mistaken, the worst that will happen is senseid links will break for awhile and you'll end up at the top of the page; we won't get page errors or anything. We can reduce the time by purging the pages that use {{senseid}} and those that use the |id= param or similar params like |id2=; we can search through the latest dump to find the latter set of pages. Benwing2 (talk) 20:32, 31 July 2021 (UTC)[reply]
If there are no objections, I'll change it soon-ish. — surjection??21:14, 7 August 2021 (UTC)[reply]
Aren't there a whole bunch of links from other wikis to individual senseids? --Yair rand (talk) 00:40, 9 August 2021 (UTC)[reply]
I don't recall ever seeing any on English Wikipedia and w:Template:wikt-lang, which I made to link to English Wiktionary, doesn't have a sense ID parameter. (The iwlinks table doesn't contain link fragments, so no help there.) — Eru·tuon 01:11, 9 August 2021 (UTC)[reply]
The only test that I can think of is to expand the wikitext of some pages and then search it for links to Wiktionary sense IDs. Perhaps some pages on English Wikipedia with a particularly large number of Wiktionary links. English Wikisource links to Wiktionary fairly often as well. The lists of pages could be generated from the replica databases. I'm doubtful there are any because, before I came along, English Wikipedia didn't have any modules with Wiktionary language data and without that linking to sense IDs is hard to do. — Eru·tuon 01:47, 9 August 2021 (UTC)[reply]
FWIW I do recall seeing someone add either a senseid or anchor to a Wiktionary page with an edit summary indicating that it was so they could link to it from Wikipedia. I don't know how many Wikipedia pages would link to specific senses (whether via senseid or anchor) but I expect it is >0. I would think someone could search Wikipedia (using the search function? or at least searching a database dump) for links to Wiktionary that used the format senseid links use (e.g. [[:wikt:foobar#LANGUAGENAME-butNowOtherText]], where—especially if searching the database dump—you could ignore the text after # until and unless there was a hyphen before the ]], and manually remove any specific languages whose names have hyphens if they turned out to be numerous), to find examples. I don't know that we should be beholden to not breaking such links, though; we're already breaking links (and can't easily track that) if we move a page to a different spelling, merge or split senses which have such links, reorder numbered etymology sections, etc. - -sche (discuss) 18:24, 10 August 2021 (UTC)[reply]

Lua errors[edit]

Does anyone else see all the 'Lua error: not enough memory. See ...' errors on the a page ? Leasnam (talk) 06:17, 31 July 2021 (UTC)[reply]

@Leasnam: Yep, everybody does around the world, and on several other pages in CAT:E. Unfortunately there's no immediate solution that we're aware of. See Wiktionary:Lua memory errors. — Eru·tuon 06:56, 31 July 2021 (UTC)[reply]