Wiktionary:Grease pit

Definition from Wiktionary, the free dictionary
(Redirected from Wiktionary:GP)
Jump to navigation Jump to search

Wiktionary > Discussion rooms > Grease pit

Welcome to the Grease pit!

This is an area to complement the Beer parlour and Tea room. Its purpose is specifically for discussing the future development of the English Wiktionary, both as a dictionary and as a website.

The Grease pit is a place to discuss technical issues such as templates, Lua modules, CSS, JavaScript, the MediaWiki software, extensions to it, the toolserver, etc. It is also a place to think in non-technical ways about how to make the best free and open online dictionary of "all words in all languages".

Others have understood this page to explain the "how" of things, while the Beer parlour addresses the "why".

Permanent notice

  • Tips and tricks about customization or personalization of CSS and JS files are listed at WT:CUSTOM.
  • Other tips and tricks are at WT:TAT.
  • Find information and helpful links about modules, Lua in general, and the Scribunto extension at WT:LUA.
  • Everyone is encouraged to expand both pages, or to come up with more such stuff. Other known pages with "tips-n-tricks" are to be listed here as well.

Grease pit archives edit
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018


Contents

February 2019

support for multiple transliterations in templates[edit]

The Hebrew entries under Category:usex with multiple transliterations feature transliteration based on modern Hebrew alongside scientific transliteration. A similiar potential need for multiple transliterations was presented at Template talk:ja-usex#marking usexes as Classical?. Perhaps templates like {{usex}} and {{link}} need |tr2= and corresponding "qualifier" parameters. {{head}} already has such a parameter, althought it seems to be for simultaneous use with |head2=. Would this be sensible? —Suzukaze-c 07:34, 1 February 2019 (UTC)

Everything in our current module infrastructure is built on the assumption that every language has one unique way to transliterate it, and we generally do not include multiple possible transliteration schemes side by side. Instead, we've always chosen one particular transliteration scheme as our standard and made everything adhere to it. I'm not sure that it's necessary to have multiple transliterations. If the goal is to reflect differences in pronunciation, then remember that the main goal of transliteration on Wiktionary is to give a Latin-script version of what is written, not to indicate how to pronounce it. Pronunciation details have always been put in the pronunciation section. —Rua (mew) 18:34, 2 February 2019 (UTC)
WT:HE TR describes two types of romanization and allows the usage of both 🤷 —Suzukaze-c 22:50, 4 February 2019 (UTC)

TemplateData and Module:parameters[edit]

I just noticed that some of our documentation pages use TemplateData, but we also have our own system for programmatically specifying template parameters in the form of Module:parameters and the tables of data that are given to it. Is there perhaps a way to combine these two, by either generating the Module:parameters data from TemplateData or the reverse? —Rua (mew) 18:29, 2 February 2019 (UTC)

That would be great, and maybe use it to generate the documentation as well, which needs to be kept in sync. TemplateData sort of already does that, but it's not very readable (one big table). – Jberkel 18:37, 2 February 2019 (UTC)

Orange display of two-word headings at Watchlist page[edit]

The two-word linked headings displayed for items on my watchlist appear orange (eg, Rosa#Derived_terms). It's distracting and annoying. I use the orange display often to determine quickly whether an entry has a Translingual section. I hope there is a simple solution to this small problem. DCDuring (talk) 22:01, 4 February 2019 (UTC)

@DCDuring: Fixed, I think. Yep, turns out the solution was simple. — Eru·tuon 22:47, 4 February 2019 (UTC)
Thanks. Glad it was simple. DCDuring (talk) 22:49, 4 February 2019 (UTC)
It hasn't yet come through to my watchlist page. I'll let you know if the problem persists. DCDuring (talk) 22:51, 4 February 2019 (UTC)
OK DCDuring (talk) 22:52, 4 February 2019 (UTC)
Yeah, seems to take a couple of minutes for the new version of the gadget to show up. — Eru·tuon 23:35, 4 February 2019 (UTC)

Category:English terms with quotations[edit]

is full of entries in other languages. Ultimateria (talk) 03:00, 5 February 2019 (UTC)

They have various citations templates that don't use "lang=". DCDuring (talk) 09:16, 5 February 2019 (UTC)
As a (temporary) workaround, maybe the templates could be changed to not categorize unless a lang parameter was explicitly set? And then do a bot run in a second step. – Jberkel 09:35, 5 February 2019 (UTC)
I would support this. I want us to populate the categories in Category:Usage examples with the translation missing by language. (Why the awkward wording, by the way??) Ultimateria (talk) 22:31, 7 February 2019 (UTC)
@Sgconlaw: In reference to this edit, it's best not to have {{quote-book}} and others default to English. If English is the default, it is impossible to tell whether the quotation really is English (in which case the entry should be in Category:English terms with quotations), or whether someone has simply failed to supply the language code for a non-English quotation (in which case it shouldn't). So |lang=en should be required. — Eru·tuon 05:40, 16 February 2019 (UTC)
Ah, OK. This was a change that occurred when @Benwing2 updated {{quote-meta/source}} recently. I noticed that any quotation template that did not have an explicit language designation now defaults to English. I assumed that this was the desired operation of the templates. — SGconlaw (talk) 05:46, 16 February 2019 (UTC)
@Sgconlaw Hmmm. I didn't realize that missing lang= meant anything other than English. I think I can change this so that it categorizes differently. The reason I added the default was that the underlying code that displays a quote (see Module:usex/templates) requires a language code. Let me see if I can make the default 'und', and have it conditionalize on this to determine whether to add to some tracking category. Note that Category:Usage examples with the translation missing by language is not the right category; as it says, it's for usage examples that are in foreign languages and missing the translation, rather than instances of {{quote-*}} that are missing the language code. Benwing2 (talk) 05:53, 16 February 2019 (UTC)
Thanks, @Benwing2. — SGconlaw (talk) 06:02, 16 February 2019 (UTC)
@Sgconlaw, Ultimateria, DCDuring, Jberkel, Erutuon I changed all the templates to default to 'und' instead of 'en'. This changes the categorization to put them into Category:Undetermined terms with quotations. We can/should run a bot to fix them all. Benwing2 (talk) 06:28, 16 February 2019 (UTC)
@Benwing2: thanks. However, this creates an additional issue. In the past, if a quotation template did not have any language designation, it just wasn't categorized. Now, though, any such quotation template gets put into Category:Undetermined terms with quotations, which means that entries get put there even if there are already some templates with explicit language designations in the same entry. For example, disport which has some quotations with explicit language designations is correctly placed in Category:English terms with quotations. However, because of other quotations without language designations it also gets placed in Category:Undetermined terms with quotations, which in my view is not very desirable. To avoid this, all quotation templates would have to have an explicit language designation. Do we wish to compel editors to do this, or can we get around it in some way? Views welcome. — SGconlaw (talk) 06:41, 16 February 2019 (UTC)
Actually, one easy solution would simply be to make Category:Undetermined terms with quotations a hidden category. Shall we do this? — SGconlaw (talk) 06:43, 16 February 2019 (UTC)
There are some genuine undetermined terms that might need to use Category:Undetermined terms with quotations, so it's not good to throw pages that need cleanup in that category. It would be better to track the missing |lang= parameter with a separate cleanup category, something like "Quotation templates missing lang parameter" or "Quotation templates without language". Not sure how to achieve this. — Eru·tuon 07:02, 16 February 2019 (UTC)
I'd have to hack the module code to allow a language not to be passed when called from {{quote-*}}. This is possible but I think a better solution is to (a) make Category:Undetermined terms with quotations a hidden category, and (b) run a bot to fix all cases of a missing language parameter. This is easy enough to do based on the language section the quote is within. The only potential issue is if a term in a given language for some reason quotes some text in a different language and doesn't indicate the language. However, this seems pretty unlikely to me, enough so that maybe we don't have to worry about it. Any objections to me running the bot script? Benwing2 (talk) 07:53, 16 February 2019 (UTC)
@Erutuon The number of "genuine undetermined terms" with quotations is < 10, possibly = 0. Benwing2 (talk) 07:54, 16 February 2019 (UTC)
@Benwing2: Yeah, there don't seem to be any at all (search query: incategory:"Undetermined lemmas" incategory:"Undetermined terms with quotations"). Still, that's what the category is for. — Eru·tuon 09:21, 16 February 2019 (UTC)
I think the main category of quotes in languages other than the section they're in are in Etymology sections, which my bot can skip. Benwing2 (talk) 09:59, 16 February 2019 (UTC)
I haven't thought this through and it can't be too common, but Translingual L2 sections are a problem, because, in principle, the taxonomic ones can be attested by any language (except "Translingual"). I also wonder about Translingual CJKV entries. DCDuring (talk) 19:27, 16 February 2019 (UTC)
@DCDuring Hmmm. Are you referring to translingual terms under the "Translingual" section or under specific languages? In the former case it's easy enough to ignore quotes in such sections. If you're talking about the latter, can you point me to any examples? Benwing2 (talk) 20:11, 16 February 2019 (UTC)
I was thinking about terms under the Translingual header. Often Translingual entries are not considered when automated (or even semi-automated changes) are implemented and then are not cleaned up afterward, leaving me with lots of unassisted cleanup. DCDuring (talk) 20:39, 16 February 2019 (UTC)
@DCDuring OK, I looked through all the Translingual quote-* entries with missing lang=. Almost all of them were in English; there were 5 pages (ipso jure, tat tvam asi, Unsupported titles/Vertical line, , x) with quotes in other languages, and I added the appropriate languages along with |nocat=1. For the remaining, I'll have my bot add |lang=en|nocat=1. I also added an extra check for |lang=en along with a translation= or t= param, which shouldn't happen. This caught (1) a few places where the translation parameter was being misused to add a footer note, cf. abstemious, cryptodepression (for which I added support for |footer= to specify such a thing); (2) a few places with a mixed English-and-some-other-language quote (cf. dude, with a quote partially in French and the translation used to translate the French portion); and (3) a few places with a legitimate translation out of an English-like language, mostly out of Middle English (cf. arrange, ashame), Scots (cf. daft) or some sort of weird English-like language: cf. ben, which has a quote from 1611 written in obsolete British thieves' cant: "A gage of ben Rom-bouse, / In a bousing-ken of Rom-vile, / Is benar than a Caster, / Pecke, pennam, lay, or popler, / Which we mill in deuse a vile." translated as "A pot of good wine, / In a pub of London, / Is better than a cloak, / Meat, bread, milk, or porridge, / Which we steal in the countryside.". Middle English and Scots examples can be moved under the appropriate header, but I have no idea what to do with the thieves' cant example; perhaps we should just leave it and keep the translation? Benwing2 (talk) 21:16, 17 February 2019 (UTC)
Thanks for addressing the exceptional cases. In the thieves' cant cite, the "translation" isn't a translation, it is a paraphrase. I don't think ot belongs in the template. It could be hard formatted to give the same appearance. DCDuring (talk) 05:18, 18 February 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @DCDuring I fixed that quote to use footer=. I also ran my bot on the first 100 or so entries that were in CAT:Undetermined terms with quotations and spot-checked the results. The only weirdness comes from uncommon languages like Nauruan and Indo-Portuguese (both on page a) where the quoted text was indeed in that language but the work as a whole was written in some other language. I manually fixed those two cases to use worklang= to indicate the language of the work as a whole, and fixed {{quote-meta/source}} to correctly display both languages, so that it e.g. says "(in German, quote in Indo-Portuguese)" instead of just "(in Indo-Portuguese)", which suggests the book might be in Indo-Portuguese instead of German. There's no way of handling these automatically and they're fairly rare, so I'm inclined to just let the bot do its thing. Note that I also searched through all 68000+ entries in for quotes identified as English but containing a circumflexed or tilded letter, which is a possible indication that the language is wrong. In fact, every single one was in English; in a couple of cases a quote from an additional language was embedded inside the English quote, but in the vast majority of cases the accented character came either from a foreign name in the text or from a naturalized foreign word written as if it were in the source language (rôle, papier-mâché, jalapeño, etc.). So I'm going to run the bot starting tomorrow unless someone sees a good reason not to. Benwing2 (talk) 07:47, 18 February 2019 (UTC)

I appreciate the additional effort you have undertaken to prevent undesirable consequences for somewhat unusual cases. I wonder whether we could have exceptional case flags in templates and elsewhere where templates or formatting were used in a way that departed from the norm to reduce or even ultimately eliminate the need to creatively locate exceptional cases. I'd be willing to insert such flags or (invisible) notice templates when I notice what looks like a likely problem and to put in time to try to normalize some such cases (probably only the easy ones). DCDuring (talk) 15:05, 18 February 2019 (UTC)
@DCDuring Can you clarify what you mean exactly? Benwing2 (talk) 16:10, 18 February 2019 (UTC)
Probably not exactly by the standards you would require.
Consider a named parameter for templates, "nonstd". The presence of such a parameter could trigger categorization into one or more maintenance categories and serve as a simple flag indicating that a bot should avoid altering the content of such template or categorizing. Obviously the flag could be ignored, but, once such a parameter were deployed, it might be desirable to allow less well-designed templates to run relying on such a parameter to avoid likely problems. I would want such a parameter to be set only where experienced editors found the underlying problem excessively hard, time-consuming, or controversial to resolve.
Consider also a template {{nonstd}} that indicated some departure from normal practice in wikitext or in formatting. Alternatively, consider adopting the practice of having bots bypass any L2 section that had {{rfc}}. The same principle of use would apply as for the named parameter.
I could imagine the named parameter (or parm 1 for the template) being set to specific values for specific widespread but not readily resolved problems. DCDuring (talk) 16:30, 18 February 2019 (UTC)
@DCDuring I could definitely implement e.g. the idea of bypassing L2 sections that have {{rfc}} in them. But I'm not sure I understand the use case for this or for {{nonstd}}. Can you give some examples? Benwing2 (talk) 02:39, 19 February 2019 (UTC)
If my suggestion doesn't strike a chord with you, it may not be worth pursuing. I suppose that I seem to keep seeing certain entries that recur in my occasional efforts to clean things up. They keep coming back because I can't figure out what to do with them. They seem to follow the letter of our rules, but are chock full of odd uses of templates, special characters, and other bits of strange. I hypothesize that these would be more likely to be problematic for bots as well. It may be that one can systematically identify L2 sections that have features likely to cause trouble. OTOH, perhaps there is so little commonality in the operation of bots that there is no identifiable class of entry that would give trouble to multiple bots. DCDuring (talk) 03:19, 19 February 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Sgconlaw, Ultimateria, DCDuring, Jberkel, Erutuon I've done some more work in this area. I think things are pretty good now:

  1. If the lang= parameter is omitted, pages go into Category:Quotations with missing lang parameter instead of Category:Undetermined terms with quotations. The latter will only have entries if lang=und is explicitly given.
  2. I added termlang= to specify the language of the term, if different from lang= (the language of the quote). This should handle most of the situations that would otherwise need to use nocat=1 (e.g. an English quotation illustrating a Translingual term; a Middle English or Scots quotation illustrating an English term; a quote from a translation, as in Citations:Undecimber, illustrated using translations of Ancient Greek and Czech quotations; a quotation from a translating dictionary, as in nác, an uncommon Vietnamese term whose quote comes from a Vietnamese-French-Latin dictionary; etc.). If nocat=1 is used, the term is added to Category:Quotations using nocat parameter.
  3. I have rewritten cases where there's a "translation" of an English-language quotation (e.g. the quotation is in thieves' cant or Early Modern English) by putting the "translation" in the footer= parameter preceded by "[paraphrase]". Visually this looks the same as if translation= were used, except for the "[paraphrase]" text.
  4. I modified {{quote-meta/source/langhandler}} to add maintenance language "Please specify the language of the quote" if lang= is omitted. (Eventually I want to make this an error.)
  5. I'm running my bot on the remaining cases where lang= is omitted.

Benwing2 (talk) 21:12, 24 February 2019 (UTC)

Wow. That seems to address all stated concerns. Thanks. DCDuring (talk) 21:41, 24 February 2019 (UTC)
I don't think I've ever seen any use here of footer=. DCDuring (talk) 21:43, 24 February 2019 (UTC)
@DCDuring I added footer=, for this use and others in {{ux}} and such where source= was being abused. Benwing2 (talk) 21:49, 24 February 2019 (UTC)
I found some 27 instances of entries using {{quote-meta}} and "source=", only two of which had "[paraphrase]". I suppose it's not highly likely that there will be a conflict. In any event that would seem to be a bridge for another day. DCDuring (talk) 21:54, 24 February 2019 (UTC)
@DCDuring How did you find those cases? Benwing2 (talk) 22:45, 24 February 2019 (UTC)
Basic searchbox: 'hastemplate:"quote-meta" insource:/source\=/'. This probably catches some entries that are not relevant, but I have trouble with complex regexes. DCDuring (talk) 23:01, 24 February 2019 (UTC)
@DCDuring That brings up a ton of results but they're all pretty much cases where source= is inside a Google Books URL. One or two are R:... templates that legitimately have a source= argument. I didn't see any that are trying to pass source= to {{quote-book}} or similar. (This param isn't defined for quote-* anyway, but only for {{ux}}, {{uxi}} and {{quote}}.) Benwing2 (talk) 02:26, 25 February 2019 (UTC)
Oh. I read too much into finding the two instances of "[paraphrase]". I should have looked at the instances more carefully, but such care is not in my nature. DCDuring (talk) 02:37, 25 February 2019 (UTC)
I'm sorry. What I had done was look for 'hastemplate:"quote-meta" insource:/footer\=/'. That is, I was looking for cases where there was a pre-existing use of "footer=" that might conflict with the use of it for paraphrases. The likelihood that there would be a conflict is not too high, but the quotations that use preexisting 'footer=' strike me as being much more likely than the average cite to also require a paraphrase. DCDuring (talk) 03:52, 25 February 2019 (UTC)
Nice job, @Benwing2! Thanks to the new categorization ("needing translation") a dozen or more quotes I've added over the years got translated recently, often by IPs. – Jberkel 23:16, 28 February 2019 (UTC)

Searching for transliterations[edit]

I was wondering how feasible it would be to have search match transliterations in {{head}}, placing them higher in search results. --{{victar|talk}} 22:43, 5 February 2019 (UTC)

Weird reference formatting at leuk#Dutch[edit]

In the References section, the reference is looking weird, with a line break in between that shouldn't be there. Does anyone know how to fix that? —Rua (mew) 14:19, 6 February 2019 (UTC)

Use cite-web, not quote-web. DTLHS (talk) 14:52, 6 February 2019 (UTC)

Inheritance of "Terms derived from PIE root"-style categories[edit]

I was wondering if it would be possible to make a bot so categories like all the Category xxxx terms derived from PIE root *yyy- ones (e.g. Category:English terms derived from the PIE root *bʰer-) are automatically added to derivative words (which in this case means pasting a {{PIE root|xx|yyy-}} template with the language code changed).

You might be able to do it using the words that appear in an ===Etymology (#)=== section in a {{der}}, {{bor}}, {{inh}}, {{compound}}, {{prefix}} or {{suffix}} template. Ideally this would also work for certain forumlae that are often found in place of those templates like From {{etyl|xx|zz}} {{m|xx|Word}} but that sounds tricky. I think false positives should be uncommon since other words mentioned in the etymology section are usually there for comparison (and use {{m}}, {{cog}}, etc). I can't think of a case which would have the etymology of a not-directly-related word but I suppose it's possible.

I had been thinking an easier way would be with the words listed in ====Descendants==== but then I stumbled on Latin refero#Descendants which contains words like "relate" from an irregular form of the same verb that doesn't contain the relevant root (should they be over at relatus instead maybe?) so I don't know if false positives would be too common. However, the general approach of using the descendants and derived terms sections is still usable if you start with the root pages themselves (which are meticulous about that sort of thing) and work upwards until you reach a non-Proto-Language page. For example, Reconstruction:Proto-Indo-European/Hreh₁dʰ- lists Reconstruction:Proto-Germanic/rēdaną which lists read#English where a bot would add {{PIE root|en|*Hreh₁dʰ-}}. ─ ReconditeRodent « talk · contribs » 04:00, 8 February 2019 (UTC)

Category:Kombio language[edit]

I just created this category while passing through the wanted category list and noticed there is some data (at the least, the script which is Latin) I'm not familiar with the Lua stuff that is needed to add this data so can someone please show me how it's done? User: PalkiaX50 talk to meh 16:33, 8 February 2019 (UTC)

Click "Edit language data" and find the matching code block. DTLHS (talk) 16:51, 8 February 2019 (UTC)

Misspelled headers[edit]

I discovered a lot of non-language mainspace headers that need fixing after compiling a list. (I was curious just how complete the list of non-language headers in the OrangeLinks gadget was.) I cleaned up some of the less common ones by hand, but it's pretty tedious and probably should be done by bot. I could start a bot, but does anyone already do this kind of thing? — Eru·tuon 00:57, 9 February 2019 (UTC)

At one point I did some of this but I found it incredibly tedious and nobody else seemed to be interested. Furthermore it's an unending task that will never be complete unless we can automatically forbid nonstandard headers from being saved. I look at level 2 headers monthly but going beyond that is madness. DTLHS (talk) 01:00, 9 February 2019 (UTC)
Ullmann used to do that, I think monthly. DCDuring (talk) 02:27, 9 February 2019 (UTC)
Yes, when there were 100,000 entries and 5 users :). DTLHS (talk) 02:30, 9 February 2019 (UTC)
He did it until about a year before he died. DCDuring (talk) 02:37, 9 February 2019 (UTC)
There were 1.5MM pages in December 2009, about a quarter of what we had December 2018. He had it easy. DCDuring (talk) 14:31, 9 February 2019 (UTC)
This doesn't sound very encouraging, but maybe I'll try. I do at least have a fast header-gathering program that could be modified to help with the task, like to collect a list of pages with misspelled headers. — Eru·tuon 01:52, 10 February 2019 (UTC)
I did once make a bot script to do this, but I'm not really interested in doing it regularly. —Rua (mew) 16:49, 10 February 2019 (UTC)
I have run User:TheDaveRoss/Rare_headers on a couple of occasions to look for entries to clean up, but I haven't done it lately, and I didn't make a bot since the variety of mistakes is broad and most required some editorial judgement. If you come up with a bot which can reliably determine the best course of action I am sure it would be busy. - TheDaveRoss 21:04, 11 February 2019 (UTC)
I just created User:Erutuon/mainspace headers/possibly incorrect, using a whitelist method. It includes all headers besides language headers and a list of headers that I compiled by going through the non-language headers by hand and finding anything that wasn't an obvious misspelling and seemed to have a distinct purpose. Thus it includes some common but obviously incorrect headers like "Alternate forms" and "See Also", and excludes some rare headers that are or might be correct, like "Ambiposition" and "Converb".
There are some headers that a bot could easily correct, like miscapitalizations ("See Also" or "synonyms"), and obvious misspellings like "Adejctive" or "Pronounciation" (of which a list would have to be compiled). Maybe it could correct singulars to plurals ("Synonym" → "Synonyms", "Reference" → "References") or vice-versa ("Adjectives" → "Adjective"), but there would have to be some way to exclude the entries in which the less common version is intended ("Adjectives" in ز ر ع). At the very least it would be nice to have the easier cases corrected. — Eru·tuon 07:48, 12 February 2019 (UTC)
In the Arabic case you gave, an Adjectives header is used inside Derived terms, which is a section that shouldn't have any subsections. That means the header isn't a misspelling, but rather a misuse of a header altogether. —Rua (mew) 16:42, 13 February 2019 (UTC)
That may be, but the point is that in such a case the appropriate action for a bot to take isn't clear to me, unlike in cases where "Adjectives" is a part-of-speech header and should be replaced with "Adjective". — Eru·tuon 19:40, 13 February 2019 (UTC)
It seems clear to me. If the bot finds something that's wrong, but doesn't have a rule to fix it, then notify the bot owner that it needs human attention. —Rua (mew) 20:34, 19 February 2019 (UTC)
Which human? There are tens of thousands of such errors and I have compiled lists of them before. There is not enough attention to fix them at a rate greater than at which they are created. DTLHS (talk) 23:17, 19 February 2019 (UTC)
An abuse filter could, theoretically, prevent the addition of any header not in an approved list of headers, couldn't it? But given that the list of approved headers would be rather long, we should investigate how "expensive" such a filter would be. But if the warning message directed people to a list of approved headers, it could cut down on a lot errors, like "Alternate form". Yes, this is an unending problem. If the code for the script to find nonstandard headers is posted somewhere on-wiki, then anyone interested could run it periodically and work on entries as they had time. (Is there a WMFlabs tool to display all headers in use on a wiki, which could be used to find and work on nonstandard ones?) - -sche (discuss) 00:06, 20 February 2019 (UTC)
@-sche: I don't know of a WMFLabs tool, but see the list of all non-language mainspace headers from the latest dump. The problem with the scripts that I used to generate the list of all headers and the list of possibly incorrect headers is that they are C programs, so people would have to gather dependencies and compile them; they would be easier to use (though probably slower and more memory-hungry) if they were translated into some scripting language like Python or Lua. I can publish them on Github if I get my files organized. [Edit: Repository started.]
My list of possibly correct headers that I used to generate the list of possibly incorrect headers is 227 entries long and 2587 bytes, so fairly long. The AbuseFilter format seems to have support for arrays, but a 227-item array would be pretty big and probably not very efficient to search. — Eru·tuon 00:31, 20 February 2019 (UTC)

Category:Xiongnu[edit]

I noticed this on the WantedCategories list and saw that {{autocat}} is adding it to Category:Terms derived from Xiongnu...the category should be called Category:Xiongnu language but I don't know how to fix it. Can someone show me? User: PalkiaX50 talk to meh 16:34, 10 February 2019 (UTC)

This is because Xiongnu is currently classified as an etymology language in our module system. (That is, its data is found in Module:etymology languages/data and it doesn't get to have entries of its own.) Etymology languages have a category name that is the name of the language without "language" added to it, and Module:category tree/derived cat links to this category name. So for Xiongnu to have the category "Xiongnu language" rather than just "Xiongnu", it would have to be promoted to a full language (in one of the submodules of Module:languages). I'm not qualified to judge whether that's appropriate. — Eru·tuon 20:41, 10 February 2019 (UTC)

Category:User de-AT[edit]

Remove the category Category:German language as Category:User de, Category:User en aren't in a language category too, and add it to Category:User de as de-AT is a subform of de like en-US and en-AU are subforms of en. --Brown*Toad (talk) 06:03, 11 February 2019 (UTC)

Good catch. Done. - -sche (discuss) 08:00, 12 February 2019 (UTC)

addToToolbar not working[edit]

$('#wpTextbox1').wikiEditor('addToToolbar', { ... no longer works for me. Did something change? --{{victar|talk}} 18:11, 11 February 2019 (UTC)

If no one here knows anything, you could try posting on mw:Extension talk:WikiEditor/Toolbar customization where this function is described. MediaWiki:Gadget-DeveloperEditorTweaks.js seems to still be working and it uses the function. — Eru·tuon 19:55, 11 February 2019 (UTC)
Thanks, I'll post there, but it looks like that module hasn't changed since December. --{{victar|talk}} 20:25, 11 February 2019 (UTC)
It looks like ext.wikiEditor.toolbar was changed to ext.wikiEditor. I wonder how many other modules broke because of this change. --{{victar|talk}} 20:40, 11 February 2019 (UTC)
Oh, then this must be the same as the bug at User talk:Dixtosa/AjaxEdit.js § Gadgetification. [Edit: link to relevant change] Yes, probably some other scripts will be broken. It is a good idea to look at the deprecation notices in the browser console and update the scripts to use the non-deprecated modules before the deprecated modules are removed. — Eru·tuon 21:14, 11 February 2019 (UTC)
Looks like no other scripts use ext.wikiEditor.toolbar (no search results), so that's good. — Eru·tuon 21:22, 11 February 2019 (UTC)
Thanks for checking. --{{victar|talk}} 21:37, 11 February 2019 (UTC)

need help with Template:quote-hansard[edit]

I'm sure I must be missing something, but I'm at a loss as to how the column parameter works in Template:quote-hansard. In particular, it doesn't seem to show the number given, as can be seen in the example on the documentation page, but more importantly to me, I would like to know how to suppress it altogether, since it isn't marked as mandatory, yet it is displayed anyhow. I'd appreciate some insight. --188.143.99.17 08:39, 13 February 2019 (UTC)

If you add "|columns=" (text between but excluding the quotes), it will suppress it. Panda10 (talk) 20:04, 13 February 2019 (UTC)
Thanks, works like a charm. --188.143.99.17 20:46, 13 February 2019 (UTC)
@Panda10: there was a typo in {{quote-hansard}}. You shouldn't have to use the workaround any more. — SGconlaw (talk) 06:32, 16 February 2019 (UTC)

Surname request[edit]

Can someone please run a script and create a list of all members of Category:English surnames ending in -ian or -yan? --Vahag (talk) 09:13, 13 February 2019 (UTC)

https://tools.wmflabs.org/dixtosa/index.php?suffix=ian&category=English+surnames
https://tools.wmflabs.org/dixtosa/index.php?suffix=yan&category=English+surnames. Dixtosa (talk) 09:40, 13 February 2019 (UTC)
Thanks. --Vahag (talk) 06:28, 14 February 2019 (UTC)

bon voyage[edit]

счастли́вого пути́ (sčastlívovo putí) Is it only me, or in this page the transliteration of the first word in Russian is ending in -vovo, instead of -vogo? Sobreira ►〓 (parlez) 19:17, 13 February 2019 (UTC)

It's because the transliteration scheme used here is actually a mix of transliteration and transcription. Per utramque cavernam 19:20, 13 February 2019 (UTC)
Oh, @Per utramque cavernam, don't tell me that there are dialects where they say -/vovo/! Sobreira ►〓 (parlez) 10:10, 14 February 2019 (UTC)
@Sobreira: /v/ is the standard pronunciation of <г> in the endings -ого and -его. Per utramque cavernam 20:20, 16 February 2019 (UTC)

Quote templates like {{quote-book|...}}[edit]

The |nocat= parameter needs to be fixed as it shall not add any category instead of adding the incorrect Category:Undetermined terms with quotations. See for example avoid like the plague (term is English; quote in etymology section is Latin) and nác (term is Vietnamese; quote is in Portuguese and Latin). --Hamator (talk) 09:34, 16 February 2019 (UTC)

See discussion above. Benwing2 (talk) 09:59, 16 February 2019 (UTC)
@Hamator I have added support for nocat=. Benwing2 (talk) 01:01, 17 February 2019 (UTC)
I have also added support for specifying multiple languages in lang= and worklang=, so that you can e.g. say |lang=vi,pt,la. Benwing2 (talk) 01:05, 17 February 2019 (UTC)

Automatic multicolumn layout for lists[edit]

I think there used to be an easy way of automatically formatting long lists into multiple columns, but I don't see it in the page on layout. I thought it was a semicolon in front of each entry, but that just seems to make the entry bold.

I know that there are the templates top3, mid3, etc, but these require laborious counting of entries and updating when new entries are added.

Do this formatting option still exist? If not, can it be created? — Paul G (talk) 07:36, 18 February 2019 (UTC)

Have to say I've never heard of this, nor believe there is any easy way of creating such a formatting option as it would require changing the way wikitext works. I would suggest just using {{der3}}, {{rel3}}, etc., depending on which section in an entry you are creating the list in, as these templates automatically balance the columns. — SGconlaw (talk) 08:33, 18 February 2019 (UTC)
Maybe those are what I was thinking of, then. Thanks. — Paul G (talk) 05:54, 21 February 2019 (UTC)

Cirrus section search[edit]

Can you search for content in specific sections? I'd like to run a query in the form hastemplate:quote-video insection:Etymology. According to the docs this is not possible (maybe with insource: and a regexp, but that's slow and messy). If more editors find this useful I can open a phabricator ticket. – Jberkel 09:51, 18 February 2019 (UTC)

Triple brace abuse filter[edit]

My bot has run into this abuse filter a few times. I'm wondering if the filter is legit or should be changed. An example is conull:

#* {{quote-journal|year=2016|date=|author=Anton Bernshteyn|title=Measurable versions of the Lovász Local Lemma and measurable graph  colorings|journal=arXiv|url=http://arxiv.org/abs/1604.07349|doi=|volume=|issue=|pages=
|passage=Moreover, if the combinatorial structure on <math>X</math> is "induced" by the <math>[0;1]</math>-shift action of a countable group <math>\Gamma</math>, then, even without any local finiteness assumptions, there is a Borel choice for <math>f</math> which satisfies the constraints on an invariant '''conull''' set (i.e., with <math>{{{1}}}</math>). }}

Maybe it should be changed to not complain about triple braces inside of math tags? I'm not sure because I don't know exactly what the triple braces inside of a math tag do. Benwing2 (talk) 15:36, 19 February 2019 (UTC)

Triple braces inside math tags are an error. This mainly comes from Visviva's old tracking pages. They need to be replaced with the actual math from what they're quoting. DTLHS (talk) 15:38, 19 February 2019 (UTC)

Question about Finnish noun/adjective accelerated creation links[edit]

Hi, I'm wondering is there anyone around who could do what it takes to update these links? What I'm requesting is to make the links for nominative and accusative plurals generate an entry that defines the word as both "nominative plural of x" and "accusative plural of x", because those two forms are always the same. User: The Ice Mage talk to meh 14:35, 20 February 2019 (UTC)

@The Ice Mage: I've made the necessary change in Module:fi-nominals. Pinging @Surjection to confirm that this is correct, because I don't know a great deal about Finnish morphology. — Eru·tuon 22:58, 21 February 2019 (UTC)
@Erutuon It causes ACCEL to error because there are no rules for the accusative forms. When they are added, only the accusative plural gets mentioned when creating plurals, which is not correct. — surjection?〉 10:11, 22 February 2019 (UTC)
@Surjection: Oh, I see. The gadget only uses the last "form-of" acceleration parameter when there are two around the same link because it uses an object to store acceleration parameters (one value per key). I should have thought of that. Could use the code "plural-nominative-accusative-form-of" and modify Module:accel/fi to generate the correct output for it. — Eru·tuon 10:24, 22 February 2019 (UTC)
It is a possibility, but how to make the module return multiple defs? — surjection?〉 10:27, 22 February 2019 (UTC)
Oh, that's right. But fortunately I found a solution: just change one of the links in the table to "plural-accusative-form-of". — Eru·tuon 11:10, 22 February 2019 (UTC)
@Erutuon: It works, thank you! :) User: The Ice Mage talk to meh 12:56, 22 February 2019 (UTC)

greenification of Italian forms[edit]

The feminine and plural forms of Spanish nouns and adjectives show up as green in the headwords, but the forms of Italian ones are red. Would it be possible for someone to greenify the Italian? SemperBlotto (talk) 10:06, 21 February 2019 (UTC)

{{auto cat}} error[edit]

I get an error when I try to add {{auto cat}} to Category:en:Towns in Hungary and Category:Towns in Hungary. Error: "The automatically-generated contents of this category has errors. The label given to the {{topic cat}} template is not valid." Can someone please help? Thanks. Panda10 (talk) 15:00, 21 February 2019 (UTC)

After a bit of poking around, I figured out that you have to update "Module:category tree/topic cat/data/Places". It's not very self-evident; I'm not sure how it can be made more obvious to editors. — SGconlaw (talk) 16:05, 21 February 2019 (UTC)
@Sgconlaw: Thank you for the information and for all the corrections/updates you made! Panda10 (talk) 16:51, 21 February 2019 (UTC)

Question about {{place}}[edit]

I'd like to add Hungarian towns to Category:en:Towns in Hungary by using {{place|en|town|c/Hungary}} but it's not working. The template places the town into Category:en:Towns. I wonder if Module:place/data has to be edited. If yes, can someone please help with the update? I'd rather not create system-wide problems with incorrect additions. Thanks. Panda10 (talk) 20:16, 21 February 2019 (UTC)

Could you check what's wrong with Szentendre? It should insert the category:en:Towns in Hungary category. Adam78 (talk) 10:04, 22 February 2019 (UTC)

Use {{c}} or {{C}}, e.g. {{c|en|Towns in Hungary}}.DonnanZ (talk) 20:09, 22 February 2019 (UTC)
That would work but {{place|en|town|c/Hungary}} is supposed to do that automatically. Except it doesn't. How come it works for Canadian towns? See Bon Accord. Panda10 (talk) 22:14, 22 February 2019 (UTC)
@Panda10: It works if you just remove the link from "Hungary": {{place|en|town|c/Hungary}}Eru·tuon 22:18, 22 February 2019 (UTC)
Thank you for the hint, but it still doesn't seem to work. I tried ?action=purge as well, both on the entry page and the category page, but to no avail. I tried using another browser too, similarly to no effect. Can there be any hope? Adam78 (talk) 22:47, 22 February 2019 (UTC)
@Adam78, Panda10: This edit activates the "Towns in Hungary" category. — Eru·tuon 23:38, 22 February 2019 (UTC)
Thank you so much! You worked wonders. :) Adam78 (talk) 00:01, 23 February 2019 (UTC)

Interface admins[edit]

Can I nominate @Surjection and @JohnC5 as interface admins? Two very helpful people in this realm. @Stephen G. Brown --{{victar|talk}} 16:55, 22 February 2019 (UTC)

Category:Categories with invalid label[edit]

I think there has been a change somewhere; some language categories are tagged with this and others aren't. What's the story? Shouldn't {{auto cat}} be used any more? For example: Category:Norwegian Nynorsk terms derived from Latin. DonnanZ (talk) 20:01, 22 February 2019 (UTC)

@Donnanz: Fixed and added an example to the testcases to ensure this doesn't happen again. — Eru·tuon 21:45, 22 February 2019 (UTC)
@Erutuon: Ah, wonderful, the number of entries in the category is slowly dropping as I write this. I noticed it when I added Gothic for Bokmål and Nynorsk, that is fixed too. Cheers. DonnanZ (talk) 22:22, 22 February 2019 (UTC)
@Erutuon: There could be some in there with incorrect labels anyway, like Category:Greek words prefix with απ- (should be prefixed?). DonnanZ (talk) 01:02, 23 February 2019 (UTC)
Well, that got a short shrift. There's still a lot of stuff stuck in the category which may need proper modules; I added the module for Category:en:Towns in Alberta and created Category:Towns in Alberta. DonnanZ (talk) 12:53, 23 February 2019 (UTC)

They won't move[edit]

Could someone help me please? At el.witkionary we have changed the name of a Language Category. (From old name to new name). Everything looks fine, the words have the correct new Category.name at the bottom of their page. But they willll not move! I need to enter their page, edit something: a space, a line, anything, so that they will move to their new Cat. Is this expected? Can one avoid it? sarri.greek (talk) 23:54, 22 February 2019 (UTC)

@Sarri.greek: It's expected. The server takes a while (sometimes days) to update the contents of a category. (Sometimes I hurry it along by using the Pywikibot touch script.) — Eru·tuon 23:58, 22 February 2019 (UTC)
A... Thank you Eru. I thought it was my fault... sarri.greek (talk) 00:02, 23 February 2019 (UTC)
You don't need to actually change anything. Just click Edit, then Publish changes (a "null edit"), and that will prompt the system to update the categories, without any typing or anything shown in the revision history.
When I'm trying to clear something like this, I open up a bunch of them in separate tabs and switch from tab to tab. That way I can do a specific step on a dozen pages while the first one is responding to what I just did, and then do the next step, etc. It saves a lot of time when I'm doing hundreds of null edits- every second you save adds up to a minute for every 60 times you do something. Chuck Entz (talk) 00:25, 23 February 2019 (UTC)
Thank you Chuck Entz. Good to know! sarri.greek (talk) 16:29, 23 February 2019 (UTC)

Category:Norwegian Nynorsk words that are no longer standard[edit]

It's a category with invalid label, but I'm not sure how to fix that. It has to be kept, of course. DonnanZ (talk) 18:39, 23 February 2019 (UTC)

If it's not going to be used by other languages, you can just write your own category description. Maybe it would make sense to put it under Category:Norwegian Nynorsk terms by orthographic property. You could also add categories for forms that were used before individual spelling changes; compare Category:Russian pre-1918 spellings and Category:Ukrainian pre-1990 spellings, which are placed under Category:Russian archaic forms and Category:Ukrainian archaic forms. — Eru·tuon 20:05, 23 February 2019 (UTC)
Another possible parent category: Category:Norwegian Nynorsk superseded forms (see Category:Superseded forms by language for existing examples). — Eru·tuon 20:41, 23 February 2019 (UTC)
OK, maybe not as easy as I hoped; I will have a look. Thanks. DonnanZ (talk) 21:22, 23 February 2019 (UTC)
Hmm, actually Category:Norwegian Nynorsk superseded forms is a pretty good fit. How about just using it instead of Category:Norwegian Nynorsk words that are no longer standard in {{nn-former}}? Or you could have {{nn-former}} place forms in categories for individual spelling reforms, and put those categories under Category:Norwegian Nynorsk superseded forms. — Eru·tuon 21:31, 23 February 2019 (UTC)
@Erutuon: I think I fixed it by changing it to {{catfix|nn}} and adding Category:Norwegian Nynorsk language so it now appears in there. It seems to work, and got rid of the invalid label. Let me know if that's too irregular. DonnanZ (talk) 21:45, 23 February 2019 (UTC)
That's better, but I'm going to try the idea of putting words in categories for individual spelling reforms, and putting those categories in Category:Norwegian Nynorsk superseded forms. — Eru·tuon 21:56, 23 February 2019 (UTC)
Do what you like. Should those four categories be added to Category:Norwegian Nynorsk words that are no longer standard? DonnanZ (talk) 22:14, 23 February 2019 (UTC)
No, I think that category should be deleted because it's functionally equivalent to Category:Norwegian Nynorsk superseded forms. — Eru·tuon 22:17, 23 February 2019 (UTC)
If the idea is to standardise this type of category across all languages, OK, fair enough. You made it happen pretty quickly. DonnanZ (talk) 23:04, 23 February 2019 (UTC)
Yeah, that's my intention because I like the idea of being able to go to Category:Superseded forms by language and find categories for each language that has undergone spelling reforms. — Eru·tuon 23:46, 23 February 2019 (UTC)
The old category has gone now. I just hope our Nynorsk contributors can find the new one! DonnanZ (talk) 00:46, 24 February 2019 (UTC)

How to replace #formatdate:DATE magic word with Lua?[edit]

@Erutuon, Rua Anyone know how to replace the {{#formatdate:DATE}} call with the equivalent Lua call? Benwing2 (talk) 06:45, 25 February 2019 (UTC)

I haven't dealt with dates very much, but as far as I can tell there's no exact equivalent; the nearest thing that I could find was mw.language:formatDate, which I used in Module:time. There's also os.date, but that's a bit lower-level. — Eru·tuon 07:21, 25 February 2019 (UTC)
@Erutuon Thanks. Do you (or anyone else) have any idea how I'd even implement that functionality? It appears to fetch the user's default date preference and use that, and I don't know how to do that from Lua. Benwing2 (talk) 00:37, 26 February 2019 (UTC)
I'm thinking of filing a Phabricator bug concerning this, but I'm not quite sure how to do this. Any pointers? Benwing2 (talk) 23:55, 26 February 2019 (UTC)
Huh, it looks like you found the solution in Module:User:Benwing2/quote-meta. I didn't try entering the name of the parser function as "#formatdate" with the number sign. — Eru·tuon 20:38, 28 February 2019 (UTC)

Is it possible to get conjugation raw data?[edit]

I am trying to get raw data and display the data on my own. Other parts seem relatively easy to handle, but the conjugation part is just an empty placeholder and the HTML renderer inserts the table later. For example, something like this for a French verb.

====Conjugation====
{{fr-conj-auto}}

Is there any way to get all the conjugation data for a verb? Or would the only way be extracting the data from HTML source code manually? --Sin Jeong-hun (talk) 12:15, 25 February 2019 (UTC)

There are a couple of ways. If you are using the API to get the data, you can get use Expandtemplates, which will return the wikitext with the templates converted into their output. If you are using the data dump you could re-implement the logic used in Module:fr-verb, and re-create what the output would have been. The first will be substantially easier, but won't help you unless you are using the API. - TheDaveRoss 13:33, 25 February 2019 (UTC)
Thank you for your help. I could get the HTML for the conjugation table using the API like this, `https://en.wiktionary.org/w/api.php?action=expandtemplates&text={{fr-conj-auto}}&title=%C3%A9couter&prop=wikitext`. I have a small additional question. Can I get the full raw tags, also by using the API? I was using a page url like `https://en.wiktionary.org/w/index.php?title=pour&action=raw`, because I could not find an API (using the "api.php") to do that. --Sin Jeong-hun (talk) 13:55, 25 February 2019 (UTC)
You might be looking for API:Revisions with action=parse here is livrer as an example. - TheDaveRoss 15:00, 25 February 2019 (UTC)
Have a look at {{se-infl-verb-even|ealli|output=Wikidata}}. —Rua (mew) 15:12, 25 February 2019 (UTC)
I think what you want is the generate_forms interface built into Module:fr-verb. Benwing2 (talk) 15:31, 25 February 2019 (UTC)
This is intended for exactly that purpose. Benwing2 (talk) 15:32, 25 February 2019 (UTC)
Thank you for the reply, but I cannot understand how to use {{se-infl-verb-even|ealli|output=Wikidata}}, even after reading the liked page. I am not trying to edit Wiktionary entries; I want to retrieve conjugation data from Wiktionary. Could you give me an example for it? For example, what is the API URL to get all the conjugations for a verb "écouter" using {{se-infl-verb-even|ealli|output=Wikidata}} --Sin Jeong-hun (talk) 12:51, 26 February 2019 (UTC)
For écouter, I find that expanding {{#invoke:fr-verb|generate_forms}} on the page écouter seems to work. {{se-infl-verb-even}} is a Northern Sami template so you can't get French verb forms from it. — Eru·tuon 21:30, 26 February 2019 (UTC)
It is not possible (as far as I am aware) to call modules remotely (via API or otherwise). For what I understand your goals to be it doesn't seem like modules are a useful tool. - TheDaveRoss 23:31, 26 February 2019 (UTC)
There is also the scribunto-console API action that is used by MediaWiki:Gadget-libLua.js on Wikipedia. The last time I looked, I couldn't find documentation, but here is a query that calls Module:fr-verb to generate the forms of écouter and converts them to JSON. — Eru·tuon 23:41, 26 February 2019 (UTC)
There's also an existing template {{fr-generate-verb-forms}} that simply calls {{#invoke:fr-verb|generate_forms}}, which you can presumably use from the Expandtemplates API. Benwing2 (talk) 23:54, 26 February 2019 (UTC)
Thanks for pointing out scribunto-console, I had never seen that before. - TheDaveRoss 17:31, 27 February 2019 (UTC)

Category:Thesaurus[edit]

Hey. Would it be possible to get a "Recent additions to the category" bit on that page, like the one at Category:Spanish nouns? --Wonderfool early February 2019 (talk) 13:55, 26 February 2019 (UTC)

Template:sv-noun-form-def, but for uncountable nouns[edit]

Would anyone be so kind as to create a version of sv-noun-form-def, but for uncountable nouns? The current template yields "definite singular of [term]". If a template for uncountable nouns was created, I believe that it should read "definite of [term]". Thanks in advance! —VulpesVulpes42 (talk) 15:09, 26 February 2019 (UTC)

A definite form of an uncountable noun is still a singular, does it really matter? I treat them as definite singulars in Norwegian. DonnanZ (talk) 10:01, 28 February 2019 (UTC)
@Donnanz You may very well be right, but personally, I do not see uncountable nouns as singular at all. I would for instance never say "en botanik", which I feel is just as awkward as the English translation: "a botany". VulpesVulpes42 (talk) 13:49, 28 February 2019 (UTC)
@VulpesVulpes42: You have a point, but we don't include the indefinite article in entries anyway, as far as I know. It doesn't appear in the declension table for Swedish botanik for instance. You have to work that one out by looking at the gender.The Norwegian page for botanikk shows (en/ein) in brackets, which could be an option. DonnanZ (talk) 14:08, 28 February 2019 (UTC)
@Donnanz: Maybe I worded my response poorly. I think that the indefinite article should not be included in entries, especially not in entries for uncountable nouns, since it is grammatically incorrect to do so under any circumstance. Whereas the inflection table for countable nouns is split up into "singular" and "plural", uncountable nouns lack those categories, and instead it just says "uncountable". What I am saying is that we should be consistent, and not list the definite forms of uncountable nouns as "singular". VulpesVulpes42 (talk) 18:08, 28 February 2019 (UTC)
@VulpesVulpes42: OK, you can do it this way: {{inflection of|[term]||def|lang=sv}}. I just tried this with elvedeltaet which gives the wording you require, I also tried {{definite of}}, but that gives the wording "definite singular of", You can see both results there, so have a look now; I will amend this entry at the end of the day as elvedelta is countable. DonnanZ (talk) 18:35, 28 February 2019 (UTC)
@Donnanz: Thank you! Hope that I wasn't too much of a hassle. VulpesVulpes42 (talk) 18:52, 28 February 2019 (UTC)
@VulpesVulpes42: No problem, as long as you're happy with that solution. DonnanZ (talk) 19:03, 28 February 2019 (UTC)

Andalusian Arabic in Latin script[edit]

Could someone add Latn as a script to Andalusian Arabic (xaa) in Module:languages/data3/x? Currently Latin-alphabet entries are displayed strangely, e.g. portocǎli. --Lvovmauro (talk) 08:21, 27 February 2019 (UTC)

@Lvovmauro: That's because they should be written in Arabic, not Latin. If you want to add a transcription, use |tr=. --{{victar|talk}} 08:32, 27 February 2019 (UTC)
Some words are only attested in the Latin alphabet. I'm not going to make up an unattested Arabic spelling. --Lvovmauro (talk) 09:19, 27 February 2019 (UTC)
Makes sense to me. Added. — Eru·tuon 18:30, 27 February 2019 (UTC)

Hiding pronunciation in certain cases[edit]

This would be a good idea in some cases, such as for English export, where it takes up a fair amount of space. Can this be done with a hide/show function? DonnanZ (talk) 17:18, 28 February 2019 (UTC)

Maybe something along the lines of what we did for derived terms, etc.? @Erutuon, Victar. — SGconlaw (talk) 17:20, 28 February 2019 (UTC)
For a start it wouldn't need columns, so maybe a simple existing template could be adapted. DonnanZ (talk) 17:36, 28 February 2019 (UTC)
That's why I made {{links-list}} a generic template. --{{victar|talk}} 18:00, 28 February 2019 (UTC)
Template:links-list/documentation is apparently not currently available. DCDuring (talk) 18:44, 28 February 2019 (UTC)
Also, how should it be categorized? DCDuring (talk) 18:45, 28 February 2019 (UTC)
I have no idea how it works (lack of documentation), can you hide/show? DonnanZ (talk) 19:13, 28 February 2019 (UTC)
Something like {{rel-top}} (which produces two columns so isn't suitable, I tried it) which doesn't produce columns would be good. DonnanZ (talk) 10:54, 1 March 2019 (UTC)

March 2019

Table templates[edit]

Does anybody know why Hittite inflection table templates {{hit-decl}}, {{hit-decl-adj}}, and {{hit-conj}} aren't collapsing anymore? They used to be rendered collapsed by default. – Tom 144 (𒄩𒇻𒅗𒀸) 14:16, 3 March 2019 (UTC)

@Tom 144: Fixed by removing a semicolon. (The semicolon was causing problems for jQuery in MediaWiki:Gadget-VisibilityToggles.js.) — Eru·tuon 20:26, 3 March 2019 (UTC)
@Erutuon: Thank you. –Tom 144 (𒄩𒇻𒅗𒀸) 20:28, 3 March 2019 (UTC)

Template for categorizing pseudo-Anglicisms, etc[edit]

Related to Wiktionary:Beer parlour/2019/March#Pseudo-X-isms_by_language, should we create one or more templates for categorizing entries as CAT:Pseudo-anglicisms by language, CAT:Pseudo-Italianisms by language, etc? We could either have different templates {{pseudo-Anglicism}}, {{pseudo-Italianism}}, etc, or one template like {{pseudo-borrowing}} {{pseudo-loan}} that would take language codes like {{pseudo-borrowing|fr|en}} {{pseudo-loan|fr|en}} (for "French pseudo-Anglicisms"). The latter would have to have its own (template-internal, non-Lua?) list of parameter-2-language-codes–to–text to account for "en" being "Anglicism" instead of "English", etc, at least for the relatively few divergent cases like that. (We could perhaps set it up to put unrecognized languages into a holding category we could check periodically to review the correctness of, or just to assume that anything that didn't have a "special" name specified should just use the language's canonical name + ism.) - -sche (discuss) 17:58, 3 March 2019 (UTC)

If we are to do this, definitely create a single template {{pseudo-borrowing}}, {{pseudo-loan}}, etc. Benwing2 (talk) 21:00, 3 March 2019 (UTC)
I'm not sure this can be always called pseudo-borrowings. In some cases, most speakers would tell you that they're still in the purported original language. In others, they're considered to be a language that doesn't actually exist. Then there are various stereotypical "foreign accents" that don't sound at all like anything real people would say. It's like speakers of the recipient language have this imaginary language in their heads that they may or may not identify with some real language (that they know nothing about). It has its own vocabulary, grammar and phonology- however limited- and is different from the speakers' own language- but also from any other language. In some ways it's like a pidgin, and in others like a conlang. Another parallel is spirit-possession languages, ritual languages, rhyming slang, thieves' cants and other specialized registers. Chuck Entz (talk) 22:53, 3 March 2019 (UTC)
Hmm... but (re your first sentence) anything that can't be called a pseudo-borrowing isn't covered by this; I'm only proposing to take entries which already describe themselves as e.g. pseudo-anglicisms in their etymology section, and templatize that description in the hopes of making such entries easier to track and regularly categorize. (The word "borrowing" or "loan" wouldn't show up anywhere in the entry, and I'm not proposing to categorize them as e.g. "derived from English", just to templatize the existing categorization as "pseudo-anglicisms" etc.) From my perspective, the fact that some cases might be unclear does not seem any more of an impediment to this template than to any other etymology template. - -sche (discuss) 23:38, 3 March 2019 (UTC)

Headers of Thai templates[edit]

I just noticed that headers have recently been removed from Thai language related templates. Formerly, the templates allowed users to add titles to the headers of drop-down lists, which worked somewhat like the temples "trans-top" and "trans-bottom". For example:

But, now, in the Thai templates, the title parameter does not work any more and the headers are not displayed any longer. For example, the following code:

{{th-syn|title=snake|อสรพิษ|โฆรวิษ|วิษธร|เงี้ยว}}
gives the following plain list without title:
snake

And this causes a mess to some entries where several lists are placed in the same section, because there's no title to tell this list is for which definition or that list is for which definition. An example is in the synonym section of the entry การเวก (gaa-rá-wêek). So, I think we should do something to fix this problem. --Miwako Sato (talk) 07:33, 4 March 2019 (UTC)

@Miwako Sato: I fixed Module:columns so that the title is displayed in the same way that it is displayed for {{der3}} and similar templates. If Thai editors in general prefer the way the list looked before, that can be achieved by replacing Module:columns with Module:columns/old in Module:th. — Eru·tuon 21:18, 4 March 2019 (UTC)
Thank you very much! --Miwako Sato (talk) 04:40, 5 March 2019 (UTC)

@Erutuon: I just noticed something. I think the header should be displayed only when the parameter "title" is used. But, now, when the parameter "title" is not used, the header displays the title of the section it belongs to instead, which is redundant. See the sections "Alternative forms" and "Derived terms" in the entry เจริญ (jà-rəən) for example. --Miwako Sato (talk) 06:24, 5 March 2019 (UTC)

@Miwako Sato: Yes. That was true before the list layout changed. The default header is set in Module:th. — Eru·tuon 06:34, 5 March 2019 (UTC)

Tocharian B Genders[edit]

I'm not sure if this is the exact best place to put this, but it's a technical question about Wiktionary machinery, and I see nowhere better to post it.

Tocharian B is a gendered language, but I don't see any simple way of showing the genders of nouns for the entries I create, unlike in Latin or Ancient Greek, which can have their genders easily shown through formatting text. Looking at the other Tocharian B noun entries that I didn't create, it doesn't look like any of them have their genders shown, which leads me to believe that there isn't any code set up to display Tocharian B genders.

Would it be possible to fix this in the Lua code, and, if not, how can I work around it? GabeMoore (talk) 19:36, 4 March 2019 (UTC)

@GabeMoore: the noun template {{txb-noun}} accepts gender as the second parameter, it seems that the issue is that nobody has added the genders yet. If you are not familiar with the use of these templates, they go immediately under the part-of-speech header (e.g. ===Noun===) and they look something like {{txb-noun|gandha|m}}. Hope that helps, and this is a perfectly reasonable place for this request. - TheDaveRoss 19:52, 4 March 2019 (UTC)
Thanks so much. A lot of the nouns have gender that either changes or is unknown in the plural. How would I denote this? GabeMoore (talk) 14:06, 6 March 2019 (UTC)

Obligate non-scriptio continua in a Mandarin Chinese example[edit]

Most of the time, we assume that Mandarin Chinese is an unspaced scriptio continua. But there is a political slogan in Mainland China that includes an obligate space between the two halves of the six character phrase, and it appears in the original text of an example that I am adding (and elsewhere- not idiosyncratic or a typo). Is there a way to add a space between the actual characters (not just a space in the pinyin) in a zh-x? Here's the page with the example: 四十埠 (Sìshíbù). --Geographyinitiative (talk) 20:46, 4 March 2019 (UTC)

A solution was found which can be seen on the page. --Geographyinitiative (talk) 01:45, 5 March 2019 (UTC)
Now I’m curious. Is it known what the rationale is for this unusual mandatory space? Is it to avoid an ambiguity in which 在行动 applies to just 雷锋 instead of the whole slogan 学雷锋?  --Lambiam 22:02, 5 March 2019 (UTC)

Minor change to "Module:category tree/PIE root cat"[edit]

Could someone please modify "Module:category tree/PIE root cat" so that when it displays, for example, the description line "English terms that originate ultimately from the Proto-Indo-European root *ḱe-" on category pages, the link is to "Reconstruction:Proto-Indo-European/ḱe" and not "Reconstruction:Proto-Indo-European/ḱe-" (a redirect)? Also, perhaps there should be a full stop at the end of the description line. Thanks. — SGconlaw (talk) 09:17, 6 March 2019 (UTC)

I added the period to the category descriptions. The cause is actually {{PIE root}}, not the category tree module. {{PIE root}} always adds a hyphen to the PIE root in the category name; the hyphenless equivalent is {{PIE word}}. (I see I need to luaify {{PIE word cat}}.) — Eru·tuon 09:47, 6 March 2019 (UTC)
Oh, I didn't realize that the PIE element in question was a word rather than a root. Thanks. — SGconlaw (talk) 12:18, 6 March 2019 (UTC)
@Sgconlaw: I don't know if the distinction is very rigorous. Practically, you use {{PIE root}} if the entry name ends in a hyphen, otherwise {{PIE word}}. — Eru·tuon 20:46, 6 March 2019 (UTC)
{{PIE root}} is of course meant to be used for roots... —Rua (mew) 15:23, 8 March 2019 (UTC)
@Rua: Okay, so I just mean I'm not sure if *ḱe is considered a root or not given that it's written with a hyphen in its headword line but without in its entry name. — Eru·tuon 19:17, 8 March 2019 (UTC)
I wouldn't call it a root, as it doesn't take part in word formation the same way normal PIE roots do. PIE roots always begin and end with a consonant. —Rua (mew) 19:42, 8 March 2019 (UTC)

{{ja-pron}} is not suited for the Japanese dialects[edit]

@Poketalker, Mellohi!, Suzukaze-c, KevinUp, Eirikr, Dine2016 I tryed to add dialectal accents in the Japanese entries, but I found many problems on the template. Though it makes detailed IPA expressions, that phonological feature only can be applied to the Tokyo dialect. For examples:

  • Tokyo type (東京式) accents distinguish only by the position of the downstep, but Keihan type (京阪式) accents also distinguish the pitch at the beginning of word and Two-patterns type (二形式) accents have fixed into two patterns by the position from the end of phrase.
    • In relation to difference of the accents, pattern of the vowel devoicing can be varied. Some Western Japanese dialects including Kansai don't have devoiced vowels.
  • Medial ザ行 /z/ is free allophone [dz/dʑ ~ z/ʑ] in Tokyo, but [z/ʑ] in the many areas in Western Japan including Kansai, [dz] in most areas in the Tōhoku region.
    • Also there are dialects which distinguishe between ジ [ʑi]・ズ [zu] and ヂ [dʑi]・ヅ [dzu] in Kōchi, Miyazaki and Kagoshima prefectures.
  • ウ /u/ is unrounded [ɯ] in Tokyo, but rounded [u] in the Mie prefecture and westward, central vowel [ʉ] in the Tochigi prefecture and northward.
    • In Nothern Tōhoku, part of ウ段 /u/ is merged with イ段 /i/ to be /ɨ/.
  • Niigata dialect preserves difference between /oː/ and /ɔː/ from the medieval era.
  • Some dialects have fused vowels /ɛ/, /ø/ and /y/.
  • Some Eastern Japanese dialects merges イ段 /i/ and エ段 /e/ into /e̝/.
  • Kagoshima dialect has final stop consonants ッ [ʔ̚ ~ t̚], ㇲ [s] and ㇱ [ɕ].

It's totally impossible to express these features at present state. Help by whom can tweak the template is needed.--荒巻モロゾフ (talk) 14:23, 9 March 2019 (UTC)

Um... what do you think of the “unified Japanese” approach on my user page? --Dine2016 (talk) 15:23, 9 March 2019 (UTC)
That's a neat idea. To explain the history of accent patterns, it's good enough to put Tokyo, Osaka and Kagoshima dialects (they complement each other from the mergers of the accent pattern from the proto form) under the entry "Modern Japanese" basically. In order to handle various phonemes and accents, it is better if there are fields to write IPA freely.--荒巻モロゾフ (talk) 16:12, 9 March 2019 (UTC)
We absolutely need to re-evaluate our templates. They are suited only for the modern Tokyo dialect, to the detriment of other forms of Japanese. I started a personal remake of ja-pron with this in mind, but I didn't like my code. Perhaps I should revisit it. —Suzukaze-c 05:53, 10 March 2019 (UTC)

Profiling Lua memory usage and processing time[edit]

@Rua, Erutuon, JohnC5 Anyone have any hints as to

  1. Profile where exactly the memory usage and processing time of a given page is going?
  2. Reduce the memory usage and processing time?

The only memory usage info I've been able to find so far is when you preview a page, it lists the total memory usage. But it doesn't break out the usage by function or module or anything, which would help a lot. I vaguely remember an old thing that listed the processing time per function but I can't find it any more.

As for reducing memory usage, I discovered at least for Module:quote that replacing precached references to require("foo") with calls to require each time I need to use a module reduces memory usage at the expense of processor time. This was enough to make black no longer throw memory-usage errors, although I'm really scraping at the margins; the majority of memory usage appears to come from the voluminous translation tables. Any other ideas? I imagine there must be ways of optimizing Module:links and/or the modules called by Module:links, and this would be useful, since they're used so often. Benwing2 (talk) 19:31, 9 March 2019 (UTC)

OK, scrap that, the reduction in memory usage was because some code was commented out. Reenabling that code bumps up the memory usage to where it was, leading me to think that avoid precaching modules doesn't help. Benwing2 (talk) 19:35, 9 March 2019 (UTC)
We could change the translation table format entirely, so that the whole contents is passed to a template and processed all at once. A major cause of memory usage is actually memory leaks between module invocations, so that the more modules are called on a page the higher the usage climbs. If we make the entire translation table one module call, it may work. —Rua (mew) 19:41, 9 March 2019 (UTC)
(edit conflict) Each page has a report on how much time the transclusions of each template have taken. It is in the HTML source code (search "Transclusion expansion time report") and can be accessed using JavaScript, but as far as I can tell isn't displayed anywhere. There doesn't seem to be any information on how much memory each module uses. —This unsigned comment was added by Erutuon (talkcontribs) at 19:44, 9 March 2019 (UTC).
@Rua Can you explain more? What sort of memory leaks are you referring to, how do they happen and how can they be eliminated? I did notice, for example, that the block that handles the chapter= parameter in Module:quote (lines 245-281) is where most of the memory is going. This code calls 3 modules but each one looks small, so I don't quite understand what's going on. Benwing2 (talk) 19:47, 9 March 2019 (UTC)
From past experiments, it seems that if you call {{t}} a ton on the same page, it uses more memory than just a single {{t}}. That doesn't make sense if you assume that each invocation is completed and all memory is freed before the next one is processed. Instead, it appears that the software processes all the invocations in parallel, but from a shared memory pool. Thus, the more invocations are processed in parallel, the higher memory usage becomes. At the same time, of course, the parallel processing speeds it up.
If the entire translation table is reduced to a single invocation, that limits the ability for each separate call to {{t}} to be done in parallel, and forces the whole thing to be processed serially as one thread of execution. Thus, instead of the invocations for each {{t}} adding up to the memory usage, you only have the memory used by that single transclusion.
There are other optimisations you can do as well with this method. With {{t}}, each invocation has to search for and import the appropriate language module, and this is then kept in memory for any future calls to {{t}} so that the whole thing doesn't slow down to a crawl. However, with translation tables we already know it's highly likely that we'll need all language modules. If we transclude all language modules in one go (via Module:languages/alldata) in advance, but don't keep them in memory after, that may be more efficient. —Rua (mew) 19:57, 9 March 2019 (UTC)
You're doing a lot of string concatenation in this block. This is bad and creates a lot of garbage that may not go away. DTLHS (talk) 20:01, 9 March 2019 (UTC)
@Rua, DTLHS Interesting ... if it's processing in parallel it must first divide the page up into chunks, because you only see the out-of-memory errors appearing after a certain point. However, I notice that all errors start after a certain point on the page, which can change slightly depending on code changes, which suggests that if it's parallelizing it must do it in small chunks. There are some very strange things about memory usage, though; as for the block in question in Module:quote, all the memory usage is coming from the call to Module:roman numerals, which adds 2 MB even though it's only invoked once (string concatenation doesn't appear to be the culprit). Guarding it with a regex call to see if the thing actually looks like a Roman numeral gets back that 2 MB. I don't see what that one call to that module could possibly be doing to gain 2 MB. Furthermore, adding cached modules at the top sometimes *reduces* memory usage, and triggering an error in the middle of the page in one spot also reduces memory usage. There's definitely something weird going on here. Benwing2 (talk) 20:32, 9 March 2019 (UTC)
@Benwing: DTLHS is certainly correct that Module:quote is using a lot of memory on string concatenation. This could perhaps be avoided by refactoring the code to store high-frequency strings in a submodule which is loaded with mw.loadData. I find also a bit odd that you haven't used Module:parameters (though this in itself might add extra overhead). Also, as @Victar may have informed you, we are also working on a Module:cite-meta project, which I think could be efficiently combined with Module:quote. That way, a lot of the parameter processing, string storage, and output styling could be unified in one place. The other thing that would be to get Module:Quotations to run through Module:quote as well, but that may be a project for the future. —*i̯óh₁n̥C[5] 21:13, 9 March 2019 (UTC)

I did an experiment on User:Rua/sandbox. Have a look at the source code for the page. The entire translation table contents is passed to the module Module:User:Rua/translations new, which then does its own simpler template parsing. It also directly loads all the language data at once, but still uses the regular Module:links for processing. The funny braces are just to prevent the software from treating it as templates before being passed to the module.

Results with the new attempt:

  • Lua time usage: 0.369/10.000 seconds
  • Lua memory usage: 12.88 MB/50 MB

Results with the original translation table from black:

  • Lua time usage: 0.780/10.000 seconds
  • Lua memory usage: 28.22 MB/50 MB

It's twice as fast and uses only half the memory. —Rua (mew) 20:39, 9 March 2019 (UTC)

It's too bad we can't do any real profiling and have to rely on trial&error. I opened phab:T188492 last year, but not much has happened. – Jberkel 20:44, 9 March 2019 (UTC)
Another wise proposal that has been made by @Erutuon is to alter the entrytitle generation code to be slightly more efficient. Right now it goes through each accented character and tries to replace them with the unaccented equivalent. Erutuon suggested that is might be faster in many cases to use mw.ustring.toNFD to decompose the diacritics and then remove them. Further efficiency might be gained by doing nothing at all if the NFC and NFD strings are the same length. —*i̯óh₁n̥C[5] 21:20, 9 March 2019 (UTC)
@Rua Your suggestion above of using the new code in Module:User:Rua/translations new is a great idea. Are there reasons not to adopt it, and if so what? We could use this for now only on pages that otherwise trigger memory errors, rather than trying to convert pages en masse. Benwing2 (talk) 22:30, 9 March 2019 (UTC)
We probably want a better solution than the weird Unicode braces I used, though. It was just a quick test/proof of concept. Someone else would need to work it out more to get it fully functional. —Rua (mew) 22:49, 9 March 2019 (UTC)
I've implemented the idea that John described, of a remove_diacritics field in the entry_name and sort_key fields of language data tables. Latin and Ancient Greek now use it. When I tested the change in the sandbox, I saw a significant decrease in memory usage in a page with a lot of Ancient Greek links, but that may have been due to the way I was testing it. At the very least it works. — Eru·tuon 01:58, 14 March 2019 (UTC)
@Erutuon: This is a great idea. It won't make a big difference, but a lot of African languages could use that as well. —Μετάknowledgediscuss/deeds 02:43, 14 March 2019 (UTC)
How many of our users like to see, use, or need to use the full set of translations simultaneously? Could users choose which subset of languages are to be served to them? Could users choose not to be served any translations? DCDuring (talk) 23:04, 9 March 2019 (UTC)
The problem with dealing with issues via settings is that a lot of users (even those with accounts) often won't be logged in. Equinox 23:09, 9 March 2019 (UTC)
Could we not read or infer their preferred language somehow? And couldn't translations be made available based on session-specific preferences? Encouraging registration seems like a good thing. Having preset clusters of translations (eg, language families or languages in a given script) could speed the process of selecting such preferences. DCDuring (talk) 23:17, 9 March 2019 (UTC)
@Rua I'm thinking it wouldn't be too hard to make your prototype fully functional:
  1. In place of weird Unicode chars, we create templates {{tt}}, {{tt+}}, {{tt-check}}, {{tt+-check}} that are parallel to {{t}}, {{t+}}, etc. but do nothing except echo their arguments in some format, e.g. {{tt|ary|كحال|tr=kḥāl}} might generate ⦃⦃t¦ary¦كحال¦tr=kḥāl⦄⦄. These templates should be non-Lua so they don't add to the Lua overhead. This way, other template calls be still be embedded into the argument passed to the translations-new function.
  2. In place of directly calling full_link from translations-new, extract out the appropriate code from Module:translations and call that, so the output is exactly the same whether translations-new is used or not.
If you think this is a good idea, I'll implement it. Benwing2 (talk) 01:21, 10 March 2019 (UTC)
@Benwing2 I'm fine with it. Be aware that such a "passthrough" would ignore unrecognised parameters, which normally trigger errors through the use of Module:parameters. It would be nice if we could retain the parameter checking somehow, either inside the passthrough template or directly in the new translation module. Perhaps the passthrough could be implemented in Lua, but in an extremely simple module that just iterates through args and spits everything back out again verbatim, leaving the translation module to do the checking. —Rua (mew) 15:11, 10 March 2019 (UTC)
@Rua That is a good point. I don't think it's very easy to implement the parameter checking inside a template. I think for the moment it doesn't matter all that much as this would presumably only be necessary on heavily trafficked pages, but once I implement it I will try implementing the passthrough in Lua the way you suggest, to see how much extra overhead it adds. An alternative is to not use a passthrough at all, but to use e.g. angle brackets and colons in place of braces and pipe signs; this should also work but might be a bit ugly. Benwing2 (talk) 19:36, 10 March 2019 (UTC)

@DTLHS, JohnC5 Are you sure it's using a lot of memory doing string concatenation? It's true it's doing some string concatenation but not generally of very large strings. The main way of building up the output is through the add() function, which adds to an array that is concat()ed at the end. I could rewrite all string concatenations inside of add() as multiple calls to add() if you think it really makes that much difference. As for Module:parameters, I didn't use that because the first pass is a direct port of Template:quote-meta/source. In a future pass I may fix it up to use Module:parameters. As for Module:cite-meta, this is the first I've heard of it. Is its goal to replace the Template:cite-meta template? That template works rather differently from Template:quote-meta/source (or at least its output is quite different), so I'm not sure how much code could be shared, although I imagine some of it. Feel free to hack on Module:quote or even move your code in Module:cite-meta into it. (If you're going to keep a separate module, I'd call it just Module:cite rather than Module:cite-meta.) Benwing2 (talk) 01:29, 10 March 2019 (UTC)

@Benwing2: I've always understood the most efficient method of string manipulation in Lua to be to create large tables of strings and then call :concat() at the end to avoid having intermediate phases lying around. Another way I've seen Rua do it is to put the final layout on another page, load that in and replace all the {{{param_names}}} in one fell swoop.
As for Module:cite-meta (which I agree should be Module:cite, if separate), we've been keeping this one under wraps to a certain extent. The notion around this one is to tag all the parts of the citation so that using CSS magic, you can control what citation type and how much to show. That way, the user who wants "simple, clean" citations can choose how much information is shown, and everyone else can get as much info as they want in whatever format. I think the quote templates could benefit from the same treatment. —*i̯óh₁n̥C[5] 02:12, 10 March 2019 (UTC)

{{multitrans}}[edit]

@Rua, Erutuon, JohnC5, Jberkel, DTLHS, Victar, -sche I implemented Rua's suggestion of processing the whole translation table together. See User:Benwing2/black-opt. This uses a template {{multitrans}} that surrounds a whole set of translation tables and requires that {{t}} inside of the template call be renamed to {{tt}}, and {{t+}} inside of the template call be renamed to {{tt+}}. (Things will still work if you don't rename, but you won't get the efficiency benefits.) This brings the memory usage down from 48M (just below the 50M limit) to 34M, and processing time from 2.09 to 1.29 seconds. There are probably further optimizations I could do, e.g. the implementations of {{tt}} and {{tt+}} aren't very smart about numbered params; with some additional work there to not generate blank numbered params, the memory usage might be brought down further. I haven't yet created pass-through templates {{tt-check}} and {{tt+check}} or documented {{multitrans}} except in the module code Module:translations/multi, but that isn't too hard. Benwing2 (talk) 22:48, 10 March 2019 (UTC)

I'm curious to know how it compares when there are multiple Translations sections on the page. Then you have to wrap each one separately. —Rua (mew) 22:51, 10 March 2019 (UTC)
You could also see what changes if you eliminate {{redlink category}} in the passthrough template. It does some things that may be rather expensive. —Rua (mew) 22:53, 10 March 2019 (UTC)
@Rua I will try eliminating the {{redlink category}} call. But I'm not sure what you're referring to about multiple Translations sections; in User:Benwing2/black-opt I used only two calls to {{multitrans}}, one wrapping all the adjective translation tables and one wrapping all the noun translation tables. Benwing2 (talk) 22:55, 10 March 2019 (UTC)
Ah, I missed that, never mind then. —Rua (mew) 22:56, 10 March 2019 (UTC)
@Rua Eliminating {{redlink category}} leads to essentially no observance difference in Lua memory or time. Benwing2 (talk) 23:01, 10 March 2019 (UTC)
Same story for optimizing the numbered params in {{tt}}. Benwing2 (talk) 23:05, 10 March 2019 (UTC)
I'm sure that's because of the opt-out list in the template, which has already eliminated any module calls through {{redlink template}} for entries on the list. The one unanswered question so far is how this interacts with the translation adder- does it get confused by the wrapper template and, if not, will it have to be modified to make this work? Chuck Entz (talk) 23:34, 10 March 2019 (UTC)
@Chuck Entz Yes, that will need to be modified. Benwing2 (talk) 00:42, 11 March 2019 (UTC)
BTW could someone point me to the JavaScript code for the translation adder? I'm not super familiar with JavaScript or how it works inside of Wiktionary. Benwing2 (talk) 00:49, 11 March 2019 (UTC)
Maybe it's this? MediaWiki:Gadget-TranslationAdder.js If so, I'll need some help fixing this as I'm not very familiar with JavaScript. Benwing2 (talk) 00:54, 11 March 2019 (UTC)
Yeah, that's it. I've wanted to improve it because it is kind of antiquated and messy, but I haven't quite grasped how it works (or MediaWiki:Gadget-Editor.js, which it depends on) so I haven't done much yet. — Eru·tuon 01:07, 11 March 2019 (UTC)
@Erutuon Any way you could hack on this? It should add {{tt}} instead of {{t}} if the translation-terms section is surrounded by {{multitrans}} (i.e. preceded by "\n{{multitrans|data=}}\n" and not preceded by a later occurrence of "\n}}\n"). If that's hard, you can look in the translation-terms section and see if there are any existing instances of {{tt}} or {{tt+}}. Also, when converting {{t}} to {{t+}}, it should correspondingly convert {{tt}} to {{tt+}}. I think the rebalancing code works fine without change. Benwing2 (talk) 05:43, 11 March 2019 (UTC)
@Benwing2: I'll take a look at it. It might simplify things to require that {{multitrans}} will enclose all of a given Translations section and not just part of it, so that the gadget does not have to determine where the template begins and ends, and can assume that the whole section should use {{tt}} and {{tt+}} for newly added translations. — Eru·tuon 08:56, 11 March 2019 (UTC)

Another optimization[edit]

Another optimization that we could do regarding memory is to change the way we currently structure the language data. Right now, we have a neat table of all the properties of a language, but in practice most of those properties go completely unused for the majority of use cases, yet we still have to import all the data and keep it in memory. Perhaps if we split the language data into smaller modules containing a subset of the data, then this data could be loaded only if it's actually needed. For example, most templates don't care about a language's parent or family, so we could put that in a separate module so that it can be loaded only when someone actually needs it. We could look into other pieces of data we could split off too to reduce the size of the data import needed. Perhaps even just one module per data point. Further splitting codes by second letter or something may also work, although that would increase the number of modules by a lot. —Rua (mew) 23:26, 10 March 2019 (UTC)

Template:ja-r[edit]

I created {{ja-r/multi}} and {{ja-r/args}} based on Rua and Benwing's method to reduce memory in Han character entries. They can replace {{ja-r}} and reduce Lua memory and time usage somewhat. Example:

{{der-top3}}
* {{ja-r|人%人|ひと%びと|rom=-}}, {{ja-r|人々|ひとびと|[[people]]}}
* {{ja-r|人%垣|ひと%がき}}
...
{{der-bottom}}

{{der-top3}}
{{ja-r/multi|data=
* {{ja-r/args|人%人|ひと%びと|rom=-}}, {{ja-r/args|人々|ひとびと|[[people]]}}
* {{ja-r/args|人%垣|ひと%がき}}
...
}}
{{der-bottom}}

In and , replacing the {{ja-r}} templates in one or two Japanese derived terms sections brings the pages under the Lua memory limit, so they no longer have module errors. — Eru·tuon 05:35, 11 March 2019 (UTC)

@Erutuon Awesome!!! Benwing2 (talk) 05:43, 11 March 2019 (UTC)

Using simpler linking functions[edit]

A technique that I used on several Appendix pages, like Appendix:Ancient Greek endings and Appendix:English doublets, and in some templates like {{grc-correlatives}}, {{Chinese-numbers}}, and {{ca-decl-ppron}}, is to link the terms without using Module:links at all. Module:links is complex because it has to handle many different types of input. For instance, it has to ensure that {{l|mul|:}} links to Unsupported titles/Colon and that {{l|en|word#Etymology}} links to word#Etymology and not to word#Etymology#English, separately process each nested link in {{l|en|[[this]] [[word]]}}, and so on. This increases Lua memory and processing time.

In many situations, many of these features are not needed, so resources can be saved by avoiding Module:links and the templates that use it. For instance, if a list of terms does not contain any hash marks (#), the hash-recognizing feature is not needed. If there are no punctuation marks or whitespace characters that are not allowed in titles, you don't have to check for unsupported titles. If there are no nested links, the linking function doesn't have to check for nested links and process them specially. One or more of these conditions is often true in lists of terms in a particular language and in inflection tables.

So for instance Module:grc-link handles Appendix:Ancient Greek endings, and as far as linking is concerned (the module does other more complicated stuff), it only has to remove some diacritics to get the correct entry name and then create the correct link syntax. Even though it contains more than two thousand links, the page renders quickly and uses very little Lua memory.

{{ca-decl-ppron}} uses a more complicated technique because you can't insert a table into a template. It invokes Module:quick link, which basically transcludes the content of another template and then processes it. I don't like the technique because it's hard to reproduce, but at least the template uses fewer Lua resources now.

{{Chinese-numbers}} and {{grc-correlatives}} also generate tables, but the wikitext is instead found in Lua modules (Module:zh-numbers and Module:grc-correlatives). The modules modify the wikitext to add links and then return it. This technique is easy to understand, but it is somewhat weird to embed a large amount of wikitext in a Lua module. — Eru·tuon 08:34, 11 March 2019 (UTC)

Lua error te[edit]

I was fixing an error on Tee#German (removed Dutch Low Saxon and added it to thee#Dutch, and moved West Frisian up), I was browsing some of the other entries and came across an error message at te#Swedish. I don't quite know how to fix this myself, it doesn't appear that my alteration causes the error to occur (reverting my changes did not cause the error message to disappear). Servien (talk) 23:02, 9 March 2019 (UTC)

Very interesting. See the immediately preceding discussion for an instance of similar symptoms. DCDuring (talk) 01:29, 10 March 2019 (UTC)
It's a Lua memory error. The page has had it for ages; it's certainly not your fault. — Eru·tuon 07:00, 10 March 2019 (UTC)
For whatever it's worth, during the process of making a number of edits to the page replacing {{l}}s etc with bare wikilinks with manual language-section-tagging, I noticed that each one or two removed wikilinks generally reduced memory enough that another one or two lines (of {{head}}s, {{label}}s, or declension-table lines) worked, basically pushing the start of the error down the page (as if each of them was using roughly as much memory as the other). - -sche (discuss) 07:23, 10 March 2019 (UTC)

Cleanups of etymology templates[edit]

Just a heads-up that I am planning of making the following changes:

  1. Changing {{back-formation}} not to output a final period, and fix up uses of this template so that if nodot= is present it's removed, and otherwise a period is added after the template. (I think this is the only such template that still adds a period at the end.)
  2. Changing all occurrences of |lang= in {{back-formation}}, {{clipping}}, {{deverbal}}, {{deverbative}}, {{ellipsis}}, and {{blend}} to use |1=, and move all other numbered parameters over by one. All these templates currently support both |1= and |lang= for specifying the language. After this change I'll remove support for |lang= in these templates.
  3. Renaming calls to {{deverbative}} to use {{deverbal}}.

In the longer run I'd like to do the following, but I won't do them yet as they may be controversial:

  1. Rewrite all uses of {{prefix}}, {{suffix}}, {{confix}} and maybe {{compound}} in terms of {{affix}} and eventually eliminate those preceding three or four templates.
  2. Rewrite all uses of {{circumfix}}, {{infix}}, etc. that use |lang= to use |1=, moving the numbered parameters over by one; and then eliminating the compatibility support for |lang=.

Benwing2 (talk) 05:59, 11 March 2019 (UTC)

You should be careful about converting them to {{affix}} because that may give the wrong result if there are hyphens in the terms. For example, with PIE root plus suffix, the root should not be treated as a prefix despite the hyphen. —Rua (mew) 10:47, 11 March 2019 (UTC)
Probably forgot {{doublet}}, {{univerbation}}, {{reduplication}}, {{rebracketing}}, which are of the same code. Fay Freak (talk) 13:45, 11 March 2019 (UTC)
Is there some way to have templates like {{prefix}} and {{suffix}} cater to situations where the various elements of a word originate from different languages? It is not uncommon, for example, for a word to have a stem that is from Greek or Latin, and then an English affix like -ic or -ity. — SGconlaw (talk) 14:24, 11 March 2019 (UTC)
@Sgconlaw: The way I do it is using {{der}} for the foreign word, plus {{suffix|en||ic}} (or whatever), or {{prefix|en|[prefix]|}} plus foreign word. DonnanZ (talk) 17:30, 11 March 2019 (UTC)
|langN= “The language code to use for this particular part” which stands at {{affix}}, is for all such cases an I err not. Fay Freak (talk) 15:45, 11 March 2019 (UTC)
Thanks for making templates more consistent. Didn't know we had {{univerbation}}. Shouldn't it take the terms which make up the new word as parameters, @Fay Freak? – Jberkel 17:14, 11 March 2019 (UTC)
@Jberkel: It could, but this is above my coding capabilities. Also often you write something between the two terms, or it is one link already, for example on عَبْقَر(ʿabqar); عَيْن الْبَقَر(ʿayn al-baqar) can be an entry and عَبْقَر(ʿabqar) is, the same with قُرْوُسْطِيّ(qurwusṭiyy); or there isn’t a term that can be linked and it is SOP, as on Langschwert; one would also need to consider in the coding that one of the parts can be in a different language code-switched into the language or the like. The template has been created because it still eases, linking to the appendix correctly and categorizing. Fay Freak (talk) 17:49, 11 March 2019 (UTC)
OK. In addition to the above, I'm going to do the following:
  1. Rename lang= to 1= on the other templates mentioned by User:Fay Freak: {{doublet}}, {{univerbation}}, {{reduplication}}, {{rebracketing}}, as well as synonyms.
  2. Reduce some synonyms. In particular, I notice the following synonyms:
    1. {{back-formation}} has synonyms {{back-form}}, {{backform}}, {{bac}} and {{bf}}. I see no reason for having so many synonyms, and only {{back-form}} and {{bf}} are mentioned in the docs, so I'll rename {{backform}} -> {{back-form}} and {{bac}} -> {{bf}}. In general I think we only need one synonym, which should be shorter than the original and short enough to type easily, and we only need such synonyms if the original is sufficiently long as to make typing it in full be annoying, hence {{blend}} doesn't need a shorter synonym. It's bizarre, for example, that {{calque}} (which is already pretty short) has four synonyms: {{cal}}, {{calq}}, {{clq}} and {{loan translation}}. Such profusion of synonyms IMO serves little purpose and just makes bot processing harder and more error-prone.
    2. {{doublet}} has synonyms {{doublet of}} and {{etymtwin}}, which will be renamed to {{doublet}}.
    3. {{metanalysis}} will be renamed to {{rebracketing}}.
    4. {{reduplicated}} will be renamed to {{reduplication}}.
  3. Convert these templates to Lua. I'll do that after the cleanup. All except {{blend}} and {{univerbation}} are basically a single link + some additional text and a category, and should be implemented with a single Lua function, with another function to handle {{blend}} and {{univerbation}} (similar to the code that implements {{compound}}, and maybe sharing that code).
Benwing2 (talk) 01:29, 12 March 2019 (UTC)
@Rua Thanks for pointing out the issue with PIE roots and such. I think {{affix}} should be extended to support such things (terms that aren't affixes but look like affixes) by using a special character to prefix such non-affixes. There are various reasons for doing this besides just making it easier to convert {{prefix}} and {{suffix}}; you might, for example, want to say something like {{affix|ine-pro|*prō-|^*bher-|*-ye-|*-ti}} (where I've used the caret ^ to indicate that the term should not be interpreted as an affix) and not have to worry about which (if any) of the *fix variants to use. What should that character be? Asterisk (*) is out because that indicates reconstructions; backslash (\) is a possibility as it has a similar function in code, but it might look ugly; exclamation point (!) is a possibility but I use it in a different (almost opposite) sense in {{affixusex}} and sisters; what do you think of caret (^)? Other possibilities are e.g. tilde (~), pound sign (#) or percent (%). Benwing2 (talk) 01:40, 12 March 2019 (UTC)
OK, I finished the first part of the cleanup and converted all the etymology templates to Lua. {{blend}} and {{univerbation}} now take multiple parts, similar to {{compound}}, and the others take a single part. Now I'm going to do the following: rewrite all uses of {{suffix}}, {{prefix}}, {{circumfix}}, {{infix}}, etc. that use |lang= to use |1=, moving the numbered parameters over by one; and then eliminate the compatibility support for |lang=. Benwing2 (talk) 02:03, 13 March 2019 (UTC)
I implemented support in {{affix}} for a caret (^) to mean non-affixal interpretation. I also fixed another issue preventing use of {{affix}} in East Asian languages, where the language-specific hyphen character was set to the empty string so that {{prefix}} and {{suffix}} wouldn't automatically add it. This setting formerly made it impossible to use {{affix}} for East Asian prefixes and suffixes. I fixed things so that you can use a regular hyphen to indicate an affix in these languages, but the displayed and linked term won't have a hyphen. I also fixed several edge-case bugs (e.g. the hyphen wasn't correctly added to translit in {{prefix}} and {{suffix}}, although the code clearly intended to do), and cleaned up a lot of duplicative code. Benwing2 (talk) 01:42, 15 March 2019 (UTC)

Cleanups of form-of templates[edit]

While we are at it, I mention the Form-of templates. From olde times, when they supposed that the language is English is not specified, they take the linked word as first entry and only |lang=, not a positional language parameter. While a change of this would be rather large, though probably consequent, and on topic since some imply etymology like {{clipping of}}, I mention these templates because of |nocap=, |nodot=, that constitute an annoyance. Some have a dot at the end by default, some not. This is irritating and cannot or should not be memorized by the user. Furthermore it wastes precious effort of the fingers to type this parameter and “|nocap=1” every time using the templates, particularly {{alternative form of}} (which puts no dot by itself, but has the capitalization problem). As with the now standing practice of capitaliation and dots in glosses, the setting should be inferred from the language code, i. e. dot and capitalization for the line in English entries and no dot and no capitalization in Arabic entries, unless specified otherwise. Fay Freak (talk) 13:45, 11 March 2019 (UTC)

Yes, I would support standardization of this category of templates. As they are mostly used in definitions, I would support the addition of a full stop at the end of the statement as the default, coupled with the ability to specify |nodot=1 or |nodot=yes to turn it off. — SGconlaw (talk) 14:18, 11 March 2019 (UTC)
Default for English, to be sure. In other languages the definition lines shan’t have dots nor begin capitalized. Fay Freak (talk) 15:42, 11 March 2019 (UTC)
I see no reason why English should be formatted differently. There should not be no dot or capital for English if it's not there for any other language. —Rua (mew) 17:22, 11 March 2019 (UTC)
I see neither any and I am a strongly against this formatting in English, I only was stating the current “rule” deriving from practice. Dots at the end of the lines are noise, they don’t bear information. Similarly I am also against dots at the ends of footnotes though some professors seem to think that they are required because footnotes are “sentences”. The capitalization even makes distinctions go away, say when there is an English word meaning one thing and a word written the same but capitalized meaning another thing. Plus the consistency argument. Fay Freak (talk) 17:49, 11 March 2019 (UTC)
I have to disagree, because I believe definitions should be treated as sentences and capitalized and punctuated as such. In some cases definitions consist of more than one sentence, and so will have to be punctuated. For consistency, those which consist of only one sentence should be given the same treatment. — SGconlaw (talk) 18:06, 11 March 2019 (UTC)
Only in those cases. Then the dot is a separator. Else you could as well use a semicolon, bar, star, or something ornate. It is arbitrary to see the glosses as “sentences” and to put dots behind them. In fact the word “gloss” tells us that they aren’t sentences, any more than in a Medieval manuscript if a word is explained in a gloss it ends with a dot. Also if they were sentences the headwords would be SOP. You understand this one? The glosses translate parts of sentences, phrases, but generelly not sentences (sometimes as glosses of interjection, but this is exceptional and can also be omitted). Putting a dot after glosses and footnotes is simple hypercorrection. It is like ending parentheses with dots, wrong the same way.
John – some guy I know from school. – has given me this book.
This wrong. Fay Freak (talk) 19:59, 11 March 2019 (UTC)
Oh, and by the way there is a practice not to end simple etymologies with dots, apparently standard in Russian entries. I remember having observed @Benwing2 removing such dots; so it looks on перепра́вить (pereprávitʹ). A dot wouldn’t do anything here, neither when we put the word “from” before the derivation. Saying this or that is “a sentence” because it is the abbreviation of a “sentence proper” or an imagined sentence in the strict sense and hence must be treated the same is just essentialist delusion. The question must be what is brought about with the signs. You might observe that it is not wrong at all if, when one nowadays chats, one sends single sentences without punctutation marks. This is because they don’t add any meaning, or even dilute it because the speaker did not want to decide between dot, , !, ?. In a “serious” work it cannot be otherwise for single full sentences. One sometimes puts the dots in these because of the dogged old expectation that “every sentence must end with a punctuation mark” but there isn’t even this expectation with database-like entries on the internet. Fay Freak (talk) 20:22, 11 March 2019 (UTC)
@Fay Freak, Rua, Sgconlaw English is fundamentally different from foreign languages because the definitions of English terms are paraphrases using other English terms (normally full sentences or long phrases), while the definitions of foreign terms are English equivalent terms (often single words). For example, the definition of English umbrella is written as follows:
  1. Cloth-covered frame used for protection against rain or sun.
while the definition of Portuguese guarda-chuva is written as follows:
  1. umbrella
Note the difference in formatting. This is fairly consistent across Wiktionary and is the reason I remove final periods when they occur in Russian definitions. For this reason I think the use of capital letters and periods in templates like {{alternative form of}} should be different for English vs. foreign languages: Capital letter and period in English, lowercase letter and no period in foreign languages. Benwing2 (talk) 01:47, 12 March 2019 (UTC)
Emoji u1f44d.svgSGconlaw (talk) 01:57, 12 March 2019 (UTC)
I agree with Benwing. - -sche (discuss) 05:46, 12 March 2019 (UTC)
That's for full definitions. But form-of templates are not full definitions, they are glosses in all languages, so they shouldn't have a period. It's inconsistent when the exact same definition gets a period in one language and not others. —Rua (mew) 11:08, 12 March 2019 (UTC)
@Rua There is no consistency. {{misspelling of}}, for example (and other templates using {{deftempboiler}}) do include a final period; see e.g. concious, mispronounciation. {{alternative form of}} (and others using {{#invoke:form of|form_of_t}}) do not. But both include a capital letter at the beginning, which is wrong for non-English languages. If we are to follow your idea that form-of templates are glosses, there shouldn't be either a capital letter or a period in any language. Otherwise, we should have both capital letter and period in English, but not any other language. Either way, the current inconsistent situation needs to be fixed. Benwing2 (talk) 03:20, 13 March 2019 (UTC)
We may have to have a poll or vote on this. — SGconlaw (talk) 11:58, 13 March 2019 (UTC)
This only concerning the formatting in English, if this is supposed to change. I was here concerned about the default formatting and dotting for formatting in foreign languages, which I said should be inferred from the language code, made consistent, this not being in doubt here. How English lines should be formatted is a divisible issue here. @Benwing2 Also we have missed {{unknown}} in the template cleanup, it shares also that code. Fay Freak (talk) 17:06, 13 March 2019 (UTC)
The issues of English vs non-English can't be separated. If the same definition occurs in both languages, it should be formatted the same way too. "Plural of" definitions should not be capitalised in one language but not another, that's inconsistent. —Rua (mew) 20:35, 14 March 2019 (UTC)
We will need a poll, I think. Even within English the usage is totally inconsistent, e.g. Template:misspelling of has a capital letter and period, Template:alternative form of has a capital letter and no period, and Template:en-past of has no capital letter and no period. So far, User:Rua is the only person disagreeing with my suggestion of handling periods and capital letters in English vs. other languages. BTW @Fay Freak I cleaned up {{unknown}}, {{onomatopoeic}} and {{spelling pronunciation}} to use only |1=, not |lang=, and obsoleted/deleted {{unk.}} (use {{unk}} instead) and {{Onomatopoeic}} (use the lowercase equivalent). Benwing2 (talk) 00:08, 15 March 2019 (UTC)
@Fay Freak Also {{adverbial accusative}}, and fixed up that one and {{unknown}}, {{onomatopoeic}} and {{spelling pronunciation}} to use Lua (which should flush out some bad param usages). Benwing2 (talk) 00:32, 15 March 2019 (UTC)

Character insertion table inserting some characters twice[edit]

I use that character insertion table – not entirely sure what it is called – that appears below the edit window, to insert characters and symbols. In the "Miscellaneous" menu, clicking on the superscript "a" and "o" (ª, º), for some reason, inserts two symbols instead of one. I wonder if this can be fixed. I guess this is quite a minor point, but it would be nice if someone could look into it. Thanks. — SGconlaw (talk) 09:04, 12 March 2019 (UTC)

@Sgconlaw: Thanks for the report. Fixed. — Eru·tuon 19:59, 12 March 2019 (UTC)
Thanks! It's working as expected now. — SGconlaw (talk) 11:56, 13 March 2019 (UTC)

Issue with Module:ko-headword at 타일[edit]

Hello,

The new Korean entry 타일 (tail) currently shows empty brackets: 타일 • () when it should show 타일 • (tail) - with a transliteration in brackets. (Notifying TAKASUGI Shinji, HappyMidnight): . --Anatoli T. (обсудить/вклад) 00:42, 14 March 2019 (UTC)

Hmm, I had to create two entries (for now): ​​타일 and 타일. The former is bad but what's wrong with it and what's the difference? --Anatoli T. (обсудить/вклад) 00:45, 14 March 2019 (UTC)
The first one has invisible Unicode characters at the front: when I hover over the link in Firefox, I get %E2%80%8B%E2%80%8B타일. I don't know why the translit wouldn't appear, but perhaps it's a helpful bug? —Suzukaze-c 04:03, 14 March 2019 (UTC)
(edit conflict) @Suzukaze-c: Thanks! that's the first thing I suspected but I haven't checked very well. There are two ZWNJ characters. Yes, it's good that the module failed but without a descriptive message. It fails on South East Asian languages with an error message. Deleting the entry now. --Anatoli T. (обсудить/вклад) 04:17, 14 March 2019 (UTC)
​​타일 contains two zero-width spaces (U+200B). I’ll delete it. — TAKASUGI Shinji (talk) 04:15, 14 March 2019 (UTC)
Too late, gone :) Thank you both. --Anatoli T. (обсудить/вклад) 04:17, 14 March 2019 (UTC)
@TAKASUGI Shinji, Suzukaze-c: BTW, I created the bad entry from a translation at [[tile]], not my copypasta error. --Anatoli T. (обсудить/вклад) 04:19, 14 March 2019 (UTC)
Turns out there are a fair number of pages with zero-width spaces. — Eru·tuon 04:46, 14 March 2019 (UTC)
@Erutuon: Yes, thanks, many South East Asian entries copied from Sealang dictionary or Wikipedia need to be quality-checked. Proper correct entries with automated transliteration won't even work for languages such Thai, Khmer and Burmese. --Anatoli T. (обсудить/вклад) 04:59, 14 March 2019 (UTC)
@Atitarev: It seems like Module:ko-headword was programmed to not transliterate if the entry isn't pure hangeul, which results in empty parentheses. I've changed it so that it produces a meaningful error message instead. —Suzukaze-c 04:48, 14 March 2019 (UTC)
@Suzukaze-c: Thanks, we need to check for exceptions, though. Arabic numbers, etc, may be part of valid titles, e.g. 7월 (7wol, “July”) = 칠월 (chirwol, “July”). The former is more common and a standard way of writing months in Korean. --Anatoli T. (обсудить/вклад) 04:59, 14 March 2019 (UTC)
Actually, I simplified my statement a bit. The module doesn't complain if the hangeul parameter is provided (as with hanja terms like 韓國). diff is my solution. —Suzukaze-c 05:23, 14 March 2019 (UTC)
@Suzukaze-c: Thanks, I didn't think about hangeul. rv= parameter is still required, if the reading is completely irregular, unless we respell the words completely, e.g. respell 십육 as "심뉵". If we do, we may consider adding long vowels but there's too much to do. Think about all geminations and all cases described at Template:ko-IPA/documentation. --Anatoli T. (обсудить/вклад) 05:32, 14 March 2019 (UTC)
Hm, I see. Perhaps we could try doing whatever the Thai infrastructure is doing (reading {{th-pron}}→reading {{ko-IPA}}). —Suzukaze-c 05:46, 14 March 2019 (UTC)
I don't object to have a more phonetic transliteration. The default transliteration is fine too but if a term is respelled with some additional parameters, we might as well transliterate eg 음식값 (eumsikgap) as something like "ēūmsikkap". "Tuttle Learner's Korean-English Dictionary" is already using a very phonetic transliteration, no long vowels are catered for, though. --Anatoli T. (обсудить/вклад) 05:57, 14 March 2019 (UTC)
I agree (I've also wondered in the past about including long vowels in the romanization), but I also haven't studied enough Korean, and wouldn't feel confident modifying Module:ko-pron appropriately _(:3 」∠ )_ —Suzukaze-c 06:09, 14 March 2019 (UTC)
The current pronunciation module is perfect and the quality of most entries is very high. We just need to merge "pron" with "translit", automate things. One of the reasons for not transliterating long vowels is digraphs, the other, maybe, the vowel length is not too prominent, semi-long and optional (?). Very phonetic transcription will invariably confuse someone who just want to know the script and understand etymologies a bit more. But we already have two parts in the pronunciation box: eg at 선로 (線路, seollo): Revised Romanization "seollo" and Revised Romanization (translit.) "seonlo" --Anatoli T. (обсудить/вклад) 06:20, 14 March 2019 (UTC)
Long digraphs are supposed to be encoded by U+035E COMBINING DOUBLE MACRON after the first letter. e.g. e͞u. Of course, font support may be poor. --RichardW57 (talk) 06:37, 25 March 2019 (UTC)

Ancestors in Module:etymology languages/data[edit]

Recently the Proto-Oghuz language was moved to Module:etymology languages/data during Victar's work on the Turkic languages. It is still set as the ancestor of Old Anatolian Turkish, which is the ancestor of Ottoman Turkish, which is the ancestor of Turkish. To let the language modules find Proto-Turkic, the parent language of Proto-Oghuz, as an ancestor of Turkish (and prevent module errors in etymologies such as ötmek), Proto-Oghuz has trk-pro (Proto-Turkic) given as its ancestor.

The etymology templates can't interpret the parent value in the data module as an ancestor, because it isn't the same thing. For instance, qfa-sub-grc (Pre-Greek) has the parent qfa-sub (substrate), but Pre-Greek does not descend from "substrate". Similarly, American English (en-US) isn't a descendant of English (en), it's a subvariety. So an ancestors value has to be given. This is the only ancestor given in the module; an etymology language only needs an ancestor if it in turn is the ancestor of a regular (non-etymology) language.

Writing this note for DTLHS mainly, who removed the ancestors for trk-ogz-pro (Proto-Oghuz), and because the discussion on this issue was held in Discord. User:KevinUp, Crom daba, Victar, Surjection, and I were involved. — Eru·tuon 05:52, 14 March 2019 (UTC)

Indeed, we cannot rely on the parent field for ancestors because it doesn't really make sense for some of those languages. That doesn't mean it couldn't be changed to be that way, but that requires wider changes and the ancestors field is the only proper solution right now. — surjection?〉 11:42, 14 March 2019 (UTC)

Transcription of Proto-Norse[edit]

Can someone fix so that transcriptions don’t show up when having a Proto-Norse reconstruction with Latin letters within t:desc, t:m etc.?Jonteemil (talk) 18:01, 14 March 2019 (UTC)

@Jonteemil: Yes check.svg Done. Just had to add Latn to the scripts for Proto-Norse. — Eru·tuon 20:31, 14 March 2019 (UTC)
This doesn't seem like the right way to go. Proto-Norse was never written in the Latin script, so why do we have reconstructions in Latin script? Would we have reconstructions for Gothic, OCS or Ancient Greek in Latin script? —Rua (mew) 20:33, 14 March 2019 (UTC)
Adding Latin to the list of scripts is not meant as a declaration that Proto-Norse should be written in Latin script, only as an acknowledgement that it is and that the modules have to know that fact or they will generate a transliteration for a Latin-script word and add rune-specific classes to it that can make it display with strange fonts for those with rune-appropriate fonts installed. If Proto-Norse shouldn't be written in Latin script, someone can go replace Latin script with runes or move it to the transliteration parameter, or whatever is necessary. — Eru·tuon 20:40, 14 March 2019 (UTC)
Now it shows up in Category:Proto-Norse language, which certainly does seem like a declaration that Proto-Norse can be written in that script. —Rua (mew) 20:48, 14 March 2019 (UTC)
Unfortunately language categories can only reflect our actual language data, which in turn is dictated by practical considerations and not by various other theoretical ideals. (It is also impossible at the moment to have the family tree simultaneously display Scots as the descendant of Northern Middle English and Northumbrian Old English and display Northern Middle English and Northumbrian Old English as subvarieties of Middle English and Old English respectively, as would be more accurate than displaying Scots as the descendant of Middle English and Northumbrian Old English and Northern Middle English separately.) If you have another solution to the problem of Old Norse written in Latin script being displayed with Runic fonts and having transliteration added to it, please tell me. — Eru·tuon 21:22, 14 March 2019 (UTC)
I think Rua already alluded to the solution, which is to have reconstructions use runic script. —Μετάknowledgediscuss/deeds 21:26, 14 March 2019 (UTC)
Sorry, I was wrong. I realize what should be done is to move the Latin script to the transliteration parameter. I'll undo my edit. — Eru·tuon 21:57, 14 March 2019 (UTC)

The reason why the reconstructions are in the Latin script is because Svensk etymologisk ordbok have them in Latin characters. Also, I unfortunately don’t know which rune correspond to which letter.Jonteemil (talk) 23:15, 14 March 2019 (UTC)

You should put Latin-script versions in the transliteration parameter (|tr= or |tr1=, |tr2=, etc. depending on the template). They will be tracked in Category:Proto-Norse terms needing native script and someone can add the Runic version. The current practice seems to be that reconstructed Proto-Norse terms are rendered in Runic script (see Category:Proto-Norse lemmas, which includes some terms in the Reconstruction namespace). — Eru·tuon 23:59, 14 March 2019 (UTC)
Ah, that makes sense!Jonteemil (talk) 16:11, 15 March 2019 (UTC)

IP Ban[edit]

hello!

this is not a constructive post. the constructive one was denied bc of specific spammer habits. here it is: hello! anyone considered an etymology based on "Magog" https://en.wikipedia.org/wiki/Gog_and_Magog on the "demagogue" discussion page.

obviously, my ip is banned. the reason: https://de.wikipedia.org/wiki/Benutzer:Baumfreund-FFM/R%C3%BCckblick#Administrative_Beitragszahlen_zum_Jahreswechsel_2016/17_(31._Dezember_2016) deletes 24thousands entries per year. at 240 work days, thats 100 per day. on a 10hrs day its 10 per hour. this guy slaughters one entry after the other. every 6minutes. hour for hour, day for day, year for year. my entry the other day impressed him so much that he woke up and banned me. that happens only 900 times a year due to him. only 3 times a day. day for day, year for year.

because its not possible to slaughter the entries on wikipedia plus the ones on wiktionary, these ip bans are very important to keep the wikis free. free in the definition of "empty".

and yes: i dont know if the grease page is the appropriate page for my defacement here. instead of "to deface" there is a free spot on the wiktionary synonyms page of "to grease".—This unsigned comment was added by 46.223.1.175 (talk).

First of all, actions taken at any other wiki have no effect on Wiktionary, so there is no IP ban involved. If there were, you would not be able to post any message, and you obviously just posted here. The message you got is from an abuse filter. Abuse filters are automated tests that the system runs while processing your edit to look for spam and vandalism. The abuse filter in question was created years ago, so it has nothing to do with you. I won't go into details, but the reason you triggered the filter had to do with the unnecessary URL you included in your edit: if you had written [[w:Gog and Magog]] or {{w|Gog and Magog}} instead of https://en.wikipedia.org/wiki/Gog_and_Magog, your edit would have had no problems. It just happened to fit a pattern that is almost never found except in edits by spambots. Sorry for the trouble.
As for your question: the derivation from an Ancient Greek term that was presumably once in actual use for demagogues is simple and obvious, so why try to dig up cryptic biblical references to enemies in prophecies? Chuck Entz (talk) 01:25, 16 March 2019 (UTC)

Catfix not working anymore[edit]

The fixup of categories to add language tags and section links isn't working anymore on Category:Old Dutch lemmas. —Rua (mew) 19:28, 15 March 2019 (UTC)

@Rua: Fixed. — Eru·tuon 19:37, 15 March 2019 (UTC)

Requesting change to template "langname-mention"[edit]

Hi, can someone change {{langname-mention}} so that it displays only the name of the language when the third parameter is a hyphen? For some reason, if the third parameter is a hyphen, it treats it literally as a hyphen rather than an empty string, which isn't the behaviour of other templates. --Florian Blaschke (talk) 01:28, 16 March 2019 (UTC)

Why are you using this template? What purpose does it serve that {{cog}} (or {{noncog}}) would not be better for? —Μετάknowledgediscuss/deeds 02:54, 16 March 2019 (UTC)
It was created after a proposal of mine. It is for use in discussions, where {{cog}} or {{noncog}} are not appropriate. --Florian Blaschke (talk) 22:49, 19 March 2019 (UTC)
@Florian Blaschke: Done. — Eru·tuon 22:58, 19 March 2019 (UTC)
Thanks a bunch! --Florian Blaschke (talk) 23:08, 19 March 2019 (UTC)

CAT:E lots of 'em, xme no longer valid[edit]

Someone removed a language code causing lots of errors. Benwing2 (talk) 14:46, 16 March 2019 (UTC)

@Victar, it's better to do as much of the moving as possible before changing up a code so that you can avoid this. —Μετάknowledgediscuss/deeds 15:39, 16 March 2019 (UTC)
Huh, I thought I got most of theses. Thanks, I'll fix them now. --{{victar|talk}} 15:47, 16 March 2019 (UTC)

Spanish words with a dot[edit]

Hey all. Can someone make a list of all Spanish entries containing a dot? I have a feeling we have lots of incorrect abbreviations around here - S.T.D for example should be S. T. D. We have Category:Spanish terms spelled with ., which is underpopulated and pretty useless. --I learned some phrases (talk) 08:44, 18 March 2019 (UTC)

User:I_learned_some_phrases/Spanish_Dots this work for you? Also, I hate when we have spaces in acronyms/abbreviations/initialisms. - TheDaveRoss 14:27, 20 March 2019 (UTC)

Portuguese greenlinks[edit]

Plurals ending in -ais of adjectives ending in -al are using the template {{feminine singular of}} rather than {{plural of}}. Ultimateria (talk) 22:57, 19 March 2019 (UTC)

Colour table templates[edit]

In the Template:table:colors subpages, do we have a standard for dealing with multiple terms for one colour in a language? If not, let's pick one and document it. Looking at German for example (Template:table:colors/de) we have inconsistencies within the one template (some with a second colour in brackets, some with spaces between terms, some with commas between terms). -Stelio (talk) 14:10, 20 March 2019 (UTC)

[POLL] Further cleaning up form-of templates[edit]

Our handling of form-of templates is completely inconsistent. Some (the majority) use {{#invoke:form of|form_of_t}}, with either a capital or lowercase letter at the beginning of the template text (depending on the template) and no trailing period, while a significant minority use {{#invoke:form of|alt_form_of_t}} (formerly {{deftempboiler}}), with a capital letter at the beginning of the template text (unless |nocap=1 is used) and a trailing period (unless |nodot=1 or |dot= is used). For example, {{obsolete form of}} has a capital letter and period, while {{obsolete spelling of}} has a capital letter without a period and {{en-past of}} has a lowercase letter without a period. I argued above that all such templates should have a capital letter and period in English, but a lowercase letter without a period in other languages, for the following reason:

English is fundamentally different from foreign languages because the definitions of English terms are paraphrases using other English terms (normally full sentences or long phrases), while the definitions of foreign terms are English equivalent terms (often single words). For example, the definition of English umbrella is written as follows:
  1. Cloth-covered frame used for protection against rain or sun.
while the definition of Portuguese guarda-chuva is written as follows:
  1. umbrella
Note the difference in formatting. This is fairly consistent across Wiktionary and is the reason I remove final periods when they occur in Russian definitions. For this reason I think the use of capital letters and periods in templates like {{alternative form of}} should be different for English vs. foreign languages: Capital letter and period in English, lowercase letter and no period in foreign languages.

User:Sgconlaw and User:-sche agreed with me, User:Fay Freak appears to agree, while User:Rua disagrees. I'd like to take a poll to see what people think:

  1. Leave everything the way it is, with all the inconsistencies.
  2. Be consistent, using a trailing period and capital letter for English, and no trailing period or capital letter for other languages.
  3. Be consistent, using a capital letter but no trailing period for English, and no trailing period or capital letter for other languages.
  4. Be consistent, the same way across all languages, using a trailing period and capital letter everywhere.
  5. Be consistent, the same way across all languages, using a capital letter but no trailing period everywhere.
  6. Be consistent, the same way across all languages, using no trailing period or capital letter everywhere.

My personal feeling is I'd be fine with either #2 or #3 and probably OK with #6 as well; I feel more strongly about not having a trailing period or capital letter in foreign languages than the exact formatting in English. Benwing2 (talk) 21:29, 20 March 2019 (UTC)

BTW don't worry too much about technical issues involving changing the way templates handle trailing periods, I'm pretty sure I can figure out a way to do a bot run to automatically fix whatever we decide. Benwing2 (talk) 21:31, 20 March 2019 (UTC)
Consistency across languages makes more sense I think. Why would the exact same definition have one format in one language and another format in another language? Definitely 4-6. —Rua (mew) 21:38, 20 March 2019 (UTC)
#4. I don't mind the odd capital in non-Eng entries. There's also {{clipping of}} and co. – Jberkel 22:31, 20 March 2019 (UTC)
  • #4, with #2 as a backup if there's no consensus for #4. — SGconlaw (talk) 08:02, 21 March 2019 (UTC)
  • #4 is my preference too. -Stelio (talk) 08:52, 21 March 2019 (UTC)
Regarding the final dot, it must be kept in mind that it's much easier to add a dot than it is to remove it. Adding it means literally just typing a dot. Removing it means nodot=1, which is much longer. —Rua (mew) 23:52, 21 March 2019 (UTC)
I personally hate #4 in that IMO the trailing period looks very wrong in non-English entries, which are full of short glosses, not full sentences. If we are to settle for the same across all languages, I'd much prefer #6. Also keep in mind that, after looking through all the form-of templates, the majority of them don't have either a capital letter or period. I have also found the automatic capital letter/period very confusing and error-prone when trying to format entries properly with additional text. Benwing2 (talk) 20:06, 22 March 2019 (UTC)

Templates/modules not minding what namespace they're in again[edit]

In e.g. this revision of Citations:recensent, the {{der}} and {{lb}} both put the page into categories. In the past, for at least a while, this was not the case and categories only got added to pages in certain namespaces. (I seem to recall seeing this issue pointed out twice before and it being fixed at least one of those times.) Is this something to be fixed, or would checking for namespace be too expensive Lua-memory-wise / would it be preferable to make checking categories for pages from unapproved namespaces a occasionally-recurring TODO task? - -sche (discuss) 05:29, 21 March 2019 (UTC)

I guess Module:etymology and Module:labels will have to be modified so that they don't add categories in the Citations namespace. They use Module:utilities to format categories and the set of namespaces in which Module:utilities adds categories includes Citations, but the same list doesn't work for all modules; {{citation}} needs to be able to add categories to the Citations namespace, but {{der}} and {{lb}} shouldn't. — Eru·tuon 05:45, 21 March 2019 (UTC)
Ah, right, I remember now that that was an impediment before. Well, as to {{der}} etc, it's not just the Citations: namespace that they shouldn't categorize in (or, alternatively, shouldn't be used in and would need to be cleaned up out of): they shouldn't categorize in most namespaces, it's only a small list of namespaces where they should categorize (mainspace, Reconstruction:, maybe Appendix:; where else?). - -sche (discuss) 07:14, 21 March 2019 (UTC)
Right, but there's no need to worry about any other namespaces because Module:utilities already disables categorization there. The only namespace in which categorization needs to be manually disabled (that is, in which Module:etymology and Module:labels should not call the format_categories function from Module:utilities) is Citations. (I suppose it would be more straightforward or understandable to send a custom list of allowed namespaces to Module:utilities, but that would require changes to the categorization function.) — Eru·tuon 07:39, 21 March 2019 (UTC)
Actually, maybe it would be easier to have to enable categorization in the Citations namespace than to disable it. Most likely most templates or modules want to avoid categorization in the Citations namespace. — Eru·tuon 08:06, 21 March 2019 (UTC)

New rare initial for Khmer is required[edit]

(Notifying Stephen G. Brown, Octahedron80): Hi. Khmer ហ្ស៉េន "zen" can't be transliterated. Getting the error "Lua error in Module:km-pron at line 269: Error handling initial ហស៉." . Can anyone with Lua skills, please add this new initial ហ + ស pronounced as /z/? It must be rare. It should follow the class of the initial consonant ហ. The whole term with diacritics should transliterate as "zeen", if using {{m|km|ហ្ស៉េន}}, for example. --Anatoli T. (обсудить/вклад) 23:43, 21 March 2019 (UTC)

(Notifying Dixtosa, Kc kennylau, Rua, Ruakh, ZxxZxxZ, Erutuon, Jberkel): Calling Lua experts as well. --Anatoli T. (обсудить/вклад) 23:44, 21 March 2019 (UTC)

I see ហស (class 1) and ហស៊ (class 2) already have /z/ in consonant table. Are you sure that ហស៉ is valid? (misspelling?) And which class is it? --Octahedron80 (talk) 23:57, 21 March 2019 (UTC)

@Octahedron80: Yes, interesting, thank you. ហ្សេន (zeen) already produces "zeen", which is already "a-series". It must be invalid then. I can't understand the module well. --Anatoli T. (обсудить/вклад) 00:12, 22 March 2019 (UTC)

sort_key for Northern Tepehuan[edit]

Could someone add the following to the entry for Northern Tepehuan (ntp) at Module:languages/data3/n?

	sort_key = {
		from = {"á", "é", "í", "ó", "ú"},
		to   = {"a", "e", "i", "o", "u"}},

--Lvovmauro (talk) 05:30, 22 March 2019 (UTC)

Added. DTLHS (talk) 05:46, 22 March 2019 (UTC)

Substitutable params in documentation template data?[edit]

@Rua Do you (or does anyone) know how to transclude text inside of the <templatedata> tag that holds template data for documentation? It doesn't appear to work at all. I tried putting the start tag, end tag and contents in three different templates but that doesn't work either. This appears to mean that there's no way to templatize the template data and share it across the documentation of multiple similar templates (e.g. the form-of templates). If so, this really sucks as it means we have to copy the entire template data on each documentation page. (In general the template data appears badly thought out, e.g. why does it have to be visible on the doc page? Why can't it live on another page?) Benwing2 (talk) 01:46, 23 March 2019 (UTC)

The tag magic word works: {{#tag:templatedata|{{template with TemplateData tag contents}}}}. — Eru·tuon 02:26, 23 March 2019 (UTC)
@Erutuon Thanks! Benwing2 (talk) 12:34, 24 March 2019 (UTC)
@Erutuon Do you know if you can stick the template data inside of <includeonly>...</includeonly> tags and still have it work? That way it won't clutter up the visible documentation. Benwing2 (talk) 03:54, 25 March 2019 (UTC)
@Benwing2: That's beyond my knowledge. I haven't worked with TemplateData much. — Eru·tuon 04:36, 25 March 2019 (UTC)
@Erutuon OK, thanks. Benwing2 (talk) 04:48, 25 March 2019 (UTC)

Why can't I save my page?[edit]

Why can't I save my page? 194.228.13.62 13:16, 23 March 2019 (UTC)

I have tried to save my page, but table with

This action has been automatically identified as harmful. Unconstructive edits will be quickly reverted, and egregious or repeated unconstructive editing will result in your account or IP address being blocked. If you believe this action to be constructive, you may submit it again to confirm it.

A brief description of the abuse rule which your action matched is: probably vandalism. If you believe your edit was flagged in error, you may report it on the Wiktionary:Grease pit.

still shows. Why can't I JUST save my page and add it here? WHAT IS WRONG??? 194.228.13.62 13:17, 23 March 2019 (UTC)

We have abuse filters to prevent vandalism and badly formatted entries. They sometimes catch formatting errors. Every month or so such a filter catches an error in something I am trying to save. For me it is usually unmatched brackets. DCDuring (talk) 13:27, 23 March 2019 (UTC)
And are you able to save my page? I'm unable to add the text here, because it disallows me to do so. I can't even add the name here. 194.228.13.62 13:30, 23 March 2019 (UTC)
Spell it out - one letter at a time. SemperBlotto (talk) 14:23, 23 March 2019 (UTC)
@SemperBlotto you can see which edits by a particular user have triggered abuse filters here.
@194.228.13.62 it looks like your entry is flagged due to the combination of characters which form the word (i.e. no vowels mostly). I created the page (čtvrthrst) based on your first edit attempt, now that it exists I think you should be able to modify it if necessary. Sorry for the inconvenience and thanks for contributing despite the hurdles. - TheDaveRoss 15:46, 23 March 2019 (UTC)
Thanks alot! 194.228.13.62 15:48, 23 March 2019 (UTC)
Lol, our abuse filters blocked Czech for looking too much like gibberish. —Rua (mew) 16:02, 23 March 2019 (UTC)

Does anyone have a better solution for this ugly code?[edit]

In Module:fy-adjectives there are a few lines like this:

data.forms["pred|comd"] = data.forms["pred|comd"] or {}; table.insert(data.forms["pred|comd"], form)
data.forms["indef|n|s|comd"] = data.forms["indef|n|s|comd"] or {}; table.insert(data.forms["indef|n|s|comd"], form)

The idea is to first set the value to an empty table if it's currently nil, then append a new value to either that empty table or the existing one, depending on which was the case. It's rather ugly code but I can't think of something elegant at the moment. @Erutuon, Benwing2 Does anyone have an idea for nicer code? —Rua (mew) 21:37, 23 March 2019 (UTC)

You can use Module:auto-subtable. — Eru·tuon 21:46, 23 March 2019 (UTC)
Thank you, I figured someone would have made a nice solution already. Is there a reason why the metatable is removed at the end? —Rua (mew) 21:51, 23 March 2019 (UTC)
Removing before make_table is probably necessary, because the code in make_table is checking whether fields are nil (local forms = data.forms[param]; if not forms then ... end). And I've run into weird bugs when the metatable is left on the table, so I figure it's generally good to remove it by default after the code that actually needs it. — Eru·tuon 22:03, 23 March 2019 (UTC)

edit request: MediaWiki:Common.css[edit]

These are my proposed changes. They mostly have to do with CJK and consistency, although I also did minor cleanup elsewhere. —Suzukaze-c 22:14, 23 March 2019 (UTC)

Bot Task- Remove All Empty Parameters[edit]

Every time a template gets converted to use Module:parameters, CAT:E fills up with {Fill-in-theblank} is not used by this template" errors for empty named parameters- the parameter name followed by "=", and nothing else. I'm assuming these are there in the first place because people are either substing templates that have all the parameters present so they can be filled in by hand, or people are copypasting the empty template into entries and customizing it for each entry. These parameters do nothing and consequently are ignored by both the system and the editors.

This seems to me like a wholly preventable problem. It should be fairly simple to remove all of these by bot without any ill effects. It would no doubt be time-consuming up front, but would save as much or more time that would otherwise be spent cleaning up module errors later on- errors that temporarily break the entries and make a bad impression on site visitors until they're fixed.

A possible second step would be to add an abuse filter once they're gone to remind contributors not to leave empty parameters (I can see how this might interact poorly with the way preload templates are used for new entries, though).

Anyone interested? Chuck Entz (talk) 16:11, 24 March 2019 (UTC)

First of all, using parameters in a template should not throw an error, there is no reason for that not to fail gracefully from a presentation perspective. Categorizing them for cleanup is one thing, having a big, red error message show in place of what might otherwise be a perfectly acceptable template output is just silly. We should design modules to ignore unexpected parameters instead, and only give errors when parameters are missing or of the wrong type.
Second, I think your idea to remove blank labeled parameters (and empty parameters when applicable) is sound, as long as nobody can provide instances when they are beneficial.
Third, sure I would be happy to run a bot to do some cleanup once the specifics are nailed down. - TheDaveRoss 16:40, 24 March 2019 (UTC)
@Chuck Entz My apologies, in this case the error messages about dot= arose as follows: (1) form_of_t in Module:form of was allowing but ignoring 'dot'; (2) I created a tracking category to find all uses of 'dot'; (3) I cleaned them up; (4) I disallowed 'dot'; (5) some errors ensued because the tracking category I created failed to track cases with empty dot=. As these are only a few, I've been cleaning them up as they come, but if you want I'll temporarily add 'dot' back as an allowed param and reenable the tracking for empty dot=, and then clean them up over several weeks as the site gradually reprocesses all the pages. Benwing2 (talk) 18:27, 24 March 2019 (UTC)
@Chuck Entz I went ahead and added 'dot' back and fixed tracking for empty dot=. Benwing2 (talk) 02:19, 25 March 2019 (UTC)
@Benwing2: That might take a while. I noticed that all templates invoking Module:form of seem to end in  of, so I went through the dump and gathered instances of templates ending in  of, gathered the templates that include |dot=, and created this list, if you can use it for a bot run. — Eru·tuon 03:59, 25 March 2019 (UTC)
@Erutuon Thanks, this is helpful. Most of these are using the alt_form_of_t form-of templates, which do support dot= and by default add a final period, but I have the full list of both types of templates, so I can sort things out. Benwing2 (talk) 04:08, 25 March 2019 (UTC)
@Benwing2: Whoops, the list may be missing a few entries because I didn't trim the template names. I've added trimming of ASCII whitespace and am regenerating the list. Not sure if I need to handle Unicode whitespace too, which would be more complicated. [Edit: Never mind, none were missing.] — Eru·tuon 04:38, 25 March 2019 (UTC)
@Erutuon OK, cool. Benwing2 (talk) 04:49, 25 March 2019 (UTC)
@TheDaveRoss I disagree that we should just ignore unrecognized params, vs. throwing an error on unrecognized params, because these unrecognized params generally represent mistakes (particularly typos). If we display an error, the person who made the error will generally see it and fix it; otherwise they won't know they made a mistake, and the error may never be fixed, even if it sits in a silent tracking category. As an example, the *fix templates used to ignore and silently track cases where the user accidentally wrote 't', 'tr', etc. (in place of 't1'/'t2'/'t3', ..., 'tr1'/'tr2'/'tr3'/etc.). Before I turned on the code to make these cases an error, I went through and manually emptied the categories, which was a lot of work because there were over 1,000 such cases. This could not be done by bot because there was no way to automatically determine which of 't1', 't2', 't3', etc. was correct. By throwing an error, we ensure that such incorrect uses don't accumulate. Benwing2 (talk) 18:34, 24 March 2019 (UTC)
@TheDaveRoss BTW you need to be careful about blithely removing all empty named params because there may be templates where such params are significant. For example, the various templates that formerly used {{deftempboiler}} used to recognize empty dot= to indicate that the default trailing period should be suppressed. When I converted these templates to use alt_form_of_t in Module:form of, I wrote a bot script to find all uses of empty dot= and convert them to nodot=1 (because having empty params be significant is very fragile and generally a bad idea); in this case removing them would have been incorrect. In general there's no way to know whether a given empty or blank param is significant except by analyzing the template or Lua code that implements the template in question. Benwing2 (talk) 18:39, 24 March 2019 (UTC)
@Benwing2: I agree with your second point, which is what my third point was about. There will be times when it is not appropriate to remove them and we need to be conscious of that. Re your first point, having a big red error message in the middle of an entry (or many entries) is not a particularly elegant means to alert folks to their errors, especially when the error doesn't actually prevent the correct outcome from being presented. If I stuck a stanza= into {{quote-song}} but otherwise populated the necessary parameters correctly that needn't prevent the quote from displaying at all. In fact some (possibly) useful information might be included. It is a minor point, but I don't like showing errors in production environments unless they are unavoidable. - TheDaveRoss 22:06, 24 March 2019 (UTC)
@TheDaveRoss: Yeah, I understand your point, I just don't see any other practical way in general of alerting people that they made a mistake (since they're almost always in fact mistakes, usually a misspelling of a parameter). Having an error on a lot of pages is definitely a bad idea but usually it's only a few, and they get fixed quickly. Benwing2 (talk) 22:50, 24 March 2019 (UTC)
@Benwing2: I suppose, ideally, the error would show on the preview and be suppressed once the page is saved. I am not sure if Modules can tell if they are in a preview context or not, perhaps some clever use of subst: could accomplish this. - TheDaveRoss 23:07, 24 March 2019 (UTC)
Module:Check for unknown parameters on Wikipedia uses frame:preprocess( "{{REVISIONID}}" ) == "" to determine if it is in preview mode and generate different text, but it's maybe simpler to generate the same text but use the previewonly class to hide it, as {{IPA}}, {{Q}}, {{ar-root}}, and {{grc-IPA}} do. — Eru·tuon 23:24, 24 March 2019 (UTC)
It's a good idea to track the unrecognized parameters so that they can be cleaned up when a template is first converted to use Module:parameters. The module errors can be turned on when no instances have unrecognized parameters. To find unrecognized parameters, it's not reliable to use a tracking template when lots of widely transcluded modules are being edited: pages will update slowly and "what links here" page for the tracking template will take a while to fill up. (That's been happening lately because of all of Benwing2's work. I think it would be a good idea to make fewer edits to widely transcluded modules, by using sandbox modules for instance. That's the practice often taken on Wikipedia.) Dump analysis is a more reliable method. I do have a program to grab all instances of a template for easier analysis, if that would be useful. — Eru·tuon 20:01, 24 March 2019 (UTC)
@Erutuon Generally I do use userspace modules. Sometimes I've not but I'll be better about that in the future. As for dump analysis, that is a good idea; I've been doing all my work using refs and categories but it looks like the dump is only 600-800 MB compressed (I expected more like 10+ GB based on working with Wikipedia dumps). BTW do you have experience working with Toolforge? If so any comments as to how difficult it is to set up, how reliable, etc.? Benwing2 (talk) 21:38, 24 March 2019 (UTC)
@Benwing2: Yeah, the Wiktionary dump is a lot smaller than the Wikipedia one. At one point I considered searching the Wikipedia dump for something or other, but gave up when I saw how huge it was.
I haven't used Toolforge, but Dixtosa created a tool for finding entries with a particular suffix so maybe he would have some comments on it. After the recent discussion about replacing {{redlink category}} with a toolserver, I've been thinking seriously about trying to devise a toolserver that uses one or more of my dump analysis programs, for instance one that would allow people to get or search all instances of a particular template, or to find all instances that use a particular parameter. That could be complicated, though, and I don't have any experience with web servers. — Eru·tuon 22:21, 24 March 2019 (UTC)
@Erutuon I did my dissertation work partly on Wikipedia; the dump was around 9 GB bz2-compressed at the time, maybe 30 GB uncompressed, and I imagine it's grown significantly since then. Processing it took awhile, and doing things like sorting the entries wasn't easy on a 16 GB laptop, since it couldn't fit all in memory. Benwing2 (talk) 22:34, 24 March 2019 (UTC)
The latest dump is 6.34 GB uncompressed. I don't load it all at once into memory but I use an XML iterator which is fast enough for my purposes. DTLHS (talk) 23:31, 24 March 2019 (UTC)
The all-revisions version is big, the current-revisions version is smaller. In other good news, the toolserver has access to replica databases, so you don't have to parse xml dumps you can just query the actual MW database structure for the vast majority of tasks. They are also much more up-to-date. - TheDaveRoss 23:46, 24 March 2019 (UTC)
Haha, no I'm not going to to 5 million queries for each report I want to run. The things I actually care about are most definitely not captured in the "MW database structure". DTLHS (talk) 23:51, 24 March 2019 (UTC)
For sure it is not useful in every case, but in this case (finding empty parameters for particular template calls) a single regex query could easily return a list of all pages which match. Further you can create tables on toolserver, so things like capturing a complete list of all titles by language can be done once and then leveraged moving forward, etc. - TheDaveRoss 00:07, 25 March 2019 (UTC)