Wiktionary:Grease pit/2021/May

Importing of the Alar dictionary corpus

Alar is an open source Kannada dictionary made by V. Krishna that has became a very popular online source and described as "comprehensive" by Deccan Herald, as well as received positive feedback from Kannada news sources (see links on the bottom of this page). Even better, its corpus is licensed under ODbL 1.0 (source), which allows for sharing, creating, and adapting the database (including for commercial use) given attribution, share-alike and keeping it open (althoug you can also add DRM to it if you also offer a free version), and is available as a single YAML file on Github. (side note: Fenakhay brought up some issues on Discord using this link from OpenStreetMap, however I think this applies to a specific situation on OSM where users can change the license of their contributions, while all contribution to WMF wikis are GDFL/CC-BY-SA dual licensed per meta:Licensing update, and the "produced works" should still be compatible with the project as even static images (as opposed to dynmaps or pure data) of WMF maps are still compatible with OSM's licensing as detailed here).

With over 150k Kannada entries with corresponding English definitions, it would significantly enhance our current coverage of the language, which has only 1122 lemmas as of present in Wiktionary. So, given these circumstances, can we use a tool to run through the file, incorporate these entries into Wiktionary, and properly attribute them to Alar/the data source? I think that, if we could, it would prove to be very useful, given the scope and pre-formatting of the entries' data as well as, more uniquely, how recent it is (not often is there open data like this for recent works, its usually for works that are old enough to be in the public domain). MSG17 (talk) 02:31, 5 May 2021 (UTC)[reply]

@MSG17: I'm not sure the license is compatible. People can use Wiktionary's data for commercial/restricted purposes; the Share-Alike requirement would seem to prevent that. —Μετάknowledge^{discuss/deeds} 03:51, 5 May 2021 (UTC)[reply]

@Metaknowledge: According to the full license text, the rights of users "explicitly include commercial use, and do not exclude any field of endeavour". As previously mentioned, as long as access to a free version of the data is offered, parallel restricted distribution is allowed (see 4.7.b in the full license). One again, I would like to point out that Wiktionary and other WMF projects are also ShareAlike licensed with the explicit motivation to make it easier to incorporate ShareAlike content. According to Creative Commons, the CC-BY-SA 3.0 license used is even more restrictive, stating "You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits." MSG17 (talk) 12:04, 5 May 2021 (UTC)[reply]

Indeed, CC-SA definitely does not exclude commercial reuse; that would by CC-NC (which is not allowed on Wikimedia projects except under a claim of fair use). —Mahāgaja · talk 20:40, 18 June 2021 (UTC)[reply]

Lua error: not enough memory

On se, starting at Serbo-Croatian, all I see "Lua error: not enough memory". If anything that's important and Lua-related was changed recently, maybe it would be a good idea to change it back. -Xbony2 (talk) 14:06, 5 May 2021 (UTC)[reply]

@Xbony2: See Wiktionary:Lua memory errors. J3133 (talk) 14:12, 5 May 2021 (UTC)[reply]

Ah. Thanks. -Xbony2 (talk) 15:05, 5 May 2021 (UTC)[reply]

Template:switch parser function

Is there a limit to the number of branches in a #switch function? — Salt marsh. 06:47, 7 May 2021 (UTC)[reply]

"A #switch can contain over 1,000 branches, but for better speed should be split to have less than 100 branches, in each of multiple or nested parts.". —Μετάknowledge^{discuss/deeds} 16:08, 7 May 2021 (UTC)[reply]

Alagoas, Amapá, and Sergipe

Alagoas is defined in the Portuguese section as "A state in the the Northeast Region of Brazil. Capital: Maceió." with an extra "the." Problem is, when I tried to edit it to remove the typo, I instead got a template saying "place|pt|state|macroregion/pt:Northeast Region|c/pt:Brazil|capital=pt:Maceió."

How do I fix this??

EDIT: Amapá has the same problem as well!

TheTheRemover (talk) 19:20, 11 May 2021 (UTC)[reply]

Search finds nothing

I listened to a Russian video and thought they said "можедельник", so I looked up this word here in English Wiktionary, but the search (Special:Search/можедельник) found nothing. It turns out there is an entry for можжевельник, which was the right word (juniper bush), I had only two letters wrong. But even a search for "можевельник" (just one letter missing, Special:Search/можевельник) gives no results at all. Is this really a state-of-the-art search function in the year 2021? Now, the same search in Russian Wikipedia (w:ru:Special:Search/можедельник) does find the entry. What's different? --LA2 (talk) 08:53, 16 May 2021 (UTC)[reply]

@LA2: It's unfortunate that the search suggestions are less helpful here for Cyrillic text. The search engine is a MediaWiki extension that we on Wiktionary have little control over, and I don't know how the search suggestions work. You will be more likely to get an answer on a Wikimedia IRC channel like #wikimedia-tech on Freenode, or on mw:Help talk:CirrusSearch. Perhaps there are language-specific search suggestion rules and the search suggestions here are English-based and the ones on the Russian Wikipedia are Russian-based? Just a guess. — Eru·tuon 05:10, 17 May 2021 (UTC)[reply]

@LA2: I just noticed in Tech News that they're working on fixing cross-site search inconsistencies like this: see the task T219550 on Phabricator. — Eru·tuon 21:38, 7 June 2021 (UTC)[reply]

`{{sumerogram of}}` and `{{akkadogram of}}`

I have created these two small form-of templates, as they are greatly used in Akkadian entries. However, most definitions are Sumerogram for". Should this be addressed or is of good enough? Cheers, Swaare (talk) 09:41, 17 May 2021 (UTC)[reply]

Different lemma inflections in headword lines

I've encountered a presentation issue with Pali verbs in some non-Roman scripts. Because emboldening does not work well in some scripts, enlargement is being used as an alternative or supplement. This is seen in both headwords and quotations and usage examples. Unfortunately, Pali verb roots should only be given in the Roman scripts; there is no benefit in giving them in other scripts, because they are not words. (The Pali names of roots end in vowels; their English names mostly don't.) How can I induce Module:headword to recognise that they are in the Roman script. --RichardW57m (talk) 12:48, 18 May 2021 (UTC)[reply]

I've found the answer. The solution is |f1sc= of {{head}}. I implemented the solution in {{pi-verb}} on 20 February 2023. --RichardW57m (talk) 10:09, 16 March 2023 (UTC)[reply]

I don't know if a similar consideration potentially applies to Sanskrit - perhaps they do intend to have dozens of manually translated and maintained pages for Sanskrit roots. It can be argued that Sanskrit roots are translingual. --RichardW57m (talk) 12:48, 18 May 2021 (UTC)[reply]

Character Folding

Do we control character folding for searches for Wikimedia-implemented searches of Wiktionary? For example, to find a Roman text word, I can generally find it without having to enter the diacritics. I assume this is done by some form of character folding. Do we control this mechanism? Could we set it up to not distinguish U+1004 MYANAMR LETTER NGA and U+105A MYANMAR LETTER MON NGA? In some (font-dependent) contexts, these two characters are indistinguishable. In the Default Unicode Collation Element Table (DUCET), and presumably therefore to ICU, they are defined to be as distinct as 'a' and 'b' are. Experimentation has shown that the search for a word encoded with one of the characters will not find a word encoded with the other. --RichardW57m (talk) 13:18, 18 May 2021 (UTC)[reply]

Have we, and if so how, solved the problem of multiple encodings for Malayalam chillus? --RichardW57m (talk) 13:18, 18 May 2021 (UTC)[reply]

Please make Wiktionary Read & Edit mobile lite app

Wiktionary would greatly benefit from an editing App as the literature search needed is very simple to add content to entries. Word translations and all can be conveniently added from a smartphone.

A mobile app is very much crucial and needed for Wiktionary. The old Wiktionary mobile app had 1 million+ downloads.

So please make a Wiktionary lite app/progressive web application based on monobook responsive skin or similar. Wiktionaries are losing huge number of potential edits due to the absence of an app. Thank you! Vis M (talk) 21:54, 18 May 2021 (UTC)[reply]

@Vis M: I actually requested this in last years Community Tech Wishlist, but Wikimedia has put apps on hold (and I think it wasn't that well received anyway). Unfortunate, as I think Wiktionary would really benefit from an app (theres definitely a lot of potential for features such as highlighting terms and getting definitions), but understandable. MSG17 (talk) 23:53, 4 July 2021 (UTC)[reply]

Yes, I had also made a request for making web apps for wikt:, voy:, etc. I wish they will make a web app one day Vis M (talk) 00:43, 5 July 2021 (UTC)[reply]

LCCN number apparently not recognized by catalog.loc.gov?

Hey- could you look at this LCCN for me? On the Dabancheng page, I've got a quotation from a 1993 book, and I want to add the LCCN link. When you click on LCCN: 93-83190 it says: "Your search found no results." But on this page in the book (near the top of the page) it clearly says "Library of Congress Catalog Card Number: 93-83190". What gives??? --Geographyinitiative (talk) 18:44, 19 May 2021 (UTC)[reply]

Translations — target languages

For some weeks now (on my screen at least) the "select target languages" option appears not to be working. However, on closer inspection today, it is working but the choice buttons are invisible. So you can make a choice, but only by guessing/clicking where the button might be. — Salt marsh. 10:48, 21 May 2021 (UTC)[reply]

@Saltmarsh: This should fix it. The stylesheet was linking to images that had apparently been moved. — Eru·tuon 08:19, 28 May 2021 (UTC)[reply]

@Erutuon: thank you — Salt marsh. 05:09, 29 May 2021 (UTC)[reply]

swahili verb conjugations

neither the template nor the module of sw-conj has been edited lately, so i assume that this is yet another instance of code changes lurking behind a template that isnt easy to see.

basically, i want to know if this is working as intended. don't get me wrong .... i love the way it looks now .... but i suspect Im in the minority. also, is it just sw-conj or have other templates also been expanded?

see Template:sw-conj/documentation#Example_usage for an example of what it looks like now.

thanks,

—Soap— 12:48, 22 May 2021 (UTC)[reply]

@Soap I tried to improve it. Did it help? This, that and the other (talk) 05:26, 17 June 2021 (UTC)[reply]

Not really, no .... I was being polite up top because I could tell that people put really hard work into this and I didnt want to make it seem like I had no apprciation for others' hard work. But the change you made really just made the text bigger .... I was talking about the recent changes to the code, which I cant find because I assume theyre in a template somewhere.

Basically to spell it all out ... do we really need all those forms? It makes the whole page difficult to navigate on slower devices, both PC and mobile. I dont know if the code could be made to run faster if it loads as a collapsible by default, or if the browser would just load them all and then collapse ... so that might not help. This discussion went for a month with no replies so I dont expect to turn the community's opinion around, though .... and I get it, ... if people aren't complaining then there's not really a problem. Thanks, —Soap— 11:04, 17 June 2021 (UTC)[reply]

@Soap: Could you clearly explain what the problem is? In your original post you basically just said "look at the template" without any further information, assuming that it looks exactly the same for everyone and that the problem is obvious. Now you mention something about collapsible by default. Is the problem that it's uncollapsed for you? It's not uncollapsed for me unless I press "Show inflection" in the sidebar. — Eru·tuon 17:38, 17 June 2021 (UTC)[reply]

Yes it loads collapsed for me on my tablet, but on my PC it loads in full form and occupies 80% of the screen on penda (with a lot of the rest being Zulu). i know now that not everyone sees the same thing, but i think it would be at least nice to make it load collapsed by default on all devices. thanks, —Soap— 21:22, 17 June 2021 (UTC)[reply]

@Soap: Try clicking "Hide inflection" in the sidebar. That should collapse it. — Eru·tuon 21:37, 17 June 2021 (UTC)[reply]

Okay thanks, ...if to hide is the default behavior, I may have clicked it onto show some time ago and forgotten about it. Still i wish it were easier to see when and where a template has changed, because Ive certainly been looking at Swahili verbs before and not noticed such large templates. Thank you, —Soap— 15:50, 19 June 2021 (UTC)[reply]

There is a limit on how many bytes of output a template can deliver. I found that some of my cruder inflection tests hit it when I added transliteration. Possibly that limit has changed. As for all those forms, well every word in every language... Some of the Indo-European inflection tables are quite impressive. --RichardW57m (talk) 11:57, 22 June 2021 (UTC)[reply]

Categories added to documentation page because of examples

Hello,

I tried all possible combinations of noinclude's and includeonly's, but I don't seem to be able to prevent the documentation page of my template to be added to categories because of the examples shown on it. How can I present example keeping only the text output but not include the categories? Thx. Sitaron (talk) 20:10, 22 May 2021 (UTC)[reply]

I think you need to have your module check the namespace before adding categories. I can't think of any reason for a module to add categories in the Template namespace- really, the only namespaces with entries are mainspace (0) and Reconstruction (118). There are a few Appendix-only entries for artificial languages and the like, but not for languages that are allowed in mainspace. Chuck Entz (talk) 20:25, 22 May 2021 (UTC)[reply]

Thank you, it worked like a charm! Sitaron (talk) 20:58, 22 May 2021 (UTC)[reply]

quote-journal incorrect placement of translator

In isopiptesis, the article was translated by Shoemaker, but it reads as if the entire journal was translated by Shoemaker. Can this be fixed? This, that and the other (talk) 10:48, 23 May 2021 (UTC)[reply]

Ping @Benwing2 This, that and the other (talk) 05:31, 1 June 2021 (UTC)[reply]

@This, that and the other, Sgconlaw I'm thinking this can be fixed by placing the word "in" after the translator instead of before, but I'm not sure if this will break anything else. User:Sgconlaw you are more familiar with the semantics of these params; does this make sense? Is translator= always used for the translator of an article rather than of a collection? If so, how is the translator of the collection specified? Benwing2 (talk) 02:41, 12 June 2021 (UTC)[reply]

@Benwing2 If the whole collection is translated by one person, you could argue in favour of adding a new parameter like "work-translator" that outputs the translator in its current position. Even so, it would not be absolutely incorrect to unconditionally place the translator in front of the article title, since they did indeed translate that article (along with all the rest of the articles in the work). This might be the simplest solution. This, that and the other (talk) 07:30, 13 June 2021 (UTC)[reply]

Romanian counties in Module:place/shared-data

I am currently working on etymologies of village names in Romania and the Category:ro:Villages in Romania category is getting a bit crowded, it would help to categorize them by county and, as far as I can tell, this would require a change in Module:place/shared-data, which is protected.

The county dictionary should look like this:

export.romanian_counties = {
	["Alba County"] = {},
	["Arad County"] = {},
	["Argeș County"] = {},
	["Bacău County"] = {},
	["Bihor County"] = {},
	["Bistrița-Năsăud County"] = {},
	["Botoșani County"] = {},
	["Brașov County"] = {},
	["Brăila County"] = {},
	["Buzău County"] = {},
	["Caraș-Severin County"] = {},
	["Cluj County"] = {},
	["Constanța County"] = {},
	["Covasna County"] = {},
	["Călărași County"] = {},
	["Dolj County"] = {},
	["Dâmbovița County"] = {},
	["Galați County"] = {},
	["Giurgiu County"] = {},
	["Gorj County"] = {},
	["Harghita County"] = {},
	["Hunedoara County"] = {},
	["Ialomița County"] = {},
	["Iași County"] = {},
	["Ilfov County"] = {},
	["Maramureș County"] = {},
	["Mehedinți County"] = {},
	["Mureș County"] = {},
	["Neamț County"] = {},
	["Olt County"] = {},
	["Prahova County"] = {},
	["Satu Mare County"] = {},
	["Sibiu County"] = {},
	["Suceava County"] = {},
	["Sălaj County"] = {},
	["Teleorman County"] = {},
	["Timiș County"] = {},
	["Tulcea County"] = {},
	["Vaslui County"] = {},
	["Vrancea County"] = {},
	["Vâlcea County"] = {},
 }

Bogdan (talk) 12:20, 24 May 2021 (UTC)[reply]

@Bogdan: Done. — Fenakhay ^{(تكلم معاي · ما ساهمت)} 10:43, 25 May 2021 (UTC)[reply]

Thank you, @Fenakhay:. The problem that we currently have everywhere "county/Vâlcea" rather than "county/Vâlcea County", so it categorizes them wrongly. It would be better to have something like there is for Russian, with the function "russian_placename_to_key" for Oblasts (making "obl/Arkhangelsk" to be categorized in Category:ru:Villages in Arkhangelsk Oblast). Bogdan (talk) 11:14, 25 May 2021 (UTC)[reply]

@Bogdan: Added. — Fenakhay ^{(تكلم معاي · ما ساهمت)} 12:35, 25 May 2021 (UTC)[reply]

@Bogdan, Fenakhay I changed the categories to include the suffix ", Romania", as I think it makes things a lot clearer; a category like Category:Villages in Neamț County is fairly obscure compared with Category:Villages in Neamț County, Romania. The key_to_placename entry was also missing, and I added it. I moved the existing categories created by User:Bogdan; the remaining higher-level categories will populate on Jun 4 when I run my script to create categories in Special:WantedCategories. Benwing2 (talk) 02:01, 2 June 2021 (UTC)[reply]

Categories at the bottom of each page

This is an idea I have been thinking about - can categories for each language, on a page with more than one language, be shown under each language instead of being grouped at the bottom of the page. For instance, the page for Arne has no less than six languages. Or is this a discussion for the Beer Parlour? DonnanZ (talk) 20:30, 24 May 2021 (UTC)[reply]

If you turn on Tabbed languages at Special:Preferences#mw-prefsection-gadgets then the categories are grouped on the relevant language's tab. —Mahāgaja · talk 20:46, 24 May 2021 (UTC)[reply]

I can see a problem with editing each language (you can't click on the language heading), but I will give it a trial. Thanks. DonnanZ (talk) 21:40, 24 May 2021 (UTC)[reply]

@Donnanz: If you click the small Edit button at the upper right of the tab frame, that opens the L2 heading for editing. —Mahāgaja · talk 09:51, 25 May 2021 (UTC)[reply]

@Mahagaja: Found it, thanks. DonnanZ (talk) 11:14, 25 May 2021 (UTC)[reply]

Another category problem

We have for many categories a box showing "Recent additions to the category", and another for "Oldest pages ordered by last edit". Past experience had taught me the latter box is rarely accurate, and I wouldn't mind if this feature was removed, as both boxes on the right-hand side of the page can interfere with the page layout. An example of this is Category:en:Places in Dorset, England, where the "Pages in category" listing is all in one column, instead of two, because the two boxes on the right drop below the start of the listing; the list of subcategories isn't long enough to compensate. Removal of the second box, "Oldest pages ordered by last edit", would cure the problem. Unless there are other bright ideas around. DonnanZ (talk) 21:01, 24 May 2021 (UTC)[reply]

Having modules check language name against language code in Reconstruction namespace

I just cleaned up a few Proto-West-Germanic entries that were sitting unnoticed in categories for Proto-Germanic, and it occurred to me that it would be fairly easy to flag such errors automatically: the language name is always right there between "Reconstruction:" and the the page name, and it's always the canonical name- our templates couldn't link to it otherwise.

Thoughts? Chuck Entz (talk) 09:11, 25 May 2021 (UTC)[reply]

I've written a program to generate a list of misplaced categories in Reconstruction entries from the SQL dump. It takes the full list of categories for each reconstruction page and eliminates any that reference the right language (part-of-speech categories, topic and list categories, and request categories like Requests for etymologies in Proto-Indo-European entries), and eliminates tracking categories like Terms with redundant transliterations or categories that reference something other than the language of the entry like requests for native script. Lots of special cases, but it works.

{{reconstructed}} could also parse the wikitext and look for templates (like {{der}} or {{inh}}) where the language code doesn't match the title of the page. But it's a bit more indirect, requires writing out all the template-specific behavior, and I'd rather not make the poor overworked servers do this for us. Looking at the categories will at least catch many uses of the wrong language code in templates because they often add categories. — Eru·tuon 05:25, 26 May 2021 (UTC)[reply]

We could use a library of programs (compiled or not) and regexes to find such 'improvement opportunities', so we are not totally dependent on the authors and so we can build upon them. DCDuring (talk) 14:00, 26 May 2021 (UTC)[reply]

Just published the code to GitHub. This one would be easier to run than most of my programs and scripts. Most of the others, like these, require some more difficult tasks like compiling Lua and some Lua libraries written in C, and downloading modules from Wiktionary to places where Lua can find them. I could probably make it easier to set up if I spent some time figuring it out. A few people have asked to use some of my code and it was an involved process. It's even hard for me; I haven't transferred most of my scripts to my new computer so lots of my lists are out of date.

One option would be to put this stuff in a Toolforge tool so that multiple people could access it. Some of my scripts could be set up in my existing tool Templatehoard because they deal with templates. — Eru·tuon 18:54, 29 May 2021 (UTC)[reply]

Categories in user pages

Why is the category Category:English lemmas added to UserːEntity137/Sandbox, a user page? Does the template not check the namespace? J3133 (talk) 15:06, 29 May 2021 (UTC)[reply]

@J3133: The title has the triangular colon ː... didn't notice until I saw it had Category:English terms spelled with ː. — Eru·tuon 17:52, 29 May 2021 (UTC)[reply]

Bug in Tibetan transliteration

Right now, for Tibetan འགས་པ ('gas pa), it auto-generates the incorrect Wylie transliteration 'ags pa. The correct form should be 'gas pa, as confirmed by the Tibetan Living Dictionary, which means there is a bug in Module:bo-translit. Can anyone fix this? RcAlex36 (talk) 02:44, 30 May 2021 (UTC)[reply]

Fixed. Bula Hailan (talk) 03:51, 31 May 2021 (UTC)[reply]

Wiktionary:Grease pit/2021/May

Contents

Importing of the Alar dictionary corpus

Lua error: not enough memory

Template:switch parser function

Alagoas, Amapá, and Sergipe

Search finds nothing

`{{sumerogram of}}` and `{{akkadogram of}}`

Different lemma inflections in headword lines

Character Folding

Please make Wiktionary Read & Edit mobile lite app

LCCN number apparently not recognized by catalog.loc.gov?

Translations — target languages

swahili verb conjugations

Categories added to documentation page because of examples

quote-journal incorrect placement of translator

Romanian counties in Module:place/shared-data

Categories at the bottom of each page

Another category problem

Having modules check language name against language code in Reconstruction namespace

Categories in user pages

Bug in Tibetan transliteration

Navigation menu

Wiktionary:Grease pit/2021/May

Importing of the Alar dictionary corpus

Lua error: not enough memory

Template:switch parser function

Alagoas, Amapá, and Sergipe

Search finds nothing

{{sumerogram of}} and {{akkadogram of}}

Different lemma inflections in headword lines

Character Folding

Please make Wiktionary Read & Edit mobile lite app

LCCN number apparently not recognized by catalog.loc.gov?

Translations — target languages

swahili verb conjugations

Categories added to documentation page because of examples

quote-journal incorrect placement of translator

Romanian counties in Module:place/shared-data

Categories at the bottom of each page

Another category problem

Having modules check language name against language code in Reconstruction namespace

Categories in user pages

Bug in Tibetan transliteration

Navigation menu

Search

`{{sumerogram of}}` and `{{akkadogram of}}`