Module talk:zh/data/glosses

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Adding a character[edit]

@Wyang Hi Frank, (tuō) is missing. Can it be added in any order? --Anatoli T. (обсудить/вклад) 02:48, 14 January 2015 (UTC)[reply]

Hmm, actually, it doesn't show the gloss at 金蟬脱殼金蝉脱壳 but is OK in (tuō). Should it actually be 金蟬脫殼金蝉脱壳 (jīnchántuōqiào)? Moved entry. --Anatoli T. (обсудить/вклад) 02:51, 14 January 2015 (UTC)[reply]
Yes, only traditional is found here to save time and ensure conversion correctness. Wyang (talk) 07:04, 14 January 2015 (UTC)[reply]

High error rate in taxonomic names?[edit]

I was led to this module by an error I discovered. I looked at about a dozen other names and discovered two more errors. 25% would be a high error rate. The errors seem to have been in the module from the beginning. I don't know how to conveniently extract the taxonomic names. An approach to checking them would be to create a text file with wikilinked names and then load the page into Wikipedia, Wikispecies, and Wiktionary. A name blue-linked in at least one of the projects could be presumed correct. The remainder could be checked more laboriously against these sources or automagically against a source like the Catalogue of Life. DCDuring (talk) 00:40, 28 November 2018 (UTC)[reply]

@DCDuring {{taxlink}} doesn't expand properly. See 芡實 for example. — justin(r)leung (t...) | c=› } 01:21, 24 April 2019 (UTC)[reply]

I also noted that w: doesn't expand properly and presumably {{vern}} doesn't expand properly.
  1. Can they be made to expand properly?
  2. Can I get a count of how many times Chinese entries include taxonomic names (and vernacular names) in the etymology boxes? I suppose I could do a run against the html dump similar to what I do with the wikitext dump. DCDuring (talk) 01:55, 24 April 2019 (UTC)[reply]
@Justinrleung: Any thoughts? DCDuring (talk) 00:08, 25 April 2019 (UTC)[reply]
@DCDuring: Template calls don't expand from Lua modules in general. Using [[w:XXX|XXX]] or any other interwiki link should be fine. — justin(r)leung (t...) | c=› } 03:36, 25 April 2019 (UTC)[reply]
But w: didn't seem to sisplay properly. I wouldn't shave noticed otherwise.
@Justinrleung: Is there some other ay to count the uses of the taxonomic name labels? DCDuring (talk) 10:38, 25 April 2019 (UTC)[reply]
@DCDuring: ''[[species:Euryale ferox|Euryale ferox]]'' for 芡 looks fine to me on 芡實. That said, you won't be able to use it for tracking. — justin(r)leung (t...) | c=› } 22:05, 25 April 2019 (UTC)[reply]
Using the HTML dumps wouldn't be simple either. Then I just won't consider such links in the link counts that determine the priorities for adding taxonomic name entries, nor will I correct errors in taxonomic names that are in this module. I will continue to correct others I find that are not implemented through this module. There may be inconsistency between the names in the module and the names in the entries. I don't understand why Chinese entries need to bypass Wiktionary's entries. Or is there some systematic effort to determine whether the WP or Species links have been rendered unnecessary? DCDuring (talk) 22:41, 25 April 2019 (UTC)[reply]
I can't undo my changes, presumably due to the subsequent changes you made. Feel free to undo mine as best you can. DCDuring (talk) 22:45, 25 April 2019 (UTC)[reply]
@DCDuring: This data module is presumably a less expensive way of retrieving definitions. The errors/inconsistencies are indeed frustrating, but please continue to make changes to this module as you see fit. I don't have a better solution than to leave the {{taxlink}}-like uses in {{zh-forms}} out of the count for now. — justin(r)leung (t...) | c=› } 04:09, 28 April 2019 (UTC)[reply]
@DCDuring: Actually, I think I found a way to expand the template. You just need to use the function taxlink(taxon, level, alt, arg_ver, arg_nomul, arg_i), just like in my last edits. — justin(r)leung (t...) | c=› } 04:59, 28 April 2019 (UTC)[reply]
@Justinrleung: Should watch out to see if expanding a template many times ends up increasing memory greatly in some of the single-character pages. If so, leaving the template syntax and expanding it when retrieving the gloss would probably be more efficient. — Eru·tuon 05:06, 28 April 2019 (UTC)[reply]
@Erutuon: Thanks for the reminder. I don't think this module is invoked on single-character pages because {{zh-forms}} would not need a gloss, so I don't think it'd be a problem. — justin(r)leung (t...) | c=› } 05:09, 28 April 2019 (UTC)[reply]
@Erutuon: After second thought, it may cause problems if we increase the usage of the function in this module, eh? — justin(r)leung (t...) | c=› } 05:12, 28 April 2019 (UTC)[reply]
@Justinrleung: Yeah. Resource usage will depend on how many times the function is called. I did a test which shows that expanding {{taxlink}} is somewhat costly (at least in processing time) if you do it 1000 times (provided that the template content is different each time). You could test the resource usage with a sandbox module, or just go ahead and use the function, and if CAT:E starts to fill up, revert and figure out a less costly method (which I could help with), or go with the less costly method right away. I don't have great ideas for testing resource usage aside from creating two versions, editing a module that uses this module, and switching between the two versions while using "preview page with this template" on a suitable page. — Eru·tuon 07:14, 28 April 2019 (UTC)[reply]
  • I don't think this process is worth the effort for the benefit of the processes connected with {{taxlink}} and {{vern}}. The Perl scripts that I run count occurrences of {{taxlink}} that appear in wikitext in the XML dump " Articles, templates, media/file descriptions, and primary meta-pages.". The occurrences in Module:zh/data/glosses do not present any trace in that file. What I can do for the taxonomic names in this module is use the searchbox for instances of "[[w:", check manually for the existence of Wiktionary entries and remove "w:". And as I add taxonomic and vernacular name entries I can try to remember to check this page or Module space generally for occurrences of the names and do the same replacement. All that is lost is the ability to count instances, which reduces the likelihood that I will add entries for those names in any given period, which may become ever. {{taxlink}} and {{vern}} don't appear in the HTML dump either. I can also extract occurrences of "[[w:" and the following wikitext from this page and manually check for correct spelling, obsolete names etc.
I was hoping to just fold what this page does into my existing process, but that seems logically impossible. I hope that the effort to help me has yielded knowledge with other applications. DCDuring (talk) 11:55, 28 April 2019 (UTC)[reply]
I see now that what you've done at least partially addresses my stated problems. The part that it does not address, I see now, should not be addressed, because the number of entries that may contain a taxonomic name can be huge, overwhelming other uses. See for example Special:WhatLinksHere/Tribulus_terrestris which shows literally thousands of such links in Chinese entries. DCDuring (talk) 22:44, 28 April 2019 (UTC)[reply]
@DCDuring, Erutuon: I've edited it so that you can now use taxlink template syntax in the gloss strings. The taxlink template should only be expanded when a gloss with taxlink is retrieved. — justin(r)leung (t...) | c=› } 18:14, 29 April 2019 (UTC)[reply]
Maybe I could figure out which entries are using glosses from this module that contain {{taxlink}} and {{vern}} by getting all instances of {{zh-forms}} and {{zh-hanzi-box}}, determining which glosses would be displayed in the template output, and then finding any glosses that contain the templates: maybe most of the time the glosses for the characters in the page title. Not sure how simple the logic for gloss retrieval is, though. — Eru·tuon 19:03, 29 April 2019 (UTC)[reply]