Module talk:zh

From Wiktionary, the free dictionary
Latest comment: 2 years ago by Fish bowl in topic Delete glyph origin sections in M.extract_gloss
Jump to navigation Jump to search

{{#invoke:zh|sortkey_conv|纟04}}Module error: The function "sortkey_conv" does not exist.

Wyang (talk) 01:49, 17 April 2013 (UTC)Reply

sort[edit]

There doesn't seem to be any error in these revisions, but {{#invoke:zh|sort|老頭兒}} still produces 'lao3lao3lao3'. It seems the loop structure in this function is broken. Wyang (talk) 04:04, 19 April 2013 (UTC)Reply

Fixed. Wyang (talk) 12:53, 19 April 2013 (UTC)Reply

Name[edit]

Since this module contains code specific to Mandarin, it should be named Module:cmn. Although Module:cmn-common fits better with other modules. —CodeCat 00:24, 20 April 2013 (UTC)Reply

Rename it whatever you want. I will not edit any more on en.wikt if such renaming is conducted. Wyang (talk) 11:34, 20 April 2013 (UTC)Reply

Wow... don't you think that is a bit excessive? If you don't want to collaborate with Wiktionary and instead expect Wiktionary to do things your way, why do you think this is a good place for you? —CodeCat 12:35, 20 April 2013 (UTC)Reply
It probably isn't now. Unfamiliar and non-contributing editors having an undue level of power in determining the policies, telling others you should do this do that because that is in accordance with my perceptions of this issue, blah blah blah. Bye. Wyang (talk) 05:29, 22 April 2013 (UTC)Reply
@Wyang. No need to be oversensitive about zh and cmn issue. It's a long-standing argument with different opinions, the position of Western linguists also differs to some extent from the position in mainland China, overseas Chinese communities and other countries. See Wiktionary:Requests_for_moves,_mergers_and_splits#Module:zh_to_Module:cmn-common_and_Module:Hani-common. Please say something there. --Anatoli (обсудить/вклад) 23:39, 23 April 2013 (UTC)Reply

apostrophe before vowels on pinyin[edit]

@Wyang Could you add an automatic apostrophe before any vowel (except semi-vowels w, y) if the syllable is not first :) ? E.g. 晚安 (wǎn'ān). --Anatoli (обсудить/вклад) 06:06, 26 March 2014 (UTC)Reply

Hi. Done. Wyang (talk) 06:21, 26 March 2014 (UTC)Reply
Thank you for the quick fix. You're a good programmer. --Anatoli (обсудить/вклад) 07:07, 26 March 2014 (UTC)Reply

zh-forms gloss for 兀[edit]

"duplicate of Big Five A461" doesn't seem like a particularly useful gloss... (it's also a curious one because it seems that U+5140 is A461, and the Unicode duplicate would be U+FA0C) —umbreon126 18:49, 9 May 2015 (UTC)Reply

zh-l 占星師 not converting into simplified[edit]

占星師 does not automatically add simplified characters. I suspect it has to do with 占, since 星師星师 and (shī) work, and 占星術 (zhānxīngshù) and 占星學 (zhānxīngxué) don't. — Justinrleung (t...)c=› 02:00, 13 November 2015 (UTC)Reply

What does this module do/what's it for?[edit]

Would be grateful if pointed to documentation. Thanks Hongthay (talk) 08:00, 4 October 2016 (UTC)Reply

This is a general-purpose utilities module for Chinese, providing support for new entry creation, trad <-> simp conversion, zh-l, zh-der, etc. Wyang (talk) 06:03, 5 October 2016 (UTC)Reply
I ask because many entries are linked to by Module:zh/data/nan-pron/000 etc where it appears pronunciations are also stored. Brings up the issue of duplication (of data and effort to maintain). Is there a plan to replace/merge? I could imagine that the module intends to provide more of a reference reading whereas the entry could have more variants. Hongthay (talk) 16:13, 5 October 2016 (UTC)Reply
Yeah they are there for reference. These nan-pron modules are called by {{zh-new}}, which will try to detect readings in varieties of Chinese recorded in other dictionaries. All the existing entries should have been updated with these readings before, like in this edit, and the entries which haven't been created will pick up readings from here, if they are created using {{zh-new}}. Wyang (talk) 06:14, 6 October 2016 (UTC)Reply
They're also used as a reference for {{zh-usex}}. —suzukaze (tc) 06:41, 6 October 2016 (UTC)Reply

Gloss-grabbing[edit]

I made some changes to the extract_gloss function (used by {{zh-forms}} and {{zh-see}}).

It no longer looks in the Etymology section. In 非洲和尚, it was grabbing "The loss of the nasal –ŋ coda in the ancient northwestern dialect of Middle Chinese, and" from the Etymology section, thinking this was a gloss.

And now it only grabs definitions from the Translingual and Chinese sections. In 夏时制, it was grabbing the definition from the Japanese section of 夏時, as there is no Chinese section. — Eru·tuon 06:24, 22 January 2018 (UTC)Reply

Optimizing der[edit]

I think we need to optimize M.der because is getting Lua errors starting from when it calls {{zh-der}}. This guy [1] on Stack Overflow recommends against using table.insert and ipairs because of the overhead. I am not comfortable enough with Lua or Chinese to make the change... @Wyang what do you think? —Internoob 04:42, 25 October 2018 (UTC)Reply

The page was running out of time for script execution. I found that changing the replace_chars function to use string.gsub greatly speeded it up (by about 6 times!) so the problem should not arise again for a while at least. mw.ustring.gsub is quite slow when used over and over, apparently.... (The server is inconsistent. The page had errors when I first visited it, but when I previewed it, it was fine. But it was still taking about 7 seconds, which isn't great, and now it's down to about 1.) — Eru·tuon 05:35, 25 October 2018 (UTC)Reply

Remove '\n' before '|' in all matched TEMPLATE wikitexts[edit]

@Erutuon, Justinrleung, Suzukaze-c, Octahedron80, WOSlinker

{{ExampleTemp
|apple
|banana
}}

should work as well as

{{ExampleTemp|apple|banana}}

. But the former sometimes does not work because the match pattern has forgot it, as in the "syn_saurus" function. 恨国党非蠢即坏 (talk) 08:28, 29 November 2019 (UTC)Reply

@恨国党非蠢即坏: The formatting should be without line breaks for consistency in the code. — justin(r)leung (t...) | c=› } 06:22, 30 November 2019 (UTC)Reply
@Justinrleung: Other templates like zh-pron and zh-forms support breaking lines. If we talk of "consistency", syn_saurus should also support it. Or else the support of breaking lines just varies arbitrarily between templates, without any documents mentioning this potentially buggy design. 恨国党非蠢即坏 (talk) 07:38, 30 November 2019 (UTC)Reply
I also question the necessity of protecting this module. 恨国党非蠢即坏 (talk) 07:38, 30 November 2019 (UTC)Reply
@恨国党非蠢即坏: {{zh-pron}} is the only one that should have line breaks. {{zh-forms}} and most other templates should not have line breaks. This is the type of consistency about - not across all templates but within each template. {{zh-syn-saurus}} does allow line breaks, but it relies on {{zh-syn-list}} to not have line breaks. Same goes with {{zh-l}}, which relies on {{zh-pron}} to have line breaks (among other formatting constraints). This module is protected because it is linked to many pages, so any edits to the module may cause major changes. — justin(r)leung (t...) | c=› } 19:34, 30 November 2019 (UTC)Reply
@Justinrleung: But the problem is you should not leave a buggy code there only known to you or an exclusive club of elite editors. I have been confused for quite a while before I can locate the cause in this module. A common editor should not be required to do this.
Now if you really want to protect mod:zh, may I move the code about thesaurus out of it? Because this part is not that frequently used and thus doesn't need protection. I think the protection should only be applied to those really necessary as this is a wiki site. 恨国党非蠢即坏 (talk) 07:59, 1 December 2019 (UTC)Reply
@恨国党非蠢即坏: The documentation page on {{zh-syn-list}} ({{zh-der}}) should be clear on how the formatting should look (although it should be explicit). @Suzukaze-c, KevinUp do you think we should fix this or leave this as a way to force consistency in formatting? — justin(r)leung (t...) | c=› } 08:09, 1 December 2019 (UTC)Reply
I prefer having Chinese lists without line breaks because such lists can get very long, and we usually don't sort these lists manually. For Latin script languages, I prefer line breaks for easy addition/removal of entries. In some languages such as Japanese, line breaks are used because other templates may be inserted within another template. KevinUp (talk) 08:28, 1 December 2019 (UTC)Reply
@KevinUp: I don't understand how "can get very long" becomes a reason for no line breaks. Aren't line breaks mostly used to break long lines to make them easier to read and edit? Also if Chinese "can get very long", why can Latin script languages not? 恨国党非蠢即坏 (talk) 09:10, 1 December 2019 (UTC)Reply
One important aspect of wiki editing is to keep things consistent. We've been using single line lists for {{zh-der}} since the beginning and prior to this no one has complained about it. 水#Compounds has up to 1200 compounds and editors who use the mobile interface will find it much more difficult to scroll through such a list if the list is in multiple lines. The reason why Chinese compounds can get very long is because all the Chinese lects (Mandarin, Cantonese, Min Nan, etc) are subsumed under a unified Chinese section, unlike other languages on Wiktionary. If there are some Chinese entries using the single line format and some Chinese entries using the multiple line format it would lead to inconsistent formatting. If other editors agree that a multiple line format is more suitable for Chinese compounds then that format shall be used. KevinUp (talk) 17:36, 1 December 2019 (UTC)Reply
  • @KevinUp: "mobile interface" argument:
    1. Even if the editors do not need to suffer scrolling through long length, the readers do because there are still line breaks when the template is displayed. No reason to ignore readers while taking care of editors.
    2. Locating one term in order to change or remove it in an unshaped cluster of characters is as difficult, if not more, as locating it by scrolling longer.
  • "format inconsistency" argument: book use t:der3. dictionary use t:der4. The more important of wiki editing is to have enough flexibility to make contents easy to read and edit. 恨国党非蠢即坏 (talk) 04:06, 2 December 2019 (UTC)Reply

Delete glyph origin sections in M.extract_gloss[edit]

@Fish bowl, Justinrleung function M.extract_gloss(content, useetc) deletes etymology sections. Can glyph origin sections also be deleted to avoid error in 𧙻? -- 13:01, 19 March 2022 (UTC)Reply

 DoneFish bowl (talk) 18:09, 19 March 2022 (UTC)Reply