Module talk:yue-pron

From Wiktionary, the free dictionary
Jump to navigation Jump to search

Bring back Yale Y[edit]

What just happened to the Yale consonant of "y" for Jyutping "j" as in joeng for yeung? --Lo Ximiendo (talk) 13:05, 11 May 2014 (UTC)[reply]

@Lo Ximiendo Fixed. --kc_kennylau (talk) 13:27, 11 May 2014 (UTC)[reply]

Cantonese ukh instead of uhk?[edit]

How come there's ukh instead of uhk in the Jyutping to Yale part? Also, how about some Penkyamp, Yựtyựt, and Guangdong? --Lo Ximiendo (talk) 04:35, 2 June 2014 (UTC)[reply]

@Lo Ximiendo What on earth do you mean by ukh and uhk? --kc_kennylau (talk) 04:37, 2 June 2014 (UTC)[reply]
As in (liù) (luk6). Is there supposed to be something like luhk instead of lukh? What about the three I just mentioned? --Lo Ximiendo (talk) 04:41, 2 June 2014 (UTC)[reply]
Well I won't oppose if you do it. --kc_kennylau (talk) 05:00, 2 June 2014 (UTC)[reply]

@kc_kennylau Don't think I agree here. Please see: [1], [2] (pg. 8), Cantonese phonology#Initial consonants and Cantonese phonology#Historical change. The set of alveolar consonants is less palatalised in Hong Kong Cantonese than in Guangzhou Cantonese. The official Guangdong Romanization#Cantonese even has a separate set of initials for Guangzhou Cantonese. Wyang (talk) 23:41, 24 June 2014 (UTC)[reply]

ipa_preprocess and pairs() not guaranteeing order of iteration[edit]

Hi. There is a minor technical issue with ipa_preprocess and pairs. Specifically, it's the following code:

  for regex,replace in pairs(ipa_preprocess) do
    syllable[i] = mw.ustring.gsub(syllable[i],regex,replace)
  end

I propose making a minor change to use ipairs() since pairs() does not guarantee order. Specifically, the following would need to be changed:

  • Change the ipa_preprocess table to create an array-based table instead of a hash-based one.
local ipa_preprocess={
	[1]={'a','ă'},[2]={'yu','y'}, [3]={'ăă', 'a'}, [4]={'uk', 'ŭk'}, [5]={'ik', 'ĭk'}, [6]={'ou', 'ŏu'},[7]={'eoi','eoy'},[8]={'ung','ŭng'}, [9]={'ing','ĭng'}, [10]={'ei', 'ĕi'}
}
  • Change the iteration to use ipairs instead of pairs
  for tbl_idx,tbl in ipairs(ipa_preprocess) do
    syllable[i] = mw.ustring.gsub(syllable[i],tbl[1],tbl[2])
  end

A longer explanation follows below the break.


The pairs snippet can produce different results on different lua interpretations. This is because "element order with pairs is arbitrary". Note that the official guideline is not to rely on the order of pairs() when doing iteration.

For example, the pronunciation section of 重用 eventually calls this module with code like this: {{#invoke:yue-pron|jyutping_to_ipa|yung6}}. The problem occurs because of the "yu" and "ung" regex. With one implementation, "yu" is ordered before "ung" and in another it is ordered after. Specifically:

Lua 5.1: currently used by Wikipedia

value action
yung6 initial value
yng6 after replacing "yu" with "y"
yng6 after replacing "ung" with "ŭng" (no effect)
yːŋ²² after processing rest of code

Luaj: a Lua implementation in Java

value action
yung6 initial value
yŭng6 after replacing "ung" with "ŭng"
yŭng6 after replacing "yu" with "y" (no effect)
Error: Unrecognised initial: "y" after processing rest of code

Note that the pairs() order varies from implementation to implementation. This is because pairs() iterates through a hashtable. Lua implements the hashtable one way, and Luaj implements it another.

Furthermore, Scribunto may upgrade from 5.1 to 5.2 sometime in the future. Whenever that upgrade occurs, this module could break in Lua 5.2. Note that Lua 5.2 uses a new "random" hashcode implementaion, so the results will be unpredictable. See this reported instance of random order for pairs() after upgrading from 5.1 to 5.2

Consequently, I propose the change above. I know it has no real effect, but it will help to future-proof the module. It may also be better practice by not relying on the order of pairs() when performing multiple regexs.

Let me know if you need more info. Thanks. Gnosygnu (talk) 21:23, 6 September 2014 (UTC)[reply]

Didn't understand everything above, but "yung6" is an invalid Jyutping syllable (should be "jung6") and I have fixed it in the entry. Also pinging @Kc_kennylau. Wyang (talk) 23:34, 7 September 2014 (UTC)[reply]
@Wyang Thanks for the explanation. I have no knowledge of Chinese / Jyutping syllables and didn't realize that "yung6" was invalid. Just so you know: this "yung6" only occurs two more times in the en.wiktionary.org dump for 2014-08-19: 别有用心 and 別有用心. If you want, I can fix the pronunciation entries there, by replacing "yung6" with "jung6".Gnosygnu (talk) 01:33, 9 September 2014 (UTC)[reply]
Thanks, I have fixed these and five other occurrences of Jyutping 'yung'. Wyang (talk) 02:10, 9 September 2014 (UTC)[reply]
Ok. Didn't know about the other three. Thanks for the follow-up.Gnosygnu (talk) 01:07, 10 September 2014 (UTC)[reply]
@Gnosygnu Effort appreciated. You want me to do it or you to do it? --kc_kennylau (talk) 11:47, 8 September 2014 (UTC)[reply]
@kc_kennylau Cool. Thanks for the confirmation. I can do it if you're fine with it. I'll make the change tomorrow evening. Feel free to change your mind, or jump in before then. Just so you know: I copied the current module to a sandbox page and tested with the TemplateSandbox extension. I also visually compared the official page vs the sandbox page for the two links I gave Wyang above. Also, FYI, I generated the integer order of ipa_preprocess based on the output of the following code:
function p.test( frame )	
  local ipa_preprocess={
    ['a']='ă',['ăă']='a',['ei']='ĕi',['ik']='ĭk',['uk']='ŭk',['ou']='ŏu',['ing']='ĭng',['ung']='ŭng',['yu']='y',['eoi']='eoy'
  }

  local rv = ''
  for k,v in pairs(ipa_preprocess) do   --pairs() will iterate in an arbitrary order
    rv = rv .. k .. ';'                 --rv will capture arbitrary order of pairs() iteration
  end
  return rv
end

Gnosygnu (talk) 01:46, 9 September 2014 (UTC)[reply]

I've applied the changes tonight and double-checked with the three urls above. All looks well. Let me know if there is anything else. Otherwise, thanks Wyang / kc_kennylau for the guidance and help. Gnosygnu (talk) 01:07, 10 September 2014 (UTC)[reply]

Use a modified/extended version of Jyutping instead?[edit]

@justinrleung, Fish bowl There are several sounds that are not supported by Jyutping: (note that some of these have already been in de facto use on Wiktionary, but the non-Jyutping outputs may be wrong)

  • en (in gen?)
  • et (in , get, set, get到, etc)
  • om (in form)
  • (by extension op?)
  • bl, kl, etc. (in onomatopoeia (yue:虢礫緙嘞 etc), Kra-Dai substrate/loans(冚唪唥), and English loans (slim))
  • pr, kr, wr, etc. (in English loans, e.g. present pre6sen1, crazy kra1si4) (this is probably code-switching and therefore controversial)
  • some entering tones have a tone change to mid rising, e.g. , 鹿, which has created some weird output. This will have to be changed, probably listed as unsupported for most of the romanisations

I suggest that we should include these (does not have to be all of them, especially the controversial r ones) into WT:YUE and update the module accordingly. (PS: the templates don't seem to support the recent 2018 *cough* additions of a and oet to Jyutping, so it has to be changed anyways) -- Wpi31 (talk) 16:21, 5 July 2022 (UTC)[reply]

@Wpi31: I'm pretty sure our modules can handle en, et, om, a, oet and the entering tones with tone change. WT:YUE may not have the full list of rimes. What is wrong with the output that you would like to see changed? — justin(r)leung (t...) | c=› } 16:37, 5 July 2022 (UTC)[reply]
A couple of issues:
  • a is merged into aa in Yale; oet outputs oet in Yale which is invalid and merged into eot in Cantonese Pinyin and Guangdong; should be "colloquial sounds not defined"
  • entering tones with tone changes display correctly, but it is a relatively new/underdocumented linguistic feature and isn't really specified in the romanisations. We should not be making assumptions and pretending that they are valid ones.
  • en, et, om, op should follow em and ep to display "colloquial sounds not defined" for Yale (and probably some others)
-- Wpi31 (talk) 17:09, 5 July 2022 (UTC)[reply]
@Wpi31: Thanks for going ahead and fixing these issues. I've corrected the code for Guangdong Romanization because em, ep and et are indeed defined in certain dictionaries using Guangdong Romanization, like 廣州話方言詞典 by 饒秉才 et al. — justin(r)leung (t...) | c=› } 21:12, 6 July 2022 (UTC)[reply]
@Justinrleung: Thanks for the corrections, Guangdong Romanization is definitely something I'm not familiar with.
Also, after thinking about it, the consonant clusters should not be defined solely for the benefit of adding English loans, otherwise it would also necessitate the addition of -s, -f as codas for the sake of completeness, making it too far-flung for a Cantonese romanization system. (though I feel like the -l ones would be worth adding for the onomatopoeia and Kra-Dai substrates/loans?)-- Wpi31 (talk) 07:19, 7 July 2022 (UTC)[reply]
No opinion due to lack of familiarity with HK Cantonese English borrowings, although I will note that bl was acknowledged in Cantonese Sin Wenz. —Fish bowl (talk) 20:54, 6 July 2022 (UTC)[reply]

Tone numbers should not be superscript[edit]

see https://lshk.org/jyutping-scheme/ . i think the change needed is to remove sup tags from Module:yue-pron#L-352. RZuo (talk) 10:00, 13 October 2023 (UTC)[reply]

@RZuo I have a feeling this was an intentional choice? Pinging @Justinrleung @Wpi @RcAlex36 @ND381 for comment. Theknightwho (talk) 12:46, 13 October 2023 (UTC)[reply]
I think we've discussed this before, the superscripts were added way before LSHK added this rule (I believe it was here. Also, the majority of online Cantonese dictionaries use superscripts in Jyutping, including CantoDict and words.hk, as well as a print dictionary by 商務印書館 that I have on hand. The ones that do not are much fewer in number and typically more scholarly/official, such as 漢語多功能字庫 and 小學學習字詞表.
I don't think we need to change this, at least in the short term and unless the superscript-less version gains popularity. – wpi (talk) 13:05, 13 October 2023 (UTC)[reply]
it's not a rule added by lshk. it was not intended to be superscript from the start, from the original design. people just blindly follow other systems and do the same to jyutping, which had been designed to avoid any problem with non-ascii text. the entire jyutping system can be written with ascii and without any formatting. RZuo (talk) 13:13, 13 October 2023 (UTC)[reply]
I think superscript tone numbers are more appealing and they are only used for display, to separate visually alpha character from numbers. All wiki codes uses plain number and copying doesn't cause any problems, whether tone numbers are automated or manually added 廣東話广东话 (gwong2 dung1 waa6-2).
Even if you need to paste into MS Word, you can "keep text only" option and you get a plain "gwong2 dung1 waa6-2".
Superscript tone numbers are now used for all Chinese topolects where tone numbers are used. It's the first time we get this complain. I wonder why and what problems you're facing. Anatoli T. (обсудить/вклад) 02:06, 20 October 2023 (UTC)[reply]
@RZuo: Do you see any non-ASCII characters in "gwong2 dung1 waa6-2" and what is wrong with the formatting for visible purposes as in 廣東話广东话 (gwong2 dung1 waa6-2)? Are you having any copy-paste issues? Anatoli T. (обсудить/вклад) 03:00, 20 October 2023 (UTC)[reply]
the problem is, it's misleading to use superscript. it gives readers an impression that jyutping tone numbers must be superscript like other romanisation schemes. and then when they intend to write jyutping in other places, superscript formatting is sometimes harder to enter.
it also gives the impression that, the tone numbers are not part of the romanisation, that people should omit them if they just want THE romanisation. for example, people might think that 東 is dung, 1 is just an indicator that should not be included if they write jyutping directly as part of the text proper.
cantonese is not chinese. many words are better or even only possible written in latin alphabet.
in short, superscript is just a customary formatting of your choice, which is unnecessary and misleading. it was pre-empted by the original design back in 1993 and now explicitly discouraged by the author, lshk. RZuo (talk) 08:54, 20 October 2023 (UTC)[reply]
@RZuo: What people think may be relevant, if a reference document is not provided. We have quite a lot of romanisation systems and not everything is intuitive or is intuitive to everyone or even, not intuitive at all, for example, if they are based on spellings or pronunciations, characters or markers uses, which mean something else in another language or dialect. (I could give examples but I hope you know what I mean).
What is missing is a description of notations at Wiktionary:About_Chinese/Cantonese (@Justinrleung, @Wpi) that a specific format was chosen. When a decision is described, no-one will be blamed. At (dung1), superscript 1 means the first tone and it's not optional. Anatoli T. (обсудить/вклад) 09:13, 20 October 2023 (UTC)[reply]
Indeed that page needs to be worked on. There has also been complaints about the unclear meaning of - (which is used for changed tones), because we only link to the Wikipedia page on Jyutping in {{zh-pron}}, although that is already explained in the about page. – wpi (talk) 10:22, 20 October 2023 (UTC)[reply]
I was just about to make an edit request for Module:zh-pron regarding this too. I believe lines 299, 301 and 326 in Module:zh-pron might also be relevant. H78c67c (talk) 23:21, 19 October 2023 (UTC)[reply]