User talk:Wyang/Archive1

Definition from Wiktionary, the free dictionary
Jump to: navigation, search



As far as I know, no one has expressed an opinion on the edits you tried to retract. I accidentally blocked the IP that added them (presumably you), but I thought I was blocking another IP when I did it. I unblocked the first IP as soon as I discovered my error. My apologies for any misunderstanding. Chuck Entz (talk) 21:12, 18 January 2013 (UTC)

All those IPs were me. Wyang (talk) 09:53, 22 January 2013 (UTC)
I'm sure there are lots of people from other countries who disagree with the way we drive on the right side of the road in the US, but I've never heard of any of them driving on the left side just to make a point- it wouldn't change anything important, but it sure would make a mess of things... Chuck Entz (talk) 11:29, 22 January 2013 (UTC)
You are using an inappropriate analogy. Driving on which side of the road is unimportant because the purpose of driving is to reach a destination, and the options are basically equally efficient in this respect. But here the right side of the road is full of bumps and hollows and may not even lead to the desired destination, whilst the left side is well-paved and -targeted. I know this has been raised a zillion times but whenever the issue was raised, the decision-making coterie has been reluctant to realise the benefits and opposed it fiercely. Yes the issue is complicated, but asking people to comply with whatever rules the previous people came up with but not the most logical rules is not the way to resolve this. This issue is going to be raised another zillion times and people need to examine the issue impartially and accept that a change in format is hugely beneficial to further editing. Wyang (talk) 10:38, 24 January 2013 (UTC)


What is this? Mglovesfun (talk) 11:01, 22 January 2013 (UTC)


Could you please provide the pīnyīn transcription and translations of citations as separate lines instead of in a template? For an example of what I mean, see the quotation at Νεφελοκοκκυγία. —Μετάknowledgediscuss/deeds 08:30, 24 January 2013 (UTC)

I don't think Pinyin is necessary. Wyang (talk) 10:38, 24 January 2013 (UTC)
zh-n. is not an intuitive template, and traditionally any Wiktionary templates that use mouseover to show more information, like certain Persian conjugational templates (if my memory serves me well), have an obvious explanatory line at the top. —Μετάknowledgediscuss/deeds 02:16, 25 January 2013 (UTC)

Note: Template talk:pinyin-analyser. —Μετάknowledgediscuss/deeds 07:20, 3 February 2013 (UTC)

bí mật[edit]


Good job! Only điều bí mật is also a synonym for bí mật in my sources. Please add, don't replace words. Do you mind adding a {{Babel}} to your user page? --Anatoli (обсудить/вклад) 02:02, 2 February 2013 (UTC)

điều bí mật just means "secret things", to emphasise that "bí mật" which has both adjective and noun senses, is used here as a noun. Same for sự which is used if the following word has both verb and noun senses. Wyang (talk) 10:39, 2 February 2013 (UTC)
About the header of Babel, I got into the weird part of YouTube, which I may not mind. --Lo Ximiendo (talk) 10:19, 3 February 2013 (UTC)
The deal with words such as điều or sự is to display them, anyway. One can use alt=điều bí mật, e.g. điều bí mật (changing now). Oh, please don't be too sensitive on us using "cmn" instead of "zh". Your language hasn't been destroyed. We cover dialects as well, especially "yue" and "nan". We had long discussions and votes, so we had to compromise. If you add "cmn" in the Babel, users will know that you speak Chinese, especially its standard and most known form - Mandarin. --Anatoli (обсудить/вклад) 11:43, 3 February 2013 (UTC)
zh-N, cmn-4 will appear as cmn-N, cmn-4. Wyang (talk) 00:33, 4 February 2013 (UTC)
Not sure what you mean, sorry. Do you mean that {{User_cmn}} and {{User_cmn-4}} have wrong wordings? --Anatoli (обсудить/вклад) 00:44, 4 February 2013 (UTC)
I have replaced 中文 with 普通話/國語 and 普通话/国语 in language user templates, even though it doesn't cover all names for Mandarin, these templates separate Mandarin speakers from 粵語/粤语 speakers, etc. --Anatoli (обсудить/вклад) 00:54, 4 February 2013 (UTC)
I speak non-Mandarin Chinese natively (and I don't really know which branch of Chinese this dialect falls under under ISO 639), and can communicate well (4) in Modern Standard Chinese (MSC). I can't find templates to accurately describe this situation. {{User zh}} doesn't exist any more, and {{User cmn-4}} is confusing MSC (no code) with Mandarin (a group of Chinese dialects, code: cmn). Wyang (talk) 01:41, 4 February 2013 (UTC)
The branches ("dialects") are somewhat determinable by region... I'm sure you could figure out what it's called in English by the ISO if you look it up on Wikipedia. If you feel comfortable telling me where you grew up, I could help finding possibilities for you to check. I think grouping MSC with actual Beijing region dialects is probably the best way to solve that particular subproblem, just based on coverage and similarity. —Μετάknowledgediscuss/deeds 01:47, 4 February 2013 (UTC)
We don't have templates for all dialects. That maybe a problem but small dialects are usually not in big demand. You can add your dialect to your user page, so people could find you. We merge here Mandarin and MSC, treating entries/translations in standard Mandarin and Northern Chinese as one language. Using {{qualifier}}, marking words as regional dialects and using other means. Everything is solvable, you can even create templates for your dialect, discuss some technical details first. --Anatoli (обсудить/вклад) 01:52, 4 February 2013 (UTC)
Which dialect is unimportant, because that is too specific. It doesn't make sense that {{User zh}} doesn't exist, while {{User ar}} or {{User ms}} do. All are macrolanguages. Arabic or Malay speakers are likely to perceive themselves as speaking Arabic or Malay (not some variety that has an ISO code) when encountering non-speakers, the same way that Chinese speakers do. Wyang (talk) 02:37, 4 February 2013 (UTC)
Standard or most common Arabic (including certain colloquialism common to various dialects, loanwords) us "ar". For dialects we have "ary", "arz", etc. Arabic wan't heavily discussed, we didn't have battles and multiple votes about. As I said, a while ago we have reached a compromise for translations:
* Chinese:
*: Cantonese:
*: Mandarin:
*: Min Nan:
Having "Chinese" as the main header for entries was rejected by some of your compatriots and Taiwanese people and others. "Mandarin" is more specific than "Chinese". When one says "an Arabic word", no-one immediately question which variety, "a Chinese word" raises questions like "Mandarin or Cantonese". If we used zh instead of cmn, yue and nan a and "Chinese" instead of "Mandarin", "Cantonese" and "Min Nan" we would have a mix-up. In any case, things are the way they are, if you want to open this can of worms, then you can start a discussion in the Beer parlour. I personally don't want any change and other Chinese editors (native or learners) got used to the status quo. --Anatoli (обсудить/вклад) 03:03, 4 February 2013 (UTC)


* Chinese:
*: Cantonese: [[安全]]
*: Classical Chinese: [[安全]]
*: Gan: [[安全]]
*: Hakka: [[安全]]
*: Huizhou: [[安全]]
*: Jinyu: [[安全]]
*: Mandarin: [[安全]]
*: Middle Chinese: [[安全]]
*: Min Bei: [[安全]]
*: Min Dong: [[安全]]
*: Min Nan: [[安全]]
*: Min Zhong: [[安全]]
*: Old Chinese: [[安全]]
*: Pu Xian: [[安全]]
*: Xiang: [[安全]]
*: Wu: [[安全]]

as in 祕密 or 安全. Specificness is hardly an improvement. Besides, equating "Mandarin" with (or using it to denote) "Standard Chinese" or "Written vernacular Chinese" is just outright wrong. Wyang (talk) 03:17, 4 February 2013 (UTC)


Thank you for the great Mandarin entries, as well as the Korean bits... if you're ever looking for more red links to create, there are a bunch at User:Tooironic#Common Words that could use some attention. Again, thanks! —Μετάknowledgediscuss/deeds 05:30, 5 February 2013 (UTC)

OK. Is there a longer list? Wyang (talk) 05:35, 5 February 2013 (UTC)
Yeah, there's also Appendix:HSK list of Mandarin words/Elementary Mandarin (almost done) + Intermediate + Advanced. Please use the HSK categories as in the example entries (most bluelinked belong to them) if you decide to work on them. Note the difference between trad. and simp. in categorizations. --Anatoli (обсудить/вклад) 05:42, 5 February 2013 (UTC)
Thanks. I've updated {{cmn new}} to account for those (damn) templates. Wyang (talk) 05:59, 5 February 2013 (UTC)
I wouldn't have a clue how to use it but thanks. Perhaps it's just easier to create entries manually. --Anatoli (обсудить/вклад) 06:12, 5 February 2013 (UTC)
Awesome template! @Anatoli, I made a generalised version independently based on the same idea at {{new entry}}, but it's a lot worse because it's mainly for minority languages with less infrastructure. For this one, the documentation is currently at Template talk:cmn new. —Μετάknowledgediscuss/deeds 06:45, 5 February 2013 (UTC)
I've seen it but I still don't understand. Do I need to add it to my User:Atitarev/common.js to be able to use it? What triggers this template and when and how do I use the parameters? --Anatoli (обсудить/вклад) 07:04, 5 February 2013 (UTC)


From experience I know that very large switch statements like the one in that template are very slow. I hope you'll take that into account and not use this template often, or always substitute it. You can also try an alternative approach, by using subpages, one for each character, in the same way as {{langrev}}. That would be faster I think, especially when there are many options. —CodeCat 03:44, 8 February 2013 (UTC)


Hi, do both of the Proto-Sino-Tibetan words given at နေ really mean "sun, day"? Or does *g-na-s mean something else? —Angr 20:33, 13 February 2013 (UTC)

Yeah, my bad. Corrected. Wyang (talk) 22:14, 13 February 2013 (UTC)
Speaking of which, is နေ့ (ne., day) also from *nəj, in spite of the different tone? —Angr 21:26, 10 March 2013 (UTC)
Yes, according to Paul Benedict's Sino-Tibetan: A Conspectus. Wyang (talk) 11:05, 11 March 2013 (UTC)

Q about cmn vs. msc[edit]

First off, many thanks for your various ZH and KO term additions! (I don't suppose you have any more detail about 아귀 (agwi) etym 3, like first appearance or quotes or anything?)

I read above that “equating "Mandarin" with (or using it to denote) "Standard Chinese" or "Written vernacular Chinese" is just outright wrong.” However, the EN WP article on Standard Chinese says right in the first line that MSC == Mandarin, leaving me confused. I ask purely out of ignorance -- I studied some 普通话 for a couple semesters in university, but most of my time is taken up with Japanese. Given my meager understanding of the wide varieties of Chinese, I'm left wondering what MSC as a spoken lect would equate to, if not Mandarin? I thought Mandarin was the proper English label for 普通话, and I thought too that 普通话 was the same thing as MSC, but perhaps I'm way off the mark? Does "Mandarin", as you understand it, mean the Beijing dialects more specifically? Curious, -- Eiríkr Útlendi │ Tala við mig 07:46, 17 February 2013 (UTC)

Thanks. For agwi, I could only find quotations and dialectal forms (agu, akku), not Middle Korean forms. The addition of the obsolete sense of "mouth" by KYPark seems reasonable but needs checking though (may be dialectal instead; I only know of agari). A possible etymological connection between these is interesting: ag- ([1][2], ağız) is the common Altaic word for "mouth", "surviving" (from an Altaicist's POV) in Modern Korean as agari (derogatory: "mouth").
Wrt MSC, "Mandarin" is the name for a group of Chinese dialects, while MSC is a standardised variety of Chinese. There is no "Standard Mandarin" really, as MSC and written vernacular Chinese (the standardised written form of MSC) serve as de facto standards for spoken (in PRC, ROC, sg) and written (all Chinese-speaking regions) Chinese. Wyang (talk) 01:10, 18 February 2013 (UTC)
  • Thanks for both answers. Interesting about ag-; I note also that Turkish ağız purportedly derives from Proto-Turkic *āgıŕ, and that final "r" appears to have echoes in KO agari and JA anguri (“agape, gawping”). That said, JA anguri looks like it might ultimately derive from verb aku, “to open”. That might still be traceable to Altaic “mouth” words, but it seems to get tenuous, unless Altaic also has words of similar sound that have to do with “opening”. I note that KO 열다 (yeolda) doesn't seem to include any such ak or ag elements, though I suppose this might be the result of some phonetic change from an earlier form. That said, it looks like Old Turkic had aç- (“to open”), from Proto-Turkic *aç-, *ač- (to open), which is interestingly close to JA root ak “to open”. -- Eiríkr Útlendi │ Tala við mig 01:24, 18 February 2013 (UTC)
Sorry, Wyang but I'm sure your answer to the second question is biased. In the Western world, "Mandarin" (language) stands for two things - 1) the most common Chinese dialect (or group of dialects) - 官话 (Guānhuà), 北方话 (Běifānghuà) and 2) the standard Chinese (Putonghua, Guoyu, Huayu) - 普通话 (Pǔtōnghuà), 国语 (Guóyǔ), 华语 (Huáyǔ). It's just a reality. People study Mandarin at universities. Even though "standard Chinese" would a more correct term, it's seldom used, even in academic circles. Dictionary names still use just "Chinese", e.g. Chinese-English dictionary. --Anatoli (обсудить/вклад) 01:27, 18 February 2013 (UTC)
Yes, but the majority of people who call it that probably think of Chinese as a simple dichotomy between Mandarin and Cantonese.. Wyang (talk) 02:57, 18 February 2013 (UTC)
There may be some who do but dictionaries only describe the language as it used. I can attest that Mandarin classes where people are especially aware what Mandarin actually is, still use either Mandarin or Chinese to refer to the standard Chinese language they study, even when they study standard Chinese. There are too many names and too many language codes. The current practice is not based on the lack of knowledge or confusion but a compromise. We use "Mandarin" header, even if we talk about Northern Chinese dialects (not a standard Chinese term), like , etc. --Anatoli (обсудить/вклад) 03:33, 18 February 2013 (UTC)
I believe it is an inappropriate and inefficient compromise, as the 15 or so headings for Chinese will largely turn out to be reduplications of each other eventually. Wyang (talk) 03:43, 18 February 2013 (UTC)
You probably mean a different issue now. Words that are 95-99% identical in dialects but are split into Mandarin, Cantonese, etc.? There are not many editors eager to develop dialects. Min Nan and Cantonese are a slight exception. I don't think you'll have luck persuading the community to merge them into one language but if you stay longer, you may get a case. --Anatoli (обсудить/вклад) 03:51, 18 February 2013 (UTC)
That's what I meant. Using "Mandarin" to denote something that should be more appropriately labelled "Chinese" only seems fine now because there is currently practically nil additions in other varieties, but it will increasingly appear less appropriate as the category of "Mandarin" start to become saturated and other varieties grow. Wyang (talk) 04:02, 18 February 2013 (UTC)
Well, the community warmed up over time to merging Serbo-Croatian varieties, Romanian and Moldavian, Indonesian and Malay wasn't successful, Hindi and Urdu was never attempted. You can always try and raise it again at Beer parlour. What are you suggesting? Having ==Chinese== header and list all dialect pronunciations? --Anatoli (обсудить/вклад) 04:12, 18 February 2013 (UTC)
Yes, examples: compound, character. I agree with you; I too (highly) doubt this will pass if raised. Wyang (talk) 04:17, 18 February 2013 (UTC)
Wyang, I'm on your side because lesser bytes in many entries would be nice for me. --Lo Ximiendo (talk) 04:33, 18 February 2013 (UTC)
  • FWIW, I agree as well, not least as there is simply so much overlap between the various Chinese languages/dialects. It just seems wrong to have the headword and etym duplicated so many times over one single page. And most defs, too, at that.
Though that does raise the question of how to handle cases where the same word has different definitions in the different langs/lects. -- Eiríkr Útlendi │ Tala við mig 04:44, 18 February 2013 (UTC)
Regional words can be marked with Category:Regional context labels (or, if it exists in too many varieties, simply "dialectal"). Wyang (talk) 04:49, 18 February 2013 (UTC)
Hmmm. 'Chinese' is linguistically inaccurate and likely to cause a godawful mess. On the bright side, we already have a godawful mess that is arguably worse. The pronunciation section could also be solved by means of Lua if all topolects go straight from romanisation to IPA without a hitch (Lua will hopefully also remove the need for overly complex templates like py-to-ipa and grc-ipa-rows). —Μετάknowledgediscuss/deeds 05:02, 18 February 2013 (UTC)
@Wyang. All depends how strong the case is, how you present it. You need to know the moods of other Chinese editors and your possible opponents - their arguments. The arguments will need to be addressed. I'd hate to set up votes myself, since my only vote on banning entries like "Planck常数" in Chinese failed (Wiktionary:Votes/pl-2011-10/Mixed script Mandarin entries), even with a compromised solution to having them as soft redirects. I would probably support your idea in the vote. --Anatoli (обсудить/вклад) 05:04, 18 February 2013 (UTC)
Just wondering, what would the potential "godawful mess" be like? Wyang (talk) 05:37, 18 February 2013 (UTC)
Well, I'll try to explain more clearly. This approach, or to be exact an approach very similar to this approach is, what they use at zh.wikt, right? It works at zh.wikt because Chinese is something that everybody there knows and that enough people are willing to upkeep. Around here, the merge wouldn't be pretty. Sure, Hakka and Wu will go without a fight. But Cantonese entries, for example, will sometimes diverge from other languages or have a different level of detail and merging those will often take a human, not to mention that I'm already assuming that somebody is running a bot to do all the easy parts and to import data from zh.wikt, most likely. There are a lot of characters and shared words. So, if you're volunteering to write and run a bot, to sift through entries and to edit your way through massive categories, I still might not support, but I wouldn't oppose. The problem is that if we don't have someone, we could get a mess at least as bad as this one, especially if our format got frozen half-and-half Chinese-based/topolect-based or something horrible like that. —Μετάknowledgediscuss/deeds 16:04, 18 February 2013 (UTC)

Appendix:Proto-Sino-Tibetan/m/s-glak ~ m-glaŋ[edit]

For technical reasons, we really oughn't to have a slash in the title... —Μετάknowledgediscuss/deeds 23:43, 23 February 2013 (UTC)

Is the m/s actually a part of the reconstruction, or does it indicate alternative reconstructions? Normally, alternative forms get their own page. —CodeCat 23:47, 23 February 2013 (UTC)
It is part of the reconstruction; there is often variability in the prefix. I see it seems to be uninterpretable by {{reconstructed}}. Don't know how to fix it though - perhaps create a {{reconstructed/sit-pro}}? Wyang (talk) 23:50, 23 February 2013 (UTC)
I don't know anything about Sino-Tibetan, but are you saying that *m/s- is a prefix, but it's not known whether it was *m- or *s-? —CodeCat 23:53, 23 February 2013 (UTC)
*m- and *s- were the prefixes reconstructable that could be added to the stem/root to form derivatives. So the whole etymon (treated as a single unit, a word family which would contain multiple allofams) is written *m/s-. Wyang (talk) 23:57, 23 February 2013 (UTC)
Doesn't that technically mean that no single form is actually reconstructable for Proto-Sino-Tibetan proper? If this word has two different prefixes that cannot be cognates, then it seems to me that this word didn't actually exist in PST but was only formed later, and that one branch used s- while the other used m-. —CodeCat 00:16, 24 February 2013 (UTC)
It is possible for differently prefixed forms to exist in a language simultaneously, with the derived words having divergent or largely identical meanings. For example: ཉལ་བ. Wyang (talk) 00:21, 24 February 2013 (UTC)

Etymology of Latvian tirgus[edit]

As per you requested on the talk page, I have just added an etymology to that word (plus examples, derived terms, and a picture). --Pereru (talk) 09:51, 26 February 2013 (UTC)

Thanks! Looks great as usual. Wyang (talk) 23:30, 26 February 2013 (UTC)

All those ko new templates[edit]

What are they for? And why so many? —CodeCat 14:44, 26 February 2013 (UTC)

They allow for semi-automated creation of Korean entries and IPA transcriptions. Very helpful, too. —Μετάknowledgediscuss/deeds 23:29, 26 February 2013 (UTC)
Yes. For edits like the one at 경쟁력. Wyang (talk) 23:30, 26 February 2013 (UTC)
Why not just write a Module that does it? That wouldn't require dozens of templates and would be much easier to do. —CodeCat 23:39, 26 February 2013 (UTC)
I find MediaWiki code more familiar to work with. I've never used Lua before (so it probably wouldn't be easier, for me..). Wyang (talk) 23:42, 26 February 2013 (UTC)
The more I work with Lua, the more I realise how useful it is. It allows you to remove templates and parameters that aren't actually necessary because Lua is able to split strings and look at individual characters. A Lua function that converts, say, Hangul to IPA could be written in only a few lines of code. —CodeCat 00:09, 27 February 2013 (UTC)
OK, will study this when I have time. (A few lines probably won't be enough, considering the complexities in Korean phonology). Wyang (talk) 09:43, 27 February 2013 (UTC)

黄泉 etym[edit]

I dimly remember reading about the reason that the afterworld is associated with yellow springs, but it's been quite a while and I've forgotten most of the story. Could you add something about that to the etym, assuming of course that you're familiar with the tale? It's a bit obscure otherwise.  :) -- Eiríkr Útlendi │ Tala við mig 15:29, 13 March 2013 (UTC)

Sure, added. Don't know if my explanation is understandable though :) Wyang (talk) 03:57, 14 March 2013 (UTC)

Mandarin translation of dowsing[edit]

Thanks for splitting the SoP translation. I was just being lazy. It's so much easier just to use the JavaScript tool. --Anatoli (обсудить/вклад) 23:30, 20 March 2013 (UTC)

No worries. Wyang (talk) 23:33, 20 March 2013 (UTC)

Middle Chinese → Sino-Xenic[edit]

I was looking around the Tubes for a handy chart showing how Middle Chinese initials, medials, and finals generally change upon entrance into Japanese, Korean, and Vietnamese, but I could only find charts with example words, for the most part, not generalizations about the phonemes. To what degree is it in fact predictable, and if that degree is high enough, is there a chart anywhere? Thank you —Μετάknowledgediscuss/deeds 17:51, 23 March 2013 (UTC)

I'm pretty confused... *nyijH and *sijH have the same syllable coda and tone AFAICT. The Sino-Japanese descendants have the same vowel, but the Korean and Vietnamese ones don't. Why is that? —Μετάknowledgediscuss/deeds 18:20, 23 March 2013 (UTC)
There are detailed explanations of how each Middle Chinese initial/final/tone corresponds to modern Chinese and Sinoxenic readings in many publications written in CJKV languages, however maybe not so much in English. I had a search of Wikipedia and could only find this rather (unnecessarily) rudimentary page at Sino-Japanese vocabulary; the other language versions are more detailed: ja:音読み, ja:漢音, ja:呉音, ko:한국 한자음, zh:漢越音. As for predictability, I estimate 90-95% of modern readings to be regular. The percentage is a lot less in Chinese varieties with prominent literary/colloquial distinctions. The reason for the difference in the vowels is the difference in MC initials. This -ij rhyme corresponds to:
  • Japanese: -i
  • Middle Korean: -uy (velar and laryngeal initials) (> Modern i), -o (coronal sibilant initials) (> a), -i (else)
  • Vietnamese: -ư (coronal sibilant initials), -i/y (else)

Wyang (talk) 07:45, 24 March 2013 (UTC)

I see, excellent. And thank you for the Wikipedia links; they are very slow reading for me, but better than the English resources by far. How often does the initial affect the vowel like that? (PS: All you have to do is learn Lua and you will be worthy of worship. Your knowledge is impressive in the extreme.) —Μετάknowledgediscuss/deeds 16:14, 24 March 2013 (UTC)
I'm flattered :) Influence by initial or medial glide is very common. Almost every rime corresponds to multiple finals in the modern language, with the exact reflex depending on initial or glide (-y-, -w-, -ɣ-). Just found a good Wikipedia article explaining the correspondence between MC finals and Beijing Mandarin ones: Middle Chinese finals. Wyang (talk) 00:31, 25 March 2013 (UTC)

Found an issue[edit]

Sorry to bug you. Please read Template_talk:ja-romaji#Category:Japanese_romaji_without_a_main_entry. --Anatoli (обсудить/вклад) 04:30, 4 April 2013 (UTC)

Replied there. Wyang (talk) 06:47, 4 April 2013 (UTC)
The problem keeps coming back, please see Template_talk:ja-romaji#Category:Japanese_romaji_without_a_main_entry. We have removed all deprecated parameters, so it should be easier to code. --Anatoli (обсудить/вклад) 11:43, 7 April 2013 (UTC)



Please say if you have an opinion on this topic. --Anatoli (обсудить/вклад) 04:35, 8 April 2013 (UTC)


Hi Wyang, I love the automatic entry-creation script, but the IPA they generate seems to be incorrect. See 拆開 for example. Is this fixable? ---> Tooironic (talk) 00:48, 9 April 2013 (UTC)

Thanks, but how is the IPA incorrect? Wyang (talk) 01:01, 9 April 2013 (UTC)
Also, that template allows more functions:
|type= : 21 (eg. 市政厅), 12, 22 (if length > 2)
|e1= , |e2= : new etymology section, definitions for the first and second characters
|c1= , |c2= : if components for etymology are different from the first and second characters
| : more definitions
|wp= : link to zh.wikipedia
市政厅: {{subst:cmn new/a|p1=shì|p2=zhèng|p3=tīng|n|[[city]] [[council]]|type=21|c1=市政|c2=厅|e1=municipal government|e2=hall; designation for a certain level of government|wp=y}}
落實: {{subst:cmn new/a|p1=luò|p2=shí|a|[[workable]]; [[practical]]|v|to [[implement]]; to [[put into effect]]|e1=to fall, to settle|e2=true, real}}
Wyang (talk) 01:10, 9 April 2013 (UTC)
Thank you, that's extremely helpful. I don't actually speak IPA but, I dunno, it doesn't look right to me. E.g. at 内地 is that really how it's written? So many numbers... ---> Tooironic (talk) 01:15, 9 April 2013 (UTC)
That's just the tones and the tone sandhis in Beijing Mandarin. Superscripts 1-5 are the same as tone symbols ˩˨˧˦˥; they are easier to recognise typographically. Wyang (talk) 01:19, 9 April 2013 (UTC)
Wyang, how do you invoke the accelerated entry creation? --Anatoli (обсудить/вклад) 03:50, 9 April 2013 (UTC)
I got, thanks. Have you documented the code somewhere? c1 and c2 don't seem to work. I tried {{subst:cmn new/a|p1=shì|p2=zhèng|p3=tīng|n|[[city]] [[council]]|type=21|c1=市政|c2=厅|e1=municipal government|e2=hall; designation for a certain level of government|wp=y}}. Also how do you create entries for simp. or trad. only (not both)?--Anatoli (обсудить/вклад) 03:58, 9 April 2013 (UTC)
It worked when I tried on 市政厅 ([3]). The instructions are at {{cmn new}}. This code is for one entry only, so for simp+trad, you have to submit the code on both pages, and for one of simp/trad, it's one submission at the missing entry. Wyang (talk) 04:10, 9 April 2013 (UTC)
Nice tool!I got it to work with the shorter version (cmn new/a) on 废除 and 廢除
User:Ruakh has developed a nifty tool (User:Ruakh/Tbot.js) for accelerated Russian entries creation from translation sections. I enabled it here. Do you think you could create the same for Mandarin? Just clicking on a red link (green if the script is enabled) in translations creates an entry in Russian. I'm just filling the rest manually (inflection, etc.). --Anatoli (обсудить/вклад) 04:28, 9 April 2013 (UTC)
The cases are a little different. In translations {{t|cmn for trad is not assigned a transcription, so it would not (?) be possible to extract pinyin for the missing trad entry. I'm not used to writing .js things like that, so having me to digest what's written there probably will take days. The current code is simple enough, for me... Wyang (talk) 04:41, 9 April 2013 (UTC)
I thought I'd let you know. The transcription for trad. is the same as for simpl., so tr= can be copied from simplified.
After enabling 'cmn' and clicking on the green link (减弱 - appears green on mine in Translation section) in abate#Translations I instantaneously got this:


# [[abate]] {{gloss|to bring down or reduce to a lower state}}
The gloss, the part of speech, tr is all there, only the code is generic and uses {{t|head. I find both yours and his work amazing pieces for accelerated development, keep up the good job. --Anatoli (обсудить/вклад) 04:52, 9 April 2013 (UTC)
With cmn the trad-simp conversion has to be done in addition, which is preferably achieved through substitution of existing trad-simp lists (which is what {{cmn_new is doing). I envisaged filling a missing entry with the code {{subst:cmn new/a|p1=PINYIN|PoS|defn}} when clicked, then having to decompose the PINYIN into syllables separated by |p2= etc. Since trad form is not assigned a |tr= in translation sections, one would have to copy the pinyin manually. So for simp it's one extra step of decomposition of PINYIN, while for trad it's two extra steps. The overall process is not hugely simpler than substituting {{cmn_new from scratch, that way the definition and SoPness are also checked (which is important for cmn as the definitions are likely different from the translation glosses). Wyang (talk) 05:04, 9 April 2013 (UTC)

Adding new languages


Sorr to be a nuisance. Could you please write a basic script to create new Russian and Japanese entries (in this order) like you did for Mandarin, Korean and Vietnamese?

A basic Russian (ru) noun entry is very simple but it needs gender (g) and transliteration. For example: "лютик" - a buttercup (leaving the entry uncreated)



# [[buttercup]]

Interjections, conjunctions, particles, prepositions use {{head||ru|preposition... Japanese entries are more complicated, divided into hiragana, katakana, kanji and I don't know if it's feasible. --Anatoli (обсудить/вклад) 00:45, 11 April 2013 (UTC)

No worries. Russian one done at {{ru new}}. Don't really know how Japanese entries should be formatted. If you can point to me all format possibilities, that'd be great. Wyang (talk) 01:12, 11 April 2013 (UTC)
Wow! That was quick. Thank you! Will test and come with some feedback. Japanese can be basic and more complex, depends how far you're willing to go. Didn't give as I didn't know if you would agree. --Anatoli (обсудить/вклад) 01:36, 11 April 2013 (UTC)
Japanese should be alright. Forms are easy to detect (see if it's a pure hiragana or katakana string, assign it as such if so; otherwise, kanji if no kana present, mixed if kana present), and script conversion shouldn't be an issue as well (hira to kana and to romaji, or the reverses, depending on requirement). Wyang (talk) 01:40, 11 April 2013 (UTC)
JA noun example 愛国心, kanji, need to provide kanjitab (this is done simpler than Chinese hanzi), wikified hiragana, romaji:


# [[patriotism]]
JA noun example あいこくしん, hiragana, need to provide romaji and {{ja-def:


# {{ja-def|愛国心}} [[patriotism]]
JA noun example アニメ , katakana, need to provide romaji and, hidden index (convert katakana to hiragana アニメ -> あにめ but without voiced consonants (e.g. が (ga)-> か (ka)) - this part may be hard, will check with Haplology or Eirikr). It's fine if just a parameter, without any tricks.


# [[anime]]

::: --~~~~

Thanks. Will look into these. (Forgive me if I seem unresponsive and distracted by real life...) Wyang (talk) 01:58, 11 April 2013 (UTC)

Hi, no pressure at all, just reminding that I still need Japanese templates. A simple template, without IPA or script conversions will do, as long as the formatting matches the above. --Anatoli (обсудить/вклад) 03:25, 12 April 2013 (UTC)
Done: see {{ja new}}. Wyang (talk) 04:01, 12 April 2013 (UTC)
Great stuff, thank you! Only I don't understand how the template detects the script. Is it automatic or by the params used?
Please see ラジオカセット, the romaji has "ッ" in "rajiokaseッto". --Anatoli (обсудить/вклад) 04:16, 12 April 2013 (UTC)
It's automatic. Yeah, I haven't done geminate consonants yet... Now done. Wyang (talk) 04:31, 12 April 2013 (UTC)
I see. It's only for nouns at the moment? Isn't it? Even so, this revision of 集る is not so great. :) --Anatoli (обсудить/вклад) 04:33, 12 April 2013 (UTC)
Fixed. (I didn't know what's the parameter for mixed script, so I just put 'm') Other PoS enabled too. Code has been altered, see the template for example usages. Wyang (talk) 04:48, 12 April 2013 (UTC)

Script detecting

Is M.script() in Module:ja for detecting the script? If so, there are easier ways to do so. --Z 08:21, 11 April 2013 (UTC)

More requests

I am a serial pest and I'd like to ask you for two more very simple templates - pinyin and romaji, if you haven't created them yet. (I hope I can learn from these how to create my own).

Pinyin is standard, romaji is still in a bit of limbo but we have many thousands romaji entries, so I don't see them reverted soon.

Pinyin can have one or two parameters (and more) per line, e.g. dòngnéng:



# {{pinyin reading of|動能|动能}}

Romaji can have up to six params, e.g. akachan:



They are not as important and the templates are already very easy. Perhaps this could be done differently, by some accelerated method like English plurals (green links) or something. --Anatoli (обсудить/вклад) 06:50, 12 April 2013 (UTC)

No worries. Done at {{cmn new/py}} and {{ja new/rom}}. (despite the fact that I do not support keeping these romanised entries in the long term...) Wyang (talk) 12:53, 12 April 2013 (UTC)
好极了!多谢啊。—This unsigned comment was added by Atitarev (talkcontribs).

rs value for 纯洁


I think Template:cmn new incorrectly generated rs value for "纯洁". Should be 纟04, not 糸04. But I used used "t", not "s" parameter (by mistake). It didn't matter on "厂长" though, where I also used "t" (I wonder how the script figured it out). --Anatoli (обсудить/вклад) 01:12, 17 April 2013 (UTC)

I see. They are basically the same radical, one is the combining or simplified form (纟) of the radical form (糸). Characters should be listed under the radical forms. I don't think I'm able to separate the two at the code level, since the subpages of Index:Chinese radical (where I extracted my rs values from) do not differentiate the two. I suggest merging the combining or simplified forms into the radical forms (like those subpages), which can be done if you replace all the {{{rs|}}} in Category:Mandarin headword-line templates with {{#invoke:zh|sortkey_conv|{{{rs|}}}}} (They are protected so I can't edit them). Wyang (talk) 01:39, 17 April 2013 (UTC)
There is no need to specify traditional/simplified. They are detected automatically using conversion lists ({{zh-trad-to-simp}}, {{zh-simp-to-trad}}). Wyang (talk) 01:41, 17 April 2013 (UTC)
Thank you. Automatic script detections seems to work well. I know that 纟 is a simplified form of 糸. They've been usually sorted differently though. You lost me about your suggestion. Are you basically suggesting that both simp and rad. were sorted the same way, if they use radicals, not pinyin for sorting? Will this also affect characters like 門 and 开? Not sure if I understand this correctly. --Anatoli (обсудить/вклад) 02:17, 17 April 2013 (UTC)
I was suggesting that the combining forms and simplified forms of the radicals be treated as identical to the basic radical forms. It would affect 門 but not 开. Or, even better, just sort everything in pinyin and get rid of this parameter altogether (currently {{cmn-noun}} uses an awful mix of sorting methods if you have a look at the code). Wyang (talk) 03:35, 17 April 2013 (UTC)
From a Western perspective, I was wondering this whole time why we don't just sort by pinyin. I assumed it is out of respect to traditional Chinese lexicography. In any case, I'd support that. —Μετάknowledgediscuss/deeds 03:57, 17 April 2013 (UTC)
Sorry, I put the wrong character, I meant 開 and 开, the two equivalents (t/s).
@Metaknowledge. The arguable benefit for sorting by radical, not pinyin was meant for people (primarily Chinese) not familiar with pinyin, speakers of dialects. Especially applicable to overseas Chinese where pinyin was not taught and various input methods exist, which don't rely on pronunciation of characters. The designer of this method - User:A-cai comes from Taiwan where dictionaries are also structured more towards radicals and romanisation, including two variants of pinyin have changed over time and still don't enjoy the full support of the population.
I already expressed my support to sort words by numbered pinyin. If you're able to do it, go ahead. All Chinese speaking editors already expressed supported. I think for people not knowing Chinese, it's still possible to find words they want just by entering them in the search window. Sorting by pinyin is more beneficial for learners than for native speakers.
This discussion reminded me of another thing I wanted to ask Chinese speaking editors. (Tooironic wasn't very enthusiastic about it, when I asked him a while ago). Please take a look at Category:Japanese_terms_by_their_individual_characters. I think creating categories for by their individual characters is a brilliant idea, even though in Chinese, their number can be quite big but finding words that use a specific character is very helpful for learners. But creating categories can be done automatically, can it not? Adding to categories is done via Template:ja-kanjitab. We could tweak Template:Hani-forms and Template:zh-hanzi to automatically add words to categories like e.g. Category:Mandarin terms spelled with 始. What do you think? User:Daniel Carrero has created this, see User_talk:Daniel_Carrero#Template:ja-kanjitab. --Anatoli (обсудить/вклад) 04:44, 17 April 2013 (UTC)
I don't think I support creating such categories (for Chinese and Japanese). It's going to be more troublesome to maintain those categories for Chinese. All one has to do to find all compounds and related pages of is to go to [4] or some page like this. Wyang (talk) 05:08, 17 April 2013 (UTC)
The trouble with those links are that they mix all other languages using the same character, translations and pinyin entries. The format is no user-friendly either.
Daniel might be able to help with making it work. When a Japanese entry is created, the categories are added automatically. Their structure is identical and only differs by the kanji and sort value. The category Category:Japanese_terms_spelled_with_始 lists only Japanese words, no other. Just looking at the list is educational and shows how words can be created with the character, especially common infixes or suffixes. I think creation of characters can be automated as well. --Anatoli (обсудить/вклад) 05:24, 17 April 2013 (UTC)
Creating these is easy, but I am not particularly fond of the idea. These should be information provided at the character page, which should explain the definitions and the relevant compounds. It shouldn't be done with thousands of categories. Wyang (talk) 05:29, 17 April 2013 (UTC)

IPA 今兒 and 今儿

It may be hard to generate correct IPA for erhua (儿化) by your template. Besides, the reading can be as expected, like 女儿. Anyway, could you fix the IPA and check the entries otherwise, please?

For words with we agreed to make alternative forms, like 沒事兒 and 没事儿, unless it's a different word. 今儿 probably won't qualify as an alternative form of 今天. --Anatoli (обсудить/вклад) 01:56, 19 April 2013 (UTC)

|p1= should be 'jīnr' and |p2= is empty (although the template automatically generates pronunciation for the second character too... will solve this with the new Module:zh-based template). The pronunciation is still generatable by substituting {{py-to-ipa}}, eg. {{subst:py-to-ipa|jīn|er1=y}}. I've created {{erhua form}}. These are not really alternative forms, they are diminutive forms (like Dutch -je, English -ling) with sometimes different meanings. Wyang (talk) 01:58, 19 April 2013 (UTC)
Thanks. Just on the erhua forms. It may be worth separating words that simply attach : 没事-> 没事儿 to those that replace another form? 今儿 = 今天, 这儿 = 这里. I think is seldom used as "today" and doesn't mean "here". --Anatoli (обсудить/вклад) 02:05, 19 April 2013 (UTC)
这儿 is formed from 这 ("this, such, here"), not 这里. The development of 这 -> 这里 is parallel to the development of 这 -> 这儿. The original un-erhua-ed forms may not be in use colloquially, but etymologically this is how erhua forms are generated. Wyang (talk) 02:10, 19 April 2013 (UTC)
I'm aware of this (origin). I just found Category:Mandarin erhua terms, could you perhaps repoint the template and/or merge the categories? --Anatoli (обсудить/вклад) 02:33, 19 April 2013 (UTC)
Repointed the template. Also, how can you enable sorting by pinyin? --Anatoli (обсудить/вклад) 02:54, 19 April 2013 (UTC)
They should be sorted now. Wyang (talk) 03:25, 19 April 2013 (UTC)
Thank you. --Anatoli (обсудить/вклад) 03:26, 19 April 2013 (UTC)

Template:cmn new and erhua

Could you have a look at this, please? The generated pinyin and IPA wasn't right:

{{subst:cmn new/a|p1=fū|p2=qī|p3=diànr|n|{{erhua form|family-run shop}}}}

Oops, forgot the entry: 夫妻店儿, created by Tooironic. --Anatoli (обсудить/вклад) 12:10, 19 April 2013 (UTC)
Yeah... Avoid erhua entries for now. Let me work on Module:zh. Wyang (talk) 12:07, 19 April 2013 (UTC)
OK, no worries. --Anatoli (обсудить/вклад) 12:10, 19 April 2013 (UTC)

Etym for 年度#Japanese

I was hoping you might be able to help with the etymology for Japanese 年度. I suspect this was in use in China some time back, but there's a slim chance it's a more modern coinage. Do you have any insight? If so, please change the etymology there as appropriate. TIA, -- Eiríkr Útlendi │ Tala við mig 16:49, 15 May 2013 (UTC)

Come back


You should come back, there's a lot of work with Mandarin or Chinese if you want to call it so. --Anatoli (обсудить/вклад) 01:56, 24 May 2013 (UTC)

JA etyms for Buddhist terms

I'm chewing on the etymologies for various kinds of 如来. I'm working on the assumption that Buddhist terms would have been imported from Chinese wholesale; JA-JA dictionaries give etymologies traced back to Sanskrit, which must have come via China, especially considering the history of Buddhism and even literacy in Japan.

However, I'm uncertain if names like 大日如来 were brought into Japanese as an integral unit, or if the name portion of 大日 came into Japanese, with the 如来 added in Japan. I ask because I'm noticing that some of the w:Five Dhyani Buddhas show up on the ZH WT with the epithet instead of 如來, such as zh:w:不空成就佛 vs. ja:w:不空成就如来. I do see that zh:w:不空成就如來 redirects to the zh:w:不空成就佛 entry, and google:"不空成就如來是" does find over 100K hits, but in terms of etymologies, I'm not sure what's best.

For now, I'm going on the assumption that the names were imported into JA as integral units from Middle Chinese. Please hit me over the head with the cluebat as necessary.  :) ‑‑ Eiríkr Útlendi │ Tala við mig 17:43, 24 July 2013 (UTC)

大日如來 is attested in "大日經" ("大毗盧遮那成佛神變加持經"), translated by Śubhakarasiṃha into Chinese in 724. I'd agree with your assumption. Wyang (talk) 03:54, 25 July 2013 (UTC)

Etymologies again -- KO and JA

Thank you for your etymological activities of late, I very much appreciate the fuller picture of various KO terms.

Along similar lines, I was wondering about 아버지 (abeoji). Modern JA has 祖父, (ōji, grandfather; old man), deriving from OJP opoji, which at first glance looks like a possible relative to KO 아버지 (abeoji).

That said, OJP opoji is itself a compound of opo “big, great; many” (root of modern JA 大きい (ōkii, big, great), 多い (ōi, many)) + ji < chi = (chi, honorific form of address for males). Any chance that 아버지 (abeoji) is also of compound derivation?

TIA, ‑‑ Eiríkr Útlendi │ Tala við mig 18:43, 25 July 2013 (UTC)

abeoji was abi in the 15th century. The form abeoji clearly violates vowel harmony and seems to be of late origin, formed from abi + some sort of suffix -Aji. The -Aji (-아지/어지) suffix is probably the same suffix as in 바가지 (< , "gourd"), 싸가지 (< , "hope" < "bud"), or even the diminutive suffix -ngaji in 송아지, 강아지. The first component abi was probably Altaic: Turkish aba, Mongolian abu. I'm not sure of the etymology of the Japanese words. Wyang (talk) 00:28, 26 July 2013 (UTC)
  • Brilliant, exactly the kind of detail I was hoping for. It sounds pretty clear from your description then that opoji and abeoji are only superficially similar, and that OJP chi, "male" is no match for Middle Korean aji, "diminutive suffix".  :) Thank you! ‑‑ Eiríkr Útlendi │ Tala við mig 01:12, 26 July 2013 (UTC)

“Phonosemantic interpretation” of Chinese characters

Just a heads-up that I’ve commented on a discussion you’ve participated quite extensively in: Wiktionary:Beer parlour/2013/June#Revert Cheers!

—Nils von Barth (nbarth) (talk) 14:53, 1 August 2013 (UTC)

Re: Tea etymology, Nils von Barth

You have new messages Hello, Wyang. You have new messages at Nbarth's talk page.
Message added 15:31, 14 August 2013 (UTC). You can remove this notice at any time by removing the {{talkback}} template.

老头儿 and 노틀

Do you have a source to back up why you think these two terms are related? I can't find anything on the Internet. ---> Tooironic (talk) 09:22, 1 September 2013 (UTC)

For example, [5]. Wyang (talk) 12:52, 3 September 2013 (UTC)

Re Tibetan pic

You have new messages Hello, Wyang. You have new messages at I'm so meta even this acronym's talk page.
Message added 16:40, 31 October 2013 (UTC). You can remove this notice at any time by removing the {{talkback}} template.
You have new messages Hello, Wyang. You have new messages at I'm so meta even this acronym's talk page.
Message added 10:43, 6 November 2013 (UTC). You can remove this notice at any time by removing the {{talkback}} template.

Korean templates


How do I use your Korean template(s) to generate RR transcription for Korean terms? --Anatoli (обсудить/вклад) 22:51, 9 November 2013 (UTC)

Hi, just created a new template. Please use {{subst:ko new/rr|가방}}. Wyang (talk) 03:13, 11 November 2013 (UTC)
Thank you! Do rule match those described by Shinji, his Korean link and the French templates? If you don't know, then I guess, I will need to run test cases with your templates as well and see if they are acceptable. Can it work for single words only, not strings with spaces? --Anatoli (обсудить/вклад) 03:22, 11 November 2013 (UTC)
It basically matches the official guidelines, except that it uses dashes in consonant + vowel syllable divisions, i.e. bur-yaseong instead of buryaseong (This can be changed by replacing all "-" in the produced string). This template was written before the advent of Lua, so it is quite slow and may not be useful with long strings, but for simple strings like 5 - 6 characters this should be sufficient. To incorporate Lua into this template would probably involve a quite substantial rewrite. Wyang (talk) 03:29, 11 November 2013 (UTC)
I just wanted to verify what it does. If it can work with short strings but accurately, it can still be used to verify {[user|Kephir}}'s module or add test cases based on the result from your template. E.g {{subst:ko new/rr|있습니다}} worked OK - "issseumnida" but {{subst:ko new/rr|갋}} gives "ga". You probably need to test it with Module:ko-translit/testcases examples (some examples don't match the RR transcription rules). --Anatoli (обсудить/вклад) 03:50, 11 November 2013 (UTC)
Yes double consonants in codas were not included - couldn't be bothered to do research and generate a large matrix of how to romanise all combinations (also because exceptions are quite common), so just left out this bit altogether. Wyang (talk) 03:58, 11 November 2013 (UTC)

"measure word", "counter" and "classifier" - headers[edit]


You might be interested in this topic: Wiktionary:Beer_parlour/2013/November#Measure_word. --Anatoli (обсудить/вклад) 01:44, 29 November 2013 (UTC)


Thank you for adding an etymology to this entry. I'd love to add etymologies like these to the Vietnamese Wiktionary, but I've had very little luck finding etymologies apart from modern loanwords. What sources do you consult?

Also, I noticed that the Mường word cal³ is given in an orthography I'm not familiar with. The Vietnamese Wiktionary has been using the Vietnamese-based orthography that seems to be ubiquitous among Vietnamese academic and government sources, since the Mường live in Vietnam. (This orthography uses Vietnamese tone marks rather than tone numbers.) Where can I find out about the orthography you're using?

 – Minh Nguyễn (talk, contribs) 05:39, 16 December 2013 (UTC)

Hi Minh. The Sealang Mon-Khmer Comparative Dictionary is a very useful resource for this purpose, which includes results from Shorto's Mon-Khmer Comparative Dictionary, and Ferlus' unpublished 2007 manuscript "Lexique de racines Proto Viet-Muong" (from the POV of Vietnamese). The notation is per Ferlus (2007) - The Mường form using Vietnamese diacritics appears to be chẳl. Wyang (talk) 06:00, 16 December 2013 (UTC)
Wow, that's awesome! What's the copyright status on the database? Some of the citations have tooltips that say, "Do not cite entries from this manuscript!" What's that about?
If I'm not mistaken, spellings like cal³ in the database are just IPA transcriptions with Chinese-style tone numbers. It would be more appropriate to use the "local orthography" field in {{term}}. Some linguists use ad-hoc transcriptions, but the Vietnamese-based one seems to be prevalent in the media and other dictionaries. Would you mind if I changed transcriptions like cal³ to chẳl when I see them?
 – Minh Nguyễn (talk, contribs) 06:51, 16 December 2013 (UTC)
Sure, I have changed it myself. Are there any good online resources describing the phonology or orthography of tieng Muong, or other Vietic languages? Googling in English and Vietnamese does not seem to yield much useful. The tooltip note means the work is a preliminary unpublished manuscript and is subject to errors. Wyang (talk) 21:38, 16 December 2013 (UTC)
I've found very little online, but here's a decent primer on the Mường orthography (Flash Paper) written in Vietnamese. In general, Vietnamese glosses are given after Mường words. Let me know if you have any questions about this document. – Minh Nguyễn (talk, contribs) 13:09, 17 December 2013 (UTC)
Excellent, thanks! :) Wyang (talk) 01:15, 18 December 2013 (UTC)

Template:vi new[edit]

I'm trying to find new Vietnamese words but what is this template supposed to do for me? TeleComNasSprVen (talk) 07:44, 28 December 2013 (UTC)

There's no documentation or examples but I have just created [[cùng nhau]] using this code on a blank page: {{subst:vi new|cùng|nhau|adv|[[together]]}}. --Anatoli (обсудить/вклад) 13:13, 28 December 2013 (UTC)

NEW templates[edit]


Could you document a bit your templates, like Template:vi new and Template:ko new. A few examples would do. Not sure what happened with phim (thanks for fixing!). I used {{subst:vi new|phim|n|film}}. What did I do wrong? --Anatoli (обсудить/вклад) 02:11, 6 January 2014 (UTC)

Hi, should be
{{subst:vi new|phim||n|[[film]]|ee={{etyl|fr|vi}} {{term|phim|lang=fr}}.}}
Wyang (talk) 02:18, 6 January 2014 (UTC)

Inaccuracy in the Korean verb template[edit]

Hi, please join this discussion, if it's okey with you. --Anatoli (обсудить/вклад) 09:43, 23 January 2014 (UTC)

Zhuyin and erhua[edit]


Thanks for your efforts on the conversion module! I posted a question there (described as I personally see it), copying here:

Issue with erhua: Since erhua is very unpopular in Taiwan, we still need to convert them correctly but they may be back conversion problems. is equivalent to a full syllable "ēr" (first tone without a tone mark. To convert Pinyin like wánr and dàir probably need to do ㄨㄢˊㄦ˙ and ㄉㄞˋㄦ˙. Converting them backward would give wáner (wán+er) and dàier (dài+er). I can't find a definite explanation of how to transliterate erhua using Zhuying but Pleco uses ㄦ˙ (with a neutral tone marker) to render the "-r" suffix.

Could you reply at Wiktionary:Grease_pit/2014/January#Converting_numbers_to_some_other_symbols_in_Lua, please?

I have also put there my ideas about what needs to happen next, when coding and testing is complete, please comment, if you can. --Anatoli (обсудить/вклад) 22:30, 27 January 2014 (UTC)

Hi, replied at Module talk:PinyinBopo-convert/testcases. Wyang (talk) 22:58, 27 January 2014 (UTC)
I've commented at Module_talk:PinyinBopo-convert/testcases#7_tests_failed. Your feedback is important on this, if you're familiar with erhua spelling in Zhuyin. --Anatoli (обсудить/вклад) 01:01, 28 January 2014 (UTC)

Pinyin dì'èr shǒu, hm[edit]


Could you take a look at Module:PinyinBopo-convert/testcases, please? "dì'èr shǒu" becomes "ㄉㄧˋ 'ㄜˋㄦ ㄕㄡˇ" but should be "ㄉㄧˋ ㄦˋ ㄕㄡˇ". Perhaps if apostrophes are removed before the conversion it'll work. Also what would PinyinZhuyin for Pinyin "hm" look like, as in , also hèn? I'm just trying to cover all corner cases, not trying to bombard you with requests :). --Anatoli (обсудить/вклад) 22:09, 28 January 2014 (UTC)

Hi, no worries. The former was taken into account and {{#invoke:PinyinBopo-convert|convert|dì'èr shǒu}} works as expected: Script error: No such module "PinyinBopo-convert".. However apostrophe in PAGENAME fails to be recognised no matter what I do to the module. Thus {{#invoke:PinyinBopo-convert|convert|{{PAGENAME}}}} fails at dì'èr shǒu. I am not sure what can be done to fix this. As for the latter, how should 'hm' etc. be transcribed in Zhuyin? Wyang (talk) 22:40, 28 January 2014 (UTC)
Thanks. I'll post a question on Wiktionary:Grease_pit/2014/January regarding the apostrophes, they are quite common in pinyin. I'll also search more for "hm", the dictionaries I've checked so far only had "hèn". --Anatoli (обсудить/вклад) 22:48, 28 January 2014 (UTC)
I have imported the whole article from my Pleco dictionary (after setting tranlsiteration to Zhuyin), all usage examples also get Zhuyin:
{interjection} (expressing disapproval or reproach) humph
噷, 别提了。
ㄏㄇ˙, ㄅㄧㄝˊㄊㄧˊ ㄌㄜ˙.
Humph, don't bring that up.
噷, 算了吧。
ㄏㄇ˙, ㄙㄨㄢˋ ㄌㄜ˙ ㄅㄚ˙.
Humph, forget about it.
So, string "hm" should be just "ㄏㄇ˙"--Anatoli (обсудить/вклад) 23:17, 28 January 2014 (UTC)
OK, should be like that now. Wyang (talk) 23:20, 28 January 2014 (UTC)
Thanks. I have asked a question about apostrophes. Have you tried using codes for apostrophes, rather than literals? --Anatoli (обсудить/вклад) 23:54, 28 January 2014 (UTC)
The apostrophe only fails to be recognised when it is part of PAGENAME, using it inside the string is not buggy (as I said above). For the latter, I am not sure I understand what you mean in the question.. Wyang (talk) 23:59, 28 January 2014 (UTC)
Sorry for being dumb, I have misunderstood you :) Could you clarify what you're trying to achieve at Wiktionary:Grease_pit/2014/January#Handling_conversion_with_Lua_with_apostrophes because I can't help you here? Note I had to put double quotes around "dì'èr shǒu" in the test case but it's still reporting as "failed". --Anatoli (обсудить/вклад) 00:07, 29 January 2014 (UTC)
I am terrible at making myself understood. I have replied there, hopefully it is understandable. Wyang (talk) 00:15, 29 January 2014 (UTC)
No-no. It makes sense, I haven't read carefully the first time. Maybe a silly suggestion but have you tried - storing PAGENAME in a variable, displaying it first, remove apostrophe, display again, then convert, check result in this order? It's not easy to debug Lua, I know. --Anatoli (обсудить/вклад) 00:21, 29 January 2014 (UTC)
No, I haven't tried that. I guess I will wait for the more knowledgeable to kindly shed some light on the problem first, and resort to my dilettantish skills if all else fails... Wyang (talk) 00:31, 29 January 2014 (UTC)
More "weird" Pinyin and Zhuyin: ng= (with various tones or neutral, Pleco lists a few) as in and . is both a Han character and a Zhuyin symbol reserved for non-Mandarin sounds and interjections like this. --Anatoli (обсудить/вклад) 00:56, 29 January 2014 (UTC)
Should be OK now. Wyang (talk) 01:06, 29 January 2014 (UTC)
It might be more complicated as "ng" is realised as ńg, ňg, ǹg with tones. I would add it myself but I don't understand the code well. :) BTW. re 第二手 - User_talk:Atitarev#.E7.AC.AC.E4.BA.8C.E6.89.8B, Tooironic (talkcontribs) has some concerns about limiting it to Taiwan, me too :) --Anatoli (обсудить/вклад) 01:30, 29 January 2014 (UTC)
I think it is done too now. Wyang (talk) 02:08, 29 January 2014 (UTC)


Hi Wyang. What makes you think this is Taiwanese Mandarin? I've heard Mainlanders use it before. ---> Tooironic (talk) 01:48, 29 January 2014 (UTC)

I don't know, I just thought it sounded strange, so I assumed it is a Taiwanese usage. To me it just means "secondary (information)". The more common term is 二手. It seems 二手 is more common in Taiwan as well. Wyang (talk) 01:59, 29 January 2014 (UTC)
I've changed to "rare" for the correct categorisation (Mandarin terms with rare senses, not rare forms), it doesn't mean I agree it's rare. I don't know. --Anatoli (обсудить/вклад)
I've been seeking used bookshops in Taipei recently and they all seem to use just 二手 rather than 第二手, or the other purported Taiwanese term 中古 for that matter. — 18:57, 5 February 2014 (UTC)



Could you have a look at Zhuyin for fèiyong, please? I've added it to Module:PinyinBopo-convert/testcases--Anatoli (обсудить/вклад) 00:52, 6 February 2014 (UTC)

Hi, it seems Lua is having a bit of a meltdown at the moment, if you try {{#invoke:PinyinBopo-convert|convert|anything}}. I don't know what is going on. Wyang (talk) 00:58, 6 February 2014 (UTC)
Sorry. I don't understand it but other cases seem to be alright at Module:PinyinBopo-convert/testcases. It seems i+y cause this. yòng looks OK. I wish I could help more. --Anatoli (обсудить/вклад) 01:04, 6 February 2014 (UTC)
There is no problem now. Could you have a look at Module talk:PinyinBopo-convert? Are all the conversions there working on your computer? Wyang (talk) 01:13, 6 February 2014 (UTC)
Yes, they do. I am on a lookout for these. Thank you!
Sorry to be a serial pest. I have 2 requests for {{ko new}} and {{ja new}}, when you have time and if you have interest. In Korean, the hangulisation template should be orphaned and deleted, IMO. We should use the standard {{etyl}} for loanwords (doesn't apply to Sino-Korean, Sino-Japanese, etc.). With Japanese, the template should produce simpler output, hiragana being the first parameter.

E.g. 招き猫Instead of:

# beckoning cat; figure of a cat with one paw raised

Should be just:

{{ja-noun|まねき ねこ}}
# beckoning cat; figure of a cat with one paw raised

Note that hiragana, katakana may have spaces "まねき ねこ", which are not displayed but produce a more user-friendly romaji. Not urgent but it would be great to have. I will take a "no" for an answer if you rather not change. Appreciate your efforts! --Anatoli (обсудить/вклад) 01:26, 6 February 2014 (UTC)

There have been many changes to the standard format of a Japanese entry, thanks to the simplification efforts by User:Haplology. I have changed Template:ja new to adapt (it seems) to those changes. As for Korean, you can use |ee=league in 리그. To change it to other languages, you can use |el=fr ... |el= is 'en' by default. Wyang (talk) 02:03, 6 February 2014 (UTC)
Yes, you guys are doing a great job. I have created 予習 using the modified template. Thank you again. --Anatoli (обсудить/вклад) 02:20, 6 February 2014 (UTC)
Thanks. The suru function also works:
{{subst:ja new|よしゅう|s|[[preparation]] for a lesson|to [[prepare]] to lessons}}
. Wyang (talk) 02:22, 6 February 2014 (UTC)
Great feature! --Anatoli (обсудить/вклад) 02:32, 6 February 2014 (UTC)

Using {{zh-hanzi-box}} in {{cmn-new}}[edit]

I created this template some time ago to encompass both {{zh-hanzi}} and {{Hani-forms}}. I am just wondering if you are willing to include it in your {{cmn-new}} template? JamesjiaoTC 00:51, 11 February 2014 (UTC)

Thanks, done. Someone bot-possessing should have obsoleted those long ago... Wyang (talk) 00:58, 11 February 2014 (UTC)

Re : Errors in Chinese character etymologies[edit]

Thanks for your comments on my edits.

I tried to correct them following your comments. I kept some of the wrong etymology and labelized them as mnemonic. Is it a good practice for the chinese character etymology section?

Feel free to comment or modify my edits as a am a beginner in wiktionary and english is not my native language.

Is it the good way to awnser your message?

Meihouwang (talk) 15:34, 20 February 2014 (UTC)

Pinyin with apostrophes[edit]

Hi Wyang, what do we do about pinyin that has apostrophes e.g. 反而, 西安? It seems to generate a question mark when I put the apostrophe in the pronunciation template. ---> Tooironic (talk) 22:45, 25 February 2014 (UTC)

Hi, the template can handle pinyin with apostrophes correctly. The small superscript in the IPA represents the semi-glottal stop, found in the onset of certain null-initial syllables. Wyang (talk) 03:05, 26 February 2014 (UTC)
Ah I see, thank you. ---> Tooironic (talk) 03:42, 26 February 2014 (UTC)


Is there a template to clearly show the tone sandhi change here (i.e. yībān as yìbān)? ---> Tooironic (talk) 06:07, 2 March 2014 (UTC)

Hi. There is only one pronunciation (not variant pronunciations), and Pinyin only writes the non-sandhi form (yi1ban1) [6][7]. Please see the page now. Cheers, Wyang (talk) 11:05, 2 March 2014 (UTC)
Looks great now, thanks. Is there a way to mention tone sandhi in the pronunciation box? I think that would be helpful to users who may not understand why the change occurs. ---> Tooironic (talk) 08:48, 4 March 2014 (UTC)
  • On a side note, I'm having second thoughts about this not being a variant pronunciation. The 國語辭典 (a trusted Taiwan dictionary) lists 一般 as yībān, even in the pronunciation sample, along with example sentences, listen here: [8] Do you think this is a Taiwan variant perhaps? I can't recall ever hearing a mainlander pronouncing it as yībān. ---> Tooironic (talk) 08:52, 4 March 2014 (UTC)
    • To me the two pronunciations sound like the pronouncer's attempt to pronounce the two syllables as if they are in isolation, probably as a consequence of the Pinyin orthography being the non-sandhi version (yi1ban1) (i.e. spelling pronunciation). Instead, the two 一般's in example sentences show tone sandhi. I don't think it is a Taiwan variant, at most a rare one. There are many online resources describing the tone sandhi patterns of 一 and 不 in Taiwanese Mandarin - [9][10][11]. Wyang (talk) 11:28, 4 March 2014 (UTC)
      @Tooironic: Ah, another thing. For the pronunciation template, it is not necessary to specify the audio file if the filename is 'zh-PINYIN.ogg'. Using '|a=y' suffices. Wyang (talk) 11:30, 4 March 2014 (UTC)


您好,关于黑人僧的发音,我查了 粵語審音配詞字庫,里面是t͡ʃɐŋ˥,因此我暂时先改回去了。如果修改错误,还请您指出。另外中文维基词典最近出现了一些奇怪的条目,可以麻烦您处理一下么。谢谢。--Hahahaha哈 (talk) 06:05, 5 March 2014 (UTC)

Hahahaha哈您好。粤拼的'z'是/ts/。除作为/ts/颚化的变体之一以外(与/tɕ/),/tʃ/不存于广州粤语。有关中文版,谢谢提醒,有时间我会去处理的。Wyang (talk) 06:10, 5 March 2014 (UTC)


That last character, shouldn't that be ? Or is that the joke?  :) Also in the hanzi box (but not the lemma) at 乞人憎. ‑‑ Eiríkr Útlendi │ Tala við mig 01:01, 8 March 2014 (UTC)

That is part of where the pun is. There was a typo in 乞人憎 which I have fixed. This is a xiehouyu in Chinese, the first part being 非洲和尚 (a monk from Africa), and the last part being 黑人僧 (a black monk) (another synonymous way of putting it) - 乞人憎 (makes people hate) (the near-homophone). Someone might say "something is really 非洲和尚", to mean "something is really annoying". Cheers, Wyang (talk) 03:40, 8 March 2014 (UTC)

and [edit]

I already ever edit hanzi entries. Is the Hanzi heading supposed to be retained? ---> Tooironic (talk) 02:20, 11 March 2014 (UTC)

I'd leave them for now for each single character entry. There's no decision on changing this yet. --Anatoli (обсудить/вклад) 03:57, 11 March 2014 (UTC)
I don't know as I rarely edit hanzi entries too. I just dislike the current format. It is too distant from the ideal logical format I have in mind. Wyang (talk) 07:17, 11 March 2014 (UTC)

apostrophe in Zhuyin for gè'àn - ㄍㄜˋ 'ㄢˋ[edit]


Could you fix this please :). --Anatoli (обсудить/вклад) 07:02, 14 March 2014 (UTC)

Also, do you mind adding Zhuyin to {{Pinyin-IPA}}, next to Pinyin? I hope it's not too hard for you. The table is a little too tall, maybe it could simplified with the bullets a bit, considering that we will include dialects as well. --Anatoli (обсудить/вклад) 07:05, 14 March 2014 (UTC)
Hi. The apostrophe issue is fixed. I have added Zhuyin to Pinyin-IPA and made it a little more compact (The extra line at the top and bottom of the table is something I plan to remove when my bot gets granted bot rights). Wyang (talk) 13:20, 14 March 2014 (UTC)
Thanks a lot! --Anatoli (обсудить/вклад) 13:42, 14 March 2014 (UTC)

Template:cmn-erhua form of[edit]

This template has triggered a script error for a while now. Could you please fix that? —CodeCat 15:27, 16 March 2014 (UTC)

You reported the wrong culprit (Template:zh-compound/code). Now done. Wyang (talk) 22:40, 16 March 2014 (UTC)

Putting a homophone field in the pronunciation header template[edit]

Hi Wyang, is it possible to put a homophone field in the pronunciation header template? I think it would be useful. Here are two sets of entries that could benefit from it: 營利/营利 VS 盈利 and 迷路 VS 麋鹿. ---> Tooironic (talk) 00:07, 18 March 2014 (UTC)

See Russian homophones привести́ (privestí) and привезти́ (priveztí ). They can simply be added manually with:
  • * Homophones: 營利, 营利 (yínglì) at the bottom of "====Pronunciation====" section. --~~
    Yes I'm aware of that. But I was hoping there was a way to integrate it into the new pronunciation template. It looks strange and ugly to have it listed as a bullet-point under the lovely box. Here's another example: 便利 VS 遍歷/遍历. ---> Tooironic (talk) 00:24, 18 March 2014 (UTC)
Hi, you can create the page Template:Pinyin-IPA/hom/PINYIN to include homophones. Anything with that Pinyin will show the homophones field. For an example please see any of the above, or Template:Pinyin-IPA/hom/yìzhì - 意志. Cheers, Wyang (talk) 00:40, 18 March 2014 (UTC)
I think homophones should be manually parameterised then, so that not the template but the entries are maintained - 遍歷/遍历 - {{Pinyin-IPA|biànlì|遍歷/遍历|a=y}} maybe? But If we keep the simple bulleted style then homophones could fit nicely into the format. Of course, pinyin entries should be kept up-to-date with homophones (no problem listing currently missing entries). Actually, homophones in Chinese is an issue we should discuss separately. There could too many and fitting them into the template will become problematic. --Anatoli (обсудить/вклад) 00:58, 18 March 2014 (UTC)
Maybe you could use categories instead? DTLHS (talk) 01:14, 18 March 2014 (UTC)
(E.C.)Possibly. It seems the structure of templates with homophones is unsustainable. Editors should be able to add/remove them manually into entries or ignore them altogether and let categories, Pinyin entries list them. If we go away from one complexity with Chinese entries, such as "rs" value, we shouldn't create new one. Topolectal pronunciation/transliteration should be optional, of course. --Anatoli (обсудить/вклад) 01:24, 18 March 2014 (UTC)
Re:Anatoli: How about now (collapsed)...? Pinyin entries can be made to have zero information, only links to these templates (in another format). Re:DTLHS: Too many of them... Probably looking at >10000 of these. Templates are probably easier to manage. Wyang (talk) 01:18, 18 March 2014 (UTC)
How is creating 10000 subtemplates easier than creating 10000 categories? DTLHS (talk) 01:29, 18 March 2014 (UTC)
It might be harder to do manipulations of the data... For example, if one is interested in finding out all near-homophones (minimal pairs wrt tones) of shi4shi4 (i.e. shiNshiN), templates would seem easier in producing the list, no? Wyang (talk) 01:42, 18 March 2014 (UTC)
I don't think so- either way you're passing the pinyin through a module that can generate and parse it any way you like. The only disadvantage is you can't add terms we don't have an entry for yet. DTLHS (talk) 01:45, 18 March 2014 (UTC)
Yep. Wyang (talk) 02:29, 18 March 2014 (UTC)
Not sure about your first question. Could you give me a link. Who will maintain templates? Pinyin entries are much simpler and they have been used to find homophones or choose the right Hanzi entry, anyway. If a template could read Pinyin entries, then it's probably better but seems too complex to me, anyway. In short, status quo is better for homophones, IMHO. --Anatoli (обсудить/вклад) 01:24, 18 March 2014 (UTC)
@Wyang: Perhaps it's time you change your position on Pinyin entries? :) They work exactly as Pinyin indices in published dictionaries. I saw your expanded example. It looks OK but would be hard to maintain in the long run. --Anatoli (обсудить/вклад) 01:37, 18 March 2014 (UTC)
I meant the template format if you have a look at 犀利. The Pinyin entries in their currently state cannot be called by the pronunciation template to generate a list of homophones. The Pinyin information should be kept centralised somewhere, such that both Pinyin entries and character entries can call these templates without having the need to do synchronisations of contents (especially homophones which can be quite a headache to keep identical). The conversion would not be hard, and it would make Pinyin entries even more unjustified. Wyang (talk) 01:42, 18 March 2014 (UTC)

"Yi" in Zhuyin[edit]

In our entry for 意義, the Zhuyin for "yi" is written as ㄧ, whereas in the 國語辭典 entry it is written as 一. Which is considered correct? ---> Tooironic (talk) 04:29, 24 March 2014 (UTC)

Isn't it ㄧ not 一 in the 國語辭典 entry too? Wyang (talk) 04:37, 24 March 2014 (UTC)
Yes, it's ㄧˋ ㄧˋ as well in 國語辭典 entry. If you copy-paste, you'll see ㄧˋ ㄧˋ. Interesting that it appears horizontally. Does the symbol appear horizontally in horizontal writing and vertical in vertical, similar to the Japanese elongation symbol , which appears as a vertical stroke in vertical writing? I've only seen Zhuyin symbol as a vertical sign before. --Anatoli (обсудить/вклад) 05:19, 24 March 2014 (UTC)
Theoretically it should be ㄧ in horizontal writing and 丨 in vertical writing. In reality there are often exceptions to this rule. Wyang (talk) 05:22, 24 March 2014 (UTC)
Interesting that in w:Bopomofo 瓶子 appears in vertical as:
  1. (appears horizontally, in vertical writing ?!, can't render here)
  2. ㄥˊ
  3. ˙
and in horizontal as ㄆ丨ㄥˊ ㄗ˙ (the opposite of what you suggested). Note ㄧ and the position of the neutral marker as well. Is there any rule in these examples? --Anatoli (обсудить/вклад) 05:29, 24 March 2014 (UTC)
Ah, yeah. I tried to search for the official rules when I edited Module:PinyinBopo-convert, but did not appear to have found anything very useful. There are also multiple versions of Zhuyin. The Wikipedia example might have been what the official rule (if there is one) considers as correct. I don't know about the rule for tonelessness. As the example above shows, in reality there are often exceptions to how ㄧ/丨 are supposed to be used, if Wikipedia is correct (chances are). Wyang (talk) 05:40, 24 March 2014 (UTC)
Thanks. Wikipedia doesn't describe it either. It seems like with the neutral tone marker, there is no real consistency. --Anatoli (обсудить/вклад) 06:01, 24 March 2014 (UTC)



If 错觉 has two readings - cuòjué and cuòjiào (?), how would you make an entry using your template? --Anatoli (обсудить/вклад) 08:09, 24 March 2014 (UTC)

I only know the former pronunciation. What does cuòjiào mean? Wyang (talk) 09:42, 24 March 2014 (UTC)
I got it from less reputable dictionaries and I thought it was a valid variant. Anyway, I don't recall how you handle words with multiple readings, such as 瘦削. What's the right way? --Anatoli (обсудить/вклад) 10:21, 24 March 2014 (UTC)
My plan after User:Wyangbot gets granted a bot flag by a bureaucrat is to finish the format change on pages using Pinyin-IPA (i.e. [12]). Currently there are >3000 pages using the old format of Template:Pinyin-IPA, which requires each syllable to be fed into the template separately. Once that is done, I will modify Template:Pinyin-IPA, to make it accept alternative readings as second, third, ... parameters and enable one to write comments for each pronunciation (like Taiwan/Mainland, standard/colloquial, if one needs to), and use that to end the template awkwardness in Category:cmn:Variant pronunciations pages. Template:cmn-new will also be modified, so that one can use the parameter |py2=... to add a second pronunciation, although it would be better if one modifies the page afterwards, since the readings are often used in different contexts. By the way, 觉 is one of the few characters in Standard Chinese which show different literary and colloquial readings, with jiao4/jiao2 being the colloquial reading (limited in 睡觉), and jue2 being the literary reading (all other situations). People speaking other dialects may use the colloquial reading in compounds which Standard Chinese normally uses the literary one, eg. 觉得jiao2de, 自觉zi4jiao2, and this would be typically considered heavily accented or colloquial. I haven't heard 错觉 been pronounced cuo4jiao4 or cuo4jiao2 though. Wyang (talk) 22:48, 24 March 2014 (UTC)
Multiple pronunciations often have different statuses. E.g. Russian до́гово́р (dógovór) when stressed on the first syllable is considered less educated, so is свекла́ (sveklá), which is also an alternative spelling of a more standard свёкла (svjókla). Manual feeding of templates is fine with me but I'd like to see an example. At Wiktionary, it's OK to list all acceptable but verifiable forms, even if they are colloquial. As for different contexts, words could be split into etymologies, like 得了. No rush. I can see you're busy. Please consider that we will need templates for words, which ARE NEVER USED in Mandarin as well - Cantonese, Min Nan (including non-Han scripts - Latin, Cyrillic, Arabic), where Pinyin/Zhuyin may not be appropriate. See Talk:老番. Perhaps Cantonese 佢哋 could be a good example, how this type of entries are going to look (after a change).
Keep Wiktionary:Votes/pl-2014-04/Unified Chinese in mind as well.
BTW, 指指点点 doesn't show tone sandhi. --Anatoli (обсудить/вклад) 23:14, 24 March 2014 (UTC)
It shows tone sandhi in IPA, but in Pinyin. I only did tone sandhi for Pinyin for words containing 一 and 不, not other cases, since the effects cannot be represented well by Pinyin tone marks. In the case of 指指点点, all syllables undergo tone sandhi, the first three undergo third-to-second tone sandhi which you can represent using the acute accent, but the last syllable undergoes third-to-half-third (half third: only the first half of third tone, only dipping, no rising) which you cannot represent using Pinyin tone markers. There are also half fourth-tone and tone sandhi of neutral syllables, for which there are no Pinyin diacritics available too.
For Cantonese-only entries like 老番 and 佢哋, would a pronunciation template like the one in Talk:老番 (either collapsed or uncollapsed) speak your mind? Wyang (talk) 23:28, 24 March 2014 (UTC)
Yes, I think so. You could release the pronunciation template without having to wait for the vote. Since you're not breaking anything with it. I guess "==Mandarin==" and "lǎofān" looks confusing on 老番, even if Mandarin usage might be attestable as well.--Anatoli (обсудить/вклад) 00:23, 25 March 2014 (UTC)
OK, I have done so at 老番. Wyang (talk) 01:02, 25 March 2014 (UTC)

Homophones bug[edit]

Hi Wyang. Why do homophones for 董事 show up in its pronunciation header, but not for all the shìshí entries, like 事實, 適時, 侍食, 試食? ---> Tooironic (talk) 12:18, 27 March 2014 (UTC)

@Tooironic: I haven't created those yet... I've only done A-G so far - [13]. I will create the rest when I have time. You can create these lists yourself: create the Template:Pinyin-IPA/hom/PINYIN page (like Template:Pinyin-IPA/hom/shìshí) and save with the following text:

Leave SIMP1 empty if TRAD1 == SIMP1. eg. Template:Pinyin-IPA/hom/shìshí has the following text:


Cheers,Wyang (talk) 22:49, 27 March 2014 (UTC)


Was this intentional? The edit summary was "formatting", but the edit removed the entire language section, citations and all. - -sche (discuss) 04:16, 28 March 2014 (UTC)

It's not a Chinese word. Wyang (talk) 04:25, 28 March 2014 (UTC)
We have to allow a minimum of non-Hanzi Mandarin. "OK" and other all-caps Roman words are included in Chinese dictionaries and OK was used to create 卡拉OK (a Chinese invention), so we should keep OK#Mandarin. --Anatoli (обсудить/вклад) 04:45, 28 March 2014 (UTC)
Sorry but I have to revert the bot's edit. --Anatoli (обсудить/вклад) 04:50, 28 March 2014 (UTC)
Oh, well. Although it's something I don't agree with. Also, that "OK" has nothing to do with the OK in 卡拉OK. Wyang (talk) 04:52, 28 March 2014 (UTC)
Possibly but "OK" is spoken quite a lot by Chinese (it's questionably considered the most common word in the world!), even if it can be argued as "code-switching", nobody found reasonably acceptable Hanzi to render the sounds, so that it was accepted by the majority, besides "OK" is so easy to type, compared to anything else. OK in 卡拉OK is usually pronounced identically, the Chinese way, that's all (with some variations in both). Of course, they have nothing in common otherwise. IMHO, rendering foreign "/k/" sounds seems problematic in standard Chinese with some eexceptions, since many words have "j" via Cantonese or otherwise, even "卡" is not common for foreign /ka/. --Anatoli (обсудить/вклад) 05:03, 28 March 2014 (UTC)


Hi there. Do you know why there is a blank line under the pronunciation header? ---> Tooironic (talk) 06:32, 29 March 2014 (UTC)

Oh, it seems it's in all the Mandarin entries. ---> Tooironic (talk) 06:35, 29 March 2014 (UTC)
Yeah, I'm not sure. It's probably related to my edits to the pronunciation template. I'll see what I can do. By the way, the template now can handle variant pronunciations (eg. 骨頭, 普遍) and can generate Mainland-Taiwan differences automatically (eg. 星期, 乳酪). Cheers, Wyang (talk) 06:42, 29 March 2014 (UTC)
Hi. The HSK categories are without cmn. prefix. --Anatoli (обсудить/вклад) 07:58, 29 March 2014 (UTC)
Looking good. Thanks for your hard work. I hope you can fix that blank line issue though. ---> Tooironic (talk) 11:31, 29 March 2014 (UTC)

Template:Pinyin-IPA, Template:Pinyin-IPA/essence, Template:Pinyin-IPA/code[edit]

The code in these templates is pretty much unreadable right now, it's just a big giant blob of code. Could you clean it up please? —CodeCat 18:14, 29 March 2014 (UTC)

Not sure if this is what you meant: [14] - is it clearer now? Wyang (talk) 21:43, 29 March 2014 (UTC)
Yes, although I would have done it slightly differently myself. —CodeCat 22:01, 29 March 2014 (UTC)
I've reorganised a lot of the code in these templates, to make it easier to maintain. The /essence template really contained the same code 5 times with some small variations, so I split that code out into a separate template, Template:Pinyin-IPA/table. The /essence template is no longer needed now. —CodeCat 00:07, 30 March 2014 (UTC)
Yes, the variant pronunciation feature added two days ago involved a lot of duplications. I wanted to put it entirely into Module:Pinyin-IPA when I added it, but I just opted for the easier option out of laziness. Thanks for doing that. By the way, there was a minor error in your code of the main template, which is now fixed. Wyang (talk) 22:36, 30 March 2014 (UTC)

More specifically[edit]

Using User:Wyang/歷史 as an exemplar, this is the xml which I would be processing - the most current revision of the article:

      <comment>/* Chinese */ add {{temp|zh-hanzi-box|[[历史]]|[[歷]][[史]]}}, rm Wikipedia, etymology, irrelevant to the proposal</comment>
      <text xml:space="preserve" bytes="438">==Chinese==

|c=lik6 si2
|w=5liq sr


# {{cx|obsolete}} [[record]]s of past events; [[historical]] records
# [[history]], [[past]]
# past [[experience]]s of a person, the history of a person
# [[historiography]], the [[study]] of history, usually {{l|cmn|歷史學|tr=lìshǐxué}}</text>

For my current project this would result in the title (User:Wyang/歷史) being added to the list of words for "Chinese". When creating captcha images, having mixed scripts can result in text in one script appearing much smaller than the other, usually illegibly. This can be worked around, but it's an additional investment of time and effort. Every wiki whose language would be collapsed to "Chinese" would end up with, possibly, all the words in that classification being used.

My particular project is very WMF-focused, and you can easily say that it would not matter on other sinitic WMF projects. But Wiktionary's data is not intended solely for use inside the WMF. A researcher may wish to use a wiktionary dump to create pools of zh-classical 'words', or a teacher might wish to create a booklet of Min-Nan zoological terms, or a developer might pull solely Cantonese translations and want solely Cantonese senses to go with them. Doing so under the proposed model would not be possible using the dump data, because the relevant information is carried solely within the parsed templates. Working with the dump takes about 12 minutes to build my 1612 word lists; building the same from the API apparently missed about 4 million entries, and took 36 hours.

In the city I live in, Richmond in BC, Canada, the majority of people speak one or another Sinitc language, but most students must speak English in school. Even very young students use Wiktionary to clarify both their English and their Chinese language use. While I do not have specific evidence, I would expect a speaker of a Chinese language would look on the page for (example) Cantonese first, and Chinese second (or not at all.) In my opinion, Wiktionary should strive to help that student find what they are looking for on their first attempt. - Amgine/ t·e 06:07, 31 March 2014 (UTC)

Hi, Amgine. Thanks for the clarification. I see what you mean in your comment now. Let me get back to you in two or three hours. Wyang (talk) 06:20, 31 March 2014 (UTC)

@Amgine: Hi, sorry about the delay.

  • I agree that data maintenance of multi-scripted (digraphic) languages is typically particularly troublesome, and people on Wiktionary working with those languages (eg. Serbo-Croatian, Chinese, Japanese) can certain relate to that. However, digraphia in Serbo-Croatian, Chinese and Japanese is unrelated to the amalgamation or separation of its varieties. If the grouping of Serbo-Croatian is not in place, the issue of multiple scripts would still pose a problem for your captcha work, as both the Cyrillic and Latin alphabets are used to write the Serbian language, with Serbian being the only European language which has synchronic digraphia. Similarly, both Simplified Chinese and Traditional Chinese are used to write every Chinese variety. Pulling out all entries of a Chinese topolect, before and after the amalgamation of varieties, would both inevitably run into the problem of having to deal with both sets of Chinese characters. The difference in font size between simplified and traditional is probably not significant, if any, fortunately.
  • As you probably know, the Chinese varieties share a common written form - in the past it was Classical Chinese, and now it is Written vernacular Chinese. Consequently there is not much point in generating a topolect-specific captcha, say a Wu-language captcha. It would be more realistic to generate a captcha based on Written vernacular Chinese, or just Chinese characters in general. I'm not sure about the details of your captcha project. Do you use the presence of '== ==' language headers to pull out all entries in a particular language? If so, then the merger would be great for your project, since it only applies to Chinese character-scripted entries here. You could pull out the title of every page which contains the header '==Chinese==' in its content, as they are guaranteed to be the same script.
  • If you have attempted generating captcha for non-Mandarin Chinese topolects using Wiktionary data, you may have noticed that at present, such data is remarkably meagre on Wiktionary. For example, Category:Wu nouns only has 10 pages, Category:Gan nouns has 1 page, and Category:Xiang nouns has only 1 page as well. Generating page title data for Category:Min Nan nouns would have resulted in a terrible mix of three scripts, whereas the unified Chinese approach would eliminate this script multiplicity, as said above. I am curious, though, regarding how you would handle Japanese data on Wiktionary? It is written in three (Kanji, Hiragana, Katakana) different scripts here... well actually, four (plus Romaji) if you pull out everything. It must be a headache to try to analyse this.
  • I'm not sure I agree with your point on language self-identification. Most of the Chinese people I know identify their speeches to be 'Chinese' when asked. It is when people enquire further which division of Chinese it is that they give the 'Cantonese' or 'Mandarin' answer.

Cheers, Wyang (talk) 09:42, 31 March 2014 (UTC)

For my specific project it is, in fact, important to identify which script is used as the identities of the Wikipedia communities is in part based on their use. It may be offensive to, for example, a Bosnian wikipedian if xe is given a cyrillic captcha, while it would not matter for sh.wikipedia and would possibly be offensive not to do so on sr.wikipedia. My personal opinion for Chinese languages would be to use vernacular Chinese, as you suggest, but how would such be identifiable under your proposal? I do use the L2 header ('== ==') to identify the language, but this is exactly why your proposal is a problem, as I will explain below.
As I understand your proposal, script-wise the article titles would probably not face great difficulties. The breadth of characters available in any given family member language may, however, be more limited than the total number of entries in "Chinese". Being able to easily identify a relevant vocabulary - in some ways to limit the expressions to those in common use by the target reader population - is often an important reuse of Wiktionary data. Would it be possible to include in your model an unambiguous method of identifying language codes which would commonly be expected to use the entry?
Yes, I have generated word lists for all L2 on en.WT; for Wu I found exactly 45 entries, 19 in Ga, and two for Xiang. Although Min-Nan does include a mix of scripts, this appears to be normal across the spectrum for written Min-Nan although I found references to efforts to standardize Hokkien in any of several writing systems. With your proposal, each of the Chinese languages would suddenly seem to have a very large number of entries if one assumed that, for example, Min-Nan = Min-Nan + Chinese. But it doesn't. Like-wise I, as a person not familiar with these languages but working with the en.Wiktionary data, would not know if Bopomofo entries are Chinese or not, or if terms written in Taiwanese Kana or latin are included. If they are not, would Min-Nan suddenly consist solely of entries in these other writing systems?
For generating word lists for Min-Nan your proposal would have no effect on reducing the multiplicity of scripts actually used in written Min-Nan. It would, however, possibly erroneously limit (and/or expand) the list of terms found for Min-Nan in en.Wiktionary data. Consolidating terms under a single L2 header "Chinese", while excluding Chinese family language headers, will likely result in confusing data for later use. Having an unambiguous language code list identifying which languages in which a term is in common use would reduce this confusion, but not entirely alleviate it. Put another way, it is likely to cause future errors, requiring a greater investment of effort in order to use Wiktionary data.
I think what I am trying to say is that although English may occasionally use words from many related languages, especially German, French, and Latin, these words are not commonly considered 'part' of the English language except here on en.Wiktionary. These words and phrases make up a much larger vocabulary for English than is actually recognized or understood by large percentage of the population, even though the terms may follow the linguistic rules and rôles of English. I approach the concept of Chinese written language and Japanese Kanji in much this way - intelligibly part of the larger classification, but not always part of the vernacular - which may be completely ignorant.
For this project I am using all entries in any writing system, so all four writing systems of Japanese are valid. The generation of captcha images, however, is failing mostly due to the Kanji, which have a high percentage of illegibility due to the complexity of the characters versus the distortion effect used. This is also a problem with Chinese scripts - the highly refined characters become illegible when even slightly distorted. For other analyses I have done with Wiktionary data having multiple writing systems is a distinct benefit, allowing use of a larger corpus of source documents. The limitations become creativity and time, rather than what can be analysed. - Amgine/ t·e 15:22, 31 March 2014 (UTC)

Hi, Amgine. Thanks for the reply. I would like to mention a few things:

  • Content under the heading '==Chinese==' will not be absentmindedly assigned to every ISO-coded Chinese variety under the proposal. The unambiguous language code list for a term is the pronunciation template {{zh-pron}}, and pronunciation in each variety will be fed into that template. In the xml code above, the varieties for which pronunciations have been given include: Mandarin, Cantonese, Min Nan and Wu, hence the entry will be categorised into those respective categories, sorted by the appropriate romanisation. One could parse through the template code to extract the topolect-specific page titles, eg. regex \{\{zh\-pron\n\|([^}]+\n)*\|c=(.+)(\n[^}]+)*\n\}\} or something similar, to extract all the Cantonese pages. Even easier perhaps, one could recursively extract the titles of all pages from the category Category:Cantonese parts of speech and its subcategories, which would be much more convenient.
  • I concur with your point of using the appropriate script so as to avoid offending specific subpopulations of a larger speaker population. However, the circumstance may be different for languages with speech-writing separation. The imposition of a modern literary standard is in place in all countries which designate Chinese as one of the official languages, and texts written in Written vernacular Chinese would be understandable to any educated person. The scope of other orthographies would be very limited. For example, the Pe̍h-ōe-jī romanisation of Min Nan is generally only understood by some seniors. Young people in Taiwan, who are fully conversant in Min Nan, are mostly illiterate in Pe̍h-ōe-jī. Even if they are able to read it, it is unlikely that they would have the appropriate input method for it.
  • Generating Chinese character captchas seems quite uncommon, and I can imagine it will be much more difficult than the Latin alphabet. A Google image search suggests most have basically unobscurified characters, or just characters in different fonts. Most Chinese fora are simply not bothered and just opt to use Latin-alphabet captchas. For the captcha project, I think the best approach would be to generate captcha based on Written vernacular Chinese. It could be produced by pulling out titles of all entries containing '==Chinese=='. Alternatively, you might want to pull out titles which are used in both Simplified Chinese and Traditional Chinese, as the reader population for any Chinese variety will be a mix of the two, and it may not be possible for them to have the input method for both sets of characters. This could be done by parsing through the simp-trad form template ({{zh-hanzi-box}}, as alternatives have been made obsolete by User:Wyangbot), and generating all mainspace pages transcluding the template but lacking a second parameter - eg. \{\{zh\-hanzi\-box\|([^\|]+)\}\}. Or, you could use a frequency list for Chinese characters (eg. [15]) and do combinations and modifications on characters which lack a simp-trad distinction. The text probably does not have to be meaning-conveying for Chinese character captchas.

Cheers, Wyang (talk) 00:08, 1 April 2014 (UTC)

Thank you for this, Wyang. I've linked some of this information on the captcha bugs.
Having a second regex inside the L2 loop to check for the presence of the zh-pron template doubles the amount of work the parsing script is required to perform. It also more than doubles the amount content processing, and a basic test of your example page 1000000 shows a time to process increase of just over 18x on average. (I could do a formal benchmark if you would like.) The template {{zh-pron}} is not documented. The codes used are not unambiguous, and do not follow a reference standard. What this means is it cannot be trusted to reliably identify which languages the entry can be used for, nor can any metadata processor future-proof their code.
Using the API to recursively iterate over Category:Cantonese parts of speech would take, I estimate, two or three days. Multiply this for each language which is at least equal in size. This time expense is prohibitive; it is not an option. Additionally, previous parsing for european languages found about 98% of terms were properly categorized; the remaining 1-2% were uncategorized or miscategorized.
Yesterday a new dump of wiktionary was produced, and I'm working on automating a process to update the word lists generated from it. However, I will be recommending that we not use en.Wiktionary data in the future, and instead derive word lists from the wikipedia dumps.
- Amgine/ t·e 15:55, 1 April 2014 (UTC)

Post-recommendation discussions[edit]

Hi, User:Amgine. I have added some more detailed descriptions of the template {{zh-pron}}. Please forgive me if it is insufficiently detailed and unambiguous; the template itself rests on the assumption that a unified Chinese approach is agreed upon. I am wondering what the outputs for your runs are like? Are they page title lists for entries satisfying a particular criterion, without any page content or history information? If so, there are probably better ways to achieve this. Wyang (talk) 04:27, 2 April 2014 (UTC)

I have, previously, processed en.Wiktionary content for many different purposes ranging from a mediawiki gadget to [DICT ouput] (as a proof of concept) to various structured dump processing scripts for linguistic research and cross-referencing all wiktionaries to a private corpus. My current project's output is a simple lists of terms, a couple of quick hacks which produce output like this from the 2014/03/28 dump of en.WT.
In short, I manipulate Wiktionary content in many different ways for diverse clients. Unlike many project members, I am aware of how exceptionally relevant en.WT data can be in real-world applications, both inside and out of acadæmia. And how useless.
To answer your question directly regarding the current request, I am creating lists of terms or phrases which are considered vulgar or obscene, and lists of terms which are *not* considered vulgar or obscene which meet the further requirements of being not confusable, single scripted, and/or non-spoofed (invisible || single- or multi-script equivalencies to non-linguistic terms.) This is related to Mediawiki bugs #32695 #5309 (primary), #63216, #63217, #62960 (prototypes via GSOC 2014) and of course mw:CAPTCHA. - Amgine/ t·e 05:35, 2 April 2014 (UTC)
@Amgine: Thanks, I see. For the current project, it seems Mandarin.txt is a mix of Simplified Chinese, Traditional Chinese, Latin letters (some with diacritics), numbers, and special symbols. Just a thought: In multi-script cases, you could perhaps use AWB for simple tasks like generating word lists. This is my take on Mandarin.txt (via recursively extracting pages under Category:Mandarin parts of speech three times) and Mandarin_NoSimpTradDistinction.txt (using the second latest en.WT dump, finding mainspace transclusions of {{zh-hanzi-box}} lacking a second parameter). In both cases non-Chinese characters symbols have been filtered off. These are effectively Written vernacular Chinese wordlists, and the latter is probably good for producing captchas targeted at speakers of Chinese varieties. Wyang (talk) 06:24, 2 April 2014 (UTC)
Not sure that AWB can run as an unattended event on a *nix server, but I'll ask Reedy about how automatable the process could be. - Amgine/ t·e 16:23, 2 April 2014 (UTC)


Why remove counter? --kc_kennylau (talk) 09:19, 2 April 2014 (UTC)

'Counter' is superseded by 'Classifier' per the discussions at Template talk:cmn-new#New PoS and Wiktionary:Beer_parlour/2014/April#Measure_word. Wyang (talk) 10:17, 2 April 2014 (UTC)

Moving classifiers back to counters[edit]

Thank you so much for this and sorry for the confusion and making you work! You seem to be able to do the formatting work with your bot as well for the Japanese. --Anatoli (обсудить/вклад) 22:43, 3 April 2014 (UTC)

No worries. I have posted at Wiktionary:Beer parlour/2014/April#Measure word. Yes, I put 'simplification of the headword templates' on my tasks-to-do list for the bot (at the end of the list). :) Wyang (talk) 23:16, 3 April 2014 (UTC)


I think something went wrong here. —CodeCat 13:40, 8 April 2014 (UTC)

Fixed. --kc_kennylau (talk) 14:41, 8 April 2014 (UTC)
Undone. --kc_kennylau (talk) 14:42, 8 April 2014 (UTC)
OK. There was only |sort2= but not |sort= in the previous version, which is why it got confused. Thanks people for fixing it. Wyang (talk) 04:12, 9 April 2014 (UTC)

Japanese counters -> classifiers[edit]

Great job, thank you! --Anatoli (обсудить/вклад) 06:43, 3 April 2014 (UTC)

Erroneous deletion of "References" headers[edit]

@Wyang: I just ran across this a second time, and realized that Wyangbot is the one doing it: diff for one example. I can't remember the earlier example of where else I've seen this, but just now checking Wyangbot's contribs, I also found diff and diff. Could you look into this? ‑‑ Eiríkr Útlendi │ Tala við mig 19:55, 8 April 2014 (UTC)

"rs" is not references but "radical sort". Mandarin entries should now be sorted by numbered pinyin instead- "pint" by an earlier agreement with al Chinese editors. Suffixes "in simplified script"/ "in traditional script" are removed in topical categories. The bot should actually replace rs with pint, IMO. Maybe Wyang wants to do it in stages? --Anatoli (обсудить/вклад) 20:11, 8 April 2014 (UTC)
  • @Atitarev: Anatoli, have another look -- Wyangbot is deleting the ===References=== header from some, but not all, Japanese entries that are above a Mandarin section that was edited by the bot. I'm not sure why Wyangbot is only doing this some of the time. ‑‑ Eiríkr Útlendi │ Tala við mig 20:14, 8 April 2014 (UTC)
I see, thanks. Hopefully, @Wyang: can explain and help fix it. --Anatoli (обсудить/вклад) 22:48, 8 April 2014 (UTC)
There was an error in the code looking for empty reference sections, and I have fixed this. Sorry and thanks. @Eirikr: Could you please have a look at bot edits of the 'references' section of other articles in your watchlist? I will search for other affected articles too once the new en.wikt dump is available in about ten days time. Thanks. Wyang (talk) 04:27, 9 April 2014 (UTC)
  • Cheers, yes, I'm slowly working down the contribs list. Any idea when this bug was introduced, so I have an idea when to stop?  :) ‑‑ Eiríkr Útlendi │ Tala við mig 06:56, 9 April 2014 (UTC)
  • The automated changes were started approximately 24 hours ago. I am automatically readding the references header to lines of <references/> not preceded by ===(=)References===(=). I have checked all pages linking to {{ja-kref}}, which is used in the reference section of ~65 pages. All reference headers followed by bulletpointed references are immune from the attack. It seems those are the major types of Japanese references... Wyang (talk) 07:04, 9 April 2014 (UTC)

Categories and sorting[edit]

@Wyang: I see you have already done some work on this, thanks! Are you able to set the bot to do the following, e.g. (if you're not doing it already)?

  1. to add "pint" value to each Mandarin category after a pipe "|", e.g. [[Category:cmn:Beginning Mandarin|er2]]
  2. to add "pint" (if missing), e.g. "|sort=er2" to any contexts and labels, e.g. {{temp|context|slang|lang=cmn|sort=er2}}, {{temp|cx|slang|lang=cmn|lang=cmn|sort=er2}}, {{temp|label|slang|lang=cmn|lang=cmn|sort=er2}}.
  3. to remove " in simplified script‎" and " in traditional script" from category name inside entries? It's OK if some categories are red-linked.
  4. to replace "rs" value with "pint" in |sort= (contexts and labels) and in categorizations?, e.g. [[Category:cmn:Countries in traditional script|尸02尼日爾]] to [[Category:cmn:Countries|ni2ri4er3]] --Anatoli (обсудить/вклад) 06:31, 9 April 2014 (UTC)
It is doing #3 now. I held off doing the rest because I wasn't sure that this is the best approach. I know it is how it is traditionally done, but there is a lot of duplications involved, and I am sure there is a better way to do this (section DEFAULTSORT) now that Lua is possible. Maybe we can embed some magic in {{zh-hanzi-box}} or {{Pinyin-IPA}} to make it look for the first parameter of Template:Pinyin-IPA. I don't know though. Wyang (talk) 06:51, 9 April 2014 (UTC)
Thank you. I don't know myself, sorry. I've only listed things that I think need to be done. I've been fixing some of them manually. Obviously after conversion, you may get duplications like [[Category:cmn:Countries|ni2ri4er3]] appearing twice. Sorting in categories is a pain in Chinese and Japanese but without "|pint". The categories will have the first character as the header, e.g. compare 百 with 八 in Category:cmn:Cardinal numbers. --Anatoli (обсудить/вклад) 23:09, 9 April 2014 (UTC)



Is the Pinyin section "dān一 [Phonetic: dānyī]" expected here? --Anatoli (обсудить/вклад) 02:12, 10 April 2014 (UTC)

Thanks. Done. Wyang (talk) 02:28, 10 April 2014 (UTC)


What's your source on this as a 輕聲? I've never heard it pronounced this way, and all the dictionaries and online sources I check don't indicate it is. ---> Tooironic (talk) 22:33, 11 April 2014 (UTC)

It's a colloquial variant, especially 體育 is followed by other nouns, as in 體育場, 體育館, 體育頻道, 體育新聞, 體育人生, 體育中心, 體育總局 ( It is listed in 《大漢俄詞典》: "体育 tǐyu физическое воспитание, физическая культура; [физкультура и] спорт; физкультурный, спортивный 體育運動 физкультурное движение, физкультура и спорт 體育比賽 спортивные соревнования". Wyang (talk) 01:10, 12 April 2014 (UTC)

Your bot breaks stuff[edit]

See diff, diff. Keφr 07:08, 15 April 2014 (UTC)

Done. Wyang (talk) 07:11, 15 April 2014 (UTC)
Happened again at diff. Keφr 15:16, 15 April 2014 (UTC)
Done. I'm waiting for this dump to finish, and will check all 'nolink's using that. Wyang (talk) 23:13, 15 April 2014 (UTC)
You should check Category:Pages with script errors in the meantime. Keφr 17:06, 19 April 2014 (UTC)
Those are using the wrong template. All Korean adjectives and verbs (lemma forms) have to end in 다. I will fix them later. Wyang (talk) 08:51, 22 April 2014 (UTC)

Chinese entries by Lo Ximiendo[edit]


When you have a moment, please check Mandarin entries by Special:Contributions/Lo_Ximiendo. --Anatoli (обсудить/вклад) 23:03, 15 April 2014 (UTC)

Most of them are quite minor edits. I have checked some and will keep an eye on it. Wyang (talk) 08:52, 22 April 2014 (UTC)


Please help me to do some testings because I'll be simplifying this Module by a lot. I'm also thinking about putting the data on a separate page but I don't know enough Lua to be able to do that. --kc_kennylau (talk) 10:23, 22 April 2014 (UTC)

Hi. It is good to see someone willing to tackle that module. My programming experience prior to Lua here is close to non-existent and I'm not familiar with everything in Lua, but I am happy to give whatever help I can regarding these modules. Another reason as to why I haven't got around to doing the rewrite, is that Module:zh is chiefly used substitutively. When you make changes to Module:zh, please make sure that the various functions in it still work correctly like before when called via {{cmn-new}}. Please let me know if I can be of any help. Wyang (talk) 11:37, 22 April 2014 (UTC)

Convert to {{cmn-pinyin}}[edit]


Could I ask you for a favour, please? Could you convert all pinyin entries in Category:Mandarin pinyin with diacritics to use {{cmn-pinyin}} with Wyangbot to make single-syllable pinyin be the same as multisyllabic ones, e.g. biào? There are other problems with those entries, though but that's a first step. Optionally, if it's not too hard, could you also remove anything (short definitions, descriptions) after {{pinyin reading of}} or brackets? I know the entries are far from perfect. --Anatoli (обсудить/вклад) 01:28, 23 April 2014 (UTC)

This is an example edit: diff, I'm after. (Please tell me if I'm pushing my luck here, LOL) --Anatoli (обсудить/вклад) 01:32, 23 April 2014 (UTC)
That format was quite stupid, see diff. --Anatoli (обсудить/вклад) 01:34, 23 April 2014 (UTC)
Sure, running now. Please see Special:Contributions/Wyangbot. Wyang (talk) 01:47, 23 April 2014 (UTC)
Thanks, mate! --Anatoli (обсудить/вклад) 01:56, 23 April 2014 (UTC)


{{cmn-new}} failed on . I have provided the reading manually in 埼玉. --Anatoli (обсудить/вклад) 03:45, 23 April 2014 (UTC)

I don't know this character... Nonetheless, it's added to Module:zh/data now. I sometimes use |p1=... (qí) for characters I don't know (and hence characters that are likely to fail). Wyang (talk) 03:51, 23 April 2014 (UTC)
OK, thanks again. I wasn't sure if you need to know about any missing character. I'll keep using |p1=..., etc. 埼 may be a Japanese invention, even if it's not marked so. BTW, on Chinese Wikipedia I tried to change 栃木 to 枥木/櫪木 (simp/trad) but was corrected. is a Japanese character (or ancient Chinese), AFAIK but it's now used in Chinese, at least in Wikipedia. --Anatoli (обсудить/вклад) 04:41, 23 April 2014 (UTC)
埼 referred to "bent coastline" in Classical Chinese. 栃 is kokuji, not shinjitai. The Chinese-language version of the official government website uses the unchanged character too, therefore the Wikipedia people did not change the title. Wyang (talk) 04:53, 23 April 2014 (UTC)
Thanks for the explanation. --Anatoli (обсудить/вклад) 05:07, 23 April 2014 (UTC)

玉 radical in Module:zh/data[edit]

Why did you categorize everything under to 王 radical instead? --kc_kennylau (talk) 04:34, 23 April 2014 (UTC)

It's the opposite as the 'sortkeys' function is to be used in conjunction. 玉 was written like 王 in the seal script and the two radicals were traditionally merged under one radical [16]. Unihan does the same too, as is Wiktionary. Wyang (talk) 04:43, 23 April 2014 (UTC)
Unihan in your link categorizes everything under Jade as opposed to King. --kc_kennylau (talk) 05:03, 23 April 2014 (UTC)
It's the opposite as the 'sortkeys' function is to be used in conjunction. Wyang (talk) 05:07, 23 April 2014 (UTC)


Hi, could you let me know what happened here? I tried to use the automatic template but it came up with "Module error". Thanks. ---> Tooironic (talk) 22:47, 23 April 2014 (UTC)

Hi. It's to do with @kc kennylau:'s changes to Module:zh. Please see Template talk:cmn-new#Module error. Wyang (talk) 23:29, 23 April 2014 (UTC)


Would it be feasible to compact all the Category:Mandarin headword-line templates to one module, like some other languages do? --kc_kennylau (talk) 07:45, 24 April 2014 (UTC)

I agree, though it might be better to start to do it after the unification vote is finished. Wyang (talk) 12:18, 24 April 2014 (UTC)


  1. Where is this function used?
  2. Why from f.args[13] to f.args[22]?

--kc_kennylau (talk) 12:31, 24 April 2014 (UTC)

It was used in a previous version when I tried to generate the content of the Etymology section all in one go. But it's been replaced by compdecompetym. It's gone now. Wyang (talk) 12:37, 24 April 2014 (UTC)


  1. Where is this function used? Why f.args[i]?
  2. Why ignore if comp=21?

--kc_kennylau (talk) 03:09, 25 April 2014 (UTC)

  1. 1. Superseded by Module:Pinyin-IPA, deleted. 2) It is because of the rules in Pinyin orthography (loosely followed here). Trisyllabic ones mostly have unspaced Pinyin, irrespective of comp= value. Wyang (talk) 00:42, 26 April 2014 (UTC)



If you accept the nomination, could you please edit Wiktionary:Votes/sy-2014-05/User:Wyang for admin and set your languages and the time zone, please? This also has to be on your user page. I believe you also need to make yourself contactable via email but I'm not 100% sure about this. The vote can start after your acceptance or whenever it's edited to be open. Good luck! --Anatoli (обсудить/вклад) 11:50, 25 April 2014 (UTC)

Thanks Anatoli. I have accepted the nomination, specified languages and timezone, and enabled email. What should I do next? Wyang (talk) 00:39, 26 April 2014 (UTC)
Welcome to sysophood, Wyang. Please add an entry at Wiktionary:Administrators. —Stephen (Talk) 22:16, 17 May 2014 (UTC)
Thanks Stephen! Wyang (talk) 22:30, 17 May 2014 (UTC)


There seems to be a mistake with the template here. AFAIK, 發瘋 is only pronounced as fāfēng, even in Taiwan. ---> Tooironic (talk) 22:17, 25 April 2014 (UTC)

Same goes for 發春. ---> Tooironic (talk) 22:19, 25 April 2014 (UTC)

Weird, though, the character comes up as both fà and fā, did you notice? ---> Tooironic (talk) 22:20, 25 April 2014 (UTC)

It's again to do with @kc kennylau:'s unfaithful simplification edits of Module:zh and Module:Pinyin-IPA. I have fixed them. To disable varpron, you can replace the character with Pinyin. To Kenny: Not all compounds of characters which are pronounced differently in Mainland and Taiwan should be interpreted by default as variant pronunciations. An example is 发/髮 (not 發). Basically if a character is used in the pronunciation template, it will be interpreted as being pronounced differently across the strait. cmn-new should not keep every character in data/MT, but only a subset (which is now lost in the code). Wyang (talk) 00:20, 26 April 2014 (UTC)
Fixed. --kc_kennylau (talk) 01:01, 26 April 2014 (UTC)

Mistake by User:Wyangbot[edit]

diff. This has happened on quite a few pages. —Mr. Granger (talkcontribs) 13:18, 1 May 2014 (UTC)

Fixed. Wyang (talk) 13:24, 1 May 2014 (UTC)
Yeah right. --kc_kennylau (talk) 13:53, 1 May 2014 (UTC)
That one done. Wyang (talk) 23:16, 1 May 2014 (UTC)


Hi Wyang, do you know why there is a "}}" under the Pronunciation header? ---> Tooironic (talk) 09:12, 2 May 2014 (UTC)

Oops, my bad. Fixed now. Wyang (talk) 09:31, 2 May 2014 (UTC)
Please also wikify Pinyin. --Anatoli (обсудить/вклад) 09:52, 2 May 2014 (UTC)
Hyperlinked now. Wyang (talk) 11:30, 2 May 2014 (UTC)
Thank you. --Anatoli (обсудить/вклад) 11:40, 2 May 2014 (UTC)
That's better, thanks. Is there a way to make the pinyin link green for me so I can auto-generate pinyin entries? ---> Tooironic (talk) 13:13, 2 May 2014 (UTC)
I'm not familiar with how acceleration creation works, unfortunately. You could perhaps refer to User:Conrad.Irwin or other users who are familiar with it. Wyang (talk) 13:30, 2 May 2014 (UTC)


What does this line do? --kc_kennylau (talk) 07:42, 3 May 2014 (UTC)

There was originally a leading whitespace at example_transform when the module used iterative word assignment. You rewrote it to use mw.text.split but did not remove the leading whitespace, hence it causes problems when the capitalisation feature appends a '^' at the start of translit. When Anatoli reported that the capitalisation feature failed, I saw that the problem was caused by '^ %l' (instead of '^%l'), but didn't realise what caused that. I have fixed it now. Wyang (talk) 07:55, 3 May 2014 (UTC)
I see. --kc_kennylau (talk) 08:39, 3 May 2014 (UTC)


Hi Wyang, was just wondering how I can indicate both mainland and Taiwan pronunciations in the new template, e.g. with 垃圾食品? ---> Tooironic (talk) 11:13, 3 May 2014 (UTC)

@Tooironic: Using the character instead of the pinyin to represent the character that would have two pronunciations? I'm not sure with the new template. --kc_kennylau (talk) 11:16, 3 May 2014 (UTC)
Yes, you could use the characters themselves: [17], provided they are in Module:zh/data#MT (most of the var prons should be there). Wyang (talk) 11:27, 3 May 2014 (UTC)
Looks good. Is there a way to indicate to the reader which is mainland pronunciation and which is Taiwan pronunciation? ---> Tooironic (talk) 23:41, 4 May 2014 (UTC)
@Tooironic: It's visible in the expanded mode. --Anatoli (обсудить/вклад) 23:43, 4 May 2014 (UTC)
The context tags are in the "expanded" mode of the pronunciation template. Wyang (talk) 23:42, 4 May 2014 (UTC)
Oh I see. Thanks! ---> Tooironic (talk) 23:44, 4 May 2014 (UTC)
At 日期, Min Nan POJ (ji̍t-kî,li̍t-kî) is hyperlinked as one. Did I do something wrong? It should be ji̍t-kî,li̍t-kî, not ji̍t-kî,li̍t-kî. --Anatoli (обсудить/вклад) 23:56, 4 May 2014 (UTC)
Should be '/' instead of ',' for Min Nan (since most Min Nan dictionaries seem to use '/'): [18]. Wyang (talk) 23:58, 4 May 2014 (UTC)
Thanks. I may have used a comma somewhere. I don't remember now. I'd prefer to have a bit more consistency, since Mandarin uses comma. Too many differences and things to remember. :)
BTW, converting existing Cantonese, Min Nan to ==Chinese== is quite time consuming. I'm not complaining but it will take time. We should try and get more people doing it. @Jamesjiao:, @Tooironic:, @Kc kennylau: do you think you can help a bit? You don't really have to know Cantonese, just need to know what to do and know Mandarin. Some entries are tricky but this diff was quite straightforward and simple edit on 多士, which merged Cantonese and Mandarin into one Chinese entry, which now has both Mandarin and Cantonese pronunciations and categories. --Anatoli (обсудить/вклад) 00:18, 5 May 2014 (UTC)
I have made it accept ',' as well. (The use of '/' should be encouraged nevertheless, I think) Wyang (talk) 01:26, 5 May 2014 (UTC)
Do we have a list of all the entries that need to be converted from Mandarin to Chinese? ---> Tooironic (talk) 04:24, 5 May 2014 (UTC)
@Tooironic: Converting all by hand would be a huge task but it's better to convert varieties (topolects) first, because they duplicate info and it's hard to do by a bot, e.g. starting at Category:Cantonese_nouns (I'm on letter "H") and other parts of speech and a much bigger list - Category:Min Nan nouns (verbs, adjectives, etc.) (only in Hanzi, don't do romanised forms). Please take a look at some complete entries to see how it's done. You can use Wyang's {{zh-new}} for this "|c=" stands for Cantonese reading (romanisation syllables should have spaces) and |mn= for Min Nan POJ. Wyang might be able to do it most of remaining Mandarin entries by a bot. Do only multisyllabic for the moment, single-character ones can be done later and they are much more complicated and messy. --Anatoli (обсудить/вклад) 04:33, 5 May 2014 (UTC)
The multisyllabic ones that need to be done manually are listed at User:Wyang/worklist. Wyang (talk) 04:34, 5 May 2014 (UTC)
Cantonese native here, how may I help all of you? --kc_kennylau (talk) 08:52, 5 May 2014 (UTC)
Hi Kenny, thanks, please read the above. We have described what's needed. --Anatoli (обсудить/вклад) 09:21, 5 May 2014 (UTC)

Category:Pages with script errors[edit]

We have occupied this category, I half and you half! :D --kc_kennylau (talk) 10:16, 4 May 2014 (UTC)

Gosh, so much formatting silliness in these articles. I have done my half, all yours now. :) Wyang (talk) 11:15, 4 May 2014 (UTC)

Two questions[edit]

Hi Frank,

I have two questions.

Wikipedia seems to use Pe̍h-ōe-jī and Tâi-lô as synonyms but they are not, apparently. What's the difference and which is more common, standard? is it POJ?

I find the use of Category:zh:Variant pronunciations and Category:cmn:Variant pronunciations a bit confusing and misleading. They are not topical categories and only applicable to Mandarin, not Cantonese, Min Nan, etc. I think it make sense to use Category:Mandarin variant pronunciations instead in this case, as Wikitiki89 suggested on my talk page. It's not urgent but we'll probably have to change this. --Anatoli (обсудить/вклад) 01:10, 6 May 2014 (UTC)

POJ and TL are two different romanisation schemes. POJ is the more popular one, although the TW government seems to be promoting the latter and the amount of material printed using TL has been increasing.
For the latter, I have changed the category generated by {{cmn-pron}} to Category:Mandarin variant pronunciations. Wyang (talk) 01:15, 6 May 2014 (UTC)
Thank you. I might make an entry for Tâi-lô later. --Anatoli (обсудить/вклад) 01:28, 6 May 2014 (UTC)

variant pronunciation[edit]

Hi Wyang, I noticed a problem with the pronunciation template. It seems that comes up automatically as mainland=xuè and Taiwan=xiě (e.g. in the entry I just created for 血運), but actually xuè is the standard pronunciation, while xiě is a (very) common variant in both mainland and Taiwan. Currently the template gives the impression that xuè is only used in mainland, and xiě only used in Taiwan which is incorrect. Would we able to fix this? ---> Tooironic (talk) 02:37, 6 May 2014 (UTC)

I would just use xuè,xiě (both equally) with xuè being the more common pronunciation. It's probably complicated, some words may use one or the other only, is that right? --Anatoli (обсудить/вклад) 05:22, 6 May 2014 (UTC)
Xiě is definitely the most common pronunciation; very few Chinese pronounce it as xuè. As it stands now the design of the pronunciation header is incorrect in that it assumes that there is regionality when there is none. ---> Tooironic (talk) 10:03, 6 May 2014 (UTC)
xue4 is the standard in Mainland, and xue3 and xie3 are variants. I have changed the template and module. @Tooironic: How does 血運 look now? Wyang (talk) 12:26, 6 May 2014 (UTC)
Looks perfect! We can now use this entry as a template for future 血-words. ---> Tooironic (talk) 22:46, 6 May 2014 (UTC)

Two more questions[edit]

The {{zh-usex}} at 男尊女卑 transliterates 地 as "d'e" instead of "de", although I told it to use "de".

Also, do you think 封建社会 should be a separate entry (is it a SoP?)? It's included in CEDIC dictionary but that's hardly a great indication. 麻烦你了! --Anatoli (обсудить/вклад) 05:19, 6 May 2014 (UTC)

My fault, I'll try to fix it now. --kc_kennylau (talk) 07:19, 6 May 2014 (UTC)
Fixed. --kc_kennylau (talk) 07:26, 6 May 2014 (UTC)
封建社会 is a word, not SoP. Wyang (talk) 12:26, 6 May 2014 (UTC)
I agree that 封建社會 is probably a word. The Chinese love the word 社會 and use it very flexibly, sometimes even creating words in the process. Also, both MOE and 現代漢語規範詞典 list it. ---> Tooironic (talk) 22:48, 6 May 2014 (UTC)

Chinese idioms[edit]

Hi Wyang. Was just wondering how Chinese idioms are dealt with under the new system? I tried making a new entry at 厚積薄發 but it didn't turn out well. Any suggestions? ---> Tooironic (talk) 10:00, 6 May 2014 (UTC)

@Tooironic: What do you mean it didn't turn out well? --kc_kennylau (talk) 10:05, 6 May 2014 (UTC)
Look at the weirdness in the categories at the bottom of the page. ---> Tooironic (talk) 10:06, 6 May 2014 (UTC)
You should have used id instead of idiom. However, I have added codes to adapt to names already, so you're safe to use idiom now. --kc_kennylau (talk) 10:23, 6 May 2014 (UTC)
Yes, Kenny is right. Wyang (talk) 12:26, 6 May 2014 (UTC)
Thanks very much. I don't usually add idioms but this one is one of my favourites. :) ---> Tooironic (talk) 22:45, 6 May 2014 (UTC)

Two questions[edit]

  1. Look at 唔該, the Yale romanization is not function properly because the grave accent cannot be displayed on top of the letter m. Any idea how to fix this?
  2. What parameters should be included in {{zh-noun}}? I feel so awful deleting every detail in the head.

Above. --kc_kennylau (talk) 09:26, 7 May 2014 (UTC)

#1 is due to the <tt> formatting. We could remove that, although it wouldn't look as nice typographically. 2) I would go for no parameter at all. Anything included would be duplicative of something that is already present. Wyang (talk) 09:32, 7 May 2014 (UTC)

return your head[edit]

Did you understand this phrase I put in the edit summary? --kc_kennylau (talk) 10:42, 8 May 2014 (UTC)

A calque of Chinese ……你個頭Wyang (talk) 10:43, 8 May 2014 (UTC)
Does this phrase exist in Mandarin? --kc_kennylau (talk) 10:48, 8 May 2014 (UTC)
Yes. Wyang (talk) 10:48, 8 May 2014 (UTC)
What would an appropriate translation be? --kc_kennylau (talk) 11:10, 8 May 2014 (UTC)
my arse. See 你妹. Wyang (talk) 11:12, 8 May 2014 (UTC)
Interesting that Russian uses a similar swearword (zh:你妈!) in such cases - твою́ ма́ть! (tvojú mátʹ!) "your mother!" (in the accusative case - object), well "fuck" is implied and is also used explicitly: ёб твою́ ма́ть (jób tvojú mátʹ)! --Anatoli (обсудить/вклад) 05:21, 9 May 2014 (UTC)

Cantonese - done, Min Nan - to do[edit]

Cantonese multisyllabic entries are now converted/merged/fixed to use "Chinese" L2 (every PoS, if they used the proper templates)! Now the turn is for Min Nan - a much larger set. I'm not familiar with Min Nan but I can treat carefully and check [19] but I may not be able to spot wrong entries - transliteration, senses, etc. Do you think you can run your bot again (I saw you merged Min Nan entries as well)? Min Nan entries seem a bit more complicated than Cantonese, though. --Anatoli (обсудить/вклад) 04:59, 9 May 2014 (UTC)

Thanks for all the hard work, 安德利 :) I definitely will, when I have time. Wyang (talk) 05:07, 9 May 2014 (UTC)
不用谢,方智。我越来越喜欢学中文,现在做新文章也比较容易了,感谢你了。:)--Anatoli (обсудить/вклад) 05:15, 9 May 2014 (UTC)
誰是方智?還有,祝你好運!--kc_kennylau (talk) 09:03, 9 May 2014 (UTC)
你猜啊!谢谢你。--Anatoli (обсудить/вклад) 12:07, 9 May 2014 (UTC)
你的中文蛮不错的嘛,呵呵。Wyang (talk) 01:21, 10 May 2014 (UTC)
Why can't we make the category for Cantonese Jyutping act more like the category for Mandarin Pinyin? --Lo Ximiendo (talk) 03:43, 10 May 2014 (UTC)
I'm probably the wrong person to ask for this...I'm against having Mandarin Pinyin in the way they are now, or having any romanised entries at all. Wyang (talk) 05:28, 10 May 2014 (UTC)
@Lo Ximiendo: Monosyllabic are allowed, pollysyllabic probably not - Wiktionary:Votes/2013-11/Jyutping. That vote was controversial. --Anatoli (обсудить/вклад)

a potential issue[edit]

Hi Wyang, I noticed that at the new entry I created for 休養 it appears that it is not linked to any category. What's going on here? ---> Tooironic (talk) 01:26, 10 May 2014 (UTC)

Hi, please use {{zh-pron}} instead of {{Pinyin-IPA}}. Mandarin audios have the parameter '|ma=', which works exactly like the parameter '|a=' in Pinyin-IPA. Please see my change on that page. Wyang (talk) 01:29, 10 May 2014 (UTC)
Gotcha, thanks. ---> Tooironic (talk) 02:36, 10 May 2014 (UTC)


Does this exist in Mandarin? --kc_kennylau (talk) 12:51, 11 May 2014 (UTC)

Yes. I have added some examples there but they are in part 18+. Wyang (talk) 00:15, 12 May 2014 (UTC)

cat=con vs. cat=conj[edit]

It's a minor issue, but I just made the above change to zh-pron in 4 entries to empty Category:Mandarin con and its sister categories: you evidently told your bot to use "con", and zh-pron didn't recognize it. Fortunately I knew where to find the correct abbreviation, but others won't- so someone will probably make that or similar mistakes as long as there's no list in the documentation. Chuck Entz (talk) 00:21, 12 May 2014 (UTC)

Thanks for letting me know. I have enabled 'con' and 'conjunction' as valid aliases of 'conjunctions'. Wyang (talk) 00:24, 12 May 2014 (UTC)

Wu Chinese transliteration[edit]

Sorry, Frank, I'm making too many mistakes so far. I'll make a list of words and my transliteration attempts for you to check. Is that okey? --Anatoli (обсудить/вклад) 01:07, 12 May 2014 (UTC)

Please if you can. It's all right since the Wu pronunciations are quite unintuitive for anyone not familiar with it. Wyang (talk) 01:11, 12 May 2014 (UTC)

Middle Chinese and Hakka transliterations[edit]

Hi Frank,

Is there a way to transliterate Middle Chinese? I've merged but not happy about Middle Chinese (ŋấ, ngɑ̌) and Hakka (a big list with a reference to a dictionary). I'd like to do . It also has Middle Chinese transliterations: *xaù, *xǎu and a list of Hakka. Not sure about the best way to add them. --Anatoli (обсудить/вклад) 14:41, 13 May 2014 (UTC)

Please use the |mc= parameter in {{zh-pron}}. Please use this page to look up MC pronunciations, the parameter value is "中古声母(1 syl)-中古韵母(1 syl)-中古等(1 or 3 syl, no "等" character)-中古开合(1 syl)-中古摄(1 syl)-中古声调(1 syl)-中古反切(2 syl)". Multiple readings ("后一条") are separated by ",". Please see my edits at and . Wyang (talk) 05:01, 14 May 2014 (UTC)
Hi. I did a couple but adding Middle Chinese transliterations seems such a hassle using [[20]] Perhaps, we should just adopt one or two of the transliterations there without extra info? Same with Hakka, actually, perhaps a simple list would do, not sure if every word/character can be found in the used references. --Anatoli (обсудить/вклад) 01:29, 20 May 2014 (UTC)
The process of extracting those values can be automated. The "one or two of the transliterations there" are for Old Chinese, not Middle Chinese. Wyang (talk) 01:47, 20 May 2014 (UTC)
I meant Middle Chinese, e.g. value "疑歌一开果上五可" for . If this can be automated, this would be wonderful but that's for single-character words? --Anatoli (обсудить/вклад) 01:54, 20 May 2014 (UTC)


Was just wondering if you had a suggestion about how to translate the extended meaning of 備胎? The best I could do was "a possible replacement for one's current partner". It's a terrible translation, but I'm not sure if there is any equivalent for this in English. ---> Tooironic (talk) 04:17, 14 May 2014 (UTC)

Aha, good one. Don't think an exact equivalent exists in English - maybe "a backup", "a second choice", "a just-in-case", "a plan B", "a contingency"? Wyang (talk) 04:26, 14 May 2014 (UTC)


Hi Frank,

Could you check this entry please - specifically word boundaries and Min Nan transliteration? There is some Wu specific grammar and words I don't understand in this usage example. --Anatoli (обсудить/вклад) 00:06, 15 May 2014 (UTC)

I have checked Wu. It doesn't seem to be used in Min Nan. Which bit of the grammar do you not understand? 立(站)-辣(在)-窗口頭(窗口前)-額(的)-搿(這)-個-人-是-㑚(你)-經理,對- 𠲎(嗎)? Wyang (talk) 03:13, 15 May 2014 (UTC)
Thanks for adding Mandarin, I understand now. I hoped there is a Min Nan reading, also for 你们, even if the words are not used in Min Nan. Should 窗口頭(窗口前) be split or is it synonymic to 窗口? Also, Qian Nairong says is also pronounced as "whu23" by young people, normally "ngu34", that's "3ngu" and "3hhu", right? Which tone is right? Can I add the alternative "hhu" pronunciation? --Anatoli (обсудить/вклад) 03:22, 15 May 2014 (UTC)
You (plural) in Min Nan is . Yes, 我 is 3ngu and 3hhu. 窗口頭 is a word, 頭 is a suffix in that word, like 木頭. Wyang (talk) 03:28, 15 May 2014 (UTC)


Is this sound even present? --kc_kennylau (talk) 10:18, 15 May 2014 (UTC)

喔唷 (ōyō), 哼唷 (hēngyō), 哎哟 (āiyou, āiyō) --Anatoli (обсудить/вклад) 10:34, 15 May 2014 (UTC)
Yep. Wyang (talk) 23:12, 15 May 2014 (UTC)
Zhuyin is failing, though, see 唷喔 or ōyō. Could you please add? I think it's "|ㄛ". --Anatoli (обсудить/вклад) 23:29, 15 May 2014 (UTC)
Yes, "yō" is definitely "|ㄛ" in Zhuyin: [21]--Anatoli (обсудить/вклад) 23:32, 15 May 2014 (UTC)
Fixed. Wyang (talk) 00:22, 16 May 2014 (UTC)

Pinyin-IPA to zh-pron 2[edit]

When converting topolects to the new format, the longest time is to convert from using {{Pinyin-IPA}} to {{zh-pron}}. Could you run a bot to change those on existing Mandarin entries? I don't know if it's hard and if it may cause other problems, though. --Anatoli (обсудить/вклад) 01:18, 16 May 2014 (UTC)

Hi, you can use {{Pinyin-IPA/a}}. Replace 'Pinyin-IPA' with 'subst:Pinyin-IPA/a', and add a |cat= parameter at the end. :) Wyang (talk) 02:09, 16 May 2014 (UTC)
I'm not sure what you mean. I am only doing it manually (copy/paste) or re-generate with {{zh-new}}, which adds {{zh-pron}}. Can you show, please? --Anatoli (обсудить/вклад)



Replace it with


Wyang (talk) 02:19, 16 May 2014 (UTC)

I got it, thanks. Used on 草書 + c=, mn=. --Anatoli (обсудить/вклад) 02:55, 16 May 2014 (UTC)

Other Topolects for 加油[edit]

Hi, when are you adding the pronunciations for Wu, Gan, Hakka, Min Dong and Xiang on the entry for 加油? --Lo Ximiendo (talk) 01:55, 16 May 2014 (UTC)

Wyang is very busy with merging topolects. I'm also hassling him to add Wu pronunciations, which I attempt to do myself. For some topolects without a developed transliteration system it's especially complicated and may not be even available. If IPA or sound recording is found, then it's possible but this information has to be found. Having said this, a starightforward way to add topolects, which are not handled yet must be addressed, if IPA or sound recording is found.
My attempt with Wu: "1ka yeu" (probably wrong), Hakka: "ka-yû". --Anatoli (обсудить/вклад) 02:06, 16 May 2014 (UTC)
I'd probably go for "4ka yeu". JamesjiaoTC 02:18, 16 May 2014 (UTC)
In Shanghainese Wu, it follows phrase tone sandhi rules, as its individual parts are evident. It's ka44 hhieu23. Wyang (talk) 02:28, 16 May 2014 (UTC)
Is the /ɦ/ really there? I can't hear it myself (doesn't mean it doesn't exist though). JamesjiaoTC 02:44, 16 May 2014 (UTC)
I can't add |w=ka44 hhieu23 (Module error). --Anatoli (обсудить/вклад) 02:43, 16 May 2014 (UTC)
I don't hear /ɦ/ either. I've got a little book on Shanghainese. They speak very fast, though and I don't seem to get Wu sounds well. here's a nice recording on [22], the site Wyang gave me.--Anatoli (обсудить/вклад) 03:01, 16 May 2014 (UTC)
/ɦ/ is the slight constriction of the glottis in the recording. Apart from the constriction, the presence of 'hh' also causes the tone to be lower when the character is pronounced in isolation. Compare 椅 i and 夷 hhi, as well as 矮 a and 鞋 hha. Null-initial and /ɦ/ are found in complementary distribution, occurring in characters which had voiceless and voiced initials in MC respectively. 油 (you2) had voiced initial in MC, which is why it is tone 2 in Mandarin (平) not tone 1 (平). 幽 (you1) would have voiceless initial in MC, and its Shanghainese pronunciation would therefore lack 'hh' and be just 'ieu'. Wyang (talk) 04:42, 16 May 2014 (UTC)
Makes a bit of sense but how do you know if it's 阳平 or 阴平 tone? Do you know Middle Chinese pronunciation for these characters? For 油 Wu minidict only shows "yeu" 平/1. So it can be either 1yeu or 3hhieu? --Anatoli (обсудить/вклад) 05:23, 16 May 2014 (UTC)
For 油 it is 3hhieu (MD: yeu 平/1, 阳平), and for 幽 it is 1ieu (MD: ieu 平/1, 阴平). You can use the MC pronunciation or other dialectal information. For the level tone it is easy, Mandarin 1st tone = 阴平, 2nd tone = 阳平; Cantonese 1st tone = 阴平, 4th tone = 阳平. So compare: 幽 (M you1, C jau1, W 1ieu), 油 (M you2, C jau4, W 3hhieu). Wyang (talk) 05:28, 16 May 2014 (UTC)
So, you basically can use Mandarin pronunciation + 平/去/入 from MD to determine the tone of isolated hanzi? I was only relying on MD for tones when I couldn't use Qian's book. --Anatoli (обсудить/вклад) 05:34, 16 May 2014 (UTC)
You don't need to use Mandarin. The voicedness of the initial and 平上去入 is enough for knowing which tonal category the character belongs to in Shanghainese. MiniDict's 'y' is 'hhi', so for the voiced initial 'hh', the tonal category of 油 is tone 3 (voiced, 平, i.e. light level). Wyang (talk) 05:37, 16 May 2014 (UTC)
I still find it hard to convert what I find in wu-minidict to what you have described. I'm not giving up but it's kind of difficult to combine learning and editing. Even if I get an audio file to listen to Shanghainese words, I can now pick up only some tones, phrasal tones make little sense. I'm more or less comfortable with reproducing and picking up Mandarin tones, I never really bothered with IPA, since I used pinyin and characters. And I'm still about uncomfortable with numbers used to represent tones in IPA but I'm getting more understanding. My exposure to Cantonese is much shorter but I used lessons and listen to recording but I'm not comfortable with Cantonese tones. Still, Cantonese doesn't sound as alien as Shanghainese, my former Chinese classmates taught me some too. After the merger, I'll do a bit more Shanghainese. Sorry for bugging about transliterations and thank you very much for your help. If it's not a burden, I'll keep adding words to my list of words to transliterate in Wu.
On the topic of Xiang, Gan, etc. Since there's so little documentation, no standard or official transliteration, are we going to handle those at all? if yes, in what way? Currently, there's almost nothing in Wiktionary, outside Mandarin and major popular topolects - Cantonese, Min Nan, Wu and Hakka. What if there's a sourced audio-recording or IPA in Xiang Chinese? Can we have a simple framework for those? E.g. as simple as x=IPA(key): /siɔ̃44 ny31/, etc. in 湘语#Pronunciation? Just a thought. --Anatoli (обсудить/вклад) 01:50, 20 May 2014 (UTC)
Shanghainese is a bit unusual among Chinese dialects. It arose as sort of a creole of different Wu and Mandarin dialects in the past century, which is why its phonology is a lot simplified compared with the neighbouring dialects. Its tone system is on the verge of breakdown (or from another perspective, on the path to a pitch accent system), and there is so much homophony and multisyllabification. For example, the listener wouldn't know whether the person who said 我买/卖过汽车 has the experience of buying or selling a car. No worries about the transliteration checks.
Tones are hard to get used to, especially when there are too many of them in the language.
With regard to the other groups, the only one with some printed romanisation material would probably be Min Dong. The romanisation is Foochow Romanized or "Bàng-uâ-cê" (same characters as POJ). However, the phonology of Min Dong is notoriously difficult, arguably the hardest in theory among Chinese dialects. There are complex sandhi rules not only for tones, but for initials and finals (!) as well (See how Fuzhou dialect#Rimes has two sets of values for each rime). Luckily I had some exposure to it before. The amount of printed material using that romanisation is meagre, although I am looking for ways of obtaining those material either electronically or in print.
The other ones - I would just set the parameter |x=, |g= to IPA. The parameter will be passed to a function which converts numbers to superscripts: x=siɔ̃44 ny31. Audios can be added using |xa=, |ga=; see 中国. Wyang (talk) 02:22, 20 May 2014 (UTC)
Would |x=IPA=/siɔ̃44 ny31/ be OK for Xiang or just |x=siɔ̃44 ny31 ? --Anatoli (обсудить/вклад) 02:27, 20 May 2014 (UTC)
It's a shame Wu/Shanghainese has so few resorces. The site you gave me - [23] doesn't use consistent spelling and there's so little about grammar. Ming Dong seems scary and there must be very little written in this dialect or only in Roman letters. Another problem is, dialectal words may not pass RFV, if they only appear in chats, dubious web-sites and the pronunciation/transliteration provided is amateurish or otherwise incompatible with the way we write IPA/transliteration here. So, some dialects, even big ones may miss out completely. --Anatoli (обсудить/вклад) 02:36, 20 May 2014 (UTC)
We could always resort to dictionaries perhaps, such as the Comprehensive Dictionary of Chinese Dialects I mentioned before or the Comprehensive Dictionary of Modern Chinese Dialects. Wyang (talk) 03:43, 20 May 2014 (UTC)
It's not easy to access them, I don't see myself mass-adding entries in smaller dialects, I may become more comfortable with Wu later, and Min Nan and Cantonese are available enough. I think we should create a simple enough framework, though (like you said x=IPA(x)). Please also answer my question above about the format of Xiang IPA or let me know if you're undecided yet. --Anatoli (обсудить/вклад) 03:50, 20 May 2014 (UTC)
Just |x=siɔ̃44 ny31, since there is no romanisation for it. Wyang (talk) 03:52, 20 May 2014 (UTC)
OK. I've added in 湘語/湘语, please make it display and categorise (Xiang nouns) if you can, and other topolects we might include in the future (IPA only). --Anatoli (обсудить/вклад) 04:07, 20 May 2014 (UTC)
OK, |x=, |g=, |j= enabled now.[24] Wyang (talk) 04:22, 20 May 2014 (UTC)

Thanks but 湘語/湘语 doesn't seem to work - I mean categorisation. I think they should also be visible in collapsed mode as well--Anatoli (обсудить/вклад) 04:27, 20 May 2014 (UTC)

Categorised now. Gan, Jin and Xiang promoted (it looks a bit weird though, having a mix of romanisations and ipas). Wyang (talk) 04:37, 20 May 2014 (UTC)
Thank you. I think it looks OK for the lack of romanisation and because there could be multiple IPA for other varieties. we can document it later.
Without actually suppressing any dialect, there should be probably be a technical limit on what can go into {{zh-pron}}, and can be added to PoS categories. What if a small regional entry with a pronunciation is added by a contributor, e.g. Sichuanese Mandarin 横顺 (huan2 sen1) (=反正) or even smaller, less known dialect? Wiktionary principle is all words in all languages, though. What do you think? --Anatoli (обсудить/вклад) 04:56, 20 May 2014 (UTC)
I agree. We could account for those by allowing things like |m=Sichuan=IPA (in the future). Wyang (talk) 06:29, 20 May 2014 (UTC)


Please check the Mandarin pronunciation of 一次方程. --kc_kennylau (talk) 14:31, 18 May 2014 (UTC)

Checked, it is correct. Wyang (talk) 23:21, 18 May 2014 (UTC)


Keep it going, please :) There are verbs, adverbs, adjectives... --Anatoli (обсудить/вклад) 07:27, 20 May 2014 (UTC)

Seems to be all done now. Wyang (talk) 08:10, 20 May 2014 (UTC)
Good job! There are some multisyllabic adjectives, interjections, pronouns and prepositions. I have just cleaned a few proper nouns. Well, when all varieties are done, you can do Mandarin? --Anatoli (обсудить/вклад) 08:45, 20 May 2014 (UTC)
You are right... For some reason I erroneously filtered some articles off the list. I'm now generating a still-to-do list from the dump, and I'm probably looking at >100 pages here. Wyang (talk) 10:52, 20 May 2014 (UTC)
Could you run your AWB again, please? It's just not efficient to do it manually.
There are only two Min Dong entries - 平話/平话.--Anatoli (обсудить/вклад) 13:06, 21 May 2014 (UTC)
No problem, but probably tomorrow since it's quite late now. Would you like to use that tool too? It is very simple. I have shared my file at Wyang (talk) 13:21, 21 May 2014 (UTC)
I have saved the file but I have no idea how AWB works and I don't have it. You'd probably have to spend much time explaining. Tomorrow's fine or any other time, as long as you're planning to do it. --Anatoli (обсудить/вклад) 13:28, 21 May 2014 (UTC)
At any rate, if you would like to learn to use it any time, I'm more than happy to help. All you do is download it, put the file in (File > Open Settings), log in (File > Log in/Profiles > Add), and run (Start > Start). I will do some when I have time. Wyang (talk) 23:25, 21 May 2014 (UTC)


Why use pinyin to sort both simplified and traditional versions in Chinese topic categories? --kc_kennylau (talk) 12:23, 20 May 2014 (UTC)

I don't know. What do you reckon? Wyang (talk) 00:21, 21 May 2014 (UTC)

Pinyin spacing[edit]

Should 土衛七 be Tǔwèiqī or Tǔwèi qī or Tǔwèi Qī? --kc_kennylau (talk) 12:33, 20 May 2014 (UTC)

I'm not familiar with the orthography rules of Pinyin. This website might be helpful. Wyang (talk) 00:22, 21 May 2014 (UTC)
I suggest "Tǔwèiqī", the same with other Saturnian or Jovian moons. --Anatoli (обсудить/вклад) 00:42, 21 May 2014 (UTC)

房卡, 房號, 房号, 房型[edit]

Just bringing this to your attention. All these entries have come up with "(At least one of the forms in the hanzi box is uncreated...)" at the top of the page. ---> Tooironic (talk) 04:29, 21 May 2014 (UTC)

It goes away if you save the page with an empty edit. It's a server lag problem. I'm using my bot to do null edits on these, so it should go away soon. Wyang (talk) 04:32, 21 May 2014 (UTC)


How to link to one page while having the transliteration with spaces? For example in 愛怎麼著怎麼著. --kc_kennylau (talk) 11:14, 21 May 2014 (UTC)

I believe that is not possible currently... Well, unless you modify Module:zh-usex. :) Wyang (talk) 11:17, 21 May 2014 (UTC)
I have already implemented this function. Please update the documentation accordingly if you like. :) --kc_kennylau (talk) 12:31, 21 May 2014 (UTC)
Thanks. I have expanded Template:zh-usex/documentation. It seems simp_word[i] fails to accept the new tricks though, when I tried to add 愛怎麼著怎麼著 as an example there. Wyang (talk) 12:56, 21 May 2014 (UTC)
Where is the example that failed? --kc_kennylau (talk) 13:12, 21 May 2014 (UTC)
At that documentation page now. Wyang (talk) 13:23, 21 May 2014 (UTC)
Done. --kc_kennylau (talk) 13:35, 21 May 2014 (UTC)

個/个 and topolect merger[edit]

Frank, could you please edit the entry yourself, specifically the Wu transliteration, perhaps some use examples? Just one of them is okey - , I'll fix the other one (trad./simp.).

I have added a few Wu entries without updating the check-list. Some are from the Wu dictionary (astronomy, weather), so I have some confidence about the tones but initials/consonants may need checking but the IPA generated looked similar (not identical to Wiktionary methods you designed). I also used existing verified entries for reference. I can't easily access the dictionary, though. So, others entries need more attention still - both tones and the rest. Would you prefer me to add any new Wu entry to the checklist? Thanks for regularly checking it! It's really helpful.

I'd like to do more Wu, I'd appreciate if you check my edits. The more entries we, the easier it gets to add more contents.

I'll leave the remaining work on topolect merger to you, since you're better equipped with tools and skills (there are still remaining multisyllabic entries but you need to update your list, since I have done a few) but I will work gradually on single-characters entries, they probably can't be done automatically?

I think all remaining Min Nan entries without Mandarin equivalent should get {{cx|Min Nan|lang=zh}} in front of the definition. What do you think? --Anatoli (обсудить/вклад) 01:56, 23 May 2014 (UTC)

No problem for the checklist. Please add anything you are unsure about, or anything that is not supported by those references.
I have expanded . I propose that we allow the use of the header "Definitions" for hanzi entries, makes it a lot clearer and editing a lot easier.
I will get on with the merger job... There were 177 remaining the last time I checked. Should be done soon. Wyang (talk) 06:54, 23 May 2014 (UTC)

Null-initial syllables[edit]

Should it be /ɥy/ instead of /y/ for the word 語 in 語言? --kc_kennylau (talk) 09:28, 23 May 2014 (UTC)

Do you mean when it is null-initial? Or any /y/? Wyang (talk) 09:43, 23 May 2014 (UTC)
Yes, any null-initial. --kc_kennylau (talk) 09:47, 23 May 2014 (UTC)
I don't have a strong preference for this... To me they are just different ways of looking at the phonotactics. I prefer /y/, as I think there isn't a semivocalic component that is worth notating, but that might just be my idiosyncrasy. If you change it, make sure you change /i/ and /u/ as well. Wyang (talk) 10:00, 23 May 2014 (UTC)


I think this should be pronounced húr, right? ---> Tooironic (talk) 17:19, 23 May 2014 (UTC)

Yes, you are right. Thanks, added. Wyang (talk) 00:43, 24 May 2014 (UTC)